Cover
Half-title page
Title page
Dedication
Contents
Preface
Part I Preliminaries
1 Vector Spaces and Bases
1.1 Definition of a Vector Space
1.2 Examples of Vector Spaces
1.3 Linear Subspaces
1.4 Spanning Sets, Linear Independence, and Bases
1.5 Linear Maps between Vector Spaces and Their Inverses
1.6 Existence of Bases and Zorn’s Lemma
Exercises
2 Metric Spaces
2.1 Metric Spaces
2.2 Open and Closed Sets
2.3 Continuity and Sequential Continuity
2.4 Interior, Closure, Density, and Separability
2.5 Compactness
Exercises
Part II Normed Linear Spaces
3 Norms and Normed Spaces
3.1 Norms
3.2 Examples of Normed Spaces
3.3 Convergence in Normed Spaces
3.4 Equivalent Norms
3.5 Isomorphisms between Normed Spaces
3.6 Separability of Normed Spaces
Exercises
4 Complete Normed Spaces
4.1 Banach Spaces
4.2 Examples of Banach Spaces
4.2.1 Sequence Spaces
4.2.2 Spaces of Functions
4.3 Sequences in Banach Spaces
4.4 The Contraction Mapping Theorem
Exercises
5 Finite-Dimensional Normed Spaces
5.1 Equivalence of Norms on Finite-Dimensional Spaces
5.2 Compactness of the Closed Unit Ball
Exercises
6 Spaces of Continuous Functions
6.1 The Weierstrass Approximation Theorem
6.2 The Stone–Weierstrass Theorem
6.3 The Arzelà–Ascoli Theorem
Exercises
7 Completions and the Lebesgue Spaces L[sup(p)]([Omega])
7.1 Non-completeness of C([0, 1]) with the L[sup(1)] Norm
7.2 The Completion of a Normed Space
7.3 Definition of the L[sup(p)] Spaces as Completions
Exercises
Part III Hilbert Spaces
8 Hilbert Spaces
8.1 Inner Products
8.2 The Cauchy–Schwarz Inequality
8.3 Properties of the Induced Norms
8.4 Hilbert Spaces
Exercises
9 Orthonormal Sets and Orthonormal Bases for Hilbert Spaces
9.1 Schauder Bases in Normed Spaces
9.2 Orthonormal Sets
9.3 Convergence of Orthogonal Series
9.4 Orthonormal Bases for Hilbert Spaces
9.5 Separable Hilbert Spaces
Exercises
10 Closest Points and Approximation
10.1 Closest Points in Convex Subsets of Hilbert Spaces
10.2 Linear Subspaces and Orthogonal Complements
10.3 Best Approximations
Exercises
11 Linear Maps between Normed Spaces
11.1 Bounded Linear Maps
11.2 Some Examples of Bounded Linear Maps
11.3 Completeness of B(X, Y ) When Y Is Complete
11.4 Kernel and Range
11.5 Inverses and Invertibility
Exercises
12 Dual Spaces and the Riesz Representation Theorem
12.1 The Dual Space
12.2 The Riesz Representation Theorem
Exercises
13 The Hilbert Adjoint of a Linear Operator
13.1 Existence of the Hilbert Adjoint
13.2 Some Examples of the Hilbert Adjoint
Exercises
14 The Spectrum of a Bounded Linear Operator
14.1 The Resolvent and Spectrum
14.2 The Spectral Mapping Theorem for Polynomials
Exercises
15 Compact Linear Operators
15.1 Compact Operators
15.2 Examples of Compact Operators
15.3 Two Results for Compact Operators
Exercises
16 The Hilbert–Schmidt Theorem
16.2 Eigenvalues of Compact Self-Adjoint Operators
16.3 The Hilbert–Schmidt Theorem
Exercises
17 Application: Sturm–Liouville Problems
17.1 Symmetry of L and the Wronskian
17.2 The Green’s Function
17.3 Eigenvalues of the Sturm–Liouville Problem
Part IV Banach Spaces
18 Dual Spaces of Banach Spaces
18.1 The Young and Hölder Inequalities
18.2 The Dual Spaces of l[sup(p)]
18.3 Dual Spaces of L[sup(p)](Omega)
Exercises
19 The Hahn–Banach Theorem
19.1 The Hahn–Banach Theorem: Real Case
19.2 The Hahn–Banach Theorem: Complex Case
Exercises
20 Some Applications of the Hahn–Banach Theorem
20.1 Existence of a Support Functional
20.2 The Distance Functional
20.3 Separability of X∗ Implies Separability of X
20.4 Adjoints of Linear Maps between Banach Spaces
20.5 Generalised Banach Limits
Exercises
21 Convex Subsets of Banach Spaces
21.1 The Minkowski Functional
21.2 Separating Convex Sets
21.3 Linear Functionals and Hyperplanes
21.4 Characterisation of Closed Convex Sets
21.5 The Convex Hull
21.6 The Krein–Milman Theorem
Exercises
22 The Principle of Uniform Boundedness
22.1 The Baire Category Theorem
22.2 The Principle of Uniform Boundedness
22.3 Fourier Series of Continuous Functions
Exercises
23 The Open Mapping, Inverse Mapping, and Closed Graph Theorems
23.1 The Open Mapping and Inverse Mapping Theorems
23.2 Schauder Bases in Separable Banach Spaces
23.3 The Closed Graph Theorem
Exercises
24 Spectral Theory for Compact Operators
24.1 Properties of T − I When T Is Compact
24.2 Properties of Eigenvalues
25 Unbounded Operators on Hilbert Spaces
25.2 Closed Operators and the Closure of Symmetric Operators
25.3 The Spectrum of Closed Unbounded Self-Adjoint Operators
26 Reflexive Spaces
26.1 The Second Dual
26.2 Some Examples of Reflexive Spaces
26.3 X Is Reflexive If and Only If X[sup(∗)] Is Reflexive
Exercises
27 Weak and Weak-∗ Convergence
27.1 Weak Convergence
27.2 Examples of Weak Convergence in Various Spaces
27.2.1 Weak Convergence in l[sup(p)], 1 < p < ∞
27.2.2 Weak Convergence in l[sup(1)]: Schur’s Theorem
27.2.3 Weak versus Pointwise Convergence in C([0, 1])
27.3 Weak Closures
27.4 Weak-∗ Convergence
27.5 Two Weak-Compactness Theorems
Exercises
Appendices
Appendix A Zorn’s Lemma
Appendix B Lebesgue Integration
Appendix C The Banach–Alaoglu Theorem
Solutions to Exercises
References
Index

##### Citation preview

An Introduction to Functional Analysis This accessible text covers key results in functional analysis that are essential for further study in the calculus of variations, analysis, dynamical systems, and the theory of partial differential equations. The treatment of Hilbert spaces covers the topics required to prove the Hilbert–Schmidt Theorem, including orthonormal bases, the Riesz Representation Theorem, and the basics of spectral theory. The material on Banach spaces and their duals includes the Hahn–Banach Theorem, the Krein–Milman Theorem, and results based on the Baire Category Theorem, before culminating in a proof of sequential weak compactness in reflexive spaces. Arguments are presented in detail, and more than 200 fully-worked exercises are included to provide practice applying techniques and ideas beyond the major theorems. Familiarity with the basic theory of vector spaces and point-set topology is assumed, but knowledge of measure theory is not required, making this book ideal for upper undergraduate-level and beginning graduate-level courses. J A M E S R O B I N S O N is a professor in the Mathematics Institute at the University of Warwick. He has been the recipient of a Royal Society University Research Fellowship and an EPSRC Leadership Fellowship. He has written six books in addition to his many publications in infinite-dimensional dynamical systems, dimension theory, and partial differential equations.

An Introduction to Functional Analysis JA MES C. ROBINSON University of Warwick

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9780521899642 DOI: 10.1017/9781139030267 c James C. Robinson 2020  This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2020 Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall A catalogue record for this publication is available from the British Library. ISBN 978-0-521-89964-2 Hardback ISBN 978-0-521-72839-3 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Preface PART I

page xiii P R E L I M I NA R I E S

1

1

Vector Spaces and Bases 1.1 Definition of a Vector Space 1.2 Examples of Vector Spaces 1.3 Linear Subspaces 1.4 Spanning Sets, Linear Independence, and Bases 1.5 Linear Maps between Vector Spaces and Their Inverses 1.6 Existence of Bases and Zorn’s Lemma Exercises

3 3 4 6 7 10 13 15

2

Metric Spaces 2.1 Metric Spaces 2.2 Open and Closed Sets 2.3 Continuity and Sequential Continuity 2.4 Interior, Closure, Density, and Separability 2.5 Compactness Exercises

17 17 19 22 23 25 30

PART II N O R M E D L I N E A R S PAC E S

33

Norms and Normed Spaces 3.1 Norms 3.2 Examples of Normed Spaces 3.3 Convergence in Normed Spaces 3.4 Equivalent Norms 3.5 Isomorphisms between Normed Spaces 3.6 Separability of Normed Spaces Exercises

35 35 38 42 43 46 48 50

3

vii

viii

Contents

4

Complete Normed Spaces 4.1 Banach Spaces 4.2 Examples of Banach Spaces 4.2.1 Sequence Spaces 4.2.2 Spaces of Functions 4.3 Sequences in Banach Spaces 4.4 The Contraction Mapping Theorem Exercises

53 53 56 57 58 61 63 64

5

Finite-Dimensional Normed Spaces 5.1 Equivalence of Norms on Finite-Dimensional Spaces 5.2 Compactness of the Closed Unit Ball Exercises

66 66 68 70

6

Spaces of Continuous Functions 6.1 The Weierstrass Approximation Theorem 6.2 The Stone–Weierstrass Theorem 6.3 The Arzelà–Ascoli Theorem Exercises

71 71 77 83 86

7

Completions and the Lebesgue Spaces L p () 7.1 Non-completeness of C([0, 1]) with the L 1 Norm 7.2 The Completion of a Normed Space 7.3 Definition of the L p Spaces as Completions Exercises

89 89 91 94 97

PART III H I L B E RT SPAC E S

99

8

Hilbert Spaces 8.1 Inner Products 8.2 The Cauchy–Schwarz Inequality 8.3 Properties of the Induced Norms 8.4 Hilbert Spaces Exercises

101 101 103 105 107 108

9

Orthonormal Sets and Orthonormal Bases for Hilbert Spaces 9.1 Schauder Bases in Normed Spaces 9.2 Orthonormal Sets 9.3 Convergence of Orthogonal Series 9.4 Orthonormal Bases for Hilbert Spaces 9.5 Separable Hilbert Spaces Exercises

110 110 112 115 117 122 123

Contents

ix

10

Closest Points and Approximation 10.1 Closest Points in Convex Subsets of Hilbert Spaces 10.2 Linear Subspaces and Orthogonal Complements 10.3 Best Approximations Exercises

126 126 129 131 134

11

Linear Maps between Normed Spaces 11.1 Bounded Linear Maps 11.2 Some Examples of Bounded Linear Maps 11.3 Completeness of B(X, Y ) When Y Is Complete 11.4 Kernel and Range 11.5 Inverses and Invertibility Exercises

137 137 141 145 146 147 150

12

Dual Spaces and the Riesz Representation Theorem 12.1 The Dual Space 12.2 The Riesz Representation Theorem Exercises

153 153 155 157

13

The Hilbert Adjoint of a Linear Operator 13.1 Existence of the Hilbert Adjoint 13.2 Some Examples of the Hilbert Adjoint Exercises

159 159 162 164

14

The Spectrum of a Bounded Linear Operator 14.1 The Resolvent and Spectrum 14.2 The Spectral Mapping Theorem for Polynomials Exercises

165 165 169 171

15

Compact Linear Operators 15.1 Compact Operators 15.2 Examples of Compact Operators 15.3 Two Results for Compact Operators Exercises

173 173 175 177 178

16

The Hilbert–Schmidt Theorem 16.1 Eigenvalues of Self-Adjoint Operators 16.2 Eigenvalues of Compact Self-Adjoint Operators 16.3 The Hilbert–Schmidt Theorem Exercises

180 180 182 184 188

17

Application: Sturm–Liouville Problems 17.1 Symmetry of L and the Wronskian

190 191

x

Contents

17.2 The Green’s Function 17.3 Eigenvalues of the Sturm–Liouville Problem

193 195

PART IV BA NAC H SPAC E S

199

18

Dual Spaces of Banach Spaces 18.1 The Young and Hölder Inequalities 18.2 The Dual Spaces of  p 18.3 Dual Spaces of L p () Exercises

201 202 204 207 208

19

The Hahn–Banach Theorem 19.1 The Hahn–Banach Theorem: Real Case 19.2 The Hahn–Banach Theorem: Complex Case Exercises

210 210 214 217

20

Some Applications of the Hahn–Banach Theorem 20.1 Existence of a Support Functional 20.2 The Distance Functional 20.3 Separability of X ∗ Implies Separability of X 20.4 Adjoints of Linear Maps between Banach Spaces 20.5 Generalised Banach Limits Exercises

219 219 220 221 222 224 226

21

Convex Subsets of Banach Spaces 21.1 The Minkowski Functional 21.2 Separating Convex Sets 21.3 Linear Functionals and Hyperplanes 21.4 Characterisation of Closed Convex Sets 21.5 The Convex Hull 21.6 The Krein–Milman Theorem Exercises

228 228 230 233 234 235 236 239

22

The Principle of Uniform Boundedness 22.1 The Baire Category Theorem 22.2 The Principle of Uniform Boundedness 22.3 Fourier Series of Continuous Functions Exercises

240 240 242 244 247

23

The Open Mapping, Inverse Mapping, and Closed Graph Theorems 23.1 The Open Mapping and Inverse Mapping Theorems 23.2 Schauder Bases in Separable Banach Spaces

249 249 252

Contents

xi

23.3 The Closed Graph Theorem Exercises

255 256

24

Spectral Theory for Compact Operators 24.1 Properties of T − I When T Is Compact 24.2 Properties of Eigenvalues

258 258 262

25

Unbounded Operators on Hilbert Spaces 25.1 Adjoints of Unbounded Operators 25.2 Closed Operators and the Closure of Symmetric Operators 25.3 The Spectrum of Closed Unbounded Self-Adjoint Operators

264 265 267 269

26

Reflexive Spaces 26.1 The Second Dual 26.2 Some Examples of Reflexive Spaces 26.3 X Is Reflexive If and Only If X ∗ Is Reflexive Exercises

273 273 275 277 280

27

Weak and Weak-∗ Convergence 27.1 Weak Convergence 27.2 Examples of Weak Convergence in Various Spaces 27.2.1 Weak Convergence in  p , 1 < p < ∞ 27.2.2 Weak Convergence in 1 : Schur’s Theorem 27.2.3 Weak versus Pointwise Convergence in C([0, 1]) 27.3 Weak Closures 27.4 Weak-∗ Convergence 27.5 Two Weak-Compactness Theorems Exercises

282 282 285 285 286 288 289 290 292 295

APPENDICES

299

Appendix A

Zorn’s Lemma

301

Appendix B

Lebesgue Integration

305

Appendix C

The Banach–Alaoglu Theorem

319

Solutions to Exercises References Index

331 394 396

Preface

This book is intended to cover the core functional analysis syllabus and, in particular, presents many of the results that are needed in partial differential equations, the calculus of variations, or dynamical systems. The material is developed far enough that the next step would be application to one of these areas or further pursuit of ‘functional analysis’ itself at a significantly more advanced level. The content is based on the two functional analysis modules taught at the University of Warwick to our third-year undergraduates. As such, it should be straightforward to use this book (with some judicious pruning) as the basis of a two-term course, with Part III (Hilbert spaces) taught in the first term and Part IV (Banach spaces) in the second term. Part II contains foundational material (a general theory of normed spaces and a collection of example spaces) that is needed for both Parts III and IV; some of this material could find a home in either term, according to taste. A one-term standalone module on Banach spaces could be based on Part II; Chapters 11, 14, and 15 from Part III; and Part IV. Familiarity is assumed with the theory of finite-dimensional vector spaces and basic point-set topology (metric spaces, open and closed sets, compactness, and completeness), which is revised, at a fairly brisk pace and with some proofs omitted, in the first two chapters. No knowledge of measure theory or Lebesgue integration is required: the Lebesgue spaces are introduced as completions of the space of continuous functions in Chapter 7, with the standard construction of the Lebesgue integral outlined in Appendix B. The canonical examples of non-Hilbert Banach spaces used in Part IV are the sequence spaces  p rather than the Lebesgue spaces L p ; I hope that this will make the book accessible to a wider audience. In the same spirit I have tried to spell

xiii

xiv

Preface

out all the arguments in detail; there are no1 four-line proofs that when written with all the details expand to fill the same number of pages. For the most part the approach adopted here is to cover the simpler case of Hilbert spaces in Part III before turning to Banach spaces, for which the theory becomes more abstract, in Part IV. There is an argument that it is more efficient to prove results in Banach spaces before specialising to Hilbert spaces, but my suspicion is that this is a product of familiarity and experience: in the same way one might argue that it is more economical to teach analysis in metric spaces before specialising to the particular case of real sequences and real-valued functions. That said, some basic concepts and results are not significantly simpler in Hilbert spaces, so portions of Parts II and III deal with Banach rather than Hilbert spaces. By way of a very brief overview of the contents of the book, it is perhaps useful to describe the end points of Parts III and IV. Part III works towards the Hilbert–Schmidt Theorem that decomposes a self-adjoint compact operator on a Hilbert space in terms of its eigenvalues and eigenfunctions, and then applies this to the example of the Sturm–Liouville eigenvalue problem. It therefore covers orthonormal bases, orthogonal projections, the Riesz Representation Theorem, and the basics of spectral theory. Part IV culminates with the result that the closed unit ball in a reflexive Banach space is weakly sequentially compact. So this part covers dual spaces in more detail, the Hahn–Banach Theorem and applications to convex sets, results for linear operators based on the Baire Category Theorem, reflexivity, and weak and weak-∗ convergence. Almost every chapter ends with a collection of exercises, and full solutions to these are given at the end of the book. There are three appendices. The first shows the equivalence of Zorn’s Lemma and the Axiom of Choice; the second provides a quick overview of the construction of the Lebesgue integral and proves properties of the Lebesgue spaces that rely on measure-theoretic techniques; and the third proves the Banach–Alaoglu Theorem on weak-∗ compactness of the closed unit ball in an arbitrary Banach space, a topological result that lies outside the scope of the main part of the book. I am indebted to those at Warwick who taught the Functional Analysis courses before me, both in the selection of the material and the general approach. Although I have adapted both over the years, the skeleton of this book was provided by Robert MacKay and Keith Ball, to whom I am very 1 Actually, there is one. An abridged version of the proof that (L p )∗ ≡ L q appears in Chapter 18

and takes about half a page. The detailed proof, which requires some non-trivial measure theory, takes up two pages Appendix B.

Preface

xv

grateful. Those who have subsequently taught the same material, Richard Sharp and Vassili Gelfreich, have also been extremely helpful. Writing a textbook encourages a magpie approach to results, proofs, and examples. I have been extremely fortunate that there are already a large number of texts on functional analysis, and I have tried to take advantage of the many insights and the imaginative problems that they contain. Just as there are standard results and standard proofs, there are many standard exercises, but I have credited those that I have adopted that seemed particularly imaginative or unusual. In addition, there is a long list of references at the back of the book, and each of these has contributed something to this text. I would particularly like to acknowledge the book by Rynne and Youngson (2008) and the older texts by Kreyszig (1978) and Pryce (1973) as consistent sources of inspiration. The books by Giles (2000) and Lax (2002) contain many interesting examples and exercises. I have not tried to trace the history of the many now ‘classical’ results that occur throughout the book. For those who are interested in this aspect of the subject, Giles (2000) has an appendix that gives a nice overview of the historical background, and historical comments are woven throughout the text by Lax (2002). Banach’s 1932 monograph contains a significant proportion of the results in Part IV. Many staff at Cambridge University Press have been involved with this project over the years: Clare Dennison, Sam Harrison, Amy He, Kaitlin Leach, Peter Thompson, and David Tranah. Given such a long list of names, it goes without saying that I would like to thank them all for their patience and support (and apologise to anybody I have missed). I would particularly like to thank Kaitlin for ultimately holding me to a deadline that meant I finally finished the book. Lastly, I am extremely grateful to Wojciech Oz˙ a´nski, who read a draft version of this book and provided me with many corrections, suggestions, and insightful comments.

PART I Preliminaries

1 Vector Spaces and Bases

Much of the theory of ‘functional analysis’ that we will consider in this book is an infinite-dimensional version of results familiar for linear operators between finite-dimensional vector spaces. We therefore start by recalling some of the basic theory of linear algebra, beginning with the formal definition of a vector space. We then discuss linear maps between vector spaces, and end by proving that every vector space has a basis using Zorn’s Lemma. Proofs of basic results from linear algebra can be found in Friedberg et al. (2004) or in Chapter 4 of Naylor and Sell (1982), for example.

1.1 Definition of a Vector Space The linear spaces that occur naturally in functional analysis are vector spaces defined over R or C; we will refer to real or complex vector spaces respectively, but generally we will omit the word ‘real’ or ‘complex’ unless we need to make an explicit distinction between the two cases. Throughout the book we use the symbol K to denote either R or C. Definition 1.1 A vector space V over K is a set V along with notions of addition in V and multiplication by scalars, i.e. x+y∈V

for

x, y ∈ V

and

λx ∈ V

for λ ∈ K, x ∈ V, (1.1)

such that (i) additive and multiplicative identities exist: there exists a zero element 0 ∈ V such that x + 0 = x for all x ∈ V ; and 1 ∈ K is the identity for scalar multiplication, 1x = x for all x ∈ V ; (ii) there are additive inverses: for every x ∈ V there exists an element −x ∈ V such that x + (−x) = 0; 3

4

Vector Spaces and Bases

(iii) addition is commutative and associative, x+y = y+x

x + (y + z) = (x + y) + z,

and

for all x, y, z ∈ V ; and (iv) multiplication is associative, α(βx) = (αβ)x

α, β ∈ K, x ∈ V,

for all

and distributive, α(x + y) = αx + αy

and

(α + β)x = αx + βx

for all α, β ∈ K, x, y ∈ V . In checking that a particular collection V is a vector space over K, properties (i)–(iv) are often immediate; one usually has to check only that V is closed under addition and scalar multiplication (i.e. that (1.1) holds).

1.2 Examples of Vector Spaces Of course, Rn is a real vector space over R; but is not a vector space over C, since ix ∈ / Rn for any1 x ∈ Rn . In contrast, Cn can be a vector space over both R and C; the space Cn over R is (according to the terminology introduced above) a ‘real vector space’. This example is a useful illustration that the real/complex label refers to the field K, i.e. the allowable scalar multiples, rather than to the elements of the space itself. Given any two vector spaces V1 and V2 over K, the product space V1 × V2 consisting of all pairs (v1 , v2 ) with v1 ∈ V1 and v2 ∈ V2 is another vector space if we define (v1 , v2 )+(u 1 , u 2 ) := (v1 +u 1 , v2 +u 2 )

and

α(v1 , v2 ) := (αv1 , αv2 ),

for v1 , u 1 ∈ V1 , v2 , u 2 ∈ V2 , α ∈ K. We now introduce some less trivial examples. Example 1.2 The space F(U, V ) of all functions f : U → V , where U and V are both vector spaces over the same field K, is itself a vector space, if we use the obvious definitions of what addition and scalar multiplication should mean for functions. We give these definitions here for the one and only time: 1 Throughout this book we will use a bold x for elements of Rn (also of Cn ), with x given in components by x = (x1 , . . . , xn ).

1.2 Examples of Vector Spaces

5

for f, g ∈ F(U, V ) and α ∈ K, we denote by f + g the function from U to V whose values are given by ( f + g)(x) = f (x) + g(x),

x ∈ U,

(‘pointwise addition’) and by α f the function whose values are (α f )(x) = α f (x),

x ∈U

(‘pointwise multiplication’). Example 1.3 The space C([a, b]; K) of all K-valued continuous functions on the interval [a, b] is a vector space. We will often write C([a, b]) for C([a, b]; R). Proof The sum of two continuous functions is again continuous, as is any scalar multiple of a continuous function. Example 1.4 The space P(I ) of all real polynomials on any interval I ⊂ R, ⎧ ⎫ n ⎨ ⎬  P(I ) = p : I → R : p(x) = a j x j , n = 0, 1, 2, . . . , a j ∈ R ⎩ ⎭ j=0

is a vector space. The next example introduces a family of spaces that will prove to be particularly important. Example 1.5 For 1 ≤ p < ∞ the space  p (K) consists of all pth power summable sequences x = (x j )∞ j=1 with elements in K, i.e. ⎧ ⎫ ∞ ⎨ ⎬  p  p (K) = x = (x j )∞ : x ∈ K, |x | < ∞ . j j j=1 ⎩ ⎭ j=1

For p = ∞, ∞ (K) is the space of all bounded sequences in K. Sometimes we will simply write  p for  p (K). Note that, as with Kn , we will use a bold x to denote a particular sequence in  p . For x, y ∈  p (K) we set x + y := (x1 + y1 , x2 + y2 , . . .), and for α ∈ K, x ∈  p , we define αx := (αx1 , αx2 , . . .). With these definitions  p (K) is a vector space.

6

Vector Spaces and Bases

Proof The only thing that is not immediate is whether x + y ∈  p (K) if x, y ∈  p (K). This is clear when p = ∞, since sup |x j + y j | ≤ sup |x j | + sup |y j | < ∞. j∈N

j∈N

j∈N

For 1 ≤ p < ∞ this follows using the inequality (a + b) p ≤ [2 max(a, b)] p ≤ 2 p (a p + b p ),

for a, b ≥ 0;

(1.2)

for every n ∈ N we have n 

|x j + y j | p ≤

j=1

and so

n 

2 p (|x j | p + |y j | p ) ≤ 2 p

j=1

j=1 |x j

∞ 

|x j | p + 2 p

j=1

∞ 

|y j | p < ∞

j=1

+ y j | p < ∞ as required.

(The factor 2 p in (1.2) can be improved to 2 p−1 ; see Exercise 1.1.)

1.3 Linear Subspaces If V is a vector space (over K) then any subset U ⊂ V is a subspace of V if U is again a vector space, i.e. if it is closed under addition and scalar multiplication, i.e. u 1 + u 2 ∈ U for every u 1 , u 2 ∈ U and λu ∈ U for every λ ∈ K, u ∈ U . Example 1.6 For any y ∈ Rn , the set {x ∈ Rn : x · y = 0} is a subspace of Rn . Example 1.7 The set

X=

f ∈ C([−1, 1]) :

ˆ

0

−1

ˆ f (x) dx = 0,

1

f (x), dx = 0

0

is a subspace of C([−1, 1]). Example 1.8 The space c0 (K) of all null sequences, i.e. of all sequences ∞ x = (x j )∞ j=1 such that x j → 0 as j → ∞, is a subspace of  (K), and p for every 1 ≤ p < ∞ the space  (K) is a subspace of c0 (K). The space c00 (K) of all sequences with only a finite number of non-zero terms is a subspace of c0 (K) and of  p (K) for every 1 ≤ p ≤ ∞.

1.4 Spanning Sets, Linear Independence, and Bases

7

Proof For the inclusion properties of c0 (K), note that any convergent sequence (in particular any null sequence) is bounded, which shows that p c0 (K) ⊂ ∞ (K). If x ∈  p , 1 ≤ p < ∞, then ∞ j=1 |x j | < ∞, which p implies that |x j | → 0 as j → ∞, so x ∈ c0 (K). The properties of c00 (K) are immediate.

1.4 Spanning Sets, Linear Independence, and Bases We now recall the definition of a vector-space basis, which will also allow us to define the dimension of a vector space. Definition 1.9 The linear span of a subset E of a vector space V is the collection of all finite linear combinations of elements of E: ⎫ ⎧ n ⎬ ⎨  α j e j , for some n ∈ N, α j ∈ K, e j ∈ E . Span(E) = v ∈ V : v = ⎭ ⎩ j=1

We say that E spans V if V = Span(E). If E spans V this means that we can write any v ∈ V in the form v=

n 

αjej,

j=1

i.e v can be expressed as a finite linear combination of elements of E. (Once we have a way to discuss convergence we will also be able to consider ‘infinite linear combinations’, but these are not available when we can only use the vector-space axioms.) Definition 1.10 A set E ⊂ V is linearly independent if any finite collection of elements of E is linearly independent, i.e. n 

αjej = 0

α1 = · · · = αn = 0

j=1

for any choice of n ∈ N, α j ∈ K, and e j ∈ E. To distinguish the standard definition of a basis for a vector space from the notion of a ‘Schauder basis’, which we will meet later, we refer to such a basis as a ‘Hamel basis’.

8

Vector Spaces and Bases

Definition 1.11 A Hamel basis for a vector space V is any linearly independent spanning set. Expansions in terms of basis elements are unique (for a proof see Exercise 1.3). Lemma 1.12 If E is a Hamel basis for V, then any element of V can be written uniquely in the form n  αjej v= j=1

for some n ∈ N, α j ∈ K, and e j ∈ E. Any Hamel basis E of V must be a maximal linearly independent set, i.e. E is linearly independent and E ∪ {v} is not linearly independent for any v ∈ V \ E. We now show that this can be reversed. Lemma 1.13 If E ⊂ V is maximal linearly independent set, then E is a Hamel basis for V . Proof To show that E is a Hamel basis we only need to show that it spans V, since it is linearly independent by assumption. If E does not span V, then there exists some v ∈ V that cannot be written as any finite linear combination of the elements of E. To obtain a contradiction, we show that in this case E ∪ {v} must be a linearly independent set. Choose n ∈ N and {e j }nj=1 ∈ E, and suppose that n 

α j e j + αn+1 v = 0.

j=1

Since v cannot be written as a sum of any finite collection of the {e j }, we must have αn+1 = 0, which leaves nj=1 α j e j = 0. However, since E is linearly independent and {e j }nj=1 is a finite subset of E it follows that α j = 0 for all j = 1, . . . , n. Since we already have αn+1 = 0, it follows that E ∪ {v} is linearly independent, contradicting the fact that E is a maximal linearly independent set. So E spans V, as claimed. If V has a basis consisting of a finite number of elements, then every basis of V contains the same number of elements (for a proof see Exercise 1.4). Lemma 1.14 If V has a basis consisting of n elements, then every basis for V has n elements.

1.4 Spanning Sets, Linear Independence, and Bases

9

This result allows us to make the following definition of the dimension of a vector space. Definition 1.15 If V has a basis consisting of a finite number of elements, then V is finite-dimensional and the dimension of V is the number of elements in this basis. If V has no finite basis, then V is infinite-dimensional. Since a basis is a maximal linearly independent set (Lemma 1.13), it follows that a space is infinite-dimensional if and only if for every n ∈ N one can find a set of n linearly independent elements of V . Example 1.16 For every 1 ≤ p ≤ ∞ the space  p (K) is infinite-dimensional. Proof Let us define for each j ∈ N the sequence e( j) = (0, 0, . . . , 1, 0, . . .),

(1.3)

which consists entirely of zeros apart from having 1 as its jth term. We can also write

1 i= j ( j) ei = δi j := (1.4) 0 i = j, where δi j is the Kronecker delta. These are all elements of  p (K) for every p ∈ [1, ∞], and will frequently prove useful in what follows. For any n ∈ N the n elements {e( j) }nj=1 are linearly independent, since n 

α j e( j) = (α1 , α2 , . . . , αn , 0, 0, 0, . . .) = 0

j=1

implies that α1 = α2 = · · · = αn = 0. It follows that  p (K) is an infinitedimensional vector space. Example 1.17 The vector space C([0, 1]; K) is infinite-dimensional. Proof For any n ∈ N the functions {1, x, x 2 , . . . , x n } are linearly independent: if f (x) :=

n  j=0

αjx j = 0

for every x ∈ [0, 1],

10

Vector Spaces and Bases

then α j = 0 for every j. To see this, first set x = 0, which shows that α0 = 0, then differentiate once to obtain f  (x) =

n 

α j j x j−1 = 0

j=1

and set x = 0 to show that α1 = 0. Continue differentiating repeatedly, each time setting x = 0 to show that α j = 0 for all j = 0, . . . , n.

1.5 Linear Maps between Vector Spaces and Their Inverses Vector spaces have a linear structure, i.e. we can add elements and multiply by scalars. When we consider maps from one vector space to another, it is natural to consider maps that respect this linear structure. Definition 1.18 If X and Y are vector spaces over K, then a map T : X → Y is linear if T (x + x  ) = T (x) + T (x  )

and

T (αx) = αT (x),

α ∈ K, x, x  ∈ X.

(This is the same as requiring that T (αx + βx  ) = αT (x) + βT (x  ) for any α, β ∈ K, x, x  ∈ U .) We often omit the brackets around the argument, and write T x for T (x) when T is linear. Note that the definition of what it means to be linear involves the field K. So, for example, if we take X = Y = C and let T (z) = z (the complex conjugate of z), this map is linear if we take K = R, but not if we take K = C. We always have T (z + w) = z + w = z + w = T (z) + T (w),

z, w ∈ C,

but the linearity property for scalar multiples only holds if α ∈ R, since T (αz) = αz = α z and this is equal to αz = αT (z) if and only if α ∈ R. This kind of ‘conjugate-linear’ behaviour is common enough that it is worth making a formal definition.

1.5 Linear Maps between Vector Spaces and Their Inverses

11

Definition 1.19 If X and Y are vector spaces over C, then a map T : X → Y is conjugate-linear if T (x + x  ) = T x + T x 

and

α ∈ C, x, x  ∈ X.

T (αx) = α T x,

(Such maps are sometimes called anti-linear.) The space of all linear maps from X into Y we write as L(X, Y ), and when Y = X we abbreviate this to L(X ). This is a vector space: for T1 , T2 ∈ L(X, Y ) and α ∈ K we define T1 + T2 and αT1 by setting (T1 + T2 )(x) = T1 x + T2 x

(αT1 )(x) = αT1 x,

and

x ∈ X.

With these definitions a linear combination of two linear maps is again a linear map: T1 , T2 ∈ L(X, Y )

αT1 + βT2 ∈ L(X, Y ),

α, β ∈ K.

Similarly the composition of compatible linear maps is again linear, T ∈ L(X, Y ), S ∈ L(Y, Z )

S ◦ T ∈ L(X, Z )

since (S ◦ T )(αx + βx  ) = S(αT x + βT x  ) = α(S ◦ T )x + β(S ◦ T )x  . Definition 1.20 If T ∈ L(X, Y ), then we define its kernel as Ker(T ) := {x ∈ X : T x = 0} and its range (or image) as Range(T ) := {y ∈ Y : y = T x for some x ∈ X }. These are both vector spaces (see Exercise 1.5). One particularly simple (but important) example of a linear map is the identity map I X : X → X given by I X (x) = x. Recall that a map T : X → Y is injective (or one-to-one) if T x = T x

x = x .

To check if a linear map T : X → Y is injective, it is enough to show that its kernel is trivial, i.e. that Ker(T ) = {0}. Lemma 1.21 A map T ∈ L(X, Y ) is injective if and only if Ker(T ) = {0}.

12

Vector Spaces and Bases

Proof We prove the equivalent statement that T is not injective if and only if Ker(T ) = {0}. If T is not injective, then there exist x1 , x2 ∈ X with x1 = x2 such that T x1 = T x2 , i.e. T (x1 − x2 ) = 0, and so x1 − x2 ∈ Ker(T ) and therefore Ker(T ) = {0}. On the contrary, if z ∈ Ker(T ) with z = 0, then for any x1 ∈ X we have T (x1 + z) = T x1 and T is not injective. A map T : X → Y is surjective (or onto) if for every y ∈ Y there exists x ∈ X such that T x = y. Lemma 1.22 If X is a finite-dimensional vector space and T ∈ L(X ), then T is injective if and only if T is surjective. Proof The Rank–Nullity Theorem (e.g. Theorem 2.3 in Friedberg et al. (2014) or Theorem 4.7.7 in Naylor and Sell (1982)) guarantees that dim(Ker(T )) + dim(Range(T )) = dim(X ) (the ‘nullity’ is the dimension of Ker(T ) and the ‘rank’ is the dimension of Range(T )). By Lemma 1.21, T is injective when dim(Ker(T )) = 0, which then implies that dim(Range(T )) = dim(X ) so that T is onto; similarly, if T is onto, then dim(Range(T )) = dim(X ) which implies that dim(Ker(T )) = 0, and so T is also injective. A map is bijective or a bijection if it is both injective and surjective. When T is a bijection we can define its inverse. Definition 1.23 A map T : X → Y has an inverse T −1 : Y → X if T is a bijection, and in this case for each y ∈ Y we define T −1 y to be the unique x ∈ X such that T x = y. Note that if Ker(T ) = {0}, then the linear map T : X → Range(T ) always has an inverse; if X is infinite-dimensional T may not map X onto Y , but it always maps X onto Range(T ), by definition. The following lemma shows that when T ∈ L(X, Y ) has an inverse, the map T −1 : Y → X is also linear. Lemma 1.24 A linear map T ∈ L(X, Y ) has an inverse if and only if there exists S ∈ L(Y, X ) such that ST = I X and then T −1 = S.

and

T S = IY ,

(1.5)

1.6 Existence of Bases and Zorn’s Lemma

13

Proof Suppose that T : X → Y is a bijection, so that it has an inverse T −1 : Y → X ; from the definition it follows that T T −1 = IY and T −1 T = I X . It remains to check that T −1 : Y → X is linear; this follows from the injectivity of T , since T [T −1 (αy + βz)] = αy + βz = T [αT −1 y + βT −1 z] therefore implies that T −1 (αy + βz) = αT −1 y + βT −1 z. For the converse, we note that T S = IY implies that T : X → Y is onto, since T (Sy) = y, and that ST = I X implies that T : X → Y is one-to-one, since Tx = Ty

S(T x) = S(T y)

x = y.

It follows that if (1.5) holds, then T has an inverse T −1 , and applying T −1 to both sides of T S = IY shows that S = T −1 . Note that if T ∈ L(X, Y ) and S ∈ L(Y, Z ) are both invertible, then so is ST ∈ L(X, Z ), with (ST )−1 = T −1 S −1 ;

(1.6)

since ST is a bijection it has an inverse (ST )−1 such that ST (ST )−1 = I Z ; multiplying first by S −1 and then by T −1 yields (1.6).

1.6 Existence of Bases and Zorn’s Lemma We end this chapter by showing that every vector space has a Hamel basis. To prove this, we will use Zorn’s Lemma, which is a very powerful result that will allow us to prove various existence results throughout this book. To state this ‘lemma’ (which is in fact equivalent to the Axiom of Choice, as shown in Appendix A) we need to introduce some auxiliary concepts. Definition 1.25 A partial order on a set P is a binary relation  on P such that for a, b, c ∈ P (i) a  a; (ii) a  b and b  a implies that a = b; and (iii) a  b and b  c implies that a  c.

14

Vector Spaces and Bases

The order is ‘partial’ because two arbitrary elements of P need not be ordered: consider for example, the case when P consists of all subsets of R and X  Y if X ⊆ Y ; one cannot order [0, 1] and [1, 2]. Definition 1.26 Two elements a, b ∈ P are comparable if a  b or b  a (or both if a = b). A subset C of P is called a chain if any pair of elements of C are comparable. An element b ∈ P is an upper bound for a subset S of P if s  b for all s ∈ S. An element m of P is maximal if m  a for some a ∈ P implies that a = m. Note that among any finite collection of elements in a chain C there is always a maximal and a minimal element: if c1 , . . . , cn ∈ C, then there are indices j, k ∈ {1, . . . , n} such that c j  ci  ck

i = 1, . . . , n;

(1.7)

this can easily be proved by induction on n; see Exercise 1.7. Theorem 1.27 (Zorn’s Lemma) If P is a non-empty partially ordered set in which every chain has an upper bound, then P has at least one maximal element. It is easy to find examples in which there is more than one maximal element. For example, let P consist of all points in the two disjoint intervals I1 = [0, 1] and I2 = [2, 3], and say that a  b if a and b are contained in the same interval and a ≤ b. Then every chain in P has an upper bound, and P contains two maximal elements, 1 and 3. Theorem 1.28 Every vector space has a Hamel basis. Proof If V is finite-dimensional, then V has a finite-dimensional basis, by definition. So we assume that V is infinite-dimensional. Let P be the collection of all linearly independent subsets of V . We define a partial order on P by declaring that E 1  E 2 if E 1 ⊆ E 2 . If C is a chain in P, then set E∗ = E. E∈C

E∗

Note that is linearly independent, since by (1.7) any finite collection of elements of E ∗ must be contained in one E ∈ C (which is linearly independent). Clearly E  E ∗ for all E ∈ C, so E ∗ is an upper bound for C.

Exercises

15

It follows from Zorn’s Lemma that P has a maximal element, i.e. a maximal linearly independent set, and by Lemma 1.13 this is a Hamel basis for V . As an example of a Hamel basis for an infinite-dimensional vector space, it is easy to see that the countable set {e( j) }∞ j=1 (as defined in (1.3)) is a Hamel basis for the space c00 from Example 1.8. However, this is a somewhat artificial example. We will see later (Exercises 5.7 and 22.1) that no Banach space (the particular class of vector spaces that will be our main subject in most of the rest of this book) can have a countable Hamel basis.

Exercises 1.1

Show that if p ≥ 1 and a, b ≥ 0, then (a + b) p ≤ 2 p−1 (a p + b p ).

1.2

[Hint: find the maximum of the function f (x) = (1 + x) p /(1 + x p ).] For 1 ≤ p < ∞, show that the set L˜ p (0, 1) of all continuous real-valued functions on (0, 1) for which ˆ 1 | f (x)| p dx < ∞ 0

1.3

1.4 1.5 1.6

is a vector space (with the obvious pointwise definitions of addition and scalar multiplication). Show that if E is a basis for a vector space V, then every non-zero v ∈ V can be written uniquely in the form v = nj=1 α j e j , for some n ∈ N, e j ∈ E, and non-zero coefficients α j ∈ K. Show that if V has a basis consisting of n elements, then every basis for V has n elements. If T ∈ L(X, Y ) show that Ker(T ) and Im(T ) are both vector spaces. If X is a vector space over K and U is a subspace of X define an equivalence relation on X by x∼y

x − y ∈ U.

The quotient space X/U is the set of all equivalence classes [x] = x + U := {x + u : u ∈ U } for x ∈ X . Show that this is a vector space over K if we define [x] + [y] := [x + y]

λ[x] := [λx],

x, y ∈ X, λ ∈ K ,

16

1.7

Vector Spaces and Bases and deduce that the quotient map Q : X → X/U given by x → [x] is linear. Show that among any finite collection of elements in a chain C there is always a maximal and a minimal element: if c1 , . . . , cn ∈ C, then there exist j, k ∈ {1, . . . , n} such that c j  ci  ck

1.8

i = 1, . . . , n.

(Use induction on n.) Let Z be a linearly independent subset of a vector space V . Use Zorn’s Lemma to show that V has a Hamel basis that contains Z .

2 Metric Spaces

Most of the results in this book concern normed spaces; but these are particular examples of metric spaces, and there are some ‘standard results’ that are no harder to prove in the more general context of metric spaces. In this chapter we therefore recall the definition of a metric space, along with definitions of convergence, continuity, separability, and compactness. The treatment in this chapter is intentionally brisk, but proofs are included. For a more didactic treatment see Sutherland (1975), for example.

2.1 Metric Spaces A metric on a set X is a generalisation of the ‘distance between two points’ familiar in Euclidean spaces. Definition 2.1 A metric d on a set X is a map d : X × X → [0, ∞) that satisfies (i) d(x, y) = 0 if and only if x = y; (ii) d(x, y) = d(y, x) for every x, y ∈ X ; and (iii) d(x, z) ≤ d(x, y) + d(y, z) for x, y, z ∈ X (‘the triangle inequality’). Even on a familiar space there can be many possible metrics. Example 2.2 Take X = Kn with any one of the metrics d p (x, y) =

⎧ ⎨ n

j=1 |x j

− y j |p

1/ p

⎩max j=1,...,n |x j − y j | 17

1 ≤ p < ∞, p = ∞.

18

Metric Spaces

The ‘standard metric’ on Kn is

d2 (x, y) = ⎝

n 

⎞1/2 |x j − y j |2 ⎠

;

j=1

this is the metric we use on Kn (or subsets of Kn ) if none is specified. Proof Property (i) is trivial, since d p (x, y) = 0 implies that x j = y j for each j, and property (ii) is immediate. We show here that d p satisfies (iii) only for p = 1, 2, ∞, the most common cases. The proof for general p is given in Lemma 3.6. For p = 1 d1 (x, z) = =

n  j=1 n 

|x j − z j | ≤ |x j − y j | +

j=1

n  j=1 n 

|x j − y j | + |y j − z j | |y j − z j | = d1 (x, y) + d1 ( y, z),

j=1

using the triangle inequality in K. For p = ∞ we have similarly d∞ (x, z) = max |x j − z j | ≤ max |x j − y j | + |y j − z j | j=1,...,n

j=1,...,n

≤ max |x j − y j | + max |y j − z j | j=1,...,n

j=1,...,n

= d∞ (x, y) + d∞ ( y, z). For p = 2, writing ξ j = |x j − y j | and η j = |y j − z j |, d2 (x, z)2 =

n 

|x j − z j |2 ≤

j=1

=

n 

n   2 |x j − y j | + |y j − z j | j=1

ξ 2j + 2ξ j η j + η2j

(2.1)

j=1

⎛ ⎞ ⎛ ⎞1/2 ⎛ ⎞1/2 ⎛ ⎞ n n n n     ≤⎝ ξ 2j ⎠ + 2 ⎝ ξ 2j ⎠ ⎝ η2j ⎠ + ⎝ η2j ⎠ (2.2) j=1

j=1

j=1

⎡⎛ ⎞1/2 ⎛ ⎞1/2 ⎤2 n n   ⎥ ⎢ = ⎣⎝ ξ 2j ⎠ + ⎝ η2j ⎠ ⎦ j=1

j=1

= [d2 (x, y) + d2 ( y, z)]2 ,

j=1

2.2 Open and Closed Sets

19

where to go from (2.1) to (2.2) we used the Cauchy–Schwarz inequality ⎛ ⎞2 ⎛ ⎞⎛ ⎞ n n n    ⎝ ξjηj⎠ ≤ ⎝ ξ 2j ⎠ ⎝ η2j ⎠ ; j=1

j=1

j=1

see Exercise 2.1 (and Lemma 8.5 in a more general context). Note that the space X in the definition of a metric need not be a vector space. The following example provides a metric on any set X ; it is very useful for counterexamples. Example 2.3 The discrete metric on any set X is defined by setting

0 x = y, d(x, y) = 1 x = y. If A is a subset of X and d is a metric on X , then (A, d| A×A ) is another metric space, where by d| A×A we denote the restriction of d to A × A, i.e. d| A×A (a, b) = d(a, b)

a, b ∈ A;

(2.3)

we usually drop the | A×A since this is almost always clear from the context. If we have two metric spaces (X 1 , d1 ) and (X 2 , d2 ), then we can choose many possible metrics on the product space X 1 × X 2 . The most useful choices are   (2.4)

1 (x1 , x2 ), (y1 , y2 ) := d1 (x1 , y1 ) + d2 (x2 , y2 ) and

1/2    .

2 (x1 , x2 ), (y1 , y2 ) := d1 (x1 , y1 )2 + d2 (x2 , y2 )2

(2.5)

These have obvious generalisations to the product of any finite number of metric spaces. While the expression in (2.4) is simpler and easier to work with, the definition in (2.5) ensures that the metric on Kn that comes from viewing it as the n-fold product K × K × · · · × K agrees with the usual Euclidean distance. Exercise 2.2 provides a larger family p of product metrics.

2.2 Open and Closed Sets The notion of an open set is fundamental in the study of metric spaces, and forms the basis of the theory of topological spaces (see Appendix C). We begin with the definition of an open ball.

20

Metric Spaces

Definition 2.4 If r > 0 and a ∈ X we define the open ball of radius r centred at a as B X (a, r ) := {x ∈ X : d(x, a) < r }. If the space X is clear from the context (as in some of the following definitions), then we will omit the X subscript. Definition 2.5 A subset A of a metric space (X, d) is open if for every x ∈ A there exists r > 0 such that B(x, r ) ⊆ A. A subset A of (X, d) is closed if X \ A is open. Note that the whole space X and the empty set ∅ are always open, so at the same time X and ∅ are also always closed. The open ball B(x, r ) is open for any x ∈ X and any r > 0 (see Exercise 2.6) and any open subset of X can be written as the union of open balls (see Exercise 2.7). Note that in any set X with the discrete metric, any subset A of X is open (since if x ∈ A, then B(x, 1/2) = {x} ⊆ A) and any subset is closed (since X \ A is open). Lemma 2.6 Any finite intersection of open sets is open, and any union of open sets is open. Any finite union of closed sets is closed, and any intersection of closed sets is closed. Proof We prove the result for open sets; for the corresponding results for closed sets (which follow by taking complements) see Exercise 2.5. Let U = ∪α∈A Uα , where A is any index set; if x ∈ U , then x ∈ Uα for some α ∈ A, and then there exists r > 0 such that B(x, r ) ⊆ Uα ⊆ U , so U is open. If U = ∩nj=1 U j and x ∈ U , then for each j we have x ∈ U j , and so B(x, r j ) ⊆ U j for some r j > 0. Taking r = min j r j it follows that B(x, r ) ⊆ ∩nj=1 U j = U. In many arguments in this book it will be useful to have a less ‘topological’ definition of a closed set, based on the limits of sequences. We first define what it means for a sequence to converge in a metric space. Throughout this book we will use the notation (xn )∞ n=1 for a sequence (to in which the order of the elements is irreldistinguish it from the set {xn }∞ n=1 evant); we will often abbreviate this to (xn ), including the index if this is required to prevent ambiguity, e.g. for a subsequence (xn k )k . We will also frequently abbreviate ‘a sequence (xn )∞ n=1 such that x n ∈ A for every n ∈ N’ to ‘a sequence (xn ) ∈ A’.

2.2 Open and Closed Sets

21

Definition 2.7 A sequence (xn )∞ n=1 in a metric space (X, d) converges in (X, d) to x ∈ X if d(xn , x) → 0 as n → ∞. We write xn → x in (X, d) (or often simply ‘in X ’). For sequences in K we often use the fact that any convergent sequence is bounded, and the same is true in a metric space, given the following definition. Definition 2.8 A subset Y of a metric space (X, d) is bounded if there exists1 a ∈ X and r > 0 such that Y ⊆ B(a, r ), i.e. d(y, a) < r for every y ∈ Y . Any convergent sequence is bounded, since if xn → x, then there exists N ∈ N such that d(xn , x) < 1 for all n ≥ N and so  for every n ∈ N. d(xn , x) ≤ max 1, max d(x j , x) j=1,...,N −1

We now describe convergence in terms of open sets. Lemma 2.9 A sequence (xn ) ∈ (X, d) converges to x if and only if for any open set U that contains x there exists an N such that xn ∈ U for every n ≥ N . Proof Given any open set U that contains x there exists ε > 0 such that B(x, ε) ⊆ U , and so there exists N such that xn ∈ B(x, ε) ⊆ U for all n ≥ N . For the other implication, just use the fact that for any ε > 0 the set B(x, ε) is open and contains x. We can now characterise closed sets in terms of the limits of sequences. Lemma 2.10 A subset A of (X, d) is closed if and only if whenever (xn ) ∈ A with xn → x it follows that x ∈ A. / A. Proof Suppose that A is closed and that (xn ) ∈ A with xn → x, but x ∈ Then X \ A is open and contains x, and so there exists N such that xn ∈ X \ A for all n ≥ N , a contradiction. Now suppose that whenever (xn ) ∈ A with xn → x we have x ∈ A, but A is not closed. Then X \ A is not open: there exists y ∈ X \ A and a sequence rn → 0 such that B(y, rn ) ∩ A = ∅. So there exist points yn ∈ B(y, rn ) ∩ A, i.e. a sequence (yn ) ∈ A such that yn → y. But then, by assumption, y ∈ A, a contradiction once more. 1 We could require that a ∈ Y in this definition, since if Y ⊆ B(a  , r ) with a  ∈ X , then for any choice of a ∈ Y we have d(y, a) ≤ d(y, a  ) + d(a  , a) < r + d(a  , a).

22

Metric Spaces

2.3 Continuity and Sequential Continuity We now define what it means for a map f : X → Y to be continuous when (X, d X ) and (Y, dY ) are two metric spaces. We begin with the ε–δ definition. Definition 2.11 A function f : (X, d X ) → (Y, dY ) is continuous at x ∈ X if for every ε > 0 there exists δ > 0 such that d X (x  , x) < δ

dY ( f (x  ), f (x)) < ε.

We say that f is continuous (on X ) if f is continuous at every x ∈ X . Note that strictly there is a distinction to be made between f as a map from a set X into a set Y , and the continuity of f , which depends on the metrics d X and dY on X and Y ; this distinction is often blurred in practice. As with simple real-valued functions, continuity and sequential continuity are equivalent in metric spaces (for a proof see Exercise 2.8). Lemma 2.12 A function f : (X, d X ) → (Y, dY ) is continuous at x if and only if f (xn ) → f (x) in Y whenever (xn ) ∈ X with xn → x in X . Continuity can also be characterised in terms of open sets, by requiring the preimage of an open set to be open. This allows the notion of continuity to be generalised to topological spaces (see Appendix C). Lemma 2.13 A function f : (X, d X ) → (Y, dY ) is continuous on X if and only if whenever U is an open set in (Y, dY ), f −1 (U ) is an open set in (X, d X ), where f −1 (U ) := {x ∈ X : f (x) ∈ U } is the preimage of U under f . The same is true if we replace open sets by closed sets. Proof Suppose that f is continuous (in the sense of Definition 2.11). Take an open subset U of Y , and z ∈ f −1 (U ). Since f (z) ∈ U and U is open in Y , there exists an ε > 0 such that BY ( f (z), ε) ⊆ U . Since f is continuous, there exists a δ > 0 such that x ∈ B X (z, δ) implies that f (x) ∈ BY ( f (z), ε) ⊆ U . So B X (z, δ) ⊆ f −1 (U ), i.e. f −1 (U ) is open. For the opposite implication, take x ∈ X and set U := BY ( f (x), ε), which is an open set in Y . It follows that f −1 (U ) is open in X , so in particular, B X (x, δ) ⊆ f −1 (U ) for some δ > 0. So

2.4 Interior, Closure, Density, and Separability

23

f (B X (x, δ)) ⊆ BY ( f (x), ε), which implies that f is continuous. The result for closed sets follows from the identity f −1 (Y \ A) = X \ f −1 (A)

for all

A ⊆ Y.

One has to be a little careful with preimages. If f : X → Y , then for U ⊆ X and V ⊆ Y we have f −1 ( f (U )) ⊇ U

and

f ( f −1 (V )) ⊆ V.

However, both these inclusions can be strict, as the simple example f : R → R with f (x) = 0 for every x ∈ R shows: here we have f −1 ( f ([−1, 1])) = f −1 (0) = R

and

f ( f −1 ([−1, 1])) = f (R) = 0.

Also be aware that in general the image of an open set under a continuous map need not be open, e.g. the image of (−4, 4) under the map x → sin x is [−1, 1].

2.4 Interior, Closure, Density, and Separability We recall the definition of the interior A◦ and closure A of a subset of a metric space. The closure operation allows us to define what it means for a subset to be dense (A = X ) and this in turn gives rise to the notion of separability (existence of a countable dense subset). Definition 2.14 If A ⊆ (X, d), then the interior of A, written A◦ , is the union of all open subsets of A. Note that A◦ is open (since it is the union of open sets; see Lemma 2.6) and that A◦ = A if and only if A is open. Lemma 2.15 A point x ∈ X is contained in A◦ if and only if B(x, ε) ⊆ A

for some ε > 0.

Proof If x ∈ A◦ , then it is an element of some open set U ⊆ A, and then B(x, ε) ⊆ U ⊆ A for some ε > 0. Conversely, if B(x, ε) ⊆ A, then we have B(x, ε) ⊆ A◦ , and so x ∈ A◦ . We will make significantly more use of the closure in what follows.

24

Metric Spaces

Definition 2.16 If A ⊆ (X, d), then the closure of A in X , written A, is the intersection of all closed subsets of X that contain A. Note that A is closed (since it is the intersection of closed sets; see Lemma 2.6 again). Furthermore, A is closed if and only if A = A and hence A = A. Lemma 2.17 A point x ∈ X is contained in A if and only if B(x, ε) ∩ A = ∅

for every ε > 0.

(2.6)

It follows that x ∈ A if and only if there exists a sequence (xn ) ∈ A such that xn → x. Proof We prove the reverse, that x ∈ / A if and only if B(x, ε) ∩ A = ∅ for every ε > 0. / K. If x ∈ / A, then there is some closed set K that contains A such that x ∈ Since K is closed, X \ K is open, and so B(x, ε) ∩ K = ∅ for some ε > 0, which shows that B(x, ε) ∩ A = ∅ (since K ⊇ A). Conversely, if there exists ε > 0 such that B(x, ε) ∩ A = ∅, then x is not contained in the closed set X \ B(x, ε), which contains A; so x ∈ / A. This proves the ‘if and only if’ statement in the lemma. To prove the final part, if x ∈ A, then (2.6) implies that for any n ∈ N we have B(x, 1/n) ∩ A = ∅, so we can find xn ∈ A such that d(xn , x) < 1/n and thus xn → x. Conversely, if (xn ) ∈ A with xn → x, then d(xn , x) < ε for n sufficiently large, which gives (2.6). Note that in a general metric space B X (a, r ) = {x ∈ X : d(x, a) ≤ r }.

(2.7)

If we use the discrete metric from Example 2.3, then B X (a, 1) = {a} for any a ∈ X , and since {a} is closed we have B X (a, 1) = {a}. However, {y ∈ X : d(x, a) ≤ 1} = X . Given the definition of the closure of the set, we can now define what it means for a set A ⊂ X to be dense in (X, d). Definition 2.18 A subset A of a metric space (X, d) is dense in X if A = X . Using Lemma 2.17 an equivalent definition is that A is dense in X if for every x ∈ X and every ε > 0 B(x, ε) ∩ A = ∅, i.e. there exists a ∈ A such that d(a, x) < ε. Another similar reformulation is that A ∩ U = ∅ for every open subset U of X .

2.5 Compactness

25

Definition 2.19 A metric space (X, d) is separable if it contains a countable dense subset. Separability means that elements of X can be approximated arbitrarily closely by some countable collection {xn }∞ n=1 : given any x ∈ X and ε > 0, there exists j ∈ N such that d(x j , x) < ε. For some familiar examples, R is separable, since Q is a countable dense subset; C is separable since the set ‘Q + iQ’ of all complex numbers of the form q1 + iq2 with q1 , q2 ∈ Q is countable and dense. Since separability of (X, d X ) and (Y, dY ) implies separability of X × Y (with an appropriate metric; see Exercise 2.9), it follows that Rn and Cn are separable. Separability is inherited by subsets (using the same metric, as in (2.3)). This is not trivial, since the original countable dense set could be entirely disjoint from the chosen subset (e.g. Q2 is dense in R2 , but disjoint from the subset {π } × R). Lemma 2.20 If (X, d) is separable and Y ⊆ X , then (Y, d) is also separable. Proof We construct A, a countable dense subset of Y , as follows. Suppose that {xn }∞ n=1 is dense in X ; then for each n, k ∈ N, if B(xn , 1/k) ∩ Y = ∅ then we choose one point from B(xn , 1/k) ∩ Y and add it to A. Constructed in this way A is (at most) a countable set since we can have added at most N × N points. To show that A is dense, take z ∈ Y and ε > 0. Now choose k such that 1/k < ε/2 and xn ∈ X with d(xn , z) < 1/k. Since z ∈ B(xn , 1/k) ∩ Y , we have B(xn , 1/k) ∩ Y = ∅; because of this there must exist y ∈ A such that d(xn , y) < 1/k and hence d(y, z) ≤ d(y, xn ) + d(xn , z) < 2/k < ε.

2.5 Compactness Compactness is an extremely useful property that is the key to many of the proofs that follow. The most familiar ‘compactness’ result is the Bolzano– Weierstrass Theorem: any bounded set of real numbers has a convergent subsequence. The fundamental definition of compactness in terms of open sets makes the definition applicable in any topological space (see Appendix C). To state

26

Metric Spaces

this definition we require the following terminology: a cover of a set K is any collection of sets whose union contains K ; given a cover, a subcover is a subcollection of sets from the original cover whose union still contains K . Definition 2.21 A subset K of a metric space (X, d) is compact if any cover of K by open sets has a finite subcover, i.e. if {Oα }α∈A is a collection of open subsets of X such that Oα , K ⊆ α∈A

then there is a finite set

{α j }nj=1

⊂ A such that K ⊆

n

Oα j .

j=1

In a metric space compactness in this sense is equivalent to ‘sequential compactness’, and it is in this form that we will most often make use of compactness in what follows. The equivalence of these two definitions in a metric space is not trivial; a proof is given in Appendix C (see Theorem C.14). Definition 2.22 If K is a subset of (X, d), then K is sequentially compact if any sequence in K has a subsequence that converges and whose limit lies in K . ∞ (Recall that a subsequence of (xn )∞ n=1 is a sequence of the form (x n k )k=1 where n k ∈ N with n k+1 > n k .) Using the Bolzano–Weierstrass Theorem we can easily prove the following basic compactness result.

Theorem 2.23 Any closed bounded subset of K is compact. Proof First we prove the result for K = R. Take any closed bounded subset A of R, and let (xn ) be a sequence in A. Since (xn ) ∈ A, we know that (xn ) is bounded, and so it has a convergent subsequence xn j → x for some x ∈ R. Since xn j ∈ A and A is closed, it follows that x ∈ A and so A is compact. Now let A be a closed bounded subset of C and (z n ) a sequence in A. If we write z n = xn + iyn , then, since |z n |2 = |xn |2 + |yn |2 , (xn ) and (yn ) are both bounded sequences in R. First take a subsequence (z n j ) j such that xn j converges to some x ∈ R. Then take a subsequence of (z n j ) j , (z n j ) j such

2.5 Compactness

27

that yn j converges to some y ∈ R; we still have xn j → x. It follows that z n j → x + iy, and since z n j ∈ A and A is closed it follows that x + iy ∈ A, which shows that A is compact. Compact subsets of metric spaces are closed and bounded. Lemma 2.24 If K is a compact subset of a metric space (X, d), then K is closed and bounded. Proof If (xn ) ∈ K and xn → x, then any subsequence of (xn ) also converges to x. Since K is compact, it has a subsequence xn j → x  with x  ∈ K . By uniqueness of limits it follows that x  = x and so x ∈ K , which shows that K is closed (see Lemma 2.10). If K is compact, then the cover of K by the open balls {B(k, 1) : k ∈ K } has a finite subcover by balls centred at {k1 , . . . , kn }. Then for any k ∈ K we have k ∈ B(k j , 1) for some j = 1, . . . , n and so d(k, k1 ) ≤ d(k, k j ) + d(k j , k1 ) < 1 + max d(x j , x1 ). j=2,...,n

Lemma 2.25 If (X, d) is a compact metric space, then a subset K of X is compact if and only if it is closed. Proof If K is a compact subset of (X, d), then it is closed by Lemma 2.24. If K is a closed subset of a compact metric space, then any sequence in K has a convergent subsequence; its limit must lie in K since K is closed, and thus K is compact. We will soon prove in Theorem 2.27 that being closed and bounded characterises compact subsets of Rn , based on the following observation. Theorem 2.26 If K 1 is a compact subset of (X 1 , d1 ) and K 2 is a compact subset of (X 2 , d2 ), then K 1 × K 2 is a compact subset of the product space (X 1 × X 2 , p ), where p is any of the product metrics from Exercise 2.2. Proof Suppose that (xn , yn ) ∈ K 1 × K 2 . Then, since K 1 is compact, there is a subsequence (xn j , yn j ) such that xn j → x for some x ∈ K 1 . Now, using the fact that K 2 is compact, take a further subsequence, (xn j , yn j ) such that we also have yn j → y for some y ∈ K 2 ; because xn j is a subsequence of xn j we still have xn j → x. Since 1/ p

p ((xn j , yn j ), (x, y)) = d1 (xn j , x) p + d2 (yn j , y) p

it follows that (xn j , yn j ) → (x, y) ∈ K 1 × K 2 .

28

Metric Spaces

By induction this shows that the product of any finite number of compact sets is compact. (Tychonoff’s Theorem, proved in Appendix C, shows that the product of any collection of compact sets is compact when considered with an appropriate topology.) Theorem 2.27 A subset of Kn (with the usual metric) is compact if and only if it is closed and bounded. Note that Kn with the usual metric is given by the product K × · · · × K, using 2 to construct the metric on the product. Proof That any compact subset of Kn is closed and bounded follows immediately from Lemma 2.24. For the converse, note that it follows from Theorem 2.26 that Q nM := {x ∈ Kn : |x j | ≤ M, j = 1, . . . , n} is a compact subset of Kn for any M > 0. If K is a bounded subset of Kn , then it is a subset of Q nM for some M > 0. If it is also closed, then it is a closed subset of a compact set, and hence compact (Lemma 2.25). We now give three fundamental results about continuous functions on compact sets. Theorem 2.28 Suppose that K is a compact subset of (X, d X ) and that f : (X, d X ) → (Y, dY ) is continuous. Then f (K ) is a compact subset of (Y, dY ). Proof Let (yn ) ∈ f (K ). Then yn = f (xn ) for some xn ∈ K . Since (xn ) ∈ K and K is compact, there is a subsequence of xn that converges to some x ∗ ∈ K , i.e. xn j → x ∗ ∈ K . Since f is continuous it follows (using Lemma 2.12) that as j → ∞ yn j = f (xn j ) → f (x ∗ ) =: y ∗ ∈ f (K ), i.e. the subsequence yn j converges to some y ∗ ∈ f (K ). It follows that f (K ) is compact. The following is an (almost) immediate corollary. Proposition 2.29 Let K be a compact subset of (X, d). Then any continuous function f : K → R is bounded and attains its bounds, i.e. there exists an M > 0 such that | f (x)| ≤ M for all x ∈ K , and there exist x, x ∈ K such that

2.5 Compactness f (x) = inf f (x) x∈K

and

29

f (x) = sup f (x).

(2.8)

x∈K

Proof Since f is continuous and K is compact, f (K ) is a compact subset of R, so f (K ) is closed and bounded (by Theorem 2.27); in particular, there exists M > 0 such that | f (x)| ≤ M for every x ∈ K . Since f (K ) is closed, it follows that sup {y : y ∈ f (K )} (see Exercise 2.12) and so there exists x as in (2.8). The argument for x is almost identical. Finally, any continuous function on a compact set is also uniformly continuous. Lemma 2.30 If f : (X, d X ) → (Y, dY ) is continuous and X is compact, then f is uniformly continuous on X : given ε > 0 there exists δ > 0 such that d X (x, y) < δ

dY ( f (x), f (y)) < ε

x, y ∈ X.

Proof If f is not uniformly continuous, then there exists ε > 0 such that for every δ > 0 we can find x, y ∈ X with d X (x, y) < ε and dY ( f (x), f (y)) ≥ ε. Choosing xn , yn for each δ = 1/n we obtain xn , yn such that d X (xn , yn ) < 1/n

and

dY ( f (xn ), f (yn )) ≥ ε.

(2.9)

Since X is compact, we can find a subsequence xn j such that xn j → x with x ∈ X . Since d X (yn j , x) ≤ d X (yn j , xn j ) + d X (xn j , x), it follows that yn j → x also. Since f is continuous at x, we can find δ > 0 such that d X (z, x) < δ ensures that dY ( f (z), f (x)) < ε/2. But then for j sufficiently large we have d X (xn, j , x) < δ and d X (yn j , x) < δ, which implies that dY ( f (xn j ), f (yn j )) ≤ dY ( f (xn j ), f (x)) + dY ( f (x), f (yn j )) < ε, contradicting (2.9). We often apply this when f : (K , d X ) → (Y, dY ) and K is a compact subset of a larger metric space (X, d X ).

30

Metric Spaces

Exercises 2.1

n

2 n Using the fact that j=1 (ξ j − λη j ) ≥ 0 for any ξ , η ∈ R and any λ ∈ R, prove the Cauchy–Schwarz inequality

⎛ ⎞2 ⎛ ⎞⎛ ⎞ n n n    ⎝ ξjηj⎠ ≤ ⎝ ξ 2j ⎠ ⎝ η2j ⎠ . j=1

2.2

j=1

j=1

(For the usual dot product in Rn this shows that |x · y| ≤ x y.) Suppose that (X j , d j ), j = 1, . . . , n are metric spaces. Show that X 1 × · · · × X n is a metric space when equipped with any of the product metrics p defined by setting

p ((x1 , . . . , xn ),(y1 , . . . , yn )) ⎧! " ⎨ n d (x , y ) p 1/ p , 1 ≤ p < ∞ j j=1 j j := ⎩max p = ∞. j=1,...,n d j (x j , y j ), (You should use the inequality \$1/ p # p p p p ≤ (α1 +α2 )1/ p +(β1 +β2 )1/ p , (2.10) (α1 + β1 ) p + (α2 + β2 ) p

2.3

which holds for all α1 , α2 , β1 , β2 ∈ R and 1 ≤ p < ∞; we will prove this in Lemma 3.6 in the next chapter.) Show that if d is a metric on X , then so is ˆ d(x, y) :=

2.4

ˆ into a bounded metric space.) [Hint: the (This new metric makes (X, d) map t  → t/(1 + t) is monotonically increasing in t.] Let s(K) be the space of all sequences x = (x j )∞ j=1 with x j ∈ K (bounded or unbounded). Show that d(x, y) :=

∞  j=1

2.5 2.6

d(x, y) . 1 + d(x, y)

2− j

|x j − y j | 1 + |x j − y j |

defines a metric on s. If (x (n) )n is any sequence in s show that x (n) → y (n) in this metric if and only if x j → y j for each j ∈ N. (Kreyszig, 1978) Show that any finite union of closed sets is closed and any intersection of closed sets is closed. Show that if x ∈ X and r > 0, then B(x, r ) is open.

Exercises

2.7

2.8 2.9 2.10

2.11

2.12 2.13 2.14

31

Show that any open subset of a metric space (X, d) can be written as the union of open balls. [Hint: if U is open, then for each x ∈ U there exists r (x) > 0 such that x ∈ B(x, r (x)) ⊆ U .] Show that a function f : (X, d X ) → (Y, dY ) is continuous if and only if f (xn ) → f (x) whenever (xn ) ∈ X with xn → x in (X, d). Show that if (X, d X ) and (Y, dY ) are separable, then (X × Y, p ) is separable, where p is any one of the metrics from Exercise 2.2. Suppose that {Fα }α∈A are a family of closed subsets of a compact metric space (X, d) with the property that the intersection of any finite number of the sets has non-empty intersection. Show that ∩α∈A Fα is non-empty. Suppose that (F j ) is a decreasing sequence [F j+1 ⊆ F j ] of non-empty closed subsets of a compact metric space (X, d). Use the result of the previous exercise to show that ∩∞ j=1 F j = ∅. Show that if S is a closed subset of R, then sup(S) ∈ S. Show that if f : (X, d X ) → (Y, dY ) is a continuous bijection and X is compact, then f −1 is also continuous (i.e. f is a homeomorphism). Any compact metric space (X, d) is separable. Prove the stronger result that in any compact metric space there exists a countable subset (x j )∞ j=1 with the following property: for any ε > 0 there is an M(ε) such that for every x ∈ X we have d(x j , x) < ε

for some 1 ≤ j ≤ M(ε).

PART II Normed Linear Spaces

3 Norms and Normed Spaces

In the first two chapters we considered spaces with a linear structure (vector spaces) and more general spaces in which we could define a notion of convergence (metric spaces). We now turn to the natural setting in which to combine these, i.e. in which to consider convergence in vector spaces.

3.1 Norms The majority of the spaces that we will consider in the rest of the book will be normed spaces, i.e. vector spaces equipped with an appropriate norm. A norm  ·  provides a generalised notion of ‘length’. Definition 3.1 A norm on a vector space X is a map  ·  : X → [0, ∞) such that (i) x = 0 if and only if x = 0; (ii) λx = |λ|x for every λ ∈ K, x ∈ X ; and (iii) x + y ≤ x + y for every x, y ∈ X (the triangle inequality). A normed space is a pair (X,  · ) where X is a vector space and  ·  is a norm on X . Any norm on X gives rise to a metric on X if we set d(x, y) := x − y. In this case we have d(x, y) = 0 if and only if x = y, d(x, y) = d(y, x), and the triangle inequality holds since d(x, z) = x − z ≤ x − y + y − z = d(x, y) + d(y, z). This means that any normed space (X,  · ) can also be viewed as a metric space (X, d), and so all the concepts discussed in Chapter 2 are immediately 35

36

Norms and Normed Spaces

applicable to normed spaces. (It is easy to find examples of metrics that do not come from norms, such as the discrete metric from Example 2.3 or the metric on the space of all sequences from Exercise 2.4; see Exercise 3.1.) In a normed space the open ball in X centred at y of radius r is B X (y, r ) = {x ∈ X : x − y < r }, and the closed ball is1 B X (y, r ) = {x ∈ X : x − y ≤ r }. When the space is obvious from the context we will drop the X subscript. In a linear space the origin plays a special role, and the unit balls centred there are of particular interest. Since we will use these frequently, we will write B X for B X (0, 1) (‘the open unit ball in X ’) and B X for B X (0, 1) (‘the closed unit ball in X ’). Both of these unit balls are convex. Definition 3.2 Let V be a vector space. A subset K of V is convex if whenever x, y ∈ K the line segment joining x and y lies in K , i.e. for every λ ∈ [0, 1] we have λx + (1 − λ)y ∈ K ; see Figure 3.1.

x

K

x x

y

Figure 3.1 A set K is convex if the line joining any x, y ∈ K is entirely contained in K .

Lemma 3.3 In any normed space B X and B X are convex. Proof If x, y ∈ B X , then x ≤ 1 and y ≤ 1. So for λ ∈ (0, 1) λx + (1 − λ)y ≤ |λ|x + |1 − λ|y ≤ λ + (1 − λ) = 1, so λx + (1 − λy) ∈ B X . The convexity of B X follows similarly. We now give a relatively simple way to check that a particular function defines a norm, based on the convexity of the resulting ‘closed unit ball’. 1 Note that in a normed space the closed ball is the closure of the open ball with the same radius,

unlike in a general metric space; see (2.7).

3.1 Norms

37

Lemma 3.4 Suppose that N : X → R satisfies (i) N (x) = 0 if and only if x = 0; (ii) N (λx) = |λ|N (x) for every λ ∈ K, x ∈ X (i.e. (i) and (ii) from the definition of a norm) and, in addition, that the set B := {x : N (x) ≤ 1} is convex. Then N satisfies the triangle inequality N (x + y) ≤ N (x) + N (y)

(3.1)

and so N defines a norm on X . Proof We only need to prove (3.1). If N (x) = 0, then x = 0 and N (x + y) = N (y) = N (x) + N (y), so we can assume that N (x) > 0 and N (y) > 0. In this case x/N (x) ∈ B and y/N (y) ∈ B, so using the convexity of B we have   x y N (y) N (x) + ∈ B. N (x) + N (y) N (x) N (x) + N (y) N (y) % % &' ( &' ( λ

Therefore

1−λ

x+y ∈ B, N (x) + N (y)

which means, using property (ii) from Definition 3.1 that  x+y N (x + y) 1≥N = ⇒ N (x + y) ≤ N (x) + N (y), N (x) + N (y) N (x) + N (y) as required. In fact any bounded convex symmetric subset in Rn can be the unit ball for some norm on Rn ; see Exercise 5.1. To use Lemma 3.4 in examples we often use the convexity of some realvalued function. Recall that a function f : [a, b] → R is convex if whenever x, y ∈ [a, b] we have f (λx + (1 − λ)y) ≤ λ f (x) + (1 − λ) f (y)

for all λ ∈ (0, 1).

If f ∈ C 2 (a, b) ∩ C 1 ([a, b]), then a sufficient condition for the convexity of f is that f  (x) ≥ 0 for all x ∈ (a, b); in particular, we will use the fact that s → |s| p is convex for all 1 ≤ p < ∞ and that s → es is convex (see Exercise 3.4).

38

Norms and Normed Spaces

3.2 Examples of Normed Spaces Strictly, a normed space should be written (X,  · ) where  ·  is the particular norm on X . However, many normed spaces have standard norms, and so often the norm is not specified. For example, unless otherwise stated, Kn is equipped with the norm ⎞1/2 ⎛ n  |x j |2 ⎠ , x ∈ Kn . x = x2 := ⎝ j=1

Example 3.5 One can equip Kn with many other norms. For example, for 1 ≤ p < ∞ the  p norms are given by ⎛ ⎞1/ p n  |x j | p ⎠ 1≤ p 0. Since f is continuous, there exists δ > 0 such that | f (x0 )| . 2 Therefore, reducing δ so that at least one of the intervals [x0 − δ, x0 ] and [x0 , x0 + δ] lies within [0, 1], we have  ˆ 1 | f (x0 )| p p | f (x)| dx ≥ δ > 0. 2 0 x ∈ [0, 1] with |x − x0 | < δ

| f (x)| >

It follows that we must have f = 0. Property (ii) is clear, and the triangle inequality follows as for the  p norm, using Lemma 3.4: if f, g ∈ C([0, 1]) with  f  L p ≤ 1 and g L p ≤ 1, then ˆ 1 p λ f + (1 − λg) L p = |λ f (x) + (1 − λ)g(x)| p dx ˆ

0

1

λ| f (x)| p + (1 − λ)|g(x)| p dx

0

≤ 1.

3.3 Convergence in Normed Spaces As we already observed, any norm  ·  on X gives rise to a metric on X by setting d(x, y) := x − y. The definitions of convergence, continuity, etc., from Chapter 2 therefore apply immediately to any normed space, but there are some particular properties of convergence that hold in normed spaces that we mention here. First we show that the map x → x is always continuous. Lemma 3.14 If x, y ∈ X , then ) ) ) ) )x − y) ≤ x − y; in particular, the map x → x is continuous from (X,  · ) into R.

(3.8)

3.4 Equivalent Norms

43

Proof The triangle inequality gives x ≤ y + y − x

and

y ≤ x + x − y

which implies (3.8). We use this to prove part (i) of the following lemma. Lemma 3.15 If (xn ), (yn ) ∈ X with xn → x and yn → y, then (i) xn  → x; (ii) xn + yn → x + y; and (iii) if (αn ) ∈ K with αn → α, then αn xn → αx. Proof (i) follows from (3.8) and (ii) follows from the triangle inequality since (xn + yn ) − (x + y) ≤ xn − x + yn − y. For (iii), since convergent sequences are bounded we have |αn | ≤ M for some M > 0, and so αn xn − αx ≤ αn (xn − x) + (αn − α)x ≤ Mxn − x + x|αn − α|.

3.4 Equivalent Norms In general there are many possible norms on a vector space (as we saw with Kn in Example 3.5). However, there is a notion of equivalence of norms which, as we will soon see, ensures that they give rise to the same open sets and therefore make the same sequences convergent. Definition 3.16 Two norms  · 1 and  · 2 on a vector space X are equivalent – we write  · 1 ∼  · 2 – if there exist constants 0 < c1 ≤ c2 such that c1 x1 ≤ x2 ≤ c2 x1

for all

x ∈ X.

It is clear that the above notion of ‘equivalence’ is reflexive ( ·  ∼  · ) and symmetric ( · 1 ∼  · 2 if and only if  · 2 ∼  · 1 ). It is also transitive, so this is indeed an equivalence relation. Lemma 3.17 Suppose that  · 1 ,  · 2 , and  · 3 are all norms on a vector space V , such that

44

Norms and Normed Spaces  · 1 ∼  · 2

and

 · 2 ∼  · 3 .

Then  · 1 ∼  · 3 . Proof There exist constants 0 < α1 ≤ α2 and 0 < β1 ≤ β2 such that α1 x1 ≤ x2 ≤ α2 x1

and

β1 x2 ≤ x3 ≤ β2 x2 ;

therefore α1 β1 x1 ≤ x3 ≤ α2 β2 x1 i.e.  · 1 ∼  · 3 . Exercise 3.10 shows that the norms  · 1 ,  · 2 , and  · ∞ on Kn are all equivalent. In fact all norms on any finite-dimensional vector space are equivalent, which we will prove in Theorem 5.1. However, it is easy to find norms on infinite-dimensional spaces that are not equivalent. For example, Lemma 3.10 shows that we could use any choice of  ·  p as a norm on the vector space 1 (K) if we wanted to (since if x ∈ 1 all of its  p norms are finite). While that lemma shows that for p < q we have xq ≤ x p

for every

x ∈ 1 ,

the same inequality does not hold in reverse: we do not have x p ≤ Cxq

for every

x ∈ 1 ,

(3.9)

for any choice of C > 0. To see this, consider the sequences2 x (n) ∈ 1 given by

j −1/ p j = 1 . . . , n (n) xj = 0 j = n + 1, . . . . n −q/ p < ∞ for every n, but x (n)  p = −1 is Then x (n) q < ∞  j=1 j j=1 j unbounded, so (3.9) cannot hold for any C > 0. The L p norms on C([0, 1]) defined in Example 3.13 are also not equivalent: if p > q there is no constant C such that  f  L p ≤ C f  L q

for every f ∈ C([0, 1]).

(3.10)

To see this, consider the functions f n (x) = x n with norms +ˆ ,1/ p  1/ p 1 1 np  fn L p = x dx = . np + 1 0 2 Note that each x (n) is an element of 1 with x (n) = (x (n) , x (n) , x (n) , . . .), so (x (n) )∞ is a 1 2 3 n=1

sequence of sequences.

3.4 Equivalent Norms

45

If we had (3.10), then applying this inequality to each f n would give

1 np + 1

1/ p

≤C

1 nq + 1

1/q

,

i.e. (nq + 1)1/q (np + 1)−1/ p ≤ C for every n ∈ N. But since p > q the left-hand side tends to infinity as n → ∞, which shows that (3.10) cannot hold. Equivalent norms define the same open sets. Lemma 3.18 Suppose that ·1 and ·2 are two equivalent norms on a linear space X . Then a set is open/closed in (X,  · 1 ) if and only if it is open/closed in (X,  · 2 ). Proof We only need to prove the statement about open sets, since the statement for closed sets will follow by taking complements. We write Bi (x, ε) for {y ∈ X : y − xi < ε}. Assume that there exist constants 0 < c1 ≤ c2 such that c1 x1 ≤ x2 ≤ c2 x1

for all

x ∈ X;

(3.11)

then B2 (x, c1 ε) ⊆ B1 (x, ε) ⊆ B2 (x, c2 ε). It follows that if U ⊆ X is open using  · 1 , i.e. for every x ∈ U we can find ε > 0 such that B1 (x, ε) ⊆ U , then it is open using  · 2 , since we have B2 (x, c1 ε) ⊆ B1 (x, ε) ⊆ U . A similar argument applies to show that any set open using  · 2 is open using  · 1 . We saw in Chapter 2 that it is possible to define convergence and continuity solely in terms of open sets, so equivalent norms produce the same definitions of convergence and continuity. For a less topological argument it is easy to prove this directly: assuming that  · 1 and  · 2 are equivalent as in (3.11), then c1 xn − x1 ≤ xn − x2 ≤ c2 xn − x1 , which shows that xn − x1 → 0 if and only if xn − x2 → 0. Compactness is also preserved under an equivalent norm since this relies on covers by open sets (Definition 2.21) or notions of convergence (Definition 2.22).

46

Norms and Normed Spaces

3.5 Isomorphisms between Normed Spaces If we want to show that two mathematical objects are ‘the same’ we require a bijection between them that also preserves the essential structures of the objects. To identify two normed spaces (X,  ·  X ) and (Y,  · Y ) we therefore require a map T : X → Y that is (i) a bijection (i.e. T is injective and surjective); (ii) linear, so that it preserves the linear (vector-space) structure; and (iii) for some 0 < c1 ≤ c2 we have c1 x X ≤ T xY ≤ c2 x X

for all

x ∈ X,

(3.12)

so that the norms in X and Y are comparable. We will refer to such a map as an isomorphism between two normed spaces X and Y , although perhaps one should say strictly that this is a ‘normed-space isomorphism’, since property (i) alone is enough to guarantee that T is an isomorphism in the set-theoretic sense and (i)+(ii) make X and Y isomorphic as vector spaces. Definition 3.19 Two normed spaces (X,  ·  X ) and (Y,  · Y ) are said to be isomorphic if there exists a bijective linear map T : X → Y that satisfies (iii) and in this case we will write X  Y . In some cases we can strengthen (iii) and find a map T : X → Y that satisfies (i), (ii), and (iii ) T is an isometry, i.e. we have T xY = x X

for all

x ∈ X,

so that T preserves the norm. We will refer to such a map T as an isometric isomorphism between the spaces X and Y . (Pryce (1973) and some other authors use the term congruence in this case, but we will not adopt this here.) Definition 3.20 We say that (X,  ·  X ) and (Y,  · Y ) are isometrically isomorphic if there exists a bijective linear isometry T : X → Y and in this case we will write X ≡ Y . Note that the exact terms used to identify normed spaces, and the corresponding notation, can differ from one author to another, so it is important to check precisely which definition is being used when referring to other sources.

3.5 Isomorphisms between Normed Spaces

47

Observe that any linear map T : X → Y satisfying (3.12) (in particular, any linear isometry) is automatically injective. Indeed, if T x = T y, then c1 x − y X ≤ T (x − y)Y = T x − T yY = 0 and so x = y. So to show that such a map T is bijective (and hence that X  Y ) we only need to show that T is surjective. Example 3.21 The space Cn is isometrically isomorphic to R2n via the map (z 1 , . . . , z n ) → (x1 , y1 , · · · , xn , yn ), where z i = xi + iyi . (This example shows very clearly that ≡ is not the same as =.) Proof The map is clearly a linear bijection, and (z 1 , . . . , z n )2Cn = =

n  j=1 n 

|z j |2 |x j |2 + |y j |2 = (x1 , y1 , · · · , xn , yn )2R2n .

j=1

We can also show that with the norm we constructed in Lemma 3.7 any finite-dimensional space is isometrically isomorphic to Kn with its standard norm. Lemma 3.22 Let V be a finite-dimensional vector space and  ·  E the norm on V defined in Lemma 3.7. Then (Kn ,  · 2 ) ≡ (V,  ·  E ). Proof We take a basis E = {e j }nj=1 for V . Writing α = (α1 , . . . , αn ), the map

: Kn → V given by

(α) =

n 

αjej

j=1

is a linear bijection from Kn onto (V,  ·  E ): injectivity and surjectivity follow from the fact that {e j } is a basis, and the definition of  ·  E ensures that  (α) E = α2 .

48

Norms and Normed Spaces

3.6 Separability of Normed Spaces We briefly discussed separability in the context of metric spaces in Section 2.4. We now investigate this a little more given the additional structure available in a normed space. If E is a subset of a normed space X , recall that the linear span of E, Span(E), is the set of all finite linear combinations of elements of E (Definition 1.9). The closure of Span(E) is called the closed linear span of E, and is the set of all elements of X that can be approximated arbitrarily closely by finite linear combinations of elements of E (see Lemma 2.17). We will denote it by clin(E), so clin(E) := Span(E). Lemma 3.23 If X is a normed space, then the following three statements are equivalent: (i) X is separable (i.e. X contains a countable dense subset); (ii) the unit sphere in X , S X := {x ∈ X : x = 1}, is separable; and (iii) X contains a countable set {x j }∞ j=1 whose linear span is dense, i.e. whose closed linear span is all of X , clin({x j }) = X. Proof Lemma 2.20 shows that (i) ⇒ (ii). For (ii) ⇒ (iii) choose a countable dense subset {x1 , x2 , x3 , . . .} of S X : then for any non-zero x ∈ X we have x/x ∈ S X , and so for any ε > 0 there exists an xk such that * * * * *xk − x * < ε . * x * x It follows that

* * *x − xxk * < ε,

and since xxk is contained in the linear span of the {x j } this gives (iii). To show that (iii) implies (i) note that the collection of finite linear combinations of the {x j } with rational coefficients is countable (when K = C by ‘rational’ we mean an element of Q + iQ). This countable collection is dense: given x ∈ X and ε > 0, choose an element in the linear span of {x1 , x2 , . . .} such that * * * * n  * ε * * *x − α x α j ∈ K, j j* < , * 2 * * j=1

3.6 Separability of Normed Spaces

49

and then for each j = 1, . . . , n choose rational q j such that |q j − α j |
0 there exists N such that ∞  |x j | p < ε p for every n ≥ N . j=n+1

It follows that * * * * n  * * ( j) * *x − xje * * * * j=1

p

= (0, . . . , 0, xn+1 , xn+2 , · · · ) p ⎛ =⎝

∞ 

⎞1/ p |x j |

p⎠

< ε.

(3.13)

j=n+1

However, in the space ∞ (K) consider the set b := {x ∈ ∞ : x j = 0 or 1 for each j ∈ N}. This set b is uncountable (this can be shown using Cantor’s diagonal argument; see Exercise 3.15) and any two distinct elements x and y in b satisfy x − y∞ = 1

50

Norms and Normed Spaces

since they must differ by 1 in at least one term. Any dense set A must therefore contain an uncountable number of elements: since A is dense, for every x ∈ b there must be some x  ∈ A such that x  − x∞ < 1/3. But if x, y are distinct elements of b, it then follows that x  and y are also distinct, since x  − y ∞ ≥ x − y∞ − x  − x∞ −  y − y∞ > 1/3. Since b contains an uncountable number of elements, so must A. Note that the sequence space c0 (K), which is a subspace of ∞ (K), is separable (see Example 3.8 for the definition of c0 (K) and Exercise 3.12 for its separability). Note also that the linear span of {e( j) }∞ j=1 is precisely the space c00 of sequences with only a finite number of non-zero terms; the proof of the above lemma shows that this space is dense in  p for all 1 ≤ p < ∞. Finally, we observe that linear subspaces of normed spaces may or may not be closed, although any finite-dimensional linear subspace is closed (see Exercise 5.3). As a simple example, for any p ∈ [1, ∞) consider the subset c00 of  p consisting of sequences with only a finite number of non-zero terms (Example 1.8). While this is a subspace of  p , any x ∈  p can be approximated by elements of c00 arbitrarily closely: we did precisely this in the proof of Lemma 3.24. It follows that c00 is not a closed subspace of  p . Any open linear subspace of a normed space is the whole space (see Exercise 3.8) and the closure of any linear subspace yields a closed linear subspace (see Exercise 3.9).

Exercises 3.1

Show that if d is a metric on a vector space X derived from a norm  · , i.e. d(x, y) = x − y, then d is translation invariant and homogeneous, i.e. d(x + z, y + z) = d(x, y)

3.2

and

d(αx, αy) = αd(x, y).

Deduce that the metric on s in Exercise 2.4 does not come from a norm. (Kreyszig, 1978) If A and B are subsets of a vector space, then we can define A + B := {a + b : a ∈ A, b ∈ B}.

3.3

Show that if A and B are both convex, then so is A + B. If C is a closed subset of a vector space X show that C is convex if and only if a, b ∈ C implies that (a + b)/2 ∈ C.

Exercises

3.4

3.5

51

Show that if f : [a, b] → R is C 2 on (a, b) and C 1 on [a, b], then f is convex on [a, b] if f  (x) ≥ 0. Deduce that f (x) = ex and f (x) = |x| p , 1 ≤ p < ∞, are convex functions on R. Show that if m = max j=1,...,n |x j |, then for any p ∈ [1, ∞), m ≤ p

n 

|x j | p ≤ nm p

j=1

3.6

and deduce that for any x ∈ Kn , x p → x∞ as p → ∞. Show that if x ∈ 1 , then x∞ = lim x p . p→∞

(Show that for every ε > 0 there exists N such that x p − ε ≤ (x1 , . . . , xn ) p ≤ x p

3.7 3.8 3.9 3.10 3.11

3.12 3.13 3.14

3.15

for all n ≥ N ;

treat p ∈ [1, ∞) together using the result of Lemma 3.10, and then p = ∞ separately. To finish the proof use the result of Exercise 3.5.) / q Given any 1 < p < ∞, find a sequence x such that x ∈  p but x ∈ for all 1 ≤ q < p. Show that if U is an open subspace of a normed space X then U = X . (Naylor and Sell, 1982) Show that if U is a linear subspace of a normed space X then U is a closed linear subspace of X . Show that the norms  · 1 ,  · 2 , and  · ∞ on Rn are all equivalent. Show that if ( f n ), f ∈ C([0, 1]) and  f n − f ∞ → 0 (uniform convergence) then  f n − f  L p → 0 (convergence in L p ) and f n (x) → f (x) for every x ∈ [0, 1] (‘pointwise convergence’). (For related results see Exercises 7.1–7.3.) Show that c0 (K) is separable. If (X,  ·  X )  (Y,  · Y ) show that X is separable if and only if Y is separable. Let (X,  · ) be a normed space. If A ⊆ B ⊆ X with A dense in B show that clin(A) = clin(B). Deduce that if A is dense in B and B is dense in X then A is dense in X (equivalently that clin(A) = X ). Show that the set b := {x ∈ ∞ : x j = 0 or 1 for each j ∈ N} is uncountable.

52

Norms and Normed Spaces

3.16 Show that a normed space (X,  · ) is separable if and only if X=

X j,

j=1

where the {X j } are finite-dimensional subspaces of X . (Zeidler, 1995) 3.17 Show that a normed space (X, ·) is separable if and only if there exists a compact set K ⊂ X such that X = clin(K ). (Megginson, 1998)

4 Complete Normed Spaces

If a sequence (xn ) in a metric space converges, then it must be Cauchy, i.e. for every ε > 0 there exists an N such that d(xn , xm ) < ε

for all n, m ≥ N .

Indeed, if (xn ) converges to some x ∈ X , then there exists N such that d(xn , x) < ε/2 for every n ≥ N , and then for n, m ≥ N the triangle inequality yields ε ε d(xn , xm ) ≤ d(xn , x) + d(x, xm ) < + = ε. 2 2 A metric space in which any Cauchy sequence converges is called complete. Normed spaces with this completeness property are called ‘Banach spaces’ and are central to functional analysis; almost all of the spaces we consider throughout the rest of this book will be Banach spaces.1 Banach himself called them ‘spaces of type (B)’. In the first section of this chapter we prove some abstract results about complete normed spaces, which will help us to show that various important normed spaces are complete in Section 4.2. We then discuss convergent series in Banach spaces: in this case the completeness will allow us to obtain results that parallel those available for sequences in R. The chapter ends with a proof of the Contraction Mapping Theorem.

4.1 Banach Spaces Banach spaces, which are complete normed spaces, are one of the central topics of what follows. 1 This is the reason for treating completeness in the less general context of normed spaces, rather

than introducing the definition for metric spaces and then delaying all our examples for two chapters.

53

54

Complete Normed Spaces

Definition 4.1 A normed space (X, ·) is complete if every Cauchy sequence in X converges in X (to a limit that lies in X ). A Banach space is a complete normed space. It is a fundamental property of R (and C) that it is complete. This carries over to Rn and Cn with their standard norms, as we now prove. A much neater (but more abstract) proof of the completeness of Kn is given in Exercise 4.3, but the more methodical proof given here serves as a useful prototype for completeness proofs in more general situations. Such ‘completeness arguments’ usually follow similar lines: (i) use the definition of what it means for a sequence to be Cauchy to identify a possible limit; (ii) show that the original sequence converges to this ‘possible limit’ in the appropriate norm; (iii) check that the ‘limit’ lies in the correct space. We label these steps in the proof of the following theorem, but will not be so explicit in the examples in Section 4.2. Theorem 4.2 The space Kd is complete (with its standard norm). d Proof (i) Identify a possible limit. Let (x (k) )∞ k=1 be a Cauchy sequence in K . Then for every ε > 0 there exists N (ε) such that

x

(n)

−x

(m)

2 =

+ d 

,1/2 (n) |xi

(m) x i |2

0 there exists an Nε such that x

(n)

p − x (m)  p

=

∞ 

(n)

(m)

|x j − x j | p < ε p

for all

n, m ≥ Nε . (4.3)

j=1 ∞ In particular, (x (n) j )n=1 is a Cauchy sequence in K for every fixed j. Since K is complete, it follows that for each j ∈ N (n)

xj

→ aj

as

n→∞

for some a j ∈ K. Set a = (a1 , a2 , · · · ). We need to show that a ∈  p and that x (n) − a p → 0 as n → ∞. From (4.3) it follows that for any k ∈ N we have k 

(m) p |x (n) j − xj | ≤

j=1

∞ 

(m) p p |x (n) j − xj | < ε

for all

n, m ≥ Nε .

j=1

Letting m → ∞ (which we can do since the left-hand side contains only a finite number of terms) we obtain k 

(n)

|x j − a j | p ≤ ε p

for all

n ≥ Nε ,

j=1 (m)

since x j

→ a j as m → ∞. Since this holds for every k ∈ N, it follows that ∞ 

(n)

|x j − a j | p ≤ ε p

for all

n ≥ Nε ,

(4.4)

j=1

and so x (n) − a ∈  p provided that n ≥ Nε . But since  p is a vector space and x (n) ∈  p for every n ∈ N, this implies that a ∈  p and (4.4) shows that

58

Complete Normed Spaces x (n) − a p ≤ ε

n ≥ Nε ,

for all

and so x (n) → a in  p (K), as required. The space c0 (K) is a closed subspace of ∞ (K): for a proof see Exercise 4.4; it is therefore complete. Corollary 4.8 The space c0 (K) of null sequences is complete when equipped with ∞ norm.

4.2.2 Spaces of Functions In this section we prove the completeness of various spaces of bounded functions in the supremum norm. We begin with a large space (the collection of all bounded functions on a metric space X ), and then use the ‘closed subspace’ method to prove the completeness of the space of all bounded continuous functions on X . With a little more work we can use the completeness of C([a, b]) to prove the completeness of the space C 1 ([a, b]) of continuously differentiable functions with an appropriate norm. Theorem 4.9 Let X be a metric space and let Fb (X ; K) be the collection of all functions f : X → K that are bounded, i.e. supx∈X | f (x)| < ∞. Then Fb (X ; K) is complete with the supremum norm  f ∞ := sup | f (x)|. x∈X

Convergence of functions in the supremum norm, i.e.  f n − f ∞ → 0, is known as uniform convergence, since it implies that given any ε > 0, there exists N such that sup | f n (x) − f (x)| < ε

for all n ≥ N ,

x∈X

i.e. | f n (x) − f (x)| < ε for every x ∈ X . Proof If ( f k )∞ k=1 is a Cauchy sequence in Fb (X ; K), then given any ε > 0 there exists an N such that  f n − f m ∞ = sup | f n (x) − f m (x)| < ε

for all

n, m ≥ N . (4.5)

x∈X

In particular, for each fixed x ∈ X the sequence ( f k (x)) is Cauchy in K, and so converges (because K is complete); therefore we can set f (x) = lim f k (x). k→∞

4.2 Examples of Banach Spaces

59

Now we need to show that f ∈ Fb (X ; K) and that f k → f uniformly on X . Using (4.5) for every x ∈ X we have | f n (x) − f m (x)| < ε

for all

n, m ≥ N ,

where N does not depend on x. Letting m → ∞ in this expression we obtain | f n (x) − f (x)| < ε

for all

n ≥ N,

where again N does not depend on x. It follows that sup | f n (x) − f (x)| < ε

for all

n ≥ N,

(4.6)

x∈X

i.e. f n converges uniformly to f on X . It now only remains to show that this limiting function f is bounded; this follows from (4.6), since f N is bounded and  f N − f ∞ < ε. We can now use the fact that closed subspaces of Banach spaces are complete to prove the completeness of the space of all bounded continuous functions from X into K. Corollary 4.10 If (X, d) is any metric space, then Cb (X ; K), the space of all bounded continuous functions from X into K, is complete when equipped with the supremum norm  f ∞ := sup | f (x)|. x∈X

Proof The space Cb (X ; K) is a subspace of Fb (X ; K), which is complete, so by Lemma 4.3 we need only show that Cb (X ; K) is a closed subspace of Fb (X ; K), i.e. if ( f k ) ∈ Cb (X ; K) and  f k − f ∞ → 0, then f is continuous. Given any ε > 0 there exists N such that  f n − f ∞ < ε/3 for every n ≥ N . In particular,  f N − f ∞ < ε/3. Now fix x ∈ X . Since f N ∈ Cb (X ; K), there exists δ > 0 such that d(y, x) < δ implies that | f N (y) − f N (x)| < ε/3. It follows that if d(y, x) < δ, then | f (y) − f (x)| ≤ | f (y) − f N (y)| + | f N (y) − f N (x)| + | f N (x) − f (x)| ε ε ε < + + = ε, 3 3 3 and so f is continuous at x. Since this holds for any x ∈ X , f ∈ Cb (X ; K) as required.

60

Complete Normed Spaces

Corollary 4.11 If (X, d) is any compact metric space, then C(X ; K) is a Banach space when equipped with the maximum norm  f ∞ := max | f (x)|. x∈X

Proof Proposition 2.29 guarantees that a continuous function on a compact metric space is bounded, so C(X ; K) = Cb (X ; K). That the supremum norm is in fact the same as the maximum norm follows from the same proposition, since every continuous function on a compact metric space attains its bounds. Since the supremum norm makes C(X ; K) complete, it is the ‘standard norm’ on this space. However, there are other (useful) norms on C(X ; K) that are not equivalent to the supremum norm (see the next chapter), and C(X ; K) is not complete in these norms. We end this section by proving the completeness of the space of all continuously differentiable functions on [a, b] with an appropriate norm. Theorem 4.12 The space C 1 ([a, b]) of all continuously differentiable functions on [a, b] is complete with the C 1 norm  f C 1 :=  f ∞ +  f  ∞ . Proof Let ( f n ) ∈ C 1 ([a, b]) be a Cauchy sequence in the C 1 norm. Then for any ε > 0 there exists N such that  f n − f m ∞ +  f n − f m ∞ < ε

n, m ≥ N .

(4.7)

Therefore ( f n ) and ( f n ) are both Cauchy sequences in C([a, b]) with the supremum norm, and since this space is complete there exist f, g ∈ C([a, b]) such that f n → f and f n → g. We need to show that g = f  . To do this we use the Fundamental Theorem of Calculus: for every n we have ˆ x f n (x) = f n (a) + f n (t) dt, x ∈ [a, b], (4.8) a

and now we take limits as n → ∞. Since )ˆ x ) ˆ x ˆ x ) )  ) f n (t) dt − g(t) dt )) ≤ | f n (t) − g(t)| dt ≤ (b − a) f n − g∞ ) a

a

a

and  f n − g∞ → 0 as n → ∞, it follows that ˆ x ˆ x f n (t) dt → g(t) dt a

a

4.3 Sequences in Banach Spaces

61

uniformly on [a, b]. Taking limits in (4.8) we therefore obtain ˆ x f (x) = f (a) + g(t) dt, a

f

which shows that = g; in particular, f ∈ C 1 ([a, b]). We can now take m → ∞ in (4.7) to deduce that  f n − f ∞ +  f n − f  ∞ < ε

n ≥ N,

which shows that f n → f in C 1 ([a, b]) as required. In a similar way one can prove the completeness of the space C k ([a, b]) of all k times continuously differentiable functions on [a, b] with the C k norm  f C k =

k 

 f ( j) ∞ ,

j=0

where f ( j) = d j f /dx j .

4.3 Sequences in Banach Spaces In any normed space we have both a notion of convergence and a linear structure, so it is possible to consider the convergence of series. As in the case of real numbers, if (x j ) ∈ X we say that ∞ 

xj = x

j=1

if the partial sums

k

j=1 x j converge to x (in X ) as k → ∞, i.e. * * * * k  * * *x − as k → ∞. xj* * *→0 * * j=1

While we can consider convergence of series in any normed space (whether or not it is complete), the theory is much more satisfactory in Banach spaces. For example, for real sequences we know that any absolutely convergent sequence converges, and there is a corresponding result in Banach spaces. Lemma 4.13 Let X be a Banach space and (x j ) a sequence in X . Then ∞  j=1

x j  < ∞

∞  j=1

x j converges in X.

62

Complete Normed Spaces

Proof Since ∞ j=1 x j  converges, its partial sums form a Cauchy sequence. So for any ε > 0 there exists N such that for m > n ≥ N ) ) ) ) m n m   ) ) )= ) x  − x  x j  < ε. j j ) ) ) j=n+1 ) j=1 j=1 Then

* * * * * m * * m * n m   * * *  * * * * *≤ = x − x x x j  < ε, j j j * * * * * j=1 * * j=n+1 * j=n+1 j=1

and so the partial sums kj=1 x j form a Cauchy sequence in X . Since X is complete, these partial sums converge, i.e. the series converges. The statement of this lemma can in some sense be reversed, providing a useful test for completeness. Lemma 4.14 If (X,  · ) is a normed space with the property that whenever ∞ ∞ j=1 x j  < ∞ the sum j=1 x j converges in X , then X is complete. Proof Suppose that (y j ) is a Cauchy sequence in X . Using Exercise 4.1 (iii) it is enough to show that some subsequence of (y j ) converges to some y ∈ X . Find n 0 such that yi − y j  < 1 for i, j ≥ n 0 , and then choose n k ∈ N inductively such that n k+1 > n k and yi − y j  < 2−k

i, j ≥ n k ;

we can do this because (y j ) is Cauchy. Let x1 = yn 1 and set x j = yn j − yn j−1 for j ≥ 2. Then for j ≥ 2 we have x j  = yn j − yn j−1  < 2−( j−1) ∞ and so ∞ j=1 x j  < ∞, which implies that j=1 x j converges to some element y ∈ X . However, n  j=1

x j = y1 +

n  

yn j − yn j−1 = yn j ,

j=2

and so yn j → y as j → ∞. It follows from Exercise 4.1 (iii) that yk → y as k → ∞, and so X is complete.

4.4 The Contraction Mapping Theorem

63

4.4 The Contraction Mapping Theorem In a complete normed2 space (X, ·) the Contraction Mapping Theorem (also known as Banach’s Fixed Point Theorem) enables us to find a fixed point of any map that is a contraction. We will use this result to prove the existence of solutions for ordinary differential equations in Exercise 4.8, that the collection of all invertible operators is open among all bounded operators (Lemma 11.16), and the Lax–Milgram Lemma (Exercise 12.4). Theorem 4.15 (Contraction Mapping Theorem) Let K be a non-empty closed subset of a complete normed space (X,  · ) and f : K → K a contraction, i.e. a map such that  f (x) − f (y) ≤ κ x − y,

x, y ∈ K ,

(4.9)

for some κ < 1. Then f has a unique fixed point in K , i.e. there exists a unique x ∈ K such that f (x) = x. Proof Choose any x0 ∈ K and set xn+1 = f (xn ). Then x j+1 − x j  ≤ κx j − x j−1  ≤ κ 2 x j−1 − x j−2  ≤ · · · ≤ κ j x1 − x0 ; so if k > j, using the triangle inequality repeatedly, we have xk − x j  ≤

k−1  i= j

xi+1 − xi  ≤

k−1 

κ i x1 − x0  ≤

i= j

κj x1 − x0 . 1−κ

It follows that (xn ) is a Cauchy sequence in X . Since X is complete, xn → x for some x ∈ X , and since (xn ) ∈ K and K is closed we have x ∈ K . Since (4.9) implies that f is continuous, we have f (xn ) → f (x) (using Lemma 2.12) and so if we take limits on both sides of xn+1 = f (xn ) we obtain x = f (x). Any such x must be unique, since if x, y ∈ K with f (x) = x and f (y) = y it follows that x − y =  f (x) − f (y) ≤ κx − y

(1 − κ)x − y = 0,

so x = y. The conclusion of the theorem is no longer valid if we only have  f (x) − f (y) < x − y,

x, y ∈ K

unless K is compact (see Exercise 4.6). 2 The theorem also holds in any complete metric space (X, d) with the obvious changes.

64

Complete Normed Spaces

Exercises 4.1

4.2 4.3 4.4 4.5

Show that in a normed space (X,  · ) (i) any Cauchy sequence is bounded; and (ii) if (xn ) is Cauchy and xn k → x for some subsequence (xn k ), then xn → x. Show that ∞ (K) is complete. Show, using Lemma 4.6, that Kn is complete. Show that c0 (K) is a closed subspace of ∞ (K), and deduce using Lemma 4.3 that c0 (K) is complete. Show that if X is a normed space and U is a closed linear subspace of U , then [x] X/U = inf x + u X u∈U

4.6

defines a norm on the quotient space X/U introduced in Exercise 1.6. Show that if X is complete, then X/U is complete with this norm. (Use Lemma 4.14 for the completeness.) Show that the conclusion of the Contraction Mapping Theorem (Theorem 4.15) need not hold if  f (x) − f (y) X < x − y X ,

4.7

x, y ∈ K ,

but that the result does still hold under this weakened condition if K is compact. Show that if f ∈ C(R; R) and x ∈ C([0, T ]; R), then x˙ = f (x),

with

x(0) = x0 ,

for all t ∈ [0, T ]

(4.10)

if and only if ˆ x(t) = x0 +

t

f (x(s)) ds

for all t ∈ [0, T ].

(4.11)

0

4.8

Suppose that f : R → R is Lipschitz continuous, i.e. satisfies | f (x) − f (y)| ≤ L|x − y|,

x, y ∈ R,

for some L > 0. Use the Contraction Mapping Theorem in the space C([0, T ]; R) on the mapping ˆ t f (x(s)) ds (Jx)(t) = x0 + 0

to show that (4.10) has a unique solution on any time interval [0, T ] with L T < 1. By considering the solution to (4.10) starting at x(T ) deduce that in fact (4.10) has a unique solution that exists for all t ≥ 0.

Exercises

4.9

65

Show that the space C(R; R) is complete when equipped with the metric d( f, g) =

∞  1  f − g[−n,n] , 2n 1 +  f − g[−n,n] n=1

where  f [−n,n] := maxx∈[−n,n] | f (x)|. Show that this metric does not correspond to a norm. (Goffman and Pedrick, 1983)

5 Finite-Dimensional Normed Spaces

In this chapter we will (briefly) investigate finite-dimensional normed spaces. We show that in a finite-dimensional space all norms are equivalent, and that being compact is the same as being closed and bounded. We also show that a normed space is finite-dimensional if and only if its closed unit ball is compact; this proof relies on Riesz’s Lemma, which will prove extremely useful later (in Chapter 24) when we discuss the spectral properties of compact operators.

5.1 Equivalence of Norms on Finite-Dimensional Spaces First we show that all norms are equivalent on finite-dimensional spaces. We make use of the norm defined in Lemma 3.7. Theorem 5.1 If V is a finite-dimensional space, then all norms on V are equivalent. (To prove that all norms on Kn are equivalent is a little easier, since we can just use the standard norm on Kn rather than having to construct the norm  ·  E for V .) Proof Let E := (e j )nj=1 be a basis for V , and let ⎛ x E := ⎝

n 

⎞1/2 |α j |

2⎠

when

j=1

x=

n 

αjej

j=1

be the norm on V defined in Lemma 3.7. Recall for use later that we showed in Lemma 3.22 that (Kn ,  · 2 ) is isometrically isomorphic to (V,  ·  E ) via the mapping : Kn → V given by 66

5.1 Equivalence of Norms on Finite-Dimensional Spaces

(α1 , . . . , αn ) =

n 

αjej.

67

(5.1)

j=1

We show that  ·  and  ·  E are equivalent, which is sufficient to prove the result since equivalence of norms is an equivalence relation (Lemma 3.17). First, using the triangle inequality and the Cauchy–Schwarz inequality from Exercise 2.1, when x = nj=1 α j e j we have * * * n * n * *  * ≤ x = * αjej* |α j |e j  * * j=1 * j=1 ⎞1/2 ⎛ ⎞1/2 ⎛ n n   e j 2 ⎠ ⎝ |α j |2 ⎠ = Cx E , (5.2) ≤⎝ j=1

j=1

1/2 n 2 where C := is a constant that does not depend on x. j=1 e j  This estimate implies that ) ) ) x − y ) ≤ x − y ≤ Cx − y E , and so the map x → x is continuous from (V,  ·  E ) into R. Now, the set {u ∈ Kn with u2 = 1} is a closed and bounded subset of Kn , so compact (Theorem 2.27); therefore S := {x ∈ V : x E = 1} = { (u) : u ∈ Kn with u2 = 1} is compact, since it is the continuous image of a compact set (see Theorem 2.28). Proposition 2.29 therefore guarantees that the map x → x from S into R is bounded below and attains its lower bound on S: there exists α ∈ R such that x ≥ α for every x ∈ S, and x = α for some x ∈ S. This second fact means that α > 0, since otherwise x = 0 (since  ·  is a norm), and this is impossible since x E = 1. It follows that * * * x * * * ≥ αx E . x = x E * x E * Combining this with the inequality x ≤ Cx E we proved above in (5.2) shows that  ·  and  ·  E are equivalent. Using this result we can immediately deduce two important consequences. Theorem 5.2 Any finite-dimensional normed space (V,  · ) is complete.

68

Finite-Dimensional Normed Spaces

Proof The complete space (Kn , ·2 ) is isometrically isomorphic to the space (V,  ·  E ) so (V,  ·  E ) is complete by Lemma 4.4. Since  ·  E is equivalent to  · , it follows from Corollary 4.5 that (V,  · ) is complete. Theorem 5.3 A subset of a finite-dimensional normed space is compact if and only if it is closed and bounded. Proof We showed in Lemma 2.24 that any compact subset of a metric space must be closed and bounded, and any normed space can be considered as a metric space with metric d(x, y) = x − y. Suppose that K is a closed, bounded, subset of (V,  · ). Then it is also a closed bounded subset of (V,  ·  E ), since closure and boundedness are preserved under equivalent norms. If we can show that K is a compact subset of (V,  ·  E ), then it is also compact subset of (V,  · ), since (5.2) shows that the identity map from (V,  ·  E ) into (V,  · ) is continuous. To show that K is a compact subset of (V,  ·  E ) we once again use the map : Kn → V from (5.1) that is an isometric isomorphism between (Kn ,  · 2 ) and (V,  ·  E ), and observe that −1 (K ) is closed (since it is the preimage of a closed set under the continuous map ) and bounded (since is an isometry). It follows that −1 (K ) is compact, since it is a closed bounded subset of Kn (see Theorem 2.27). Since is continuous, it follows that K = ( −1 (K )) is the continuous image of a compact set, and so compact.

5.2 Compactness of the Closed Unit Ball We end this chapter by showing that a normed space is finite-dimensional if and only if its closed unit ball is compact. This means that Theorem 5.3 cannot hold in an infinite-dimensional space. Lemma 5.4 (Riesz’s Lemma) Let (X,  · ) be a normed space and Y a proper 1 closed subspace of X . Then there exists x ∈ X with x = 1 such that x − y ≥ 1/2 for every y ∈ Y . Proof Choose x0 ∈ X \ Y and set d = dist(x0 , Y ) := inf x0 − y. y∈Y

1 A proper subspace of X is a subspace that is not equal to X .

5.2 Compactness of the Closed Unit Ball

69

Since Y is closed, d > 0. Indeed, if dist(x0 , Y ) = 0, then there exists a sequence (yn ) ∈ Y such that yn − x0  → 0, i.e. such that yn → x0 , and then, since Y is closed, we would have x0 ∈ Y . Now choose y0 ∈ Y such that d ≤ x0 − y0  ≤ 2d and set x=

x0 − y0 . x0 − y0 

Clearly x = 1, and for any y ∈ Y we have * * * x0 − y0 * * x − y = * − y * x − y  * 0 0 *  * 1 *(x0 − y0 + yx0 − y0  *. = x0 − y0  Since y0 + yx0 − y0  ∈ Y , it follows that x − y ≥

d 1 d ≥ = . x0 − y0  2d 2

Using this we can prove our promised result characterising finitedimensional spaces. Theorem 5.5 A normed space X is finite-dimensional if and only if its closed unit ball is compact. Proof If X is finite-dimensional, then its closed unit ball is compact, by Theorem 5.3. So we suppose that X is infinite-dimensional and show that its closed unit ball is not compact. Take any x1 ∈ X with x1  = 1. Then the linear span of {x1 } is a proper closed linear subspace of X , so by Lemma 5.4 there exists x2 ∈ X with x2  = 1 and x2 − x1  ≥ 1/2. Now Span{x1 , x2 } is a proper closed linear subspace of X , so there exists x3 ∈ X with x3  = 1 and x3 − x j  ≥ 1/2. One can continue inductively to obtain a sequence (xn ) with xn  = 1 and xi − x j  ≥ 1/2 whenever i = j. No subsequence of the (xn ) can be Cauchy, so no subsequence can converge, from which it follows that the closed unit ball in X is not compact. We can see this non-compactness easily in the  p sequence spaces, which we have already shown are infinite-dimensional (Example 1.16). Whatever value ( j) p of p we choose the elements (e( j) )∞ j=1 from (1.3) all have e  = 1 so form a sequence in the closed unit ball of  p . However, if i = j, then e(i) − e( j)  p = 21/ p ;

70

Finite-Dimensional Normed Spaces

no subsequence of (e( j) ) can be Cauchy, so the closed unit ball is not compact in any of these spaces.

Exercises 5.1

Show that if K is a non-empty open subset of Rn that is (i) convex, (ii) symmetric (x ∈ K implies that −x ∈ K ), and (iii) bounded, then N (x) := inf{M > 0 : M −1 x ∈ K }

5.2

defines a norm on Rn (‘K is the unit ball for N ’). Show that if (V,  · V ) is a normed space, W a vector space, and T : W → V a linear bijection, then xW := T xV

5.3 5.4 5.5

defines a norm on W , and T : (W,  · W ) → (V,  · V ) is an isometry. (So (W,  · W ) and (V,  · V ) are isometrically isomorphic.) Show that any finite-dimensional subspace of a Banach space is closed. Show that if Y is a finite-dimensional subspace of a normed space X and x ∈ X \ Y , then there exists y ∈ Y with x − y = dist(x, Y ). Show that if Y is a subspace of a normed space X and x ∈ X \ Y , then dist(αx, Y ) = |α| dist(x, Y )

for any

α ∈ K,

dist(x + w, Y ) = dist(x, Y )

for any

w ∈ Y.

and

5.6

5.7

Suppose that Y is a proper finite-dimensional subspace of a normed space X . Show that for any y ∈ Y and r > 0 there exists x ∈ X such that x − y = dist(x, Y ) = r . (Megginson, 1998) Use the result of the previous exercise to show that no infinitedimensional Banach space can have a countable Hamel basis. (Megginson, 1998) [Hint: given a Hamel basis {e j }∞ j=1 for X , let X n = Span(e1 , . . . , en ). Now find a sequence (yn ) with yn ∈ X n such that yn − yn−1  = dist(yn , X n−1 ) = 3−n and show that (yn ) is Cauchy in X but its limit cannot lie in any of the X n .] For another, simpler, proof using the Baire Category Theorem, see Exercise 22.1.

6 Spaces of Continuous Functions

We showed in Corollary 4.11 that the space C(X ; K) of continuous functions from a compact metric space X into K is complete with the supremum norm  f ∞ = sup | f (x)|. x∈X

In this chapter we prove some key results about such spaces of continuous functions. First we show that continuous functions on an interval can be uniformly approximated by polynomials (the ‘Weierstrass Approximation Theorem’), which has interesting applications to Fourier series. Then we prove the Stone–Weierstrass Theorem, which generalises this to continuous functions on compact metric spaces and other collections of approximating functions. We end with a proof of the Arzelà–Ascoli Theorem, which characterises compact subsets of C(X ; K).

6.1 The Weierstrass Approximation Theorem In this section we show that any continuous real-valued function on [0, 1] can be uniformly approximated by polynomials. As a consequence we prove the same result for continuous functions on [a, b] and deduce that C([a, b]; R) is separable. We also consider the approximation of continuous functions by Fourier sine and cosine series. We will prove a more general version of this result, the Stone–Weierstrass Theorem, later in this chapter.

71

72

Spaces of Continuous Functions

Theorem 6.1 (Weierstrass Approximation Theorem) If f ∈ C([0, 1]), then the sequence of polynomials1  n   n k f x k (1 − x)n−k Pn (x) = (6.1) k n k=0

converges uniformly to f (x) on [0, 1]. (The sequence of polynomials in (6.1) are known as the Bernstein polynomials.) For the proof we will need the following result. Lemma 6.2 If we define rk (x) :=

n k x (1 − x)n−k , k

then n 

rk (x) = 1

k=0

and

n 

(k − nx)2 rk (x) = nx(1 − x).

k=0

Proof We start with the binomial identity n   n k n−k (x + y)n = x y . k

(6.2)

k=0

Differentiate with respect to x and multiply by x to give  n  n k n−k n−1 nx(x + y) = k x y ; k k=0

differentiate twice with respect to x and multiply by x 2 to give  n  n k n−k 2 n−2 n(n − 1)x (x + y) = k(k − 1) x y . k k=0

It follows, since rk (x) is the right-hand side of (6.2) when we set y = 1 − x, that n  k=0

rk (x) = 1,

n  k=0

krk (x) = nx, and

n 

k(k − 1)rk (x) = n(n − 1)x 2 .

k=0

  n! 1 As is standard, we define n! = 1 · 2 · · · n and n = k (n−k)!k! .

6.1 The Weierstrass Approximation Theorem

73

Therefore n 

(k − nx) rk (x) = n x 2

2 2

k=0

n 

rk (x) − 2nx

k=0

n 

krk (x) +

k=0

n 

k 2rk (x)

k=0

= n x − 2nx · nx + (nx + n(n − 1)x 2 ) 2 2

= nx(1 − x). Using this we can now prove the Weierstrass Approximation Theorem. Proof of Theorem 6.1 Since f is continuous on the compact set [0, 1], it is bounded with | f (x)| ≤ M for some M > 0 (Proposition 2.29). It also follows (see Lemma 2.30) that f is uniformly continuous on [0, 1], so for any ε > 0 there exists a δ > 0 such that |x − y| < δ

| f (x) − f (y)| < ε.

Noting that we can write Pn (x) =

n 

f

k=0

and using the fact that

n

k=0 rk (x)

k rk (x) n

= 1 we have

) ) ) n )  .  n ) ) ) )  k k ) ) ) ) f (x) − f rk (x)) = ) rk (x)) . f ) f (x) − ) ) ) ) n n k=0

k=0

This expression is bounded by ) ) ) ) ) ) ) ) ) ) )  )   . . )  ) ) ) k k ) ) ) ); + f (x) − f f (x) − f r r (x) (x) k k ) ) ) ) n n ) k=0,...,n ) ) k=0,...,n ) )|(k/n)−x|≤δ ) )|(k/n)−x|>δ ) writing this as R1 + R2 we have |R1 | ≤

 k=0,...,n |(k/n)−x|≤δ

)  ) n  ) ) ) f (x) − f k ) rk (x) ≤ ε rk (x) = ε, ) n ) k=0

74

Spaces of Continuous Functions

and for R2 we use Lemma 6.2 to obtain )  )  ) k )) ) |R2 | ≤ ) f (x) − f n ) rk (x) ≤ 2M k=0,...,n |(k/n)−x|>δ



rk (x)

k=0,...,n |(k/n)−x|>δ n 

2M n2δ2

(k − nx)2 rk (x)

k=0

2M 2M x(1 − x) ≤ 2, = nδ 2 nδ since x ∈ [0, 1], and this tends to zero as n → ∞. Note that for a fixed k the coefficient of x k in Pn (x) depends on n. For example, the first approximations of f (x) = 1/2 − |x − 1/2| are P1 (x) = 0,

P2 (x) = P3 (x) = x − x 2 ,

P4 (x) = P5 (x) = x − 2x 3 + x 4 ,

and P6 (x) = x − 5x 4 + 6x 5 − 2x 6 ; see Figure 6.1. 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 6.1 Approximation of f (x) = 12 − |x − 12 | by Bernstein polynomials: shown are P2 , P4 , P6 , and the function f .

Corollary 6.3 Any f ∈ C([a, b]; R) can be uniformly approximated by polynomials.

6.1 The Weierstrass Approximation Theorem

75

Proof The function g(x) := f ((x − a)/(b − a)) is a continuous function on [0, 1]. If it is approximated within ε on [0, 1] by a degree n polynomial q(x), then p(x) := q((x − a)(b − a)) is a degree n polynomial that approximates f on [a, b] to within ε. An immediate consequence of this result is that C([a, b]; R) is separable. Corollary 6.4 The space C([a, b]; R) (equipped with the supremum norm) is separable. Proof The previous corollary guarantees that the linear span of the countable set {x k }∞ k=0 is dense in C([a, b]; R); the result now follows from Lemma 3.23. We now deduce from Theorem 6.1 that we can approximate any continuous function by a Fourier cosine series. The proofs of the following two corollaries follow ideas in Renardy and Rogers (1993). Corollary 6.5 Every continuous function on [0, 1] can be uniformly approximated arbitrarily closely by an expression of the form n 

ck cos(kπ x)

k=0

for some n ∈ N and {ck }nk=0 ∈ R. Proof Given f ∈ C([0, 1]), consider the function g : [−1, 1] → R defined by setting g(cos π x) = f (x). Since the resulting function g is continuous, given any ε > 0 we can use the Weierstrass Approximation Theorem to find n and {ak }nk=0 such that ) ) n ) )  ) n) sup )g(u) − ak u ) < ε. ) u∈[−1,1] ) k=0

It follows that

) ) n ) )  ) ) ak (cos π x)k ) < ε, sup ) f (x) − ) ) x∈[0,1] k=0

and since elementary trigonometric identities can be used to show that cos(π x)n is given by an expression of the form nj=0 b j cos( jπ x) the result follows as stated. (We will give a more elegant (but less ‘elementary’) proof of this result in the next section as a consequence of the Stone–Weierstrass Theorem.)

76

Spaces of Continuous Functions

With the restriction that f (0) = f (1) = 0 we can also approximate continuous functions on [0, 1] using a sine series. Corollary 6.6 Every f ∈ C([0, 1]) with f (0) = f (1) = 0 can be uniformly approximated arbitrarily closely by an expression of the form n 

bk sin(kπ x)

(6.3)

k=1

for some n ∈ N and some {bk }nk=1 ∈ R. Note that if we are to approximate f by a series like (6.3) we must have f (0) = f (1) = 0, since any expression of this form is zero at x = 0 and x = 1. Proof First note that given such a function f and ε > 0 we can find a function g ∈ C([0, 1]) such that  f − g∞ < ε/2 and g = 0 in a neighbourhood of x = 0 and x = 1: use the continuity of f at x = 0 and x = 1 to find δ > 0 such that | f (x)| < ε for all 0 ≤ x < δ and all 1 − δ < x ≤ 1 and set ⎧ ⎪ 0 0≤x 0. For each pair x, y ∈ X there exists f x,y ∈ A such that f x,y (x) = f (x)

and

f x,y (y) = f (y).

If x = y we use the construction in the first paragraph of the proof, while if x = y we can just take f x,y (z) = f (x) for every z ∈ X . For each fixed x ∈ X the set Ux,y = {ξ ∈ X : f x,y (ξ ) < f (ξ ) + ε}

(6.5)

is open and contains y, since y ∈ Ux,y , both f and f x,y are continuous, and Ux,y = ( f x y − f )−1 (−∞, ε). Therefore for each fixed x these sets form an open cover of X : X= Ux,y . y∈X

Since X is compact, this cover has a finite subcover, so there exist y1 , . . . , yn such that n X= Ux,y j . j=1

Now set h x := min( f x,y j ),

80

Spaces of Continuous Functions

i.e. h x (ξ ) = min j=1,...,n f x,y j (ξ ) for each ξ ∈ X . Then ● ● ●

h x ∈ A using Lemma 6.7; h x (x) = f (x) for every x ∈ X (since f x,y j (x) = x for every j); and h x < f + ε, since for every z ∈ X we have z ∈ Ux,y j for some j and then h x (z) ≤ f x,y j (z) < f (z) + ε, using (6.5). Now for each x ∈ X let Vx := {ξ ∈ X : f (ξ ) − ε < h x (ξ )};

(6.6)

then Vx is open and contains x, since x ∈ Vx , h x and f are continuous, and Vx = ( f − h x )−1 (−∞, ε). The collection of all these sets provides an open cover of X ; since X is compact it follows that there exists a finite collection x1 , . . . , xm such that m Vx j . X= j=1

Finally, we set F := max h x j . j

Then ● ● ●

F ∈ A using Lemma 6.7; F < f + ε since h x j < f + ε for each j; and f − ε < F, since for every z ∈ X we have z ∈ Vx j for some j, and then f (z) − ε < h x j (z) ≤ F(z) from (6.6).

In other words, we have found F ∈ A such that f − ε < F < f + ε, i.e. F − f ∞ < ε. This shows that f ∈ A = A. We now show that this result is indeed a generalisation of the Weierstrass Approximation Theorem (Theorem 6.1). Take X = [0, 1] and let A be the set of all polynomials; this is an algebra, since it is a linear subspace of C([0, 1]), 1 is a polynomial, and the product of two polynomials is another polynomial. Moreover, the polynomial x separates points, so A = C([0, 1]; R), i.e. any element of C([0, 1]; R) can be approximated arbitrary closely by polynomials. (Despite the fact that the proof of Lemma 6.7

6.2 The Stone–Weierstrass Theorem

81

using the Weierstrass Approximation Theorem is easier, this argument would be circular if we had proved Lemma 6.7 this way.) For another proof of the Fourier cosine series result in Corollary 6.5, we can take X = [0, 1] and let

n  ak cos(π kx) : ak ∈ R, n ≥ 0 . A= k=0

This is a subalgebra of C([0, 1]), since 1 ∈ A and  1 cos(π(k + j)x) + cos(π(k − j)x) . cos(π kx) cos(π j x) = 2 Furthermore, it separates points, since if x, y are distinct points in [0, 1], then cos(π x) = cos(π y). We now prove a complex version of the Stone–Weierstrass Theorem. Note that for this we will require the additional assumption that A is closed under complex conjugation: if f ∈ A, then f ∈ A. To see that something additional must be required, note that a general continuous complex-valued function cannot be approximated by complex polynomials (expressions of the form nj=0 a j z j with a j ∈ C). For example z cannot be approximated by polynomials, since ˆ z dz = 2π i |z|=1

while

ˆ |z|=1

p(z) dz = 0

for any complex polynomial. (The set of all complex polynomials is not closed under conjugation.) Theorem 6.9 (Complex Stone–Weierstrass Theorem) Suppose that X is compact and A is a subalgebra of C(X ; C) that separates points in X and is closed under conjugation, i.e. f ∈ A implies that f ∈ A. Then A = C(X ; C). Proof We want to show that AR , the elements of A that are real-valued, satisfy the requirements of the real Stone–Weierstrass Theorem. The algebra property is inherited from A itself; we need to show that AR still separates points. So suppose that x, y ∈ X and f ∈ A separates points. Then we have either Re f (x) = Re f (y) or Im f (x) = Im f (y). Since f is closed under conjugation, we have 1 1 (f + f) ∈ A and Im( f ) = ( f − f ) ∈ A; 2 2i these are both elements of C(X ; R), and hence of AR . So AR separates points. Re( f ) =

82

Spaces of Continuous Functions

Now since any element f ∈ C(X ; C) can be written as f 1 + i f 2 , for appropriate f 1 , f 2 ∈ C(X ; R), it follows that we can approximate any element of C(X ; C) by elements of the form φ + iψ with φ, ψ ∈ AR ; and thus by φ + iψ ∈ A. As one application, if we take X = S 1 := {z ∈ C : |z| = 1} ⊂ C and let ⎧ ⎫ n ⎨ ⎬ A= ajz j : aj ∈ C , ⎩ ⎭ j=−n

then 1 ∈ A, A is closed under conjugation (since z = 1/z if |z| = 1), and z ∈ A so A separates points. It follows that A = C(S 1 ; C). Corollary 6.10 Any f ∈ C([−π, π ]) for which f (−π ) = f (π ) can be approximated uniformly on [−π, π ] by an expression of the form n 

ak eikx .

k=−n

Proof Given any f ∈ C([−π, π ]) with f (−π ) = f (π ) we can define a continuous function g : S 1 → R by setting g(eix ) = f (x) (see Exercise 6.9). This continuous g can be approximated uniformly by expressions of the form n 

ajz j;

j=−n

it follows that f can be approximated uniformly by expressions n 

a j ei j x

(6.7)

j=−n

as claimed. Since the expression in (6.7) is equal to a0 +

n 

ak cos(kx) +

k=1

n 

bk sin(kx)

k=1

for some choice of coefficients, this corollary shows that

n n   ak cos(kx) + bk sin(kx) : ak , bk ∈ R, n ≥ 1 A = a0 + k=1

k=1

is uniformly dense in { f ∈ C([−π, π ]) : f (−π ) = f (π )}.

(6.8)

6.3 The Arzelà–Ascoli Theorem

83

(Note that although a priori the coefficients ak , bk in (6.8) can be complex, if f is real, then we can assume that these coefficients are real too, since for any complex function p(x) | f (x) − p(x)|2 = |Re( f (x) − p(x))|2 + |Im( f (x) − p(x))|2 = | f (x) − Re p(x)|2 + |Im p(x)|2 ≥ | f (x) − Re p(x)|2 ; by taking the real part of (6.8) we do not increase the distance to f .) We return to the topic of Fourier series in Lemma 9.16. For much more on this subject see the book by Körner (1989), for example.

6.3 The Arzelà–Ascoli Theorem The Arzelà–Ascoli Theorem characterises precompact (and therefore compact) subsets of C(X ; K), where we say that a set is precompact if its closure is compact. We will use the following general result about precompact sets in the proof. Lemma 6.11 A subset A of a complete normed space (X,  · ) is precompact if and only if any sequence in A has a Cauchy subsequence. Proof Suppose that A is precompact, and that (xn ) ∈ A. Then since (xn ) ∈ A and A is compact, (xn ) has a convergent subsequence, and any convergent sequence is Cauchy. For the other implication, suppose that any sequence in A has a Cauchy subsequence. Take a sequence (yn ) ∈ A; then, using Lemma 2.17, there exist (xn ) ∈ A such that xn − yn  < 1/n. By assumption, (xn ) has a Cauchy subsequence (xn k ); it follows that (yn k ) is Cauchy too, and so converges to a limit y, which is contained in A since this set is closed. This shows that A is compact. Theorem 6.12 (Arzelà–Ascoli Theorem: precompact version) If X is a compact metric space then A ⊂ C(X ; R) is precompact if and only if it is bounded (there exists R > 0 such that  f ∞ ≤ R for all f ∈ A) and equicontinuous, that is for each ε > 0 there exists a δ > 0 such that d X (x, y) < δ

| f (x) − f (y)| < ε

for every

f ∈ A, x, y ∈ X.

84

Spaces of Continuous Functions

fn

x

x

x

x

x

x

x

x

fn3,j

x

x

fn4,j

x

x

fn5,j

x

x

x

x

x

fn1,j

x

x

fn2,j

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Figure 6.2 The argument used to show that ( f j ) has a subsequence ( f m,m ) such that f m,m (xi ) converges for every i. The figure gives an idea of the indices used for each subsequence; the circled elements are the ‘diagonal sequence’ f k,k .

Proof Using the result of Exercise 2.14 we can find a countable set (xk ) with the following property: given any δ > 0, we can guarantee that there is an M(δ) such that for every x ∈ X , d(x, xk ) < δ for some 1 ≤ k ≤ M(δ). Since ( f j ) is bounded, we can use a ‘diagonal argument’ to find a subsequence (which we relabel) such that f j (xk ) converges for every k. The idea is to repeatedly extract subsequences to ensure that f j (xi ) converges for more and more of the (xk ). Since ( f j ) is bounded, we know that f j (x1 ) is a bounded sequence in K, so we can find a subsequence f 1, j such that f 1, j (x1 ) converges. Now we consider f 1, j (x2 ), which is again a bounded sequence in K, so there is a subsequence of f 1, j , which we will label f 2, j , such that f 2, j (x2 ) converges. Since f 2, j (x1 ) is a subsequence of f 1, j (x1 ), this still converges. We continue in this way, extracting subsequences of subsequences, so that f n, j (xi )

converges for all i = 1, . . . , n.

This almost gives what we need, but not quite: although we can find a subsequence that converges at as many of the (xi ) as we wish, in order to obtain a subsequence that converges at all of the (xi ) simultaneously we use a ‘diagonal trick’ and consider the sequence ( f m,m )∞ m=1 . This is a subsequence of the original sequence ( f j ), and a subsequence of ( f n, j ) once m ≥ n; see Figure 6.2. It follows that f m,m (xi )

converges for all i ∈ N.

Set f m∗ = f m,m . We now show that ( f m∗ ) must be Cauchy in the supremum norm. Given ε > 0, since ( f j ) is equicontinuous there exists a δ > 0 such that d(x, y) < δ for every j.

| f j (x) − f j (y)| < ε/3

6.3 The Arzelà–Ascoli Theorem

85

By our construction of the (xi ) there exists an M (depending on δ) such that for every x ∈ X there exists an xi with 1 ≤ i ≤ M such that |x − xi | < δ. Since f n∗ (xi ) converges for every i, there exists N such that for n, m ≥ N we have | f n∗ (xi ) − f m∗ (xi )| < ε/3

1 ≤ i ≤ M,

and we can use the triangle inequality to obtain | f n∗ (x) − f m∗ (x)| = | f n∗ (x) − f n∗ (xi ) + f n∗ (xi ) − f m∗ (xi ) + f m∗ (xi ) − f m∗ (x)| ≤ | f n∗ (x) − f n∗ (xi )| + | f n∗ (xi ) − f m∗ (xi )| + | f m∗ (xi ) − f m∗ (x)| < ε, which shows that ( f n∗ ) is Cauchy in the supremum norm, and so A is precompact. To show the converse, first note that if A is precompact, then it is bounded since its closure is compact. It remains to prove that A is equicontinuous. Since A is compact, for any ε > 0 there exist { f 1 , . . . , f n } such that for every f ∈ A we have  f − f i ∞ < ε/3

for some i ∈ {1, . . . , n}.

Since the f i are all continuous functions on the compact set X they are each uniformly continuous (Lemma 2.30), so there exists δ > 0 such that for every i = 1, . . . , n d(x, y) < δ

| f i (x) − f i (y)| < ε/3

for all x ∈ X.

Then for any f ∈ A choose j such that  f j − f ∞ < ε/3; then whenever d(x, y) < δ we have | f (x) − f (y)| ≤ | f (x) − f j (x)| + | f j (x) − f j (y)| + | f j (y) − f (y)| ≤  f − f j ∞ + | f j (x) − f j (y)| +  f j − f ∞ < ε, so A is equicontinuous. Since a compact set is the closure of a precompact set, the following corollary is immediate. Corollary 6.13 (Arzelà–Ascoli Theorem: compact version) If X is a compact metric space then A ⊂ C(X ; K) is compact if and only if it is closed, bounded, and equicontinuous. As an example, for any L > 0 the collection of all L-Lipschitz functions on any compact metric space X , Lip L (X ; K) := { f ∈ C(X ; K) : | f (x) − f (y)| ≤ L|x − y|},

86

Spaces of Continuous Functions

is a compact subset of C(X ; K). More generally if ω : [0, ∞) → [0, ∞) is any ‘modulus of continuity’, i.e. a non-decreasing function that satisfies lim ω(r ) = ω(0) = 0,

r →0

then { f ∈ C(X ; K) : | f (x) − f (y)| ≤ ω(|x − y|)} is a compact subset of C(X ; K).

Exercises 6.1

Using the upper and lower bounds on the factorial √ 2πn n+1/2 e−n ≤ n! ≤ en n+1/2 e−n (see e.g. Holland, 2016) show that  2k 22k ≤c√ , k k

6.2

where c is a constant that does not √ depend on k. Show that the Taylor expansion of 1 − x about x = 0 is  ∞  √ 2 2k x k+1 1−x =1− k+1 k 4 k=0

and that this converges for all x ∈ [0, 1). (Use Taylor’s Theorem in the form f (x) =

n  f (k) (0) k f (n+1) (c) x + x(x − c)n , k! n!

x > 0,

k=0

6.3 6.4

for some c ∈ (0, x), where f (k) (x) is the kth derivative of f at x.) Use the Weierstrass Approximation Theorem to prove Lemma 6.7. [Hint: approximate |x| uniformly on [−1, 1] using polynomials.] Prove Dini’s Theorem: suppose that ( f n ) ∈ C([a, b]; R) is an increasing sequence ( f n+1 (x) ≥ f n (x)) that converges pointwise to some f ∈ C([a, b]; R). Show that f n converges uniformly to f . (For each ε > 0 consider the sets E n := {x ∈ [a, b] : f (x) − f n (x) < ε} and use the compactness of [a, b].)

Exercises

6.5

Define a sequence of polynomials pn : [−1, 1] → R iteratively, by setting p0 (x) = 0 and pn+1 (x) =

6.6

87

1 2 1 x + pn (x) − pn (x)2 . 2 2

Using the result of the previous exercise show that pn (x) → |x| uniformly on [−1, 1] as n → ∞. (First consider the map f : [−1, 1] → R give by p  → 12 x 2 + p − 12 p 2 . This gives another way to prove Lemma 6.7.) (Pryce, 1973) Let X and Y be compact metric spaces. Use the Stone–Weierstrass Theorem to show that any function f ∈ C(X × Y ) can be uniformly approximated by functions of the form F(x, y) =

n 

f i (x)gi (y),

i=1

6.7

where f i ∈ C(X ) and gi ∈ C(Y ), i = 1 . . . , n. Use the Stone–Weierstrass Theorem to show that any continuous function f ∈ C([a, b] × [c, d]) can be uniformly approximated by functions of the form n  ai j x i y j . F(x, y) = i, j=0

6.8

Suppose that f ∈ C([a, b] × [c, d]; R). Use the result of the previous exercise to show that ˆ bˆ d ˆ dˆ b f (x, y) dx dy = f (x, y) dy dx a

c

c

a

(Pryce, 1973). 6.9 Show that if f ∈ C([−π, π]) with f (−π ) = f (π ), then the function g : S 1 → R defined by setting g(eix ) = f (x) is continuous on S 1 . 6.10 A subset A of a normed space is called totally bounded if for every ε > 0 it is possible to cover A with a finite collection of open balls of radius ε. Show that (i) if A is totally bounded, then A is totally bounded; (ii) any sequence in a totally bounded set has a Cauchy subsequence; (iii) if A is a subset of a complete normed space, then A is compact if and only if A is totally bounded. (Part (iii) could be rephrased as ‘a subset of a complete normed space is precompact if and only if it is totally bounded’.)

88

Spaces of Continuous Functions

6.11 Use the Arzelà–Ascoli Theorem repeatedly to show that if a sequence ( f n ) ∈ Cb (R) is bounded and equicontinuous on R, then it has a subsequence that converges uniformly on all compact subsets of R. 6.12 Suppose that f ∈ Cb (R). Show that for every δ > 0 the function ˆ x+δ 1 f (y) dy f δ (x) := 2δ x−δ is Lipschitz with  f δ ∞ ≤  f ∞ . Show furthermore that f δ converges uniformly to f on every bounded interval. 6.13 Suppose that f ∈ Cb (R) with  f ∞ ≤ M, and that ( f n ) ∈ Cb (R) is a sequence with  f n ∞ ≤ M such that f n → f uniformly on every bounded subinterval in R. Use the Arzelà–Ascoli Theorem to show that if there exist (xn ) ∈ C([0, T ]) such that ˆ t xn (t) = x0 + f n (xn (s)) ds for all t ∈ [0, T ] (6.9) 0

for each n then there exists x ∈ C([0, T ]) such that ˆ t f (x(s)) ds for all t ∈ [0, T ]. x(t) = x0 +

(6.10)

0

6.14 Suppose that f ∈ Cb (R) (this global boundedness condition can be relaxed). Combine the results of the previous two exercises with that of Exercise 4.8 to deduce that the ordinary differential equation x˙ = f (x),

with

x(0) = x0 ,

has at least one solution on [0, T ] for any T > 0.

7 Completions and the Lebesgue Spaces L p ()

In Chapter 4 we saw that C([0, 1]) is complete when we use the supremum norm. But there are other natural norms on this space with which it is not complete. In this chapter we look at one particular example, the L 1 norm (and then at the whole family of L p norms). We use this to motivate the abstract completion of a normed space and the Lebesgue integral, and then define the L p spaces of Lebesgue integrable functions as completions of the space of continuous functions in the L p norm.

7.1 Non-completeness of C([0, 1]) with the L 1 Norm We have already seen in Example 3.13 that for any 1 ≤ p < ∞ the integral expression +ˆ ,1/ p 1 | f (x)| p dx  f  L p := 0

defines a norm (‘the L p norm’) ´on C([0, 1]). In this first section we concen1 trate on the L 1 norm  f  L 1 = 0 | f (x)| dx, and show that C 0 ([0, 1]) is not complete with this norm. To do this we find a sequence ( f n ) ∈ C([0, 1]) that is Cauchy in the L 1 norm but that does not converge to a function in C([0, 1]). Indeed, consider the sequence for n ≥ 2 given by ⎧ ⎪ 0 ≤ x < 1/2 − 1/n ⎪ ⎨0 f n (x) = 1 − n(1/2 − x) 1/2 − 1/n ≤ x ≤ 1/2 ⎪ ⎪ ⎩1 1/2 < x ≤ 1, see Figure 7.1. 89

90

Completions and the Lebesgue Spaces L p ()

1

1/2

Figure 7.1 The function f 10 .

Then for n > m, since the integrands agree everywhere except on the interval (1/2 − 1/m, 1/2) we have ˆ  fn − fm L 1 = ˆ

1

| f n (x) − f m (x)| dx

0

1/2

| f n (x) − f m (x)| dx ≤ 2/m;

1/2−1/m

so this sequence is Cauchy. Suppose that it converges in the L 1 norm to some continuous function f . Then for all n ≥ m we have ˆ  fn − f L 1 = ˆ

1/2

ˆ | f (x) − f n (x)| dx +

0

1/2−1/m

ˆ | f (x)| dx +

0

1

| f (x) − 1| dx

1/2 1

| f (x) − 1| dx.

1/2

Letting n → ∞ it follows that ˆ 0

1/2−1/m

ˆ | f (x)| dx +

1

| f (x) − 1| dx = 0.

1/2

Arguing as for Example 3.13 it follows that f (x) = 0 for 0 ≤ x ≤ 1/2 − 1/m and f (x) = 1 for 1/2 ≤ x ≤ 1. Since this holds for all m, the limit function must satisfy

0 0 ≤ x < 1/2 f (x) = (7.1) 1 1/2 ≤ x ≤ 1. But this function is not continuous, so C([0, 1]) is not complete in the L 1 norm.

7.2 The Completion of a Normed Space

91

7.2 The Completion of a Normed Space We have just seen that C([0, 1]) is not complete in the L 1 norm. In order to obtain a complete space it should be enough to add to C([0, 1]) the limits of all sequences that are Cauchy in the L 1 norm. In such a way we can hope to obtain a complete space in which C([0, 1]) is dense. We have seen a simple example of this earlier: we showed in the proof of Lemma 3.24 that for 1 ≤ p < ∞ the space c00 (of sequences with only a finite number of non-zero terms) is a dense subspace of  p ; so  p is the ‘completion’ of (c00 ,  ·  p ). However, in general we cannot just ‘add the limits of Cauchy sequences’ to X , since, given an abstract normed space, there is ‘nowhere else’ for the limits of these Cauchy sequences to lie. To circumvent this problem we introduce a more abstract notion of completion. Definition 7.1 If (X,  · ) is a normed space, a completion of X is a complete normed space (X ,  · X ) along with a map i : X → X that is an isometric isomorphism of X onto a dense subspace of X . For simplicity we will often write (i, X ) in what follows, suppressing the norm. In simple examples X is a subset of X , and the map i is the identity; this is the case when we say that  p is the completion of c00 in the  p norm. For clarity, we spell this out in the following simple lemma. Lemma 7.2 If (X ,  · X ) is complete and X is a dense subspace of X , then (id, X ) is a completion of X . Proof The identity map id : X → X is an isometric isomorphism of X onto itself, and X is a dense subspace of (X ,  · X ). We now show that every normed space has a completion. Theorem 7.3 Every normed space (X,  · ) has a completion (i, X ). The space (X ,  · X ) that completes X is an abstract one: it consists of equivalence classes of Cauchy sequences in X , where (xn ) ∼ (yn ) if limn→∞ xn − yn  X = 0; the norm on X is defined by setting [(xn )]X = lim xn  X . n→∞

In the proof we have to show that (X ,  · X ) really is a Banach space, and that it contains an isometrically isomorphic copy of X as a dense subset.

92

Completions and the Lebesgue Spaces L p ()

Proof We consider Cauchy sequences in X , writing x = (x1 , x2 , . . .),

x j ∈ X,

for a sequence in X . We say that two Cauchy sequences x and y are equivalent, x ∼ y, if lim xn − yn  X = 0.

n→∞

(7.2)

We let X be the space of equivalence classes of Cauchy sequences in X ; any element ξ ∈ X can be written as [x], where this notation denotes the equivalence class of a Cauchy sequence x = (xn ). It is clear that X is a vector space, since the sum of two Cauchy sequences in X is again a Cauchy sequence in X . We define a candidate for our norm on X : if ξ ∈ X then ξ X := lim xn  X , n→∞

(7.3)

for any x ∈ ξ (recall that ξ is an equivalence class of Cauchy sequences). Note that (i) if x = (xn ) is a Cauchy sequence in X , then (xn ) forms a Cauchy sequence in R, so for any particular choice of x ∈ ξ the right-hand side of (7.3) exists, and (ii) if x, y ∈ ξ then ) ) ) ) ) ) ) ) ) lim xn  − lim yn ) = ) lim (xn  − yn )) n→∞ n→∞ n→∞ ) ) ) ) = lim )xn  − yn ) n→∞

≤ lim xn − yn  = 0 n→∞

since x ∼ y (see (7.2)). So the expression in (7.3) is well defined, and it is easy to check that it satisfies the three requirements of a norm. Now we define a map i : X → X , by setting i(x) = [(x, x, x, x, x, x, . . .)]. Clearly i is linear, and is a bijective isometry between X and its image. We want to show that i(X ) is a dense subset of X . For any given ξ ∈ X , choose some x ∈ ξ . Since x = (xn ) is Cauchy, for any given ε > 0 there exists an N such that xn − xm  X < ε

for all

n, m ≥ N .

In particular, xn − x N  X < ε for all n ≥ N , and so ξ − i(x N )X = lim xn − x N  X ≤ ε, n→∞

which shows that i(X ) is dense in X .

7.2 The Completion of a Normed Space

93

Finally, we have to show that X is complete, i.e. that any Cauchy sequence in X converges to another element of X . (A Cauchy sequence in X is a Cauchy sequence of equivalence classes of Cauchy sequences in X !) Take such a Cauchy sequence, (ξk )∞ k=1 . For each k, find x k ∈ X such that i(xk ) − ξk X < 1/k,

(7.4)

using the density of i(X ) in X . Now let x = (xn ); we will show (i) that x is a Cauchy sequence in X , and so [x] ∈ X , and (ii) that ξk converges to [x]. This will show that X is complete. (i) To show that x is Cauchy, observe that xn − xm  X = i(xn ) − i(xm )X = i(xn ) − ξn + ξn − ξm + ξm − i(xm )X ≤ i(xn ) − ξn X + ξn − ξm X + ξm − i(xm )X 1 1 ≤ + ξn − ξm X + . n m So now given ε > 0 choose N such that ξn − ξm X < ε/3 for n, m ≥ N . If N  = max(N , 3/ε), it follows that xn − xm  X < ε

for all

n, m ≥ N  ,

i.e. x is Cauchy. So [x] ∈ X . (ii) To show that ξk → [x], first write [x] − ξk X ≤ [x] − i(xk )X + i(xk ) − ξk X . Given ε > 0 choose N large enough that xn − xm  X < ε/2 for all n, m ≥ N , and then set N  = max(N , 2/ε). It follows that for k ≥ N  , [x] − i(xk )X = lim xn − xk  < ε/2 n→∞

and i(xk ) − ξk X < ε/2 by (7.4); therefore [x] − ξk X < ε, and so ξk → [x] as claimed. The completion we have just constructed is unique ‘up to isometric isomorphism’. We state the result here, and leave the proof to Exercise 7.4. Lemma 7.4 The completion of X is unique up to isometric isomorphisms: if (i, X ) and (i , X  ) are two completions of X there exists an isometric isomorphism j : X → X  such that i = j ◦ i.

94

Completions and the Lebesgue Spaces L p ()

7.3 Definition of the L p Spaces as Completions Fortunately in many situations there is a more concrete description of the completion. For example, we saw above that  p is the completion of c00 in the  p norm. We now return to the example of C([0, 1]) with the L 1 norm from the beginning of the chapter, which motivated our more abstract discussion of completions. First we need to introduce some basic terminology from measure theory. A subset E of R has measure zero if for every ε > 0 there exists a countable collection {I j }∞ j=1 of open intervals such that E⊆

Ij

and

j=1

∞ 

|I j | < ε,

j=1

where by |I | we denote the length of the interval I . A property holds almost everywhere if it fails on a set of measure zero (which could be empty). Now, suppose we view C([0, 1]) as a subset of F([0, 1]; R), the set of all real-valued functions on [0, 1]. Since F([0, 1]; R) is a much larger space than C([0, 1]), we can hope that sequences ( f n ) ∈ C([0, 1]) that are Cauchy in the L 1 norm have limits that lie in F([0, 1]; R), even if they need not lie in C([0, 1]). It is possible to show (see Corollary B.12) that if ( f n ) is Cauchy in the L 1 norm, then it has a subsequence ( f n j ) that converges ‘almost everywhere’, i.e. at every point in [0, 1] apart from some set E of measure zero: we can therefore define a limiting function by setting f (x) := lim f n j (x) j→∞

at the points where ( f n j (x)) converges, and however we like at the points of E. This gives a candidate f ∈ F([0, 1]; R) for the limit of the sequence ( f n ), and one can show that  fn − f L 1 → 0 as n → ∞. The set of all f constructed in this way we denote by L1 (0, 1). While this gives us a way to construct a ‘limit function’ for any ‘L 1 -Cauchy’ sequence ( f n ), note that different subsequences may require a different measure zero set E, and that the limiting function is arbitrarily defined on E. If we want to ensure that this procedure defines a unique limit we therefore have to identify any two functions in L1 (0, 1) that differ only on a set of measure zero; this identification gives us the Lebesgue space L 1 (0, 1) (so, strictly speaking, elements of L 1 (0, 1) are equivalence classes of functions that agree on sets of measure zero).

7.3 Definition of the L p Spaces as Completions

95

´b Our construction also gives us a way of defining a f dx for any element f ∈ L 1 (0, 1): if f is the ‘limit’ of a Cauchy sequence ( f n ) ∈ C([a, b]) we set ˆ b ˆ b f (x) dx = lim f n (x) dx; n→∞ a

a

this limit exists since )ˆ b ) ˆ ˆ b ) ) ) f n (x) dx − f m (x) dx )) ≤ ) a

a

b

a

| f n (x) − f m (x)| dx =  f n − f m  L 1 ,

and we know that ( f n ) is Cauchy in the L 1 norm. The space L 1 (0, 1) can also be constructed in a more intrinsic way as the space of all ‘Lebesgue integrable functions’ on (0, 1), i.e. measurable functions such that ˆ 1 | f (x)| dx 0

is finite, where the integral is understood in the Lebesgue sense. (The Lebesgue integral is also defined by taking limits, but of ‘simple’ functions that take only a finite number of non-zero values rather than continuous functions; see Appendix B for details.) If we want  ·  L 1 to be a norm on L 1 (0, 1), then we must have f = g in 1 L (0, 1) whenever ˆ 1 | f (x) − g(x)| dx = 0; 0

so we deem two elements of L 1 to be equal if they are equal almost everywhere. (The requirement that we identify functions that agree on sets of measure zero therefore arises whether we define L 1 as a completion or using the theory of Lebesgue integration.) In our discussion above there was nothing particularly special about our choice of the L 1 norm, and we can follow a very similar procedure for the L p norm for any choice of p ∈ [1, ∞). We can also do the same starting with C() for any open subset  of Rn ; we therefore give the following definition. Definition 7.5 The space L p () is the completion of C() in the L p norm ˆ 1/ p  f  L p () := | f (x)| p dx , 

where we identify limits of where.

L p -Cauchy

sequences that agree almost every-

96

Completions and the Lebesgue Spaces L p ()

The following two properties of L p () are immediate consequences of Definition 7.5. Lemma 7.6 For any p ∈ [1, ∞) (i) the space L p () is complete; and (ii) C() is dense in L p (). Defined intrinsically, the space L p () consists of equivalence classes of measurable functions for which the L p norm is finite (with the integral considered in the Lebesgue sense). In this case the proof of Lemma 7.6 is far from trivial; see Section B.3 in Appendix B. Using part (ii) we can prove the separability of L p (a, b) as corollary of the Weierstrass Approximation Theorem (Theorem 6.1). For the separability of L p () when  ⊂ Rn see Lemma B.14. Lemma 7.7 The set of all polynomials P([a, b]) is dense in L p (a, b) for any 1 ≤ p < ∞. In particular, L p (a, b) is separable. Proof We know that P([a, b]) is dense in C([a, b]) in the supremum norm; since ˆ b p p | f (x) − g(x)| p dx ≤ (b − a) f − g∞  f − g L p = a

it follows that P([a, b]) is dense in C([a, b]) in the L p norm, and the density of P([a, b]) in L p (a, b) now follows from Exercise 3.14. The separability of L p (a, b) follows immediately, since P([a, b]) is the linear span of the countable collection {1, x, x 2 , . . .}. There is one final member of the family of the L p spaces that we cannot define as a completion of the space of continuous functions, namely L ∞ (). This is the equivalence classes of measurable functions that agree almost everywhere such that  f  L ∞ := inf{M : | f (x)| ≤ M almost everywhere} < ∞.

(7.5)

Note that the L ∞ norm is the same as the supremum norm for continuous functions on  (see Exercise 7.6) and the space C() is complete. So L ∞ () cannot be the completion of C() in the L ∞ norm, since it contains functions that are not continuous, e.g. the function f in (7.1) is in L ∞ (0, 1) but is not continuous.

Exercises

97

The space L ∞ () is not separable: to see this for L ∞ (0, 1), consider the collection U of functions U := {1[0,t] : t ∈ [0, 1]}, where 1[0,t] (x) = 1 if 0 ≤ x ≤ t and is zero otherwise. This is an uncountable collection of functions, and any two elements f, g ∈ U with f = g have  f − g L ∞ = 1. As in the earlier proof that ∞ is not separable (Lemma 3.24) it follows that L ∞ is not separable.

Exercises 7.1 7.2

Suppose that ( f n ) ∈ C([a, b]) and that f n → f in the supremum norm. Show that  f n − f  L 1 → 0 as n → ∞. Consider the functions ( f n ) ∈ C([0, 1]) defined by setting

1 − nx 0 ≤ x ≤ 1/n f n (x) = 0 1/n < x ≤ 1, see Figure 7.2. Show that f k → 0 in the L 1 norm, but that f k does not converge to zero pointwise on [0,1].

1

10

0.9

9

0.8

8

0.7

7

0.6

6

0.5

5

0.4

4

0.3

3

0.2

2

0.1

1

0

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 7.2 The functions f 10 (left) and g10 (right).

7.3

Consider the functions (gn ) ∈ C([0, 1]) defined by setting ⎧ 2 ⎪ 0 ≤ x ≤ 1/n, ⎪ ⎨n x gn (x) = n(2 − nx) 1/n < x ≤ 2/n, ⎪ ⎪ ⎩0 2/n < x ≤ 1;

98

7.4

7.5

Completions and the Lebesgue Spaces L p () see Figure 7.2. Show that gn → 0 pointwise on [0,1] but gn  L 1 = 1, so gn does not converge to zero in L 1 . Prove Lemma 7.4. [Hint: if x ∈ X with x = limn→∞ i(xn ), then set j(x) = limn→∞ i (x); show this map j : X → X  has the required properties.] Suppose that (X, d) is a metric space, and denote by Fb (X ; R) the normed space of all bounded maps from X into R, equipped with the supremum norm. Show that for any choice of x0 ∈ X the map i : X → Fb (X ; R) given by [i(x)](y) = d(y, x) − d(y, x0 )

7.6

is an isometry from X onto a subset of Fb (X ; R). (It follows that if we let X be the closure of i(X ) in Fb (X ; R), then as a closed subspace of a complete space, (X ,  · ∞ ) is complete; this construction therefore provides a completion (i, X ) of any metric space.) Show that if f ∈ C(), then  f  L ∞ =  f ∞ . (Note that the norm on the left-hand side is the L ∞ norm from (7.5); the norm on the right-hand side is the usual supremum norm.)

PART III Hilbert Spaces

8 Hilbert Spaces

Hilbert spaces – which form the main topic of this part of the book – are a particular class of Banach spaces in which the norm is derived from an inner product. As such they share many properties with the familiar Euclidean spaces Rn .

8.1 Inner Products If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) are two elements of Rn , then their dot (scalar) product is x · y = x1 y1 + · · · + xn yn .

(8.1)

This is one concrete example of an inner product on a vector space. Definition 8.1 An inner product (·, ·) on a vector space V over K is a map (·, ·) : V × V → K such that for all x, y, z ∈ V and for all α ∈ K (i) (ii) (iii) (iv)

(x, x) ≥ 0 with equality if and only if x = 0, (x + y, z) = (x, z) + (y, z), (αx, y) = α(x, y), and (x, y) = (y, x).

Note that ● ●

in a real vector space the complex conjugate in (iv) is unnecessary; in the complex case the restriction that (y, x) = (x, y) implies in particular that (x, x) = (x, x), i.e. that (x, x) is real, and so the requirement that (x, x) ≥ 0 makes sense; and 101

102

Hilbert Spaces

(ii) and (iii) imply that the inner product is linear in its first argument, but because of (iv) it is conjugate-linear in its second argument when K = C, i.e. (x, αy) = α(x, y). An inner-product space is a vector space with an inner product.

Example 8.2 The canonical examples are Rn equipped with the standard dot product as in (8.1) and Cn equipped with the inner product (x, y) =

n 

x j yj.

(8.2)

j=1

Example 8.3 In the space 2 (K) of square summable sequences one can define the inner product of x = (x1 , x2 , . . .) and y = (y1 , y2 , . . .) by setting (x, y) =

∞ 

x j yj.

(8.3)

j=1

This is the key example of an infinite-dimensional inner-product space: we will see why in Theorem 9.18. Proof First observe that the expression on the right-hand side converges: ⎛ ⎞ ∞ ∞    1 |x j y j | ≤ 12 ⎝ |x j |2 + |y j |2 ⎠ = x22 +  y22 < ∞. 2 j=1

j=1

Now we check properties (i)–(iv) from Definition 8.1. For (i) we have (x, x) =

∞ 

|x j |2 ≥ 0

j=1

and (x, x) = 0 implies that x j = 0 for every j, i.e. x = 0. For (ii) (x + y, z) =

∞ ∞   (x j + y j )z j = x j z j + y j z j = (x, z) + ( y, z). j=1

j=1

For (iii) (αx, y) =

∞ 

αx j y j = α

j=1

∞ 

x j y j = α(x, y),

j=1

and for (iv) (x, y) =

∞  j=1

x j yj =

∞  j=1

x j y j = ( y, x).

8.2 The Cauchy–Schwarz Inequality

Example 8.4 The expression

103

ˆ

( f, g) =



f (x)g(x) dx

(8.4)

defines an inner product on the space L 2 (); see Exercise 8.1.

8.2 The Cauchy–Schwarz Inequality Given an inner product on V we can define a map1  ·  : V → [0, ∞) by setting v2 := (v, v).

(8.5)

Note that for 2 and L 2 () the inner products defined in (8.3) and (8.4) produce the usual norms in 2 and L 2 () via (8.5): ˆ ∞  2 |x j | and ( f, f ) L 2 = | f (x)|2 dx. (x, x)2 = 

j=1

We will soon show that  ·  always defines a norm; we say that it is the norm induced by the inner product (·, ·). As a first step we prove the Cauchy–Schwarz inequality for inner products, a generalisation of the familiar inequality |x · y| ≤ x2  y2 for x, y ∈ Rn (see Exercise 2.1). Lemma 8.5 Any inner product satisfies the Cauchy–Schwarz inequality |(x, y)| ≤ xy

for all

x, y ∈ V,

(8.6)

where  ·  is defined in (8.5). Proof If x = 0 or y = 0, then (8.6) is clear; so suppose that x = 0 and y = 0. For any λ ∈ K we have 0 ≤ (x − λy, x − λy) = (x, x) − λ(y, x) − λ(x, y) + |λ|2 (y, y). Setting λ = (x, y)/y2 we obtain |(x, y)|2 |(x, y)|2 + y2 y2 |(x, y)|2 = x2 − , y2

0 ≤ x2 − 2

which implies (8.6). 1 We will show that this defines a norm in Lemma 8.6, but beware of ‘proof by notation’: just because we have denoted the quantity (v, v)1/2 as v does not guarantee (without proof) that

this really is a norm.

104

Hilbert Spaces

In the sequence space 2 the Cauchy–Schwarz inequality gives |(x, y)| ≤ x2  y2 ; by considering instead x  and y with x j = |x j | and y j = |y j | this yields ∞ 

|x j y j | ≤ x2  y2 .

j=1

We will obtain a more general inequality involving other  p spaces later (see Lemma 18.3). The Cauchy–Schwarz inequality in L 2 (a, b) gives )ˆ b ) ˆ b 1/2 ˆ b 1/2 ) ) 2 2 ) ) f (x)g(x) dx ) ≤ | f (x)| dx |g(x)| dx , ) a

a

a

L 2 (a, b).

Again, if we take | f | and |g| rather than f and g this for f, g ∈ shows in particular that if f, g ∈ L 2 (a, b), then f g ∈ L 1 (a, b) with  f g L 1 ≤  f  L 2 g L 2 . We will see something more general in L p spaces later (Theorem 18.4). The Cauchy–Schwarz inequality now allows us to show easily that the map x → x is a norm on V . Lemma 8.6 If V is an inner-product space with inner product (·, ·), then the map  ·  : V → [0, ∞) defined by setting v = (v, v)1/2 defines a norm on V , which we call the norm induced by the inner product. In this way every inner-product space is also a normed space. Proof We check that  ·  satisfies the requirements of a norm in Definition 3.1. Property (i) is clear, since x ≥ 0 and if x2 = (x, x) = 0 then x = 0. Property (ii) is also clear, since αx2 = (αx, αx) = αα(x, x) = |α|2 x2 . Property (iii), the triangle inequality, follows from the Cauchy–Schwarz inequality (8.6), since x + y2 = (x + y, x + y) = x2 + (x, y) + (y, x) + y2 = x2 + 2 Re(x, y) + y2

8.3 Properties of the Induced Norms

105

≤ x2 + 2xy + y2 = (x + y)2 , i.e. x + y ≤ x + y.

8.3 Properties of the Induced Norms Norms derived from inner products must satisfy the ‘parallelogram law’. Lemma 8.7 (Parallelogram law/identity) If V is an inner-product space with induced norm  · , then x + y2 + x − y2 = 2(x2 + y2 )

for all

x, y ∈ V

(8.7)

(Figure 8.1). Proof Simply expand the inner products: x + y2 + x − y2 = (x + y, x + y) + (x − y, x − y) = x2 + (y, x) + (x, y) + y2 + x2 − (y, x) − (x, y) + y2 = 2(x2 + y2 ). It follows that if a norm on space does not satisfy the parallelogram law, it cannot have come from an inner product. For example, the supremum norm on C([0, 1]) does not come from an inner product, since if we consider f (x) = x and g(x) = 1− x (for example), then ( f + g)(x) = 1 and ( f − g)(x) = 2x −1, and so  f + g∞ =  f − g∞ =  f ∞ = g∞ = 1

v

v

y

x+y v

v

v

v

x−y

x

y

x

Figure 8.1 The parallelogram law: x + y2 + x − y2 = 2(x2 + y2 ).

106

Hilbert Spaces

and (8.7) is not satisfied. Similarly, one can show that the  p and L p norms do not come from an inner product when p = 2 (see Exercises 8.5 and 8.6). If a norm is derived from an inner product, we can reconstruct the inner product as follows. Lemma 8.8 (Polarisation identity) Let V be an inner-product space with induced norm  · . Then if V is real, 4(x, y) = x + y2 − x − y2 ,

(8.8)

while if V is complex, 4(x, y) = x + y2 − x − y2 + ix + iy2 − ix − iy2 =

3 

in x + in y2 .

(8.9)

n=0

Proof Rewrite the right-hand sides as inner products, multiply out, and simplify: for example, in the real case x + y2 − x − y2 = (x + y, x + y) − (x − y, x − y) = [x2 + (x, y) + (y, x) + y2 ] − [x2 − (x, y) − (y, x) + y2 ] = 4(x, y). If V is a real/complex inner-product space and  ·  is a norm on V that satisfies the parallelogram law, then (8.8) or (8.9) defines an inner product on V . In other words, the parallelogram law characterises those norms that can be derived from inner products; this is the Jordan–von Neumann Theorem (for the proof in the case of a real Hilbert space see Exercise 8.8). We end this chapter with a very useful result that deals with the interaction between inner products and convergent sequences. Lemma 8.9 If V is an inner-product space, then xn → x and yn → y implies that (xn , yn ) → (x, y). Proof Let  ·  be the norm induced by the inner product in V . Since xn and yn converge, xn  and yn  are bounded. Therefore

8.4 Hilbert Spaces

107

|(xn , yn ) − (x, y)| = |(xn − x, yn ) + (x, yn − y)| ≤ xn − xyn  + xyn − y, which implies that (xn , yn ) → (x, y). In particular, this lemma means that we can swap inner products and sums: if x=

∞ 

xj,

i.e. if

j=1

n 

x j → x,

j=1

then ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ∞ n n n     ⎝ ⎠ ⎝ ⎠ ⎠ ⎝ x j , y = lim x j , y = lim x j , y = lim (x j , y) n→∞

j=1

=

∞ 

j=1

n→∞

j=1

(x j , y).

n→∞

j=1

(8.10)

j=1

8.4 Hilbert Spaces We are now in a position to introduce Hilbert spaces. Definition 8.10 A Hilbert space is a complete inner-product space. (Here ‘complete’ is understood with respect to the norm induced by the inner product.) For the inner-product spaces we have introduced above – Rn , Cn , 2 , and – the norms induced by the inner products are the ‘standard norms’, with which they are complete. So these are all Hilbert spaces. Since closed subspaces of Banach spaces are again Banach spaces (see Lemma 4.3) the same is true of Hilbert spaces. L 2 ()

Lemma 8.11 Suppose that H is a Hilbert space and U is a closed subspace of H . Then U is a Hilbert space when equipped with the inner product from H . From now on if we have a Hilbert space H we will denote its inner product by (·, ·) and the induced norm by  · ; unless stated otherwise we treat the case K = C, since taking K = R only simplifies matters by removing the complex conjugates.

108

Hilbert Spaces

Exercises 8.1

Check that the expression ( f, g) =

8.2

ˆ 

f (x)g(x) dx

(see Example 8.4) defines an inner product on L 2 (). ´ Show that if || =  1 dx < ∞, then L ∞ () ⊂ L 2 () ⊂ L 1 () with  f  L 1 ≤ ||1/2  f  L 2 ≤ || f  L ∞ .

8.3

(8.11)

Show that if H and K are Hilbert spaces with inner products (·, ·) H and (·, ·) K , respectively, then H × K is a Hilbert space with inner product ((x, ξ ), (y, η)) H ×K := (x, y) H + (ξ, η) K .

8.4

Let T : H → K be a linear surjective mapping between two real Hilbert spaces. Use the polarisation identity (8.8) to show that (T x, T y) K = (x, y) H

8.5 8.6 8.7

8.8

for every x, y ∈ H

(8.12)

if and only if T x K = x H for every x ∈ H . (In this case we say that T is unitary.) (Young, 1988) Show that if p = 2 there is no inner product on  p that induces the  p norm. Show that if p = 2 there is no inner product on C([0, 1]) that induces the L p norm. If  ·  is a norm on a vector space X induced by an inner product show that it satisfies Apollonius’s identity, * *2 * * 1 1 2 2 2 * z − x + z − y = x − y + 2 *z − (x + y)* * 2 2 for every x, y, z ∈ X . (One can argue directly, expanding out the inner products, but it is much easier to use the parallelogram law.) Show that if  ·  is a norm on a real vector space V that satisfies the parallelogram identity x + y2 + x − y2 = 2(x2 + y2 ) then

 1 x + y2 − x − y2 4 defines an inner product on V . This result is due to Jordan and von Neumann (1935). [Hint: properties (i) and (iv) from Definition 8.1 are immediate. To prove that x, z + y, z = x + y, z for all x, y, z ∈ V , x, y :=

Exercises

109

i.e. property (ii) from the definition, use the parallelogram identity. Finally, deduce that αx, y = αx, y for every α ∈ Q, x, y ∈ V , and use this to prove the same identity for every α ∈ R (which is property (iii) of an inner product).] (Yosida, 1980) 8.9 Show that if H is a Hilbert space and U a closed linear subspace of H , then H/U is also a Hilbert space. [Hint: Exercise 4.5 shows that H/U is a Banach space; show that the norm satisfies the parallelogram law and hence comes from an inner product.] 8.10 A Banach space is called uniformly convex if for every ε > 0 there exists δ > 0 such that * * *x + y* * * x − y > ε, x = y = 1 ⇒ * 2 * < 1 − δ. Use the parallelogram identity to show that every Hilbert space is uniformly convex.

9 Orthonormal Sets and Orthonormal Bases for Hilbert Spaces

In this chapter we discuss how to define a basis for a general normed space. Since we are in a normed space and have a notion of convergence, we can now allow infinite linear combinations of basis elements (in contrast to the Hamel bases we considered in Chapter 1). We then consider orthonormal sets in inner-product spaces, and orthonormal bases for separable Hilbert spaces, which share much in common with the standard orthonormal basis in Rn .

9.1 Schauder Bases in Normed Spaces In an infinite-dimensional normed space X we cannot hope to find a finite basis, since then the space would by definition be finite-dimensional. Assuming that X is separable, the best that we can hope for is to find a countable basis {e j }∞ j=1 , in terms of which to expand any x ∈ X as x=

∞ 

αjej,

j=1

where the sum converges in X in the sense discussed in Section 3.3. (If X is not separable it cannot have a countable basis; see Lemma 3.23 part (iii).) We now formalise the idea of a basis in a normed space, restricting to separable spaces for simplicity. Definition 9.1 A countable set {e j }∞ j=1 is a Schauder basis for a normed space X if every x ∈ X can be written uniquely1 as x=

∞ 

αjej

for some α j ∈ K.

(9.1)

j=1 1 Equivalently, we could require that (i) every x ∈ X can be written in the form x = ∞ α e j=1 j j and (ii) if ∞ j=1 α j e j = 0, then α j = 0 for every j.

110

9.1 Schauder Bases in Normed Spaces

111

The equality here is to be understood as equality ‘in X ’, i.e. the sum converges in X . Note that if {e j }∞ j=1 is a basis in the sense of Definition 9.1, then the uniqueness of the expansion implies that {e j }∞ j=1 is a linearly independent set: if n  αjej, 0= j=1

then as there is a unique expansion for zero we must have α j = 0 for all j = 1, . . . , n. p Example 9.2 The collection {e( j) }∞ j=1 is a Schauder basis for  for every ∞ 1 ≤ p < ∞, but is not a Schauder basis for  .

Proof We proved the first part of this in the course of Lemma 3.24: in (3.13) we showed if 1 ≤ p < ∞ then for every ε > 0 there exists N > 0 such that * * * * n  * * ( j) * *x − x e j * * n, * *2 * m * m   * * 2 * αjej* = |α j |2 . xm − xn  = * * * j=n+1 * j=n+1

n

j=1 α j e j

we

It follows that (xn ) is a Cauchy sequence, and since H is complete it therefore converges to some x ∈ H . The equality in (9.4) now follows as above. By combining this lemma with Bessel’s inequality we obtain the following convergence result. Corollary 9.13 Let H be a Hilbert space and {en }∞ n=1 an orthonormal set in (x, e )e converges for every x ∈ H . H . Then ∞ n n n=1 Note that ∞ n=1 (x, en )en need not converge to x; we investigate how to ensure this in the next section.

9.4 Orthonormal Bases for Hilbert Spaces If E is a Schauder basis for an inner-product space V and is also orthonormal, we refer to it as an orthonormal basis for V . Note that an orthonormal set {e j }∞ j=1 is a basis provided that every x can be written in the form (9.1), since in this case the uniqueness of the expansion follows from the orthonormality: indeed, if x=

∞  j=1

αjej =

∞  j=1

βjej

∞  (α j − β j )e j = 0 j=1

118

Orthonormal Sets and Orthonormal Bases for Hilbert Spaces

then taking the inner product with ei we have ⎞ ⎛ ∞ ∞   ⎠ ⎝ (α j − β j )e j , ei = (α j − β j )(e j , ei ) = αi − βi , 0= j=1

j=1

where we have used (8.10) to move the inner product inside the summation. So if E is an orthonormal basis we can expect to have x=

∞ 

(x, e j )e j

for every x ∈ H,

j=1

and in this case Parseval’s identity (9.4) from Lemma 9.12 will guarantee that ∞ 

|(x, e j )|2 = x2 .

j=1

We now show that E = {e j }∞ j=1 forms a basis for H if and only if this equality holds for every x ∈ H , and also provide some other equivalent conditions. Proposition 9.14 Let E = {e j }∞ j=1 be an orthonormal set in a Hilbert space H . Then the following statements are equivalent: (a) E is a basis for H ; (b) for any x we have x=

∞ 

(x, e j )e j

for all

x ∈ H;

j=1

(c) Parseval’s identity holds: x2 =

∞ 

|(x, e j )|2

for all

x ∈ H;

j=1

(d) (x, e j ) = 0 for all j implies that x = 0; and (e) clin(E) = H . Part (e) means that the linear span of E is dense in H , i.e. for any x ∈ H and any ε > 0 there exists an n ∈ N and α j ∈ K such that ) ) ) ) n  ) ) )x − ) α e j j ) < ε. ) ) ) j=1 See Exercise 9.8 for an example showing that if E is linearly independent but not orthonormal, then clin(E) = H does not necessarily imply that E is a basis for H .

9.4 Orthonormal Bases for Hilbert Spaces

119

Proof First we show (a)⇔(b). If E is an orthonormal basis for H , then we can write ∞ n   x= α j e j , i.e. x = lim αjej. j→∞

j=1

Clearly if k ≤ n we have

j=1

⎛ ⎞ n  ⎝ α j e j , ek ⎠ = αk ; j=1

taking the limit n → ∞ and using the compatibility of inner products and limits from (8.10) it follows that αk = (x, ek ) and hence (a) holds. The same argument shows that if we assume (b), then this expansion is unique, and so E is a basis. We show that (b) ⇒ (c) ⇒ (d) ⇒ (b), and then that (b) ⇒ (e) and (e) ⇒ (d). (b) ⇒ (c) is immediate from (3.14). (c) ⇒ (d) is immediate since x = 0 implies that x = 0. (d) ⇒ (b) Take x ∈ H and let y=x−

∞ 

(x, e j )e j .

j=1

For each m ∈ N we have, using Lemma 8.9 (continuity of the inner product), ⎛ ⎞ n  (y, em ) = (x, em ) − lim ⎝ (x, e j )e j , em ⎠ n→∞

j=1

=0 since eventually n ≥ m. It follows from (c) that y = 0, i.e. that x=

∞  (x, e j )e j j=1

as required. (b) ⇒ (e) is clear, since given any x and ε > 0 there exists an n such that ) ) ) ) ) ) n ) (x, e j )e j − x ) < ε. ) ) ) ) j=1 (e) ⇒ (d) Suppose that x ∈ H with (x, e j ) = 0 for every j. Choose xn contained in the linear span of E such that xn → x. Then

 x2 = (x, x) = lim xn , x = lim (xn , x) = 0, n→∞

n→∞

since xn is a (finite) linear combination of the e j . So x = 0.

120

Orthonormal Sets and Orthonormal Bases for Hilbert Spaces

Example 9.15 The sequence (e( j) )∞ j=1 defined in (1.3) is an orthonormal basis 2 ( j) for  , since it is clear that if (x, e ) = x j = 0 for all j then x = 0. We now use Proposition 9.14 to show that we can expand any function f ∈ L 2 (−π, π ) as a Fourier series, which will converge to f in L 2 ; we use the orthonormality of the exponential functions in (9.5) to find an explicit expression for the Fourier coefficients. Lemma 9.16 The exponential functions . 1 √ eikx : k ∈ Z 2π

(9.5)

(as in Example 9.6) form an orthonormal basis for L 2 (−π, π ). In particular, any f ∈ L 2 can be written as the Fourier series ∞ 

f =

ck eikx ,

(9.6)

k=−∞

where 1 cn = 2π

ˆ

π

−π

f (x)e−inx dx

and the sum in (9.6) converges in L 2 . Furthermore, we have ˆ π ∞  2 | f (x)| dx = 2π |ck |2 . −π

(9.7)

(9.8)

k=−∞

Before we give the proof, note that we have already shown in Corollary 6.10 that any continuous function f ∈ C([−π, π ]) with f (π ) = f (−π ) can be approximated uniformly by expressions of the form n 

ck eikx

k=−n

and we will use this is the proof below. Note, however, that this requires a different set of coefficients for each approximation. In contrast, taking partial sums of the series on the right-hand side of the equality in (9.6) produces approximating expressions of this form, but in general these will only converge in the L 2 norm and not uniformly. Indeed, in Section 22.3 we will show that there exist 2π -periodic continuous functions for which the Fourier series in (9.6) fails to converge at x = 0 (and, in fact, at ‘very many’ points).

9.4 Orthonormal Bases for Hilbert Spaces

121

Proof We showed in Example 9.6 that these functions are orthonormal in L 2 (−π, π ). To show that they form a basis for L 2 (−π, π ) it is enough to show that their linear span is dense, by part (e) of Proposition 9.14. Given any f ∈ L 2 (−π, π ) and ε > 0, we first use the density of C([−π, π ]) in L 2 (−π, π ) (see Lemma 7.6) to find g ∈ C([−π, π ]) such that  f − g L 2 < ε/3. Now let g(x) ˜ :=

⎧ x+π ⎪ ⎪ ⎨ δ g(−π + δ)

−π ≤ x ≤ −π + δ

g(x) ⎪ ⎪ ⎩ π −x g(π − δ), δ

π − δ < x ≤ π,

−π + δ < x < π − δ

with δ = ε/18g∞ , so that g˜ − g L 2 < ε/3 and g ∈ C([−π, π ]) with g(π ) = g(−π ) = 0. Now we can use Corollary 6.10 to find coefficients α j such that * * * * k  * * ε i jx* *g˜ − αje * < √ . * 3 2π * * j=−k ∞

Then since for any φ ∈ C([−π, π ]) we have ˆ π ˆ π 1/2 |φ(x)|2 dx ≤ φ2∞ dx φ L 2 = −π

it follows that * * * * k  * * i jx* *g˜ − αje * * * * j=−k

1/2

=

−π

L2

* * * * k  √ * * i jx* * ≤ 2π *g˜ − αje * * * j=−k

M}| ≤

9.4

x2 . M2

Extend the argument from the previous exercise to show that if E is an uncountable orthonormal set in an inner product space V , then for each x ∈ V, {e ∈ E : (x, e) = 0} is at most a countable set.

124

9.5

Orthonormal Sets and Orthonormal Bases for Hilbert Spaces Show that if {e j }∞ j=1 is an orthonormal basis for H then (u, v) =

∞ 

(u, e j )(e j , v)

j=1

9.6

for every u, v ∈ H . (This is a more general version of Parseval’s identity.) Suppose that (e j )∞ j=1 is an orthonormal sequence that forms a basis for a Hilbert space H . Show that the ‘Hilbert cube’ ⎧ ⎫ ∞ ⎨ 1⎬ Q := α j e j : |α j | ≤ ⎩ j⎭ j=1

9.7

is a compact subset of H . Use Proposition 9.14 to deduce that if E = {e j }∞ j=1 is an orthonormal set in a Hilbert space, then ⎧ ⎫ ∞ ⎨ ⎬  clin(E) = x ∈ H : x = α j e j , (α j ) ∈ K . ⎩ ⎭ j=1

9.8

Proposition 9.14 shows that if {e j }∞ j=1 is an orthonormal set, then it is a basis if its linear span is dense in H . This exercise gives an example to show that this is not true without the assumption that {e j } is orthonormal. Let (e j )∞ j=1 be an orthonormal sequence that forms a basis for a Hilbert space H . Set fn =

n  1 ej. j j=1

Show that the linear span of { f j } is dense in H , but that { f j } is not a −1 basis for H . (Show that x = ∞ j=1 j e j cannot be written in terms of the { f j }.) 9.9 Suppose that H is a Hilbert space. Show that if (x, z) = (y, z) for all z in a dense subset of H , then x = y. 9.10 Suppose that f, g ∈ L 2 (a, b) are such that ˆ b ˆ b x n f (x) dx = x n g(x) dx a

a

for every n. Use the result of the previous exercise along with Lemma 7.7 to show that f = g in L 2 . (Goffman and Pedrick, 1983; Pryce, 1973)

Exercises

125

9.11 Use the results of Example 9.7 and Corollary 6.5 to show that any real f ∈ L 2 (0, 1) can be written as f (x) =

∞ 

ak cos kπ x,

k=0

where the sum converges in L 2 (0, 1). Find an expression for the coefficients ak . 9.12 Let f (x) = x on [−π, π ]. By finding the Fourier coefficients in the ikx and using the Parseval identity (9.8) expansion f (x) = ∞ k=−∞ ck e show that ∞  1 π2 = . 6 k2 k=1

9.13 Show that any infinite-dimensional Hilbert H space contains a countably infinite orthonormal sequence. 9.14 Use the result of the previous exercise to show that a Hilbert space is finite-dimensional if and only if its closed unit ball is compact. 9.15 Show that any Hilbert space H has an orthonormal basis. (Use Zorn’s Lemma to show that H has a maximal orthonormal subset E = {eα }α∈A , and then show that every element of H can be written as ∞ j=1 a j eα j for some a j ∈ K and α j ∈ A.)

10 Closest Points and Approximation

In this chapter we consider the existence of ‘closest points’ in convex subsets of Hilbert spaces. In particular, this will enable us to define the orthogonal projection onto a closed linear subspace U of a Hilbert space H , and thereby decompose any x ∈ H as x = u + v, where u ∈ U and v ∈ U ⊥ . Here U ⊥ is the ‘orthogonal complement’ of U , U ⊥ := {y ∈ H : (y, u) = 0 for every u ∈ U }.

10.1 Closest Points in Convex Subsets of Hilbert Spaces In a Hilbert space there is always a unique closest point in any closed convex set. (The same result is not true in a general Banach space without additional assumptions; see Exercises 10.2–10.7.) Proposition 10.1 Let A be a non-empty closed convex subset of a Hilbert space H and let x ∈ H \ A. Then there exists a unique aˆ ∈ A such that x − a ˆ = dist(x, A) := inf{x − a : a ∈ A}. Moreover, for every a ∈ A we have Re (x − a, ˆ a − a) ˆ ≤ 0.

(10.1)

See Figure 10.1. Proof Set δ = inf{x − a : a ∈ A} > 0; that this is strictly positive follows from the facts that x ∈ / A and A is closed (see the beginning of the proof of Lemma 5.4). Then we can find a sequence (an ) ∈ A such that 126

10.1 Closest Points in Convex Subsets of Hilbert Spaces

127

A

x

a x

a ˆ

x

x

Figure 10.1 The point aˆ is the unique closest point to x in the convex set A. The inequality in (10.1) means that in Rn , the angle between x − aˆ and a − aˆ is always at least a right angle.

1 . (10.2) n We show that (an ) is a Cauchy sequence by using the parallelogram law (Lemma 8.7) to write   (x − an ) + (x − am )2 + (x − an ) − (x − am )2 = 2 x − an 2 + x − am 2 . x − an 2 ≤ δ 2 +

This gives 2x − (an + am )2 + an − am 2 ≤ 4δ 2 + or an − am 2 ≤ 4δ 2 +

2 2 + m n

* *2 2 2 * * + − 4 *x − 12 (an + am )* . m n

Since A is convex, 12 (an + am ) ∈ A, and so x − 12 (an + am )2 ≥ δ 2 , which implies that 2 2 + . an − am 2 ≤ m n It follows that (an ) is Cauchy; since H is a Hilbert space it is complete, and so an → aˆ for some aˆ ∈ H . Since A is closed, aˆ ∈ A, and taking limits in (10.2) shows that x − a ˆ =δ (since x − a ≥ δ for every a ∈ A). Finally, to prove (10.1), let a be any other point in A; then, since A is convex, (1 − t)aˆ + ta ∈ A for all t ∈ (0, 1). Since aˆ is the closest point to x in A, x − a ˆ 2 ≤ x − [(1 − t)aˆ + ta]2 = (x − a) ˆ − t (a − a) ˆ 2 = x − a ˆ 2 − 2t Re (x − a, ˆ a − a) ˆ + t 2 a − a ˆ 2.

128

Closest Points and Approximation

It follows that Re(a − a, ˆ x − a) ˆ ≤ 0; otherwise for t sufficiently small we obtain a contradiction. Finally, if both aˆ and a  satisfy (10.1), then Re (x − a, ˆ a  − a) ˆ ≤0

Re (x − a  , aˆ − a  ) ≤ 0.

and

Rewriting the second of these as Re (a  − x, a  − a) ˆ ≤ 0 and adding yields Re (a  − a, ˆ a  − a) ˆ ≤0 which implies that a  − a ˆ 2 = 0 so a  = a. ˆ The following corollary shows that we can ‘separate’ x ∈ / A from A by looking at inner products with some v ∈ H . We will reinterpret this result later in Corollary 12.5. Corollary 10.2 Suppose that A is a non-empty closed convex subset of a Hilbert space, and x ∈ / A. Then there exists v ∈ H such that Re (a, v) + d 2 ≤ Re (x, v)

for every a ∈ A,

where d = dist(x, A); see Figure 10.2. Proof Let aˆ be the closest point to x in A, and set v = x − a. ˆ The result now follows from (10.1) since Re (x, v) = Re (aˆ + (x − a), ˆ v) = Re (a, ˆ v) + v2 ≥ Re (a, v) + d 2 . (u, v) + d2 < (x, v)

A

a ˆ (u, v) = (x, v) v (u, v) + d2 = (x, v)

x

Figure 10.2 The sets of points u ∈ H with (u, v) taking various constant values in the case of a real space.

10.2 Linear Subspaces and Orthogonal Complements

129

10.2 Linear Subspaces and Orthogonal Complements If X is a subset of a Hilbert space H , then the orthogonal complement of X in H is X ⊥ = {u ∈ H : (u, x) = 0

x ∈ X }.

for all

Clearly, if Y ⊆ X , then X ⊥ ⊆ Y ⊥ . Note also that X ∩ X ⊥ = {0} if 0 ∈ X , and that this intersection is empty otherwise. We have already remarked (see Section 4.1) that linear subspaces of infinitedimensional spaces are not always closed; however, orthogonal complements are always closed. Lemma 10.3 If X is a subset of H , then X ⊥ is a closed linear subspace of H . Proof It is clear that X ⊥ is a linear subspace of H : indeed, if u, v ∈ X ⊥ and α ∈ K, then (u + v, x) = (u, x) + (v, x) = 0

and

(αu, x) = α(u, x) = 0

for every x ∈ X . To show that X ⊥ is closed, suppose that u n ∈ X ⊥ and u n → u; then for every x ∈ X

 (u, x) = lim u n , x = lim (u n , x) = 0 n→∞

n→∞

(using (8.10)) and so X ⊥ is closed. Note that Proposition 9.14 shows that E is a basis for H if and only if E ⊥ = {0}, since this is just a rephrasing of part (d) of that result: (u, e j ) = 0 for all j implies that u = 0. We now show that given any closed linear subspace U of H , any x ∈ H has a unique decomposition in the form x = u + v, where u ∈ U and v ∈ U ⊥ : we say that H is the direct sum of U and U ⊥ and write H = U ⊕ U ⊥ . Proposition 10.4 If U is a closed linear subspace of a Hilbert space H , then any x ∈ H can be written uniquely as x =u+v

with

u ∈ U,

v ∈ U ⊥,

i.e. H = U ⊕ U ⊥ . The map PU : H → U defined by PU x := u is called the orthogonal projection of x onto U , and satisfies PU2 x = PU x See Figure 10.3.

and

PU x ≤ x

for all x ∈ H.

130

Closest Points and Approximation U⊥

xx

v x

xu x

U

0

Figure 10.3 Decomposing x = u + v with u ∈ U and v ∈ U ⊥ . The point u is the orthogonal projection of x onto U .

Proof If U is a closed linear subspace, then U is closed and convex, so the result of Proposition 10.1 shows that given x ∈ H there is a unique closest point u ∈ U . It is now simple to show that x − u ∈ U ⊥ and then such a decomposition is unique. To show that x − u ∈ U ⊥ we use (10.1). First, given any v ∈ U , we have u ± v ∈ U , so (10.1) yields Re (x − u, ±v) ≤ 0, which shows that Re (x − u, v) = 0. Choosing instead u ± iv ∈ U we obtain Im (x − u, v) = 0, and so (x − u, v) = 0 for every v ∈ U , i.e. x − u ∈ U ⊥ . Finally, the uniqueness follows easily: if x = u 1 + v1 = u 2 + v2 , then u 1 − u 2 = v2 − v1 , and so u 1 − u 2 2 = (u 1 − u 2 , u 1 − u 2 ) = (u 1 − u 2 , v2 − v1 ) = 0, since u 1 − u 2 ∈ U and v2 − v1 ∈ U ⊥ . If PU x denotes the closest point to x in U , then clearly PU2 = PU , and it follows from the fact that (u, x − u) = 0 that x2 = u2 + x − u2 , and so PU x ≤ x, i.e. the projection can only decrease the norm. We will now show that in general X ⊆ (X ⊥ )⊥ ; we can use the decomposition result we have just proved to show that this is an equality if X is a closed linear subspace. Lemma 10.5 If X ⊆ H , then X ⊆ (X ⊥ )⊥ with equality if and only if X is a closed linear subspace of H .

10.3 Best Approximations

131

Proof Any x ∈ X satisfies (x, z) = 0

for every z ∈ X ⊥ ;

so X ⊆ (X ⊥ )⊥ . Now suppose that z ∈ (X ⊥ )⊥ , so that (z, y) = 0 for every y ∈ X ⊥ . If X is a closed linear subspace, then we can use Proposition 10.4 to write z = x + ξ , where x ∈ X and ξ ∈ X ⊥ . But then (since z ∈ (X ⊥ )⊥ ) we have 0 = (z, ξ ) = (x + ξ, ξ ) = ξ 2 , so in fact ξ = 0 and therefore z ∈ X . Finally, it follows from Lemma 10.3 that if X = (X ⊥ )⊥ , then X must be a closed linear subspace. Exercise 10.9 shows that E ⊥ = [clin(E)]⊥ . In particular, whenever X is a linear subspace of H we have (X ⊥ )⊥ = X .

10.3 Best Approximations We now investigate the best approximation of elements of H using (possibly infinite) linear combinations of elements of an orthonormal set E. Exercise 9.7 shows that when E is an orthonormal sequence the set of all such linear combinations is1 precisely clin(E). Since this is a closed subspace of H , by Proposition 10.4 the closest point to any x ∈ H in clin(E) is the orthogonal projection of x onto clin(E). Theorem 10.6 Let E = {e j } j∈J be an orthonormal set, where J = N or (1, 2, . . . , n) . Then for any x ∈ H , the orthogonal projection of x onto clin(E), which is the closest point to x in clin(E), is given by  (x, e j )e j . PE x := j∈J

(Of course, if E is a basis for H , then there is no approximation involved.) Proof Consider x − j∈J α j e j . Then *2 * * *     * * 2 *x − αjej* (x, α j e j ) − (α j e j , x) + |α j |2 * = x − * * * j∈J j∈J j∈J j∈J    = x2 − α j (x, e j ) − α j (x, e j ) + |α j |2 j∈J

j∈J

j∈J

1 Note that this is not necessarily the case when E is not orthonormal; see Exercise 9.8.

132

Closest Points and Approximation = x2 −



|(x, e j )|2

j∈J

 |(x, e j )|2 − α j (x, e j ) − α j (x, e j ) + |α j |2 + j∈J

= x2 −



|(x, e j )|2 +

j∈J



|(x, e j ) − α j |2 ,

j∈J

and so the minimum occurs when α j = (x, e j ) for all j ∈ J. Example 10.7 If E = {e j }∞ j=1 is an orthonormal basis in H , then the best approximation of an element of H in terms of {e j }nj=1 is just given by the partial sum n  (x, e j )e j . j=1

For example, the best approximation of an element x ∈ 2 in terms of {e( j) }nj=1 (elements of the standard basis) is simply n  (x, e( j) )e( j) = (x1 , x2 , . . . , xn , 0, 0, . . .). j=1

Now suppose that E is a finite or countable set that is not orthonormal. We can still find the best approximation to any u ∈ H that lies in clin(E) by using the Gram–Schmidt orthonormalisation process from Proposition 9.9. First we find an orthonormal set E˜ whose linear span is the same as that of E, and then we use the result of Theorem 10.6. Example 10.8 Consider approximation of functions in L 2 (−1, 1) by polynomials of degree up to n. We can start with the set {1, x, x 2 , . . . , x n } and then use the Gram–Schmidt process to construct polynomials that are orthonormal with respect to the L 2 (−1, 1) inner product. We do this here for polynomials up to degree 3. √ We begin with e1 = 1/ 2 and then consider  1 e2 = x − x, √ 2 so e2 2

ˆ

1

1 1 √ =x− 2 2

2 = t dt = ; 3 −1 2

ˆ

1

−1

t dt = x

3 e2 =

3 x. 2

10.3 Best Approximations

Then

+ e3

3

=x − x , 2

2

3x =x − 2 1 = x2 − , 3

ˆ

2

so e3 2

ˆ =

1

−1

−1

1 t − 3 2

1

,3

3 1 x − x 2, √ 2 2 ˆ 1 3 1 3t dt − t 2 dt 2 2 −1

3 x 2

2

1

t5 2t 3 t dt = − + 5 9 9

133

1 √ 2

21 −1

=

8 45

which gives

3  5 3x 2 − 1 . e3 = 8 4 Exercise 10.11 asks you to check that e4 = 78 (5x 3 − 3x) and that this is orthogonal to e1 , e2 , and e3 . Using these orthonormal functions we can find the best approximation of any function f ∈ L 2 (−1, 1) by a degree three polynomial: , + ˆ 7 1 f (t)(5t 3 − 3t) dt (5x 3 − 3x) 8 −1 + ˆ , 5 1 2 + f (t)(3t − 1) dt (3x 2 − 1) 8 −1 + ˆ , ˆ 3 1 1 1 + f (t)t dt x + f (t) dt. 2 −1 2 −1 Example 10.9 The best approximation in L 2 (−1, 1) of f (x) = |x| by a third degree polynomial is f 3 (x) = with  f − f 3  L 2 =

√ 3/4.

15x 2 + 3 , 16

´1 ´1 Proof Since −1 |t|t k dt = 0 for k odd and 2 0 t k+1 dt for k even, we only need to calculate , + ˆ ˆ 1 15x 2 + 3 5 1 5 1 3 (3x 2 − 1) + = . 3t − t dt (3x 2 − 1) + t dt = 4 0 16 2 16 0

134

Closest Points and Approximation

1.2

1

0.8

0.6

0.4

0.2

0 –1

–0.8 –0.6 –0.4 –0.2

0

0.2

0.4

0.6

0.8

1

Figure 10.4 Graph of f (x) = |x| (solid), f (x) = (15x 2 + 3)/16 (dashed), and f (x) = x 2 + 18 (dotted).

The L 2 distance of f 3 from f can be found by integrating ˆ 1 2 3 2 (15x 2 − 16x + 3)2 dx = .  f − f3  = 2 16 16 0 Of course, the meaning of the ‘best approximation’ is that this choice minimises the L 2 norm of the difference. It is not the ‘best approximation’ in terms of the supremum norm: we have f 3 (0) = 3/16, while ) ) ) 1 )) 1 3 2 ) ≤  f − f 3 ∞ sup )|x| − (x + )) = < 8 8 16 x∈[−1,1] (see Figure 10.4). For other examples of orthonormal sets see Section 4.10 in Goffman and Pedrick (1983).

Exercises 10.1

Show that the result on the existence of a unique closest point in Proposition 10.1 is equivalent to the following statement: if K is a non-empty closed convex subset of a Hilbert space that does not contain zero, then K contains a unique element with minimum norm.

Exercises

10.2

135

Let X = C([−1, 1]) with the supremum norm, and let

ˆ 0 ˆ 1 U := f ∈ C([−1, 1]) : f (t) dt = f (t) dt = 0 , −1

0

which is a closed linear subspace of X . Let g be a function in X such that ˆ 0 ˆ 1 g(t) dt = 1 and g(t) dt = −1. −1

0

Show that dist(g, U ) = 1 but that dist(g, f ) > 1 for every f ∈ U , so that there is no closest point to g in U . (Lax, 2002) 10.3 A Banach space is called strictly convex if x, y ∈ X , x = y, with x = y = 1 implies that x + y < 2. Show that if X is strictly convex and U is a closed linear subspace of X , then given any x ∈ / U, any closest point to x in U (should one exist) is unique. 10.4 Show that a uniformly convex Banach space (see Exercise 8.10) is strictly convex. 10.5 Show that if 2 ≤ p < ∞, then for any α, β ≥ 0 α p + β p ≤ (α 2 + β 2 ) p/2 , and deduce, using the fact that t  → |t| p/2 is convex, that ) ) ) ) ) a + b )p ) a − b )p 1 p p ) +) ) ) a, b ∈ C. ) 2 ) ≤ 2 (|a| + |b| ), ) 2 ) Hence obtain Clarkson’s first inequality * * * *p * f + g *p * * * * + * f − g * ≤ 1 ( f  p p + g p p ), L L * 2 * p * 2 * p 2 L L

f, g ∈ L p .

(10.3) (The same inequality also holds in  p , 2 ≤ p < ∞, by a similar argument.) 10.6 Use Clarkson’s first inequality to show that L p is uniformly convex for all 2 ≤ p < ∞. Clarkson’s second equality is valid for the range 1 < p ≤ 2: for all f, g ∈ L p we have * * *q *  1/( p−1) * * * f + g *q * + * f − g * ≤ 1  f  p p + 1 g p p * , L L * 2 * p * 2 * p 2 2 L L where p and q are conjugate; this is much less straightforward to prove than Clarkson’s first inequality (Clarkson, 1936). Use this inequality to show that L p is uniformly convex for 1 < p ≤ 2. (The same arguments work in the  p spaces for 1 < p < ∞.)

136

10.7

10.8 10.9 10.10

10.11

10.12

Closest Points and Approximation

In this exercise we show that if X is a uniformly convex Banach space and K is a closed convex subset of X that does not contain 0, then K has a unique element of minimum norm. Let (kn ) ∈ K be a sequence such that kn → infk∈K k. (i) Set xn = kn /kn  and use the convexity of K to show that  d 1 1 1 + ;  2 (xn + xm ) ≥ 2 kn  km  (ii) deduce that  12 (xn + xm ) → 1 as max(n, m) → ∞; (iii) use the uniform convexity of X to show that (xn ) is a Cauchy sequence; and (iv) finally, use the fact that kn = kn xn to show that (kn ) is also Cauchy. This result implies that if K is a closed convex subset of a uniformly convex Banach space and x ∈ / K , then there exists a unique closest point to x in K (just consider K  = K − x; see the solution of Exercise 10.1). (Lax, 2002) Show that (X + Y )⊥ = X ⊥ ∩ Y ⊥ . Show that E ⊥ = (clin(E))⊥ . Suppose that M is a closed subspace of a Hilbert space H . Show that H/M (see Exercise 8.9) is isometrically isomorphic to M ⊥ via the mapping T : H/M → M ⊥ given by T ([x]) = P ⊥ x, where P ⊥ is the orthogonal projection onto M ⊥ . Continuing the analysis in Example 10.8, show that 3 7 e4 = (5x 3 − 3x). 8 The polynomials from Example 10.8 and the previous exercise are closely related to the Legendre polynomials (Pn ). The nth Legendre polynomial is given by the formula Pn (x) =

1 dn u n (x), 2n n! dx n

u n (x) := (x 2 − 1)n .

( j)

Noting that u n (±1) = 0 for all j = 0, . . . , n, show that ˆ 1 x k u n (x) dx = 0 −1

whenever 0 ≤ k < n, and deduce that (Pm , Pn ) = 0 for m = n. (We use ( j) u n to denote the jth derivative of u n .) (Rynne and Youngson, 2008)

11 Linear Maps between Normed Spaces

We now consider linear maps between general normed spaces: throughout the chapter unless explicitly stated otherwise X and Y are normed spaces with norms  ·  X and  · Y , respectively. We say that a linear map T : X → Y is bounded if T xY ≤ Mx X

for every x ∈ X

for some M > 0; we will show that this is equivalent to continuity of T . After giving a number of examples we show that if Y is a Banach space, then the collection of all bounded linear maps from X into Y is a Banach space when equipped with the norm T  B(X,Y ) = inf{M : T xY ≤ Mx X }. We then discuss inverses of linear maps, pointing out that a bounded linear map can have an unbounded inverse; we reserve the term ‘invertible’ for those maps whose inverse is also bounded. We will return to the particular case of linear maps between Hilbert spaces in the next chapter.

11.1 Bounded Linear Maps Recall from Section 1.5 that if U and V are vector spaces over K, then a map T : U → V is linear if T (αx + βy) = αT x + βT y

for all

α, β ∈ K, x, y ∈ U ;

the collection L(U, V ) of all linear maps from U into V is a vector space; and we write L(U ) for L(U, U ). 137

138

Linear Maps between Normed Spaces

Definition 11.1 A linear map T : (X,  ·  X ) → (Y,  · Y ) is bounded if there exists a constant M such that T xY ≤ Mx X

x ∈ X.

for all

(11.1)

Linear maps defined on finite-dimensional spaces are automatically bounded. Lemma 11.2 If X is a finite-dimensional vector space, then any linear map T : (X,  ·  X ) → (Y,  · Y ) is bounded. Proof If E = {e j }nj=1 is a basis for X , then recall that * * ⎛ ⎞1/2 * * n n  * * * ⎝ αjej* |α j |2 ⎠ * = * * * j=1 j=1 E

defines a norm on X (Lemma 3.7). For any x ∈ X with x = have * ⎛ ⎞* * * n  * * * ⎝ ⎠ T xY = *T αjej * * * * j=1 Y * * * n * n   * * * ≤ =* α T e |α j |T e j  j j * * * j=1 * j=1 Y ⎛ ⎞1/2 ⎛ ⎞1/2 n n   ≤⎝ T e j 2Y ⎠ ⎝ |α j |2 ⎠ j=1

n

n

j=1 α j e j

we

j=1

= Cx E ,

where C := ( j=1 T e j 2Y )1/2 . Since X is finite-dimensional, all norms on X are equivalent (Theorem 5.1); in particular, we have x E ≤ C  x for some C  > 0. It follows that T xY ≤ CC  x X , so T is bounded from (X,  ·  X ) into (Y,  · Y ) as claimed. However, linear operators on infinite-dimensional spaces need not be bounded. For example,1 the linear map T : (c00 , 2 ) → 2 that maps e( j) to j e( j) is not bounded; nor is the linear map from 1 Recall that c is the space of all sequences with only a finite number of non-zero terms; see 00

Example 1.8.

11.1 Bounded Linear Maps

139

(C 1 ([0, 1]),  · ∞ ) → (C([0, 1]),  · ∞ ) given by f → f  , since x n → nx n−1 for any n ∈ N. Lemma 11.3 A linear map T : X → Y is continuous if and only if it is bounded. Proof Suppose that T is bounded; then for some M > 0 T xn − T xY = T (xn − x)Y ≤ Mxn − x X , and so T is continuous. Now suppose that T is continuous; then in particular it is continuous at zero, and so, taking ε = 1 in the definition of continuity, there exists a δ > 0 such that T x ≤ 1 It follows that for z = 0 *  * z δz T z = * *T δ z

for all

x ≤ δ.

* *  * z * *= *T δz * δ * z

* * 1 * ≤ z, * δ

and so T is bounded. The space of all bounded linear maps from X into Y is denoted by B(X, Y ); we write B(X ) for the space B(X, X ) of all bounded linear maps from X into itself. Definition 11.4 The norm in B(X, Y ) or operator norm of a linear map T : X → Y is the smallest value of M such that (11.1) holds, T  B(X,Y ) := inf {M : T xY ≤ Mx X for all x ∈ X } .

(11.2)

The infimum in (11.2) is attained: since for each x ∈ X , T xY ≤ Mx X for every M > T  B(X,Y ) , it follows that T xY ≤ T  B(X,Y ) x X

for all

x ∈ X.

Note that from now on we will use the terms ‘linear map’ and ‘linear operator’ interchangeably. The former is perhaps more common when considering maps between vector spaces, but as many of the examples in applications are differential operators, the latter terminology is more common when considering spectral theory, for example. We now show that (11.2) really does define a norm on B(X, Y ). Lemma 11.5 As defined in (11.2)  ·  B(X,Y ) is a norm on B(X, Y ).

140

Linear Maps between Normed Spaces

Proof If T  B(X,Y ) = 0, then for every x ∈ X we have T xY = 0, which shows that T x = 0 and so T = 0. Since, by definition, (λT )x = λT x, the homogeneity property λT  B(X,Y ) = |λ|T  B(X,Y ) is immediate, and for the triangle inequality observe that (T + S)xY = T x + SxY ≤ T xY + SxY ≤ T  B(X,Y ) x X + S B(X,Y ) x X   = T  B(X,Y ) + S B(X,Y ) x X , from which it follows that T + S B(X,Y ) ≤ T  B(X,Y ) + S B(X,Y ) as required. Lemma 11.6 The norm in B(X, Y ) is also given by T  B(X,Y ) = sup T xY . x X =1

(11.3)

Proof Let us denote by T 1 the value defined in (11.2), and by T 2 the value defined in (11.3). Then, given x = 0, we have * * * * *T x * ≤ T 2 i.e. T xY ≤ T 2 x X , * x * X Y and so T 1 ≤ T 2 . It is also clear that if x X = 1 then T xY ≤ T 1 x X = T 1 , and so T 2 ≤ T 1 . It follows that T 1 = T 2 . We also have T xY ; x =0 x X

T  B(X,Y ) = sup T xY = sup x X ≤1

(11.4)

see Exercise 11.1. We remarked in Section 1.5 that the composition of two linear maps is linear. In a similar way, the composition of two bounded linear maps is another bounded linear map: T ∈ B(X, Y ), S ∈ B(Y, Z )

S ◦ T ∈ B(X, Z ),

since (S ◦ T )x Z ≤ S B(Y,Z ) T xY ≤ S B(Y,Z ) T  B(X,Y ) x X ,

(11.5)

11.2 Some Examples of Bounded Linear Maps

141

and so S ◦ T  B(X,Z ) ≤ S B(Y,Z ) T  B(X,Y ) .

(11.6)

In particular, it follows that if T ∈ B(X ), then T n ∈ B(X ), where T n is T composed with itself n times, and T n  B(X ) ≤ T nB(X ) .

11.2 Some Examples of Bounded Linear Maps When there is no room for confusion we will omit the B(X, Y ) subscript on the norm of a linear map. If T : X → Y , then in order to find T  one can try the following: first show that T xY ≤ Mx X

(11.7)

for some M > 0, i.e. show that T is bounded. It follows that T  ≤ M (since T  is the infimum of all M such that (11.7) holds). Then, in order to show that in fact T  = M, find an example of a particular z ∈ X such that T zY = Mz X . This shows from the definition in (11.4) that T  ≥ M and hence that in fact T  = M. Example 11.7 Consider the right- and left- shift operators sr : 2 → 2 and sl : 2 → 2 , given by sr (x) = (0, x1 , x2 , . . .)

and

sl (x) = (x2 , x3 , x4 , . . .).

Both operators are linear with sr  = sl  = 1. Proof It is clear that the operators are linear. We have sr (x)22 =

∞ 

|xi |2 = x22 ,

i=1

so that sr  = 1, and sl (x)22 =

∞ 

|xi |2 ≤ x22 ,

i=2

so that sl  ≤ 1. However, if we choose an x with x = (0, x2 , x3 , . . .)

142

Linear Maps between Normed Spaces

then we have sl (x)22 =

∞ 

|x j |2 = x22 ,

j=2

and so we must have sl  = 1. In other cases one may need to do a little more; for example, given the bound T xY ≤ Mx X find a sequence (z n ) ∈ X such that T z n Y →M z n  X as n → ∞, which shows, using (11.4), that T  ≥ M and hence that T  = M. Example 11.8 Take X = L 2 (a, b) and, for some g ∈ C([a, b]), define the multiplication operator T from L 2 (a, b) into itself by [T f ](x) := f (x)g(x)

x ∈ [a, b].

Then T is linear and T  B(X ) = g∞ . Proof It is clear that T is linear. For the upper bound on T  B(X ) observe that ˆ b | f (x)g(x)|2 dx T f 2L 2 = a

ˆ

b

=

| f (x)|2 |g(x)|2 dx

a

≤

ˆ max |g(x)|

a≤x≤b

2

b

| f (x)|2 dx;

a

so T f  L 2 ≤ g∞  f  L 2 , i.e. T  B(X ) ≤ g∞ . Now let s be a point at which |g| attains its maximum. Assume for simplicity that s ∈ (a, b), and for each ε > 0 consider

1 |x − s| < ε f ε (x) = 0 otherwise, then T f ε 2 1 = 2 2ε  fε 

ˆ

s+ε

s−ε

|g(x)|2 dx → |g(s)|2

as

ε→0

11.2 Some Examples of Bounded Linear Maps

143

since g is continuous. Therefore in fact T  B(X ) = g∞ . If s = a, then we replace |x − s| < ε in the definition of f ε by a ≤ x < a + ε, and if s = b we replace it by b − ε < x ≤ b; the rest of the argument is identical. Example 11.9 Consider the map from X = C([a, b]) to R given by ˆ b φ(x) f (x) dx, Tf = a

where φ ∈ C([a, b]). Then T is linear with T  B(X ;R) = φ L 1 . Proof Linearity is clear. For the upper bound we have ˆ b  f ∞ |φ(x)| d x =  f ∞ φ L 1 . |T f | ≤ a

The lower bound is a little more involved. Ideally we would choose ⎧ ⎪ ⎪ ⎨+1 φ(x) > 0 f (x) = sign(φ(x)) = 0 φ(x) = 0 ⎪ ⎪ ⎩−1 φ(x) < 0, as for such an f we have  f ∞ = 1 and ˆ b ˆ b φ(x) f (x) dx = |φ(x)| dx = φ L 1 = φ L 1  f ∞ ; a

a

however, the function f will not be continuous if φ changes sign. We therefore consider a sequence of continuous functions f ε that approximate this choice of f , setting φ(x) f ε (x) = |φ(x)| + ε for ε > 0. Since φ attains its maximum on [a, b], we have φ∞ ≤  f ε ∞ ≤ 1. φ∞ + ε Then ˆ a

b

ˆ |φ(x)| dx − a

b

ˆ φ(x) f ε (x) dx =

b

|φ(x)| −

a

ˆ =

a

b

|φ(x)|2 dx |φ(x)| + ε

ε|φ(x)| |φ(x)| + ε

≤ 2(b − a)ε.

144

Linear Maps between Normed Spaces

This shows that )ˆ b ) ) )   ) ) ≥ φ 1 − 2(b − a)ε φ(x) f (x) dx ε L ) ) a   ≥ φ L 1 − 2(b − a)ε  f ε ∞ ; letting ε → 0 guarantees that T  ≥ φ L 1 , yielding the required equality.

We now consider another example to which we will return a number of times in what follows. Example 11.10 Take X := L 2 (a, b) and K ∈ C([a, b] × [a, b]). Define T : X → X as the integral operator ˆ

b

(T f )(x) =

K (x, y) f (y) ds

for all

x ∈ [a, b].

a

Then T is a bounded linear map with ˆ T 2B(X ) ≤

a

b

ˆ

b

|K (x, y)|2 dx dy.

(11.8)

a

Proof The operator T is clearly linear, and )ˆ b )2 ) ) ) ) dx K (x, y) f (y) dy ) ) a a 2 ˆ b ˆ b 1ˆ b |K (x, y)|2 dy | f (y)|2 dy dx ≤ ˆ

T f 2L 2 =

b

a

ˆ = a

a

b

ˆ

a

a b

|K (x, y)|2 dy dx  f 2L 2 ,

where we have used the Cauchy–Schwarz inequality to move from the first to the second line. This yields (11.8). This example can be extended to treat the case K ∈ L 2 ((a, b) × (a, b)); we now need to appeal to the Fubini–Tonelli Theorem (Theorem B.9) to justify the integration steps. Note that this upper bound on the operator norm can be strict; see Exercise 11.10.

11.3 Completeness of B(X, Y ) When Y Is Complete

145

11.3 Completeness of B(X, Y ) When Y Is Complete The space B(X, Y ) (with the norm defined in (11.2)) is a Banach space whenever Y is a Banach space. Remarkably this does not depend on whether the space X is complete or not. Theorem 11.11 If X is a normed space and Y is a Banach space, then B(X, Y ) is a Banach space. Proof Given any Cauchy sequence (Tn ) in B(X, Y ) we need to show that Tn → T for some T ∈ B(X, Y ). Since (Tn ) is Cauchy, given ε > 0 there exists an Nε such that Tn − Tm  B(X,Y ) ≤ ε

for all

n, m ≥ Nε .

(11.9)

We now show that for every fixed x ∈ X the sequence (Tn x) is Cauchy in Y . This follows since Tn x − Tm xY = (Tn − Tm )xY ≤ Tn − Tm  B(X,Y ) x X ,

(11.10)

and (Tn ) is Cauchy in B(X, Y ). Since Y is complete, it follows that Tn x → z for some z ∈ Y , which depends on x. We can therefore define a mapping T : X → Y by setting T x = z. Now that we have identified our expected limit we need to make sure that T ∈ B(X, Y ) and that Tn → T in B(X, Y ). First, T is linear since for any x, y ∈ X , α, β ∈ K, T (αx + βy) = lim Tn (αx + βy) = α lim Tn x + β lim Tn y n→∞

n→∞

n→∞

= αT x + βT y. To show that T is bounded take n, m ≥ Nε (from (11.9)) in (11.10), and let m → ∞. Since Tm x → T x this limiting process shows that Tn x − T xY ≤ εx X .

(11.11)

Since (11.11) holds for every x, it follows that Tn − T  B(X,Y ) ≤ ε

for n ≥ Nε .

(11.12)

In particular, TNε − T ∈ B(X, Y ), and since B(X, Y ) is a vector space and we have TNε ∈ B(X, Y ), it follows that T ∈ B(X, Y ). Finally, (11.12) also shows that Tn → T in B(X, Y ).

146

Linear Maps between Normed Spaces

11.4 Kernel and Range Given a linear map T : X → Y , recall (see Definition 1.20) that we define its kernel to be Ker(T ) := {x ∈ X : T x = 0} and its range to be Range(T ) := {y ∈ Y : y = T x for some x ∈ X }. Lemma 11.12 If T ∈ B(X, Y ), then Ker T is a closed linear subspace of X . Proof Given any x, y ∈ Ker(T ) we have T (αx + βy) = αT x + βT y = 0. Furthermore, if xn → x and T xn = 0 (i.e. xn ∈ Ker(T )), then since T is continuous T x = limn→∞ T xn = 0, so Ker(T ) is closed. For a more topological proof, we could simply note that Ker(T ) = T −1 ({0}) and so is the preimage under T of {0}, which is a closed subset of Y , and so Ker(T ) is closed in X since T is continuous (Lemma 2.13). While T ∈ B(X, Y ) implies that Ker(T ) is closed, the same is not true for the range of T : it need not be closed. Indeed, consider the map from 2 into itself given by 

x2 x3 x4 (11.13) T x = x1 , , , , . . . . 2 3 4 Since T x22 =

∞ ∞   1 2 |x | ≤ |x j |2 = x22 , j j2 j=1

j=1

we have T  ≤ 1 and T is bounded. Now consider y(n) ∈ Range(T ), where  1 1 1 (n) y = T (1, 1, 1, . . . , 1, 0 . . .) = 1, , , . . . , , 0, . . . . % &' ( 2 3 n first n terms We have y(n) → y, where y is the element of 2 with y j = j −1 (observe −2 < ∞). However, there is no x ∈ 2 such that that y ∈ 2 since ∞ j=1 j T (x) = y: the only candidate is x = (1, 1, 1, . . .), but this is not in 2 since its / Range(T ). 2 norm is not finite, so y ∈

11.5 Inverses and Invertibility

147

11.5 Inverses and Invertibility We have already discussed the existence of inverses for linear maps between general vector spaces in Section 1.5, and we showed that a bijective linear map T : X → Y has an inverse T −1 : Y → X that is also linear (Lemma 1.24). However, in the context of bounded linear maps between normed spaces there is no guarantee that the inverse is also bounded. As a somewhat artificial example, consider the subspace c00 of ∞ consisting of sequences with only a finite number of non-zero terms. Then the linear map T : c00 → c00 defined by setting T e(n) =

1 (n) e n

is bijective and bounded (it has norm 1). However, the inverse map is not bounded, since it maps e(n) to ne(n) . To avoid this sort of pathology we incorporate boundedness into our requirement of invertibility. This gives rise to the somewhat strange situation in which T ∈ B(X, Y ) can ‘have an inverse’ but not be ‘invertible’. Definition 11.13 An operator T ∈ B(X, Y ) is invertible if there exists an S ∈ B(Y, X ) such that2 ST = I X and T S = IY , and then T −1 = S is the inverse of T . Before continuing we make the (almost trivial but very useful) observation that for any non-zero α ∈ K, T is invertible if and only if αT is invertible. We now relate the concept of invertibility from this definition and the usual notion of an inverse when T : X → Y is a bijection. We will see later in Theorem 23.2 that if X and Y are both Banach spaces and T : X → Y is a bijection, then T −1 is automatically bounded. Lemma 11.14 Suppose that X and Y are both normed spaces. Then for any T ∈ B(X, Y ), the following are equivalent: (i) T is invertible; (ii) T is a bijection and T −1 ∈ B(Y, X ); (iii) T is onto and for some c > 0 T xY ≥ cx X

for every x ∈ X.

2 As before, we denote by I : X → X the identity map from X to itself. X

(11.14)

148

Linear Maps between Normed Spaces

Proof We showed in Lemma 1.24 that (i) implies (ii) apart from the boundedness of T −1 , but this is part of the definition of invertibility in Definition 11.13. That (ii) implies (i) is clear. We now show that (i) ⇒ (iii). If T is invertible, then it is onto (by (i) ⇒ (ii)), and since T −1 ∈ B(Y, X ) we have T −1 y X ≤ MyY for every y ∈ Y , for some M > 0; choosing y = T x we obtain x X ≤ MT xY , which yields (iii) with c = 1/M. Finally, we show that (iii) ⇒ (ii). Note first that the lower bound on T x in (iii) implies that T is one-to-one, since if T x = T x  , then 0 = T (x − x  )Y ≥ cx − x   X

x = y.

Since T is assumed to be onto, it is therefore a bijection. That the resulting inverse map T −1 : Y → X is bounded follows if we set x = T −1 y in (11.14), since this yields yY ≥ cT −1 y X . Corollary 11.15 If X is finite-dimensional, then a linear operator T : X → X is invertible if and only if Ker(T ) = {0}. Proof Since X is finite-dimensional, we can use Lemma 1.22, which guarantees that T is injective if and only if it is surjective; injectivity is equivalent to bijectivity in this case, and T is injective if and only if Ker(T ) = {0} (Lemma 1.21). Under this condition T −1 exists, and is linear by Lemma 1.24. Lemma 11.2 guarantees that T −1 is bounded, and so T is invertible. We can use the equivalence between (i) and (iii) in Lemma 11.14 to prove the following useful result. Lemma 11.16 If X and Y are Banach spaces and T ∈ B(X, Y ) is invertible, then so is T + S for any S ∈ B(X, Y ) with ST −1  < 1. Consequently, the subset of B(X, Y ) consisting of invertible operators is open. For an alternative (and more ‘traditional’) proof see Exercises 11.5 and 11.6. Proof Suppose that T ∈ B(X, Y ) is invertible; then by (i)⇒(iii) we know that T is onto and that 1 x X . T xY ≥ T −1  We will show that for any S ∈ B(X, Y ) with ST −1  = α < 1, T + S is invertible. First we show that T + S is onto: given y ∈ Y , we want to ensure that there is an x ∈ X such that (T + S)x = y.

11.5 Inverses and Invertibility

149

Consider the map I : X → X defined by setting x → I(x) := T −1 (y − Sx). Then

* * * * I(x) − I(x  ) X = *T −1 (y − Sx) − T −1 (y − Sx  )* X * * * −1  * = *T S(x − x )* X

≤ T −1 Sx − x   X = αx − x   X , where α < 1 by assumption. Since X is a Banach space, we can use the Contraction Mapping Theorem (Theorem 4.15) to ensure that there is a unique x ∈ X such that x = I(x), i.e. such that x = T −1 (y − Sx). Applying T to both sides guarantees that y = (T + S)x and so T + S is onto. We now just have to check that T + S is bounded below in the sense of (iii). Note that since ST −1  < 1 we have 1 T −1 

− S = c > 0.

Therefore (T + S)xY ≥ T xY − SxY 1 x X − Sx X = cx X . ≥ T −1  Now using (iii)⇒(i) we deduce that (T + S) is invertible. We finish this section by considering inverses of products. Lemma 11.17 If T ∈ B(X, Y ) and S ∈ B(Y, Z ) are invertible, then so is ST ∈ B(X, Z ), and (ST )−1 = T −1 S −1 . Proof We have T −1 ∈ B(Y, X ) and S −1 ∈ B(Z , Y ), so (see (11.6)) T −1 S −1 ∈ B(Z , X ), and T −1 S −1 ST = I X

and

ST T −1 S −1 = I Z .

Two operators T, S ∈ L(X, X ) commute if T S = ST . We make the simple observation that if S and T commute and T is invertible, then S commutes with T −1 ; indeed, since ST = T S, we have

150

Linear Maps between Normed Spaces T −1 [ST ]T −1 = T −1 [T S]T −1

T −1 S = ST −1 .

Proposition 11.18 If {T1 , . . . , Tn } are commuting operators in B(X ), then T1 · · · Tn is invertible if and only if every T j , j = 1, . . . , n, is invertible. Proof One direction follows from Lemma 11.17 and induction. For the other direction, suppose that T = T1 · · · Tn is invertible; since T1 commutes with T it also commutes with T −1 , and so T1 [T −1 T2 · · · Tn ] = T −1 T1 T2 · · · Tn = T −1 T = I. Since {T1 , . . . , Tn } commute we have [T −1 T2 · · · Tn ]T1 = T −1 T = I as well. All the operators are bounded, so T1 is invertible. For other values of j we can use the fact that the {T j } commute to reorder the factors of T so that the first is T j . We can now apply the same argument to show that T j is invertible. No such result is true if the operators do not commute: if we consider the left and right shifts sl and sr on 2 , then sl sr is the identity (so clearly invertible), but neither sl nor sr are invertible. (The left shift sl is not injective, and the right shift sr is not surjective.)

Exercises 11.1

Show that T  B(X,Y ) = sup T xY x X ≤1

and T xY . x =0 x X

T  B(X,Y ) = sup 11.2

Let X = Cb ([0, ∞)) with the supremum norm. Show that the map T : X → X defined by setting [T f ](0) = f (0) and

Exercises

[T f ](x) =

11.3

1 x

11.5

ˆ

x

f (s) ds

0

is linear and bounded with T  B(X ) = 1. Show that T ∈ L(X, Y ) is bounded if and only if ⎛ ⎞ ∞ ∞   Txj = T ⎝ xj⎠ j=1

11.4

151

j=1

whenever the sum on the right-hand side converges. (Pryce, 1973) Suppose that (Tn ) ∈ B(X, Y ) and (Sn ) ∈ B(Y, Z ) are such that Tn → T and Sn → S. Show that Sn Tn → ST in B(X, Z ). Suppose that X is a Banach space and T ∈ B(X ) is such that ∞ 

T n  B(X ) < ∞.

j=1

Show that (I − T )−1 = I + T + T 2 + · · · =

∞ 

T j.

j=0

(This is known as the Neumann series for (I − T )−1 .) In the case that T  < 1 deduce that (I − T )−1  ≤ (1 − T )−1 . 11.6

Use the result of the previous exercise to show that if X and Y are Banach spaces and T ∈ B(X, Y ) is invertible, then so is T + S for any S ∈ B(X, Y ) with ST −1  < 1, and then (T + S)−1  ≤

11.7

T −1  . 1 − ST −1 

(11.15)

(This is the usual way to prove Lemma 11.16.) Suppose that K ∈ C([a, b] × [a, b]) with K ∞ ≤ M. Show that the operator T defined on X := C([a, b]) by setting ˆ x [T f ](x) = K (x, y) f (y) dy a

is a bounded linear operator from X into itself. Show by induction that |T n f (x)| ≤ M n  f ∞

(x − a)n n!

152

Linear Maps between Normed Spaces

and use the result of Exercise 11.5 to deduce that the equation ˆ x f (x) = g(x) + λ K (x, y) f (y) dy (11.16) a

has a unique solution f ∈ X for any g ∈ X and any λ ∈ R. Show that if T ∈ B(X, Y ) is a bijection, then T is an isometry if and only if T  B(X,Y ) = T −1  B(Y,X ) = 1. 11.9 Suppose that X is a Banach space, Y a normed space, and take some T ∈ B(X, Y ). Show that if there exists α > 0 such that T x ≥ αx, then Range(T ) is √ closed. (Rynne and Youngson, 2008) √ 3/2x, which are orthonormal 11.10 Let e1 (x) = 1/ 2 and e2 (x) = functions in L 2 (−1, 1), and set

11.8

K (x, y) = 1 + 6x y = 2e1 (x)e1 (y) + 4e2 (x)e2 (y). Show that the norm of the operator T : L 2 (−1, 1) → L 2 (−1, 1) defined by setting ˆ 1 T f (t) := K (x, y) f (y) dy −1

is strictly less than K  L 2 ((−1,1)×(−1,1)) . 11.11 Show that if X is a Banach space and T ∈ B(X ), then exp(T ) :=

∞  Tk k! k=0

defines an element of B(X ).

12 Dual Spaces and the Riesz Representation Theorem

If X is a normed space over K, then a linear map from X into K is called a linear functional on X . Linear functionals therefore take elements of X (which could be a very abstract space) and return a number. It is one of the central observations in functional analysis that understanding all of these linear functionals on X (the ‘dual space’ X ∗ ) gives us a good understanding of the space X itself. In this chapter we concentrate on linear functionals on Hilbert spaces and show that any bounded linear functional f : H → K must actually be of the form f (x) = (x, y) for some y ∈ H (this is the Riesz Representation Theorem). We also discuss a more geometric interpretation of this result and show how linear functionals are closely related to hyperplanes (sets of codimension one) in H .

12.1 The Dual Space We denote by X ∗ the collection of all bounded linear functionals on X , i.e. X ∗ = B(X, K); we equip X ∗ with the norm  f  X ∗ = sup | f (x)| x=1

for each f ∈ X ∗ ,

i.e. the standard norm in B(X, K). The space X ∗ is called1 the dual (space) of X . 1 Strictly speaking there is a distinction to be made between the ‘algebraic dual’ of X , which is

the collection of all linear functionals on X , and the ‘normed dual’ of X , which is the normed space formed by this collection of all bounded linear functionals, equipped with the B(X, K) norm.

153

154

Dual Spaces and the Riesz Representation Theorem

Example 12.1 Take X = Rn . Then if e( j) is the jth coordinate vector, we have x = nj=1 x j e( j) , and so if f : Rn → R is linear, then ⎛ ⎞ n n   f (x) = f ⎝ x j e( j) ⎠ = x j f (e( j) ); j=1

j=1

if we write y for the element of Rn with y j = f (e( j) ), then we can write this as n  x j y j = (x, y). (12.1) f (x) = j=1

So with any f ∈ since

(Rn )∗

we can associate some y ∈ Rn such that (12.1) holds;

| f (x)| ≤  y2 x2

and

| f ( y)| =  y22 ,

it follows that  f (Rn )∗ =  y2 . In this way (Rn )∗ ≡ Rn . Example 12.2 Let X be L 2 (a, b), take any φ ∈ L 2 (a, b), and consider the map f : L 2 (a, b) → R defined by setting ˆ b f (u) = φ(t)u(t) dt. a

Then

)ˆ ) | f (u)| = ))

b

a

) ) φ(t)u(t) dt )) = |(φ, u) L 2 | ≤ φ L 2 u L 2 ,

using the Cauchy–Schwarz inequality, and so f ∈ X ∗ with  f  X ∗ ≤ φ L 2 . If we choose u = φ/φ L 2 , then u L 2 = 1 and ˆ b |φ(t)|2 | f (u)| = dt = φ L 2 , a φ L 2 and so  f  X ∗ = φ L 2 . This example shows that any element u of L 2 gives rise to a bounded linear functional on L 2 , which is defined by taking the inner product with u. It is natural to ask whether any bounded linear functional on L 2 can be obtained in this way, and remarkably this is true, not only for L 2 but for any Hilbert space. We will prove this result, the Riesz Representation Theorem, in the

12.2 The Riesz Representation Theorem

155

next section: it is one of the most useful fundamental properties of Hilbert spaces.

12.2 The Riesz Representation Theorem In an abstract Hilbert space H we can generalise Examples 12.1 and 12.2 to give a very important example of a linear functional on H (i.e. an element of H ∗ ). In fact we will prove that any element of H ∗ must be of this particular form. Lemma 12.3 If H is a Hilbert space over K and y ∈ H , then the map f y : H → K defined by setting f y (x) = (x, y)

(12.2)

is an element of H ∗ with  f y  H ∗ = y H . Note that this shows in particular that x = maxy=1 |(x, y)|. Proof The map f y is linear since the inner product is always linear in its first argument (although conjugate-linear in its second argument when K = C). Using the Cauchy–Schwarz inequality we have | f y (x)| = |(x, y)| ≤ xy and so it follows that f y ∈ H ∗ with  f y  H ∗ ≤ y. Choosing x = y in (12.2) shows that | f y (y)| = |(y, y)| = y2 and hence  f y  H ∗ = y. The Riesz map R : H → H ∗ given by setting R(y) = f y is therefore an isometry from H into H ∗ ; it is linear when H is real, and conjugate-linear when H is complex (because in this case y → (x, y) is conjugate-linear). The Riesz Representation Theorem shows that the map R is onto, so that this example can be ‘reversed’, i.e. every linear functional on H can be realised as an inner product with some element y ∈ H . Theorem 12.4 (Riesz Representation Theorem) If H is a Hilbert space, then for every f ∈ H ∗ there exists a unique element y ∈ H such that f (x) = (x, y)

for all

x ∈ H;

and y H =  f  H ∗ . In particular, the Riesz map R : H → (12.2) by setting R(y) = f y maps H onto H ∗ .

(12.3) H∗

defined via

156

Dual Spaces and the Riesz Representation Theorem

Note if H is real, then R is a bijective linear isometry and H ≡ H ∗ . Proof Let K = Ker f ; since f is bounded this is a closed linear subspace of H (Lemma 11.12). We claim that K ⊥ is a one-dimensional linear subspace of H . Indeed, given u, v ∈ K ⊥ we have  f f (u)v − f (v)u = f (u) f (v) − f (v) f (u) = 0, (12.4) since f is linear. Since u, v ∈ K ⊥ , it follows that f (u)v − f (v)u ∈ K ⊥ , while (12.4) shows that f (u)v − f (v)u ∈ K . Since K ∩ K ⊥ = {0}, it follows that f (u)v − f (v)u = 0, and so u and v are linearly dependent. Therefore we can choose z ∈ K ⊥ such that z = 1, and use Proposition 10.4 to decompose any x ∈ H as x = (x, z)z + w

with

w ∈ (K ⊥ )⊥ = K ,

where we have used Lemma 10.5 and the fact that K is closed to guarantee that (K ⊥ )⊥ = K . Thus f (x) = (x, z) f (z) = (x, f (z)z), and setting y = f (z)z we obtain (12.3). To show that this choice of y is unique, suppose that (x, y) = (x, yˆ )

for all

x ∈ H.

Then (x, y − yˆ ) = 0 for all x ∈ H ; taking x = y − yˆ gives y − yˆ 2 = 0. Finally, Lemma 12.3 shows that y H =  f  H ∗ . This gives a way to rephrase the result of Corollary 10.2 somewhat more elegantly, by letting f (u) = (u, v). (We will obtain a similar result later in a more general context as Theorem 21.2.) Corollary 12.5 Suppose that A is a non-empty closed convex subset of a real Hilbert space, and x ∈ / A. Then there exists f ∈ H ∗ such that f (a) + d 2 ≤ f (x)

for every a ∈ A,

where d = dist(x, A); see Figure 12.1. The Riesz Representation Theorem also allows us to form a geometric picture of the action of a linear functional on a Hilbert space. Given any f ∈ H ∗ , the proof of Theorem 12.4 shows that the value of f at any point in y ∈ H is determined by its projection (y, z) onto any normal z to the set Ker( f ).

Exercises

157

A

f (u) = α − d2 /2 a ˆ f (u) = f (x) = α

f (u) = α − d2

x

Figure 12.1 Separating x from A using a linear functional. In the figure, aˆ is the closest point to x in A; cf. Figure 10.2. f (x) = f (y) x

y

z

0

f (x) = 0

x

Figure 12.2 Illustration of the foliation of a Hilbert space by the sets f (x) = c, which are translated copies of Ker( f ). In the Riesz Representation Theorem the linear functional f is reconstructed as the inner product with an element z in the direction normal to Ker( f ) = {x : f (x) = 0}.

In this way the space H is ‘foliated’ by translated copies of Ker( f ), i.e. {x : f (x) = c, c ∈ K}; see Figure 12.2. The space Ker( f ) is an example of a hyperplane: a linear subspace that has ‘codimension 1’. We will make this more precise when we consider the situation in a more general setting in Chapter 21.

Exercises 12.1 Show that there are discontinuous linear functionals on any infinitedimensional normed space (X,  · ). 12.2 Show that if V is a finite-dimensional space, then dim V ∗ = dim V . (Given a basis {e j }nj=1 for V show that the set of linear functionals n with φi (e j ) = δi j form a basis for V ∗ .) (Kreyszig, 1978) {φi }i=1

158

Dual Spaces and the Riesz Representation Theorem

12.3 Suppose that H is a real Hilbert space and that B : H × H → R is such that (i) B(x, y) is linear in x and in y; (ii) |B(x, y)| ≤ cxy for some c > 0, for all x, y ∈ H ; (iii) |B(x, x)| ≥ bx2 for some b ∈ R, for all x ∈ H ; and (iv) B(x, y) = B(y, x) for every x, y ∈ H . Show that u ∈ H minimises 1 F(u) := B(u, u) − f (u), 2 where f ∈ H ∗ , if and only if B(u, v) = f (v)

for every v ∈ H.

[Hint: consider φ(t) := F(u + tv).] (Zeidler, 1995) 12.4 The following generalisation of the Riesz Representation Theorem, the Lax–Milgram Lemma, is very useful in the analysis of linear partial differential equations. Suppose that H and B are as in the previous exercise, but without the symmetry assumption (iv). Show that for every f ∈ H ∗ there exists a unique y ∈ H such that f (x) = B(x, y)

for every x ∈ H

(12.5)

as follows: (i) show that for each fixed y ∈ H the map x  → B(x, y) is a bounded linear functional on H , so that B(x, y) = (x, w) for some w ∈ H by the Riesz Representation Theorem; (ii) define A : H → H by setting Ay = w and show that A ∈ B(H ); (iii) given f ∈ H , use the Riesz Representation Theorem to find z ∈ H such that f (x) = (x, z) for every x ∈ H and rewrite (12.5) as (x, z) = (x, Ay), i.e. Ay = z. For any choice of ∈ R this equality holds if and only if y = y − (Ay − z). Use the Contraction Mapping Theorem applied to the map T : H → H defined by setting T y := y − (Ay − z) to show that if is sufficiently small T has a fixed point, which is the required solution y of our original equation. (To show that T is a contraction consider T y − T y  2 .)

13 The Hilbert Adjoint of a Linear Operator

We now use the Riesz Representation Theorem to define the adjoint of a linear operator T : H → K , where H and K are Hilbert spaces; this is a linear operator T ∗ : K → H such that (T x, y) K = (x, T ∗ y) H

x ∈ H, y ∈ K .

Properties of operators and their adjoints are closely related, and we will be able to develop a good spectral theory for operators T : H → H that are ‘self-adjoint’, i.e. for which T = T ∗ .

13.1 Existence of the Hilbert Adjoint We let H and K be Hilbert spaces with inner products (·, ·) H and (·, ·) K respectively; these induce corresponding norms  ·  H and  ·  K . Theorem 13.1 Let H and K be Hilbert spaces and T ∈ B(H, K ). Then there exists a unique operator T ∗ ∈ B(K , H ), which we call the (Hilbert) adjoint of T , such that (T x, y) K = (x, T ∗ y) H

(13.1)

for all x ∈ H , y ∈ K . Furthermore, T ∗∗ := (T ∗ )∗ = T and T ∗  B(K ,H ) = T  B(H,K ) . Proof Let y ∈ K and consider f : H → K defined by f (x) := (T x, y) K . Then clearly f is linear and | f (x)| = |(T x, y) K | ≤ T x K y K ≤ T  B(H,K ) x H y K . 159

160

The Hilbert Adjoint of a Linear Operator

It follows that f ∈ H ∗ , and so by the Riesz Representation Theorem there exists a unique z ∈ H such that (T x, y) K = (x, z) H

x ∈ H.

for all

We now define T ∗ : K → H by setting T ∗ y = z. By definition we have (T x, y) K = (x, T ∗ y) H

for all

x ∈ H, y ∈ K ,

i.e. (13.1). However, it remains to show that T ∗ ∈ B(K , H ). First, T ∗ is linear since for all α, β ∈ K, y1 , y2 ∈ Y , (x, T ∗ (αy1 + βy2 )) H = (T x, αy1 + βy2 ) K = α(T x, y1 ) K + β(T x, y2 ) K = α(x, T ∗ y1 ) H + β(x, T ∗ y2 ) H = (x, αT ∗ y1 + βT ∗ y2 ) H , i.e. T ∗ (αy1 + βy2 ) = αT ∗ y1 + βT ∗ y2 . To show that T ∗ is bounded, we can write T ∗ y2H = (T ∗ y, T ∗ y) H = (T T ∗ y, y) K ≤ T T ∗ y K y K ≤ T  B(H,K ) T ∗ y H y K . If T ∗ y H = 0, then we can divide both sides by T ∗ y H to obtain T ∗ y H ≤ T  B(H,K ) y K , while this final inequality is trivially true if T ∗ y H = 0. Thus T ∗ ∈ B(K , H ) with T ∗  B(K ,H ) ≤ T  B(H,K ) . We now show that T ∗∗ := (T ∗ )∗ = T , from which can obtain equality of the norms of T and T ∗ . Indeed, if we have T ∗∗ = T , then it follows that T  B(H,K ) = (T ∗ )∗  B(H,K ) ≤ T ∗  B(K ,H ) , which combined with T ∗  B(K ,H ) ≤ T  B(H,K ) shows that T ∗  B(K ,H ) = T  B(H,K ) . To prove that T ∗∗ = T , note that since T ∗ ∈ B(K , H ) it follows that (T ∗ )∗ ∈ B(H, K ), and by definition for all x ∈ K , y ∈ H we have (x, (T ∗ )∗ y) K = (T ∗ x, y) H = (y, T ∗ x) H

13.1 Existence of the Hilbert Adjoint

161

= (T y, x) K = (x, T y) K , i.e. (T ∗ )∗ y = T y for all y ∈ H , which is exactly (T ∗ )∗ = T . Finally, we show that the requirement that (13.1) holds defines T ∗ uniquely. Suppose that T ∗ , Tˆ : K → H are such that (x, T ∗ y) H = (x, Tˆ y) H

for all

x ∈ H, y ∈ K .

Then for each y ∈ K we have (x, (T ∗ − Tˆ )y) H = 0

for every x ∈ H ;

this shows that (T ∗ − Tˆ )y = 0 for each y ∈ K , i.e. that Tˆ = T ∗ . Before we give some examples we first prove some simple properties of the adjoint operation, and give an important definition. Lemma 13.2 Let H , K , and J be Hilbert spaces, R, S ∈ B(H, K ), and T ∈ B(K , J ); then (a) (α R + β S)∗ = α R ∗ + β S ∗ and (b) (T R)∗ = R ∗ T ∗ . Proof (a) For any x ∈ H , y ∈ K we have (x, (α R + β S)∗ y) H = ((α R + β S)x, y) K = α(Rx, y) K + β(Sx, y) K = α(x, R ∗ y) H + β(x, S ∗ y) H = (x, α R ∗ y + β S ∗ y) H = (x, (α R ∗ + β S ∗ )y) H ; the uniqueness argument from Theorem 13.1 now guarantees that (a) holds. (b) We have (x, (T R)∗ y) H = (T Rx, y) J = (Rx, T ∗ y) K = (x, R ∗ T ∗ y) H , and again we use the uniqueness argument from Theorem 13.1. The following definition should seem natural. Definition 13.3 If H is a Hilbert space and T ∈ B(H ), then T is self-adjoint if T = T ∗ . Equivalently T ∈ B(H ) is self-adjoint if and only if it is symmetric, i.e. (x, T y) = (T x, y)

for all

x, y ∈ H.

(13.2)

162

The Hilbert Adjoint of a Linear Operator

Note that this means that for operators T ∈ B(H ) we do not actually need the definition of the adjoint T ∗ in order to define what it means to be self-adjoint. (We will see later in Chapter 25 that self-adjointness of unbounded operators requires more than just symmetry.) Note that it is a consequence of part (b) of Lemma 13.2 that if T and R are both self-adjoint, then (T R)∗ = R ∗ T ∗ = RT , and so T R is self-adjoint if and only if T and R commute. In the next chapter we will introduce the notion of the spectrum of a linear operator. In Chapter 16 we will be able to give a full analysis of the spectrum of compact self-adjoint operators on Hilbert spaces; we will define what it means for an operator to be compact in Chapter 15.

13.2 Some Examples of the Hilbert Adjoint We now give three examples of operators and their adjoints, and (in some cases) conditions under which they are self-adjoint. Example 13.4 Let H = K = Kn with its standard inner product. Then any matrix A = (ai j ) ∈ Kn×n defines a linear map T A on Kn by mapping x to Ax, where n  ai j x j . (Ax)i = j=1

Then we have

⎛ ⎞ n n   ⎝ (T A x, y) = ai j x j ⎠ yi i=1

=

n  j=1

j=1

xj

n 

(ai j yi ) = (x, T A∗ y),

i=1 T

where A∗ is the Hermitian conjugate of A, i.e. A∗ = A . If K = R, then T A is self-adjoint if and only if A T = A, i.e. if A is symT metric. If K = C, then T A is self-adjoint if and only if A = A, i.e. if A is Hermitian. Example 13.5 Let H = K = 2 and consider the shift operators from Example 11.7. If we start with the right-shift operator sr x = (0, x1 , x2 , . . .) we have

13.2 Some Examples of the Hilbert Adjoint

163

(sr x, y) = x1 y2 + x2 y3 + x3 y4 + · · · = (x, sr∗ y); so sr∗ y = (y2 , y3 , y4 , . . .), i.e. sr∗ = sl . Similarly for the left shift sl x = (x2 , x3 , x4 , . . .) we have (sl x, y) = x2 y1 + x3 y2 + x4 y3 + · · · = (x, sl∗ y); so sl∗ y = (0, y1 , y2 , . . .), i.e. sl∗ = sr . These maps are not self-adjoint, but we do have sl∗∗ = sl and sr∗∗ = sr (as is guaranteed by Theorem 13.1). We will return to our next example again later. Lemma 13.6 For K ∈ C((a, b) × (a, b)) define T : L 2 (a, b) → L 2 (a, b) by setting ˆ b (T f )(x) := K (x, y) f (y) dy a

(see Example 11.10). Then ˆ

T g(x) =

b

K (y, x)g(y) dy,

(13.3)

a

and T is self-adjoint if K (x, y) = K (y, x). Proof For f, g ∈ L 2 (a, b) we have ˆ bˆ b K (x, y) f (y) dy g(x) dx (T f, g) H = a

ˆ =

a

b

a

ˆ =

a

ˆ

b

K (x, y) f (y)g(x) dy dx

a b

f (y)

b

K (x, y)g(x) dx dy = ( f, T ∗ g) H ,

a

with T ∗ g defined as in (13.3). In order to justify the change in the order of integration in this calculation we can either appeal to Fubini’s Theorem (Theorem B.9) or, without recourse to measure theoretic results, use the fact that C([a, b]) is dense in L 2 : given f, g ∈ L 2 (a, b), find sequences ( f n ) and (gn ) in C([a, b]) such that f n → f and gn → g in L 2 . Then, with f n and gn replacing f and g, the above calculation is valid (using the result of Exercise 6.8, for example), yielding (T f n , gn ) = ( f n , T ∗ gn ).

164

The Hilbert Adjoint of a Linear Operator

Since f n → f , gn → g, and T and T ∗ are continuous from L 2 (a, b) into L 2 (a, b), we can take n → ∞ and deduce that (T f, g) = ( f, T ∗ g) for every f, g ∈ L 2 (a, b).

Exercises 13.1 Show that the adjoint of the operator T : L 2 (0, 1) → L 2 (0, 1) defined by setting ˆ x K (x, y) f (y) dy (T f )(x) := 0

is given by (T ∗ g)(x) =

ˆ

1

K (y, x)g(y) dy.

x

13.2 Show that if (Tn ) ∈ B(H ) is a sequence of self-adjoint operators such that Tn → T in B(H ), then T is also self-adjoint. 13.3 Show that if T ∈ B(H, K ), then Ker(T ) = (Range(T ∗ ))⊥ . (It then follows from the fact that T ∗∗ = T that Ker(T ∗ ) = (Range(T ))⊥ .) 13.4 Show that if T ∈ B(H, K ), then T ∗ T ∈ B(H, H ) with T ∗ T  B(H,H ) = T 2B(H,K ) . 13.5 Show that if T ∈ B(H, K ) is invertible, then T ∗ ∈ B(K , H ) is invertible with (T ∗ )−1 = (T −1 )∗ . (In particular, this shows that if T ∈ B(H ) is self-adjoint and invertible, then T −1 is also self-adjoint.)

14 The Spectrum of a Bounded Linear Operator

In the theory of linear operators on finite-dimensional spaces, the eigenvalues play a prominent role. In this case the eigenvalues form the entire ‘spectrum’ of the operator; but we will see that in the case of infinite-dimensional spaces the situation is somewhat more subtle. If T : X → X then λ is an eigenvalue of T if there exists a non-zero x ∈ X such that T x = λx. Since this implies that (T − λI )x = 0, for λ to be an eigenvalue it must be the case that T − λI is not invertible; otherwise multiplying on the left by (T − λI )−1 would show that x = 0. When discussing the spectral properties of operators it is convenient to treat Banach spaces over C, but this is no restriction, since we can always consider the ‘complexification’ of a Banach space over R; see Exercises 14.1, 14.2, and 16.9.

14.1 The Resolvent and Spectrum We have already remarked that for linear operators between infinitedimensional spaces there is a distinction to be made between ‘having an inverse’ and ‘being invertible’, with the latter requiring the inverse to be bounded (see Section 11.5). We incorporate the requirement of invertibility into the following definition of the resolvent set and its complement, the spectrum. Definition 14.1 Let X be a complex Banach space and T ∈ B(X ). The resolvent set of T , ρ(T ), is ρ(T ) = {λ ∈ C : T − λI is invertible}. 165

166

The Spectrum of a Bounded Linear Operator

The spectrum of T , σ (T ), is the complement of ρ(T ), σ (T ) = C \ ρ(T ) = {λ ∈ C : T − λI is not invertible}. In a finite-dimensional space the spectrum consists entirely of eigenvalues, since Lemma 11.15 guarantees that the resolvent set ρ(T ) is exactly {λ ∈ C : Ker(T − λI ) = {0}}, and hence its complement is the set where (T − λI )x = 0 for some non-zero x ∈ X , i.e. the eigenvalues. In an infinite-dimensional space the spectrum can be strictly larger than the set of eigenvalues, which we term the ‘point spectrum’: σp (T ) = {λ ∈ C : (T − λI )x = 0 for some non-zero x ∈ X }. If λ ∈ σp (T ), then λ is an eigenvalue of T , E λ := Ker(T − λI ) is the eigenspace corresponding to λ, and any non-zero x ∈ E λ is one of the corresponding eigenvectors (if x ∈ E λ , then T x = λx); the dimension of E λ is the multiplicity of λ. To begin with we prove two simple results about the eigenvalues (and corresponding eigenvectors) of any bounded linear operator. First, we observe that any λ ∈ σp (T ) satisfies |λ| ≤ T : if there exists x = 0 such that T x = λx, then |λ|x = λx = T x ≤ T x,

(14.1)

which shows that |λ| < T . We now show that eigenvectors corresponding to distinct eigenvalues are linearly independent. Lemma 14.2 Suppose that T ∈ B(X ) and that {λ j }nj=1 are distinct eigenvalues of T . Then any set {e j }nj=1 of corresponding eigenvectors (i.e. T e j = λ j e j ) is linearly independent. Proof We argue by induction. Suppose that {e1 , . . . , ek } are linearly independent and that k+1  α j e j = 0, {α j }kj=1 ∈ K. (14.2) j=1

By (i) applying T to both sides and (ii) multiplying both sides by λk+1 we obtain k+1 k+1   λjαjej = 0 = λk+1 α j e j . j=1

j=1

14.1 The Resolvent and Spectrum

167

It follows that k 

(λk+1 − λ j )α j e j = 0.

j=1

Since λ j = λk+1 and {e1 , . . . , ek } are linearly independent the preceding equation implies that α j = 0 for j = 1, . . . , k; then αk+1 = 0 from (14.2). It follows that {e1 , . . . , ek+1 } are linearly independent. We remarked above that any λ ∈ σp (T ) satisfies |λ| ≤ T . We now show, using Lemma 11.16, that the same bound holds for any λ ∈ σ (T ). Lemma 14.3 If T ∈ B(X ), then σ (T ) is a closed subset of σ (T ) ⊆ {λ ∈ C : |λ| ≤ T }.

(14.3)

Proof First we prove the inclusion in (14.3). To do this, note that for any λ = 0 we can write  1 T − λI = λ T−I , λ / σ (T ). But for |λ| > T  we have so if I − λ1 T is invertible, λ ∈ * * *1 * * T * I  < 1, *λ * and then Lemma 11.16 guarantees that I − λ1 T is invertible, i.e. λ ∈ ρ(T ), and the result follows. To show that the spectrum is closed we show that the resolvent set is open. If λ ∈ ρ(T ), then T −λI is invertible and Lemma 11.16 shows that (T −λI )−δ I is invertible provided that δT (T − λI )−1  < 1, i.e. T − (λ + δ)I is invertible for all δ with |δ| < (T − λI )−1 −1 , and so ρ(T ) is open. We now, following Rynne and Youngson (2008), consider the illustrative examples of the shift operators from Example 11.7. These allow us to show that the spectrum can be significantly larger than the point spectrum. Example 14.4 The right-shift operator sr on 2 from Example 11.7 has no eigenvalues.

168

The Spectrum of a Bounded Linear Operator

Proof Observe that sr x = λx implies that (0, x1 , x2 , . . .) = λ(x1 , x2 , x3 , . . .) and so λx1 = 0,

λx2 = x1 ,

λx3 = x2 , . . . .

If λ = 0, then this implies that x1 = 0, and then x2 = x3 = x4 = . . . = 0, and so λ is not an eigenvalue. If λ = 0, then we also obtain x = 0, and so there are no eigenvalues, i.e. σp (sr ) = ∅. Example 14.5 For the left-shift operator sl on 2 every λ ∈ C with |λ| < 1 is an eigenvalue. Proof Observe that λ ∈ C is an eigenvalue if sl x = λx, i.e. if (x2 , x3 , x4 . . .) = λ(x1 , x2 , x3 , . . .), i.e. if x2 = λx1 ,

x3 = λx2 ,

x4 = λx3 ,

···

Given λ = 0 this gives a candidate eigenvector x = (1, λ, λ2 , λ3 , . . .), which is an element of 2 (and so is an actual eigenvector) provided that ∞  n=1

|λ|2n =

1 < ∞, 1 − |λ|2

which is the case for any λ with |λ| < 1. It follows that {λ ∈ C : |λ| < 1} ⊆ σp (sl ). We showed in Example 13.5 that sr∗ = sl and sl∗ = sr . The following result about the spectrum of the Hilbert adjoint will therefore allow us to relate the spectra of these two operators. Lemma 14.6 If H is a Hilbert space and T ∈ B(H ), then σ (T ∗ ) = {λ : λ ∈ σ (T )}. Proof If λ ∈ / σ (T ), then T − λI has a bounded inverse, (T − λI )(T − λI )−1 = I = (T − λI )−1 (T − λI ).

14.2 The Spectral Mapping Theorem for Polynomials

169

Taking adjoints and using Lemma 13.2 we obtain [(T − λI )−1 ]∗ (T ∗ − λI ) = I = (T ∗ − λI )[(T − λI )−1 ]∗ , and so T ∗ − λI has a bounded inverse, i.e. λ ∈ / σ (T ∗ ). Starting instead with / σ (T ∗ ) ⇒ λ ∈ / σ (T ), which completes the proof. T ∗ we deduce that λ ∈ Example 14.7 Let sr be the right-shift operator on 2 . We saw above that sr has no eigenvalues (Example 14.4), but that for sr∗ = sl the interior of the unit disc is contained in the point spectrum (Example 14.5). It follows from Lemma 14.6 that {λ ∈ C : |λ| < 1} ⊆ σ (sr ) even though σp (sr ) = ∅. Combining the above argument with the fact that σ (T ) is a compact subset of {λ : |λ| ≤ T } (Lemma 14.3) allows us to determine the spectrum of these two shift operators. Example 14.8 The spectrum of sl and of sr (as operators on 2 ) are both equal to the unit disc in the complex plane: σ (sl ) = σ (sr ) = {λ ∈ C : |λ| ≤ 1}. Proof We showed earlier that for the shift operators sr and sl on 2 , σ (sl ) = σ (sr ) ⊇ {λ ∈ C : |λ| < 1}. Since the spectrum is closed and sr  = sl  = 1, it follows from Lemma 14.3 that σ (sl ) = σ (sr ) = {λ ∈ C : |λ| ≤ 1}.

14.2 The Spectral Mapping Theorem for Polynomials We end this chapter with a relatively simple version of the ‘spectral mapping theorem’, which in full generality guarantees that the spectrum of f (T ) consists of { f (λ) : λ ∈ σ (T )}. Here we prove a simpler result where we restrict to the case that f is a polynomial; this does not require the theory of operatorvalued complex functions used in the more general case (see Kreyszig, 1978, for example).

170

The Spectrum of a Bounded Linear Operator

If P(x) = the operator

n

k=0 ak x

k

is a polynomial and T ∈ L(X ), then we can consider P(T ) =

n 

ak T k ,

k=0

which is another linear operator from X into itself. Theorem 14.9 If T ∈ B(X ) and P is a polynomial, then σ (P(T )) = P(σ (T )) := {P(λ) : λ ∈ σ (T )}. Proof If P has degree n, then for each fixed λ ∈ C we can write λ − P(z) = a(β1 − z) · · · (βn − z), where a ∈ K and the β j are the roots of the polynomial λ − P(z). Note that the values of β j depend on the choice of λ, and that setting z = β j shows that λ = P(β j ) for any j = 1, . . . , n. It follows that λI − P(T ) = a(β1 I − T ) · · · (βn I − T ).

(14.4)

Note that all the factors on the right-hand side commute; this allows us to use Proposition 11.18 to deduce that λI − P(T ) is invertible if and only if β j I − T is invertible for every j = 1, . . . , n. In particular, this means that when λ ∈ σ (P(T )), i.e. λI − P(T ) is not invertible, it follows that β j I −T is not invertible for some j and so β j ∈ σ (T ). We already observed that λ = P(β j ), so σ (P(T )) ⊆ P(σ (T )). Now suppose that λ ∈ / σ (P(T )), in which case λI − P(T ) is invertible, with inverse S, say. For each i = 1, . . . , n we can write λI − P(T ) = (βi I − T )Q i (T ) where Q i is a polynomial in T of degree n − 1. Since all the factors on the right-hand side of (14.4) commute, βi I − T commutes with Q i (T ); moreover, since βi I −T commutes with λI − P(T ), it commutes with (λI − P(T ))−1 = S (see comment immediately before Proposition 11.18). Therefore we have I = S(λI − P(T )) = S(βi I − T )Q i (T ) = (βi I − T )S Q i (T ) = S Q i (T )(βi I − T ). It follows that S Q i (T ) (which is bounded) is the inverse of βi I − T , and so / σ (T ). βi ∈ By (14.4) the only possible choices of z such that λ = P(z) are the {β j }, and we have just shown that none of these are in σ (T ). It follows that λ = P(z)

Exercises

171

for any z ∈ σ (T ), i.e. λ ∈ / P(σ (T )). Thus λ ∈ P(σ (T )) ⇒ λ ∈ σ (P(T )), i.e. P(σ (T )) ⊆ σ (P(T )), and we obtain the required equality.

Exercises 14.1 Let X be a real Banach space. We define its complexification X C as the vector space X C := {(x, y) : x, y ∈ X }, equipped with operations of addition and multiplication by complex numbers defined via (x, y) + (x  , y  ) = (x + x  , y + y  ),

x, y, x  , y  ∈ H

and (a + ib)(x, y) = (ax − by, bx + ay)

a, b ∈ R, x, y ∈ H. (14.5)

It is natural to denote (x, y) by x + iy, but this is purely ‘notational’, since multiplication by i in the original space X has no meaning; then it is easy to see that (14.5) corresponds to the usual rule of multiplication for complex numbers. (Zeidler, 1995) When H is a Hilbert space, show that (x + iy, x  + iy  ) HC := (x, x  ) + i(y, x  ) − i(x, y  ) + (y, y  ) is an inner product on HC , and that this makes HC a Hilbert space. (Note that the induced norm on HC is (x, y)2HC = x2 + y2 .) 14.2 Let H be a real Hilbert space and HC its complexification. Given any T ∈ L(H ), define the complexification of T , TC ∈ L(HC ), by setting TC (x + iy) := T x + iT y

x, y ∈ H.

Show that (i) if T ∈ B(H ), then TC ∈ B(HC ) with TC  B(HC ) = T  B(H ) ; and (ii) any eigenvalue of T is an eigenvalue of TC , and any real eigenvalue of TC is an eigenvalue of T . 14.3 For any α ∈ ∞ (C) consider the operator Dα : 2 (C) → 2 (C) given by (x1 , x2 , x3 , · · · )  → (α1 x1 , α2 x2 , α3 x3 , . . .), i.e. (Dα x) j = α j x j . Show that (i) σp (Dα ) = {α j }∞ j=1 ; (ii) σ (Dα ) = σp (Dα ); and

172

The Spectrum of a Bounded Linear Operator

(iii) any compact subset of C is the spectrum of an operator of this form. (Giles, 2000) 14.4 If X is a complex Banach space and T ∈ B(X ), then the spectral radius of T , rσ (T ), is defined as rσ (T ) = sup{|λ| : λ ∈ σ (T )}. Show that rσ (T ) ≤ lim infn→∞ T n 1/n . 14.5 Let X = C([0, 1]). Use the result of the previous exercise and Exercise 11.7 to show that every λ = 0 is in the resolvent set of the operator T ∈ B(X ) defined by setting ˆ x f (s) ds. [T f ](x) = 0

Show that 0 ∈ σ (T ) but is not an eigenvalue of T . (Giles, 2000; Lax, 2002) 14.6 The Fourier transform of a function f ∈ L 1 is defined by ˆ ∞ 1 eikx f (x) dx, ( f˜)(k) = [F f ](k) := √ 2π −∞ and if f˜ ∈ L 1 , then we can recover f using the fact that f (−x) = [F f˜](x).

(14.6)

We can extend the definition of F to a map from L 2 (R) into itself using a density argument: we approximate f by smooth functions that decay rapidly at infinity and use the fact that  f  L 2 =  f˜ L 2 for such functions. In this way we also preserve the relationship [F2 f ](x) = f (−x) from (14.6). Show that if F is viewed as an operator from L 2 (R) into itself, then σ (F) ⊆ {±1, ±i}. (Lax, 2002)

15 Compact Linear Operators

If a linear operator is not only bounded but compact (we define this below), then we can obtain more information about its spectrum. We prove results for compact self-adjoint operators on Hilbert spaces in the next chapter, and for compact operators on Banach spaces in Chapter 24.

15.1 Compact Operators A linear operator is compact if it maps bounded sequences into sequences that have a convergent subsequence. Definition 15.1 Let X and Y be normed spaces. A linear operator T : X → Y is compact if for any bounded sequence (xn ) ∈ X , the sequence (T xn ) ∈ Y has a convergent subsequence (whose limit lies in Y ). Alternatively T is compact if T B X is a precompact subset of Y , i.e. if T B X is a compact subset of Y . The equivalence of these two definitions follows since we showed in Lemma 6.11 that a set A is precompact if and only if any sequence in A has a Cauchy subsequence. (See Exercise 15.1 for more details.) Note that a compact operator must be bounded, since otherwise there exists a sequence (xn ) ∈ X with xn  = 1 but T xn → ∞, and clearly (T xn ) cannot have a convergent subsequence. Example 15.2 Take T ∈ B(X, Y ) with finite-dimensional range. Then T is compact, since any bounded sequence in a finite-dimensional space has a convergent subsequence. Noting that if T, S : X → Y are both compact, then T + S is also compact, and that λT is compact for any λ ∈ K, we can define the space K (X, Y ) of all 173

174

Compact Linear Operators

compact linear operators from X into Y , and this is then a vector space. Our next result shows that this is a closed subspace of B(X, Y ), and so is complete (by Lemma 4.3). Theorem 15.3 Suppose that X is a normed space and Y is a Banach space. If (K n )∞ n=1 is a sequence of compact (linear) operators in K (X, Y ) that converges to some K ∈ B(X, Y ), i.e. K n − K  B(X,Y ) → 0

as

n → ∞,

then K ∈ K (X, Y ). In particular, K (X, Y ) is complete. Proof Let (xn )n be a bounded sequence in X with xn  ≤ M for all n. Then, since K 1 is compact, (K 1 (xn ))n has a convergent subsequence, (K 1 (xn 1, j )) j . Since (xn 1, j ) j is bounded, (K 2 (xn 1, j )) j has a convergent subsequence, (K 2 (xn 2, j )) j . Repeat this process to get a family of nested subsequences, (xn k, j ) j , with (K l (xn k, j )) j convergent for all l ≤ k. As in the proof of the Arzelà–Ascoli Theorem we now consider the diagonal sequence y j = xn j, j . Since (y j ) is a subsequence of (xn k,i )i for j ≥ k, it follows that K n (y j ) converges (as j → ∞) for every n. We now show that (K (y j ))∞ j=1 is Cauchy, and hence convergent, to complete the proof. Choose ε > 0, and use the triangle inequality to write K (yi ) − K (y j )Y ≤ K (yi ) − K n (yi )Y + K n (yi ) − K n (y j )Y + K n (y j ) − K (y j )Y . (15.1) Since (y j ) is bounded, with y j  ≤ M for all j, and K n → K , pick n large enough that ε ; K − K n  B(X,Y ) < 3M then K (y j ) − K n (y j )Y ≤ ε/3

for every j.

For such an n, the sequence (K n (y j ))∞ j=1 is Cauchy, and so there exists an N such that for i, j ≥ N we can guarantee that K n (yi ) − K n (y j )Y ≤ ε/3. So now from (15.1) K (yi ) − K (y j )Y ≤ ε

for all

i, j ≥ N ,

and (K (yn )) is a Cauchy sequence. Since Y is complete, it follows that (K (yn )) converges, and so K is compact.

15.2 Examples of Compact Operators

175

15.2 Examples of Compact Operators For the remainder of this chapter we concentrate on operators on Hilbert spaces. As a first example we use Theorem 15.3 to show that the integral operator from Example 11.10 is compact. Proposition 15.4 Suppose that K ∈ C([a, b] × [a, b]). Then the integral operator T : L 2 (a, b) → L 2 (a, b) given by ˆ b K (x, y)u(y) dy [T u](x) := a

is compact. For another proof using the Arzelà–Ascoli Theorem see Exercise 15.4. Proof First note that if K n (x, y) =

n 

f j (x)g j (y),

(15.2)

j=1

with f j , g j ∈ C([a, b]), then the integral operator Tn : L 2 → L 2 defined by setting 2 ˆ b n n 1ˆ b    (Tn u)(x) := f j (x)g j (y) u(y) dy = g j (y)u(y) dy f j (x) a

j=1

j=1

a

has finite-dimensional range, namely the linear span of { f j (x)}nj=1 , and so is compact. Now use the result of Exercise 6.6 to approximate K uniformly on the set [a, b] × [a, b] by a sequence of (K n ) of the form in (15.2). Then )ˆ b ) ) ) ) {K (x, y) − K n (x, y)} u(y) dy )) |T u(x) − Tn u(x)| = ) ˆ

a b

|K (x, y) − K n (x, y)||u(y)| dy

a

ˆ ≤ a

b

K −

K n 2∞

1/2 ˆ b

1/2

|u(y)| dy

a

= (b − a)1/2 K − K n ∞ u L 2 , which shows that T − Tn  B(H ) ≤ (b − a)1/2 K − K n ∞ .

2

176

Compact Linear Operators

It follows that T is the limit in B(H ) of compact operators, and so T is compact by Theorem 15.3. Another class of compact operators on Hilbert spaces are the so-called Hilbert–Schmidt operators. Definition 15.5 An operator T ∈ B(H ) is Hilbert–Schmidt if for some orthonormal basis {e j }∞ j=1 of H T 2HS :=

∞ 

T e j 2 < ∞.

j=1

This is a meaningful definition, since the quantity T HS is independent of ∞ the orthonormal basis we choose. To see this, suppose that {e j }∞ j=1 and { f j } j=1 are two orthonormal bases for H ; then ∞  j=1

T e j 2 =

∞ 

|(T e j , f k )|2 =

j,k=1

∞ 

|(e j , T ∗ f k )|2 =

j,k=1

∞ 

T ∗ f k 2 .

k=1

Applying the resulting equality with e j = f j shows that ∞ 

T f j  = 2

j=1

∞ 

T f k  =

k=1

2

∞ 

T e j 2 ,

j=1

and so T HS is indeed independent of the choice of basis. Exercise 15.5 shows that T  B(H ) ≤ T HS . Proposition 15.6 Any Hilbert–Schmidt operator T is compact. Proof Choose some orthonormal basis {e j }∞ j=1 for H , and observe that since T is linear and continuous we can write ⎞ ⎛ ∞ ∞   (u, e j )T e j . T u = T ⎝ (u, e j )e j ⎠ = j=1

j=1

Now for each n let Tn : H → H be defined by setting Tn u :=

n  (u, e j )T e j . j=1

This operator is clearly linear, and its range is finite-dimensional since it is the linear span of {T e j }nj=1 . It follows that Tn is a compact operator for each n.

15.3 Two Results for Compact Operators

177

Now we have

* * *  * * ∞ * * (Tn − T )u = * (u, e j )T e j * * * j=n+1 * ≤

∞ 

|(u, e j )|T e j 

j=n+1

⎛ ≤⎝

⎞1/2 ⎛

∞ 

|(u, e j )|

2⎠

j=n+1

≤ u ⎝

⎝ ⎞1/2

∞ 

T e j 2 ⎠

∞ 

⎞1/2 2⎠

T e j 

j=n+1

,

j=n+1

which shows that

⎛ Tn − T  B(H ) ≤ ⎝

∞ 

⎞1/2 T e j 2 ⎠

.

j=n+1

2 Since T is Hilbert–Schmidt, ∞ j=1 T e j  < ∞, and so the right-hand side tends to zero as n → ∞. It follows from Theorem 15.3 that T is compact. In fact the operator T from Proposition 15.4 is Hilbert–Schmidt, which provides another proof that it is compact; see Exercise 15.6.

15.3 Two Results for Compact Operators We end with two results for compact operators: we show that if T is a compact operator on a Hilbert space, then T ∗ is also compact; and that the spectrum of a compact operator on an infinite-dimensional Banach space always contains zero. Lemma 15.7 If H is a Hilbert space and T ∈ K (H ), then T ∗ ∈ K (H ). Proof Since T is compact and T ∗ is bounded (Theorem 13.1), it follows (see Exercise 15.2) that T T ∗ is compact. So given any bounded sequence (xn ) ∈ H , T T ∗ xn has a convergent subsequence (which we relabel). Therefore |(T T ∗ (xn − xm ), xn − xm )| ≤ T T ∗ (xn − xm )xn − xm  → 0 as min(m, n) → ∞. But the left-hand side of this expression is |(T ∗ (xn − xm ), T ∗ (xn − xm ))| = T ∗ (xn − xm )2 ,

178

Compact Linear Operators

which shows that (T ∗ xn ) is Cauchy and thus convergent, showing that T ∗ is compact. We will investigate the spectrum of compact operators on Banach spaces in detail in Chapter 24, but for now we show that 0 ∈ σ (T ) if T acts on an infinite-dimensional Banach space. Theorem 15.8 Suppose that X is an infinite-dimensional Banach space and T ∈ K (X ). Then 0 ∈ σ (T ). Proof If 0 ∈ / σ (T ), then T is invertible and so T −1 is bounded. Since the composition of a compact operator and a bounded operator is compact (see Exercise 15.2), it follows that I = T T −1 is compact. But this implies that the unit ball in X is compact, so by Theorem 5.5 X is finite-dimensional, a contradiction. While this result shows that the spectrum of a compact operator on an infinite-dimensional space is always non-empty, such an operator can have no eigenvalues (see Exercise 15.9).

Exercises Show that T ∈ B(X, Y ) is compact if and only if T B X is a precompact subset of Y , i.e. any sequence in T B X has a convergent subsequence. 15.2 Suppose that T ∈ B(X, Y ) and S ∈ B(Y, Z ). Show that if either of S or T are compact, then S ◦ T is compact. 15.3 Show that the operator T : 2 → 2 given by

15.1

(x1 , x2 , x3 , x4 , . . .)  → (x1 ,

15.4

x2 x3 , ,···) 2 3

is compact. Suppose that K ∈ C([a, b] × [a, b]). Use the Arzelà–Ascoli Theorem (Theorem 6.13) to show that the operator T : C([a, b]) → C([a, b]) defined by ˆ b T u(x) = K (x, y)u(y) dy a

15.5

is compact. (Bollobás, 1990; Pryce, 1973) Show that if T is a Hilbert–Schmidt operator, then T  B(H ) ≤ T HS .

Exercises

15.6

Show that the operator from Proposition 15.4 is a Hilbert–Schmidt operator on L 2 (a, b) as follows. Take any orthonormal basis {e j }∞ j=1 for L 2 (a, b) and for each fixed x ∈ (a, b) consider the function κx ∈ C 0 (a, b) given by κx (y) = K (x, y). Then ˆ b ˆ b (T e j )(x) = K (x, y)e j (y) dy = κx (y)e j (y) dy = (κx , e j ). a

a

Use this to show that ∞  j=1

15.7

179

ˆ

T e j 2 =

b

ˆ

a

b

|K (x, y)|2 dx dy < ∞,

a

and hence that T is Hilbert–Schmidt. (Young, 1988) Suppose that {K i j }i,∞j=1 ∈ K with ∞ 

|K i j |2 < ∞.

i, j=1

Show that the operator S : 2 → 2 defined by setting (Sx)i =

∞ 

Ki j x j

i=1

is compact. (Show that S is Hilbert–Schmidt.) Let X be an infinite-dimensional normed space and T : X → Y a compact linear operator. Show that there exists (x n ) ∈ S X such that T xn → 0. Show by example that there need not exist x ∈ S X with T x = 0. (Giles, 2000) 15.9 Show that the operator T  : 2 → 2 defined by x2 x3 (x1 , x2 , x3 , · · · )  → (0, x1 , , , · · · ) 2 3

15.8

is compact and has no eigenvalues. (Note that T  = sr ◦ T , where T is the compact operator from Exercise 15.3 and sr is the right-shift operator from Example 11.7.) (Kreyszig, 1978) 15.10 Show that the operator T : 2 → 2 that maps e( j) to e( j+1) when j is odd and e( j) to zero when j is even, i.e. (x1 , x2 , x3 , · · · )  → (0, x1 , 0, x3 , 0, x5 , · · · ) is not compact. (Since T 2 = 0, this gives an example of a non-compact operator whose square is compact.)

16 The Hilbert–Schmidt Theorem

It is one of the major results of finite-dimensional linear algebra that all eigenvalues of real symmetric matrices are real and that the eigenvectors of distinct eigenvalues are orthogonal. In this chapter we prove similar results for compact self-adjoint operators on infinite-dimensional Hilbert spaces: we show that the spectrum consists entirely of real eigenvalues (except perhaps zero), that the multiplicity of every non-zero eigenvalue is finite, and that the eigenvectors form an orthonormal basis for H .

16.1 Eigenvalues of Self-Adjoint Operators If T is self-adjoint, then the numerical range of T , V (T ), is the set V (T ) := {(T x, x) : x ∈ H, x = 1}.

(16.1)

It is possible to deduce various facts about the spectrum of T from its numerical range (see Exercises 16.1, 16.3, and 16.4), but we will mainly use it for the following result. Theorem 16.1 Let H be a Hilbert space and T ∈ B(H ) a self-adjoint operator. Then V (T ) ⊂ R and T  B(H ) = sup{|λ| : λ ∈ V (T )}.

(16.2)

(In fact when H is a complex Hilbert space an operator T ∈ B(H ) is selfadjoint if and only if V (T ) ⊂ R; see Exercise 16.1.) Proof We have (T x, x) = (x, T x) = (T x, x), and so (T x, x) is real for every x ∈ H . 180

181

To prove (16.2) we let M = sup{|(T x, x)| : x ∈ H, x = 1}. Clearly |(T x, x)| ≤ T xx ≤ T x2 = T  when x = 1, and so M ≤ T . Now observe that for any u, v ∈ H we have (T (u + v), u + v) − (T (u − v), u − v) = 2[(T u, v) + (T v, u)] = 2[(T u, v) + (v, T u)] = 4 Re(T u, v), using the fact that (T v, u) = (v, T u) = (T u, v) since T is self-adjoint. Therefore 4 Re(T u, v) = (T (u + v), u + v) − (T (u − v), u − v) ≤ M(u + v2 + u − v2 ) = 2M(u2 + v2 ) using the Parallelogram Law (Lemma 8.7). If T u = 0 choose v=

u Tu T u

to obtain, since v = u, that 4uT u ≤ 4Mu2 , i.e. T u ≤ Mu if T u = 0. The same inequality is trivial if T u = 0, and so it follows that T  ≤ M and therefore we obtain T  = M, as required. Corollary 16.2 If T ∈ B(H ) is self-adjoint, then (i) all of its eigenvalues are real, and (ii) if T x1 = λ1 x1 and T x2 = λ2 x2 with λ1 = λ2 , then (x1 , x2 ) = 0. Proof If x = 0 and T x = λx, then (T x, x) = (λx, x) = λx2 and (T x, x) ∈ R by the previous theorem. If x1 or x2 is zero, then the result is immediate. Otherwise λ1 and λ2 are eigenvalues of T , and so by part (i) we must have λ1 , λ2 ∈ R; now simply note that λ1 (x1 , x2 ) = (T x1 , x2 ) = (x1 , T x2 ) = λ2 (x1 , x2 )

182

The Hilbert–Schmidt Theorem

and so (λ1 − λ2 )(x1 , x2 ) = 0, which implies that (x1 , x2 ) = 0 since λ1 = λ 2 .

16.2 Eigenvalues of Compact Self-Adjoint Operators We now show that any compact self-adjoint operator has at least one eigenvalue. Theorem 16.3 Let H be a Hilbert space and T ∈ B(H ) a compact selfadjoint operator. Then at least one of ±T  is an eigenvalue of T , and so in particular T  = max{|λ| : λ ∈ σp (T )}.

(16.3)

Proof We assume that T = 0, otherwise the result is trivial. From Theorem 16.1 we have T  = sup |(T x, x)|, x=1

so there exists a sequence (xn ) of unit vectors in H such that (T xn , xn ) → α,

(16.4)

where α is either T  or −T . Since T is compact, there is a subsequence xn j such that T xn j is convergent to some y ∈ H . Relabel xn j as xn again, so that T xn → y and (16.4) still holds. Now consider T xn − αxn 2 = T xn 2 + α 2 − 2α(T xn , xn ) ≤ 2α 2 − 2α(T xn , xn ); by our choice of xn , the right-hand side tends to zero as n → ∞. It follows, since T xn → y, that αxn → y, and since α = 0 is fixed we have xn → x := y/α; note that x = 1 since it is the limit of the xn and xn  = 1 for every n. Since T is bounded, it is continuous, so therefore T x = lim T xn = y = αx. n→∞

We have found x ∈ H with x = 1 such that T x = αx, so α ∈ σp (T ). We already showed that any eigenvalue λ must satisfy |λ| ≤ T  (see (14.1)) and so (16.3) follows.

16.2 Eigenvalues of Compact Self-Adjoint Operators

183

We now start to investigate the spectrum of self-adjoint compact operators. We show that the eigenvalues can only accumulate at 0, and that every eigenspace is finite-dimensional. Proposition 16.4 Let T be a compact self-adjoint operator on a separable Hilbert space H . Then σp (T ) is either finite or consists of a countable sequence (λn )∞ n=1 with λn → 0 as n → ∞. Furthermore, every distinct nonzero eigenvalue corresponds to only a finite number of linearly independent eigenvectors. Proof Suppose that T has infinitely many eigenvalues that do not form a sequence tending to zero. Then for some ε > 0 there exists a sequence (λn ) of distinct eigenvalues with |λn | > ε for every n. Let (xn ) be a corresponding sequence of eigenvectors (i.e. T xn = λn xn ) with xn  = 1; then T xn − T xm 2 = (T xn − T xm , T xn − T xm ) = (λn xn − λm xm , λn xn − λm xm ) = |λn |2 + |λm |2 ≥ 2ε2 since (xn , xm ) = 0 (as we are assuming that T is self-adjoint we can use Corollary 16.2). It follows that (T xn ) can have no convergent subsequence, which contradicts the compactness of T . Now suppose that for some eigenvalue λ there exist an infinite number of linearly independent eigenvectors {en }∞ n=1 . Using the Gram–Schmidt process from Proposition 9.9 we can find a countably infinite orthonormal set of eigenvectors {eˆ j }, since any linear combination of the {e j } is still an eigenvector: ⎛ ⎛ ⎞ ⎞ n n n    T⎝ αjej⎠ = αjTej = λ ⎝ αjej⎠ . j=1

j=1

j=1

So we have T eˆn − T eˆm  = λeˆn − λeˆm  = |λ|eˆn − eˆm  =

√ 2|λ|.

It follows that (T eˆn ) can have no convergent subsequence, again contradicting the compactness of T . (Note that this second part does not use the fact that T is self-adjoint.) Note that if T n rather than T is compact, then we can apply the above result to T n : the Spectral Mapping Theorem for polynomials (Theorem 14.9) tells us that σ (T n ) = [σ (T )]n , and so in this case too the point spectrum of T is either finite or consists of a countable sequence tending to zero.

184

The Hilbert–Schmidt Theorem

16.3 The Hilbert–Schmidt Theorem We now prove our main result about compact self-adjoint operators, the Hilbert–Schmidt Theorem. We show that any such operator can be expressed in terms of its eigenvalues and eigenvectors, and that when augmented by a basis for the kernel of T the eigenvectors form an orthonormal basis for H . We will need the following simple lemma. Lemma 16.5 If T ∈ B(H ) and Y is a closed linear subspace of H such that T Y ⊆ Y , then T ∗ Y ⊥ ⊆ Y ⊥ . In particular, if T ∈ B(H ) is self-adjoint and Y is a closed linear subspace of H , then TY ⊆ Y

T Y ⊥ ⊆ Y ⊥.

Proof Let x ∈ Y ⊥ and y ∈ Y . Then T y ∈ Y and so 0 = (T y, x) = (y, T ∗ x)

for all y ∈ Y,

i.e. T ∗ x ∈ Y ⊥ . In the proof of the Hilbert–Schmidt Theorem we find successive eigenvalues of T , λ1 , λ2 , . . . by using Theorem 16.3 repeatedly. Each time we find a new eigenvector w1 , w2 , . . ., and ‘remove’ these directions from H by considering Hn+1 = Span(w1 , . . . , wn )⊥ , which will be invariant under T due to Lemma 16.5. Theorem 16.6 (Hilbert–Schmidt Theorem). Let H be a Hilbert space and T ∈ B(H ) a compact self-adjoint operator. Then there exists a finite or countably infinite orthonormal sequence (w j ) consisting of eigenvectors of T , with corresponding non-zero real eigenvalues (λ j ), such that for all x ∈ H  Tx = λ j (x, w j )w j . (16.5) j

Proof By Theorem 16.3 there exists w1 ∈ H such that T w1 = ±T w1 and w1  = 1. Consider the subspace of H perpendicular to w1 , H2 = w1⊥ . Since H2 ⊂ H is closed, it is a Hilbert space (Lemma 8.11). Then since T is self-adjoint, Lemma 16.5 shows that T leaves H2 invariant. If we consider T2 = T | H2 , then we have T2 ∈ B(H2 , H2 ) with T2 compact; this operator is still self-adjoint, since for all x, y ∈ H2 (x, T2 y) = (x, T y) = (T x, y) = (T2 x, y).

16.3 The Hilbert–Schmidt Theorem

185

Now apply Theorem 16.3 to the operator T2 on the Hilbert space H2 find an eigenvalue λ2 = ±T2  and an eigenvector w2 ∈ H2 with w2  = 1. Now if we let H3 = {w1 , w2 }⊥ , then H3 is a closed subspace of H2 and T3 = T | H3 is compact and self-adjoint. We can once more apply Theorem 16.3 to find an eigenvalue λ3 = ±T3  and a corresponding eigenvector w3 ∈ H3 with w3  = 1. We continue this process as long as Tn = 0. If Tn = 0 for some n, then, for any given x ∈ H , if we set y := x −

n−1  (x, w j )w j ∈ Hn , j=1

we have 0 = Tn y = T y = T x −

n−1 

(x, w j )T w j = T x −

j=1

n−1 

λ j (x, w j )w j ,

j=1

which is (16.5). If Tn is never zero, then, given x ∈ H , consider n−1  yn := x − (x, w j )w j ∈ Hn j=1

(for n ≥ 2). We have x2 = yn 2 +

n−1 

|(x, w j )|2 ,

j=1

and so yn  ≤ x. It follows, since Tn = T | Hn , that * * * * n−1  * * *T x − λ j (x, w j )w j * * = T yn  ≤ Tn yn  = |λn |x, * * * j=1 and since |λn | → 0 as n → ∞ (Proposition 16.4) we obtain (16.5). There is a partial converse to this theorem; see Exercise 16.5. The orthonormal sequence constructed in this theorem is only a basis for the range of T ; however, we can augment this by a basis for the kernel of T to obtain the following result. Corollary 16.7 Let H be an infinite-dimensional separable Hilbert space and T ∈ B(H ) a compact self-adjoint operator. Then there exists a countable

186

The Hilbert–Schmidt Theorem

orthonormal basis E = {e j }∞ j=1 of H consisting of eigenvectors of T , and for any x ∈ H Tx =

∞ 

λ j (x, e j )e j ,

(16.6)

j=1

where T e j = λ j e j . In particular, if Ker(T ) = {0}, then H has an orthonormal basis consisting of (suitably normalised) eigenvectors of T corresponding to non-zero eigenvalues. Proof Theorem 16.6 gives a finite or countable sequence W = (wk ) of eigenvectors of T such that  λk (x, wk )wk . (16.7) Tx = k

Since Ker(T ) is a closed subspace of H , it is a Hilbert space in its own right (Lemma 8.11); since H is separable so is Ker(T ), and as Proposition 9.17 shows that every separable Hilbert space has a countable orthonormal basis, it follows that Ker(T ) has a finite or countable orthonormal basis F. Note that each f ∈ F is an eigenvector of T with eigenvalue zero, and since T f = 0 but T wk = λk wk with λk = 0, we know that ( f, wk ) = 0 for every f ∈ F, wk ∈ W . So F ∪ W is a countable orthonormal set in H . Now, (16.7) implies that ⎡ ⎤ ∞  T ⎣x − (x, w j )w j ⎦ = 0, j=1

i.e. that x − ∞ j=1 (x, w j )w j ∈ Ker T . It follows that F ∪ W is an orthonormal basis for H . Since it is countable, we can relabel it to yield E = {e j }∞ j=1 , and the expression in (16.6) follows directly from (16.7). Finally, we use this result to show that for compact self-adjoint operators, any non-zero element of the spectrum must be an eigenvalue. (In fact selfadjointness is not necessary: we will see that the same result holds for compact operators on a Banach space in Theorem 24.7.) Theorem 16.8 If T is a compact self-adjoint operator on a separable Hilbert space H , then σ (T ) = σp (T ). Every non-zero λ ∈ σ (T ) is an eigenvalue, and either σ (T ) = σp (T ) or σ (T ) = σ p (T ) ∪ {0}.

16.3 The Hilbert–Schmidt Theorem

187

Proof By Corollary 16.7 we have Tx =

∞ 

λ j (x, e j )e j

j=1

for some orthonormal basis {e j }∞ j=1 of H . Now take λ ∈ / σp (T ). For such λ, it follows that there exists a δ > 0 such that sup |λ − λ j | ≥ δ > 0,

(16.8)

j∈N

for otherwise λ ∈ σp (T ). We use this to show that T − λI is invertible, i.e. that λ∈ / σ (T ). Indeed, (T − λI )x = y

∞ ∞   (λ j − λ)(x, e j )e j = (y, e j )e j .

j=1

j=1

Taking the inner product of both sides with each ei in turn we have (T − λI )x = y

(λi − λ)(x, ei ) = (y, ei ) for every i ∈ N.

Since λ = λi for every i, we can solve (T − λI )x = y by setting (x, ei ) =

(y, ei ) λi − λ

for every i ∈ N.

Then, using (16.8), x2 =

∞ 

|(x, ei )|2 =

j=1

− λI )−1

∞ ∞  |(y, ei )|2 1  y2 ≤ |(y, ei )|2 = 2 . 2 2 |λi − λ| δ δ j=1

j=1

Therefore (T exists and is bounded: thus λ ∈ ρ(T ), which shows that λ∈ / σ (T ). / σ (T ), which implies that We have shown that if λ ∈ / σp (T ), then λ ∈ σ (T ) ⊆ σp (T ); since σp (T ) ⊆ σ (T ) and σ (T ) is closed (Lemma 14.3) we also have σp (T ) ⊆ σ (T ), so we can conclude that σ (T ) = σp (T ), as claimed. The proof of Proposition 16.4 shows that the only possible limit point of σp (T ) is 0. Since λ ∈ σ (T ) = σp (T ), either λ ∈ σp (T ) or λ = 0; in particular, every non-zero λ ∈ σ (T ) is an eigenvalue. If H is infinite-dimensional, then we always have 0 ∈ σ (T ) since T is compact (Theorem 15.8). It follows, since the only possible limit point of σp (T ) is 0, that either σ (T ) = σp (T ) (if 0 is an eigenvalue of T ) or σ (T ) = σp (T ) ∪ {0} (if 0 is not an eigenvalue of T ).

188

The Hilbert–Schmidt Theorem

Exercises 16.1 Suppose that H is a complex Hilbert space and T ∈ B(H ). Show that if (T x, x) ∈ R for every x ∈ H , then T is self-adjoint. [Hint: consider the two expressions (T (x + y), x + y) and (T (x + iy), x + iy).] 16.2 Show that if T ∈ B(H ) and (T x, y) = 0 for every x, y ∈ H then T = 0. Show that if H is a complex Hilbert space then (T x, x) = 0 for every x ∈ H implies that T = 0. (Kreyszig, 1978) 16.3 Suppose that H is a Hilbert space, T ∈ B(H ) is self-adjoint, and V (T ) ⊆ [0, β] for some β > 0, where V (T ) is the numerical range of T defined in (16.1). Show that T x2 ≤ β(T x, x)

x ∈ X.

[Hint: rewrite this inequality as ((β I − T )T x, x) ≥ 0.] (Pryce, 1973) 16.4 Using the result of the previous exercise and adapting the argument of Theorem 16.3 show that α := inf V (T ) and β := sup V (T ) are both eigenvalues of T . (Note that Theorem 16.1 shows that either α = −T  or β = T  (or perhaps both).) 16.5 Let H be a Hilbert space, (e j )∞ j=1 an orthonormal sequence in H , and define T ∈ L(H ) by setting T u :=

∞ 

λ j (u, e j )e j ,

j=1

where (λ j ) ∈ R. Show that every λ j is an eigenvalue of T and that there are no other eigenvalues. Show also that (i) T is bounded if and only if (λ j ) is bounded; (ii) if (λ j ) is bounded, then T is self-adjoint; (iii) T is compact if and only if λ j → 0 as j → ∞; (iv) T is Hilbert–Schmidt if and only if |λ j |2 < ∞. (Rynne and Youngson, 2008) 16.6 Suppose that {e j (x)} is an orthonormal set in L 2 ((a, b); R) and that K (x, y) =

∞ 

λ j e j (x)e j (y)

j=1

with

∞ 

|λ j |2 < ∞.

j=1

Show that the {e j (x)} are eigenvectors of ˆ b K (x, y)u(y) dy (T u)(x) = a

with corresponding eigenvalues λ j and that there are no other eigenvectors corresponding to non-zero eigenvalues.

Exercises

189

16.7 In the setting of the previous exercise, consider (i) (a, b) = (−π, π ) and K (t, s) = cos(t − s); show that the eigenvalues of the corresponding operator T are ± 2π (use Exercise 9.1); (ii) (a, b) = (−1, 1) and K (t, s) = 1 − 3(t − s)2 + 9t 2 s 2 ; show that the eigenvalues of the corresponding operator T are 4 and 8/5 (use Example 10.8). 16.8 If T is a compact self-adjoint operator on a Hilbert space H and (λ j )∞ j=1 , ordered so that λn+1 ≤ λn , are the eigenvalues of T , then λn+1 = min Vn

(T x, x) , 2 x =0 x

max

x∈Vn⊥ ,

(16.9)

where the minimum is taken over all n-dimensional subspaces Vn of H . To show this, prove that (i) if V is an n-dimensional subspace V , any (n + 1)-dimensional subspace W of H contains a vector orthogonal to V ; and (ii) if x ∈ Span(e1 , . . . , en ) where (e j ) are orthonormal eigenvectors corresponding the to (λ j ), then (T x, x) ≥ λn . x2 Deduce that (16.9) holds. (Lax, 2002) 16.9 Show that if T : H → H is compact, then TC : HC → HC is compact, where HC and TC are the complexifications of H and T from Exercises 14.1 and 14.2. Show similarly that if T is self-adjoint, then TC is selfadjoint.

17 Application: Sturm–Liouville Problems

In this chapter we consider the Sturm–Liouville eigenvalue problem −

d dx

p(x)

du dx

+ q(x)u = λu

with

u(a) = u(b) = 0.

(17.1)

Sturm–Liouville problems arise naturally in many situations via the technique of separation of variables. Throughout this chapter we will assume that p ∈ C 1 ([a, b]) with p(x) > 0 on [a, b] and q ∈ C([a, b]) with q(x) ≥ 0 on [a, b]. Any λ ∈ R for which there exists a non-zero function u ∈ C 2 ([a, b]) such that (17.1) holds is called an eigenvalue of this problem, and u is then the corresponding eigenfunction. As a shorthand, we write L[u] for the left-hand side of (17.1), i.e. L[u] = −( p(x)u  ) + q(x)u, and then λ is an eigenvalue if L[u] = λu for some non-zero u ∈ D, where D := {u ∈ C 2 ([a, b]) : u(a) = u(b) = 0}. Our aim in this chapter is to show that the eigenfunctions of the Sturm– Liouville problem form an orthonormal basis for L 2 (a, b), and to prove some other properties of the eigenvalues and eigenfunctions. Some of these we will be able to prove relatively easily, while others will require the theory developed in the previous chapters. We will sum up all our results in Theorem 17.8 at the end of the chapter. 190

17.1 Symmetry of L and the Wronskian

191

17.1 Symmetry of L and the Wronskian Let us note first that L has a useful symmetry property. We use (·, ·) for the inner product in L 2 (a, b). Lemma 17.1 If u, v ∈ D, then (L[u], v) = (u, L[v]). For all u ∈ D we have (L[u], u) ≥ 0 with (L[u], u) = 0 if and only if u = 0. Proof We integrate by parts and use the boundary conditions: ˆ b ˆ b ˆ     (L[u], v) = (−( pu ) + qu)v = − ( pu ) v + a

ˆ =

a

 b pu  v  − pu  v a +

b

ˆ

=−

b

a

ˆ =−

b

ˆ

a

ˆ

a

b

quv

a

b

quv a

 b ( pv  ) u + pv  u a + ( pv  ) u +

b

ˆ

(17.2)

b

quv a

quv = (u, L[v]).

a

Putting v = u it follows from (17.2) that we have ˆ b (L[u], u) = p(u  )2 + qu 2 , a

which is non-negative since p > 0 and q ≥ 0 on [a, b]. If (L[u], u) = 0, then, since p > 0 on [a, b], it follows that u  = 0 on [a, b], and so u must be constant on [a, b]. Since u(a) = 0, it follows that u ≡ 0. Using this result we can show that eigenvalues are positive and eigenfunctions with different eigenvalues are orthogonal. Corollary 17.2 If λ is an eigenvalue of (17.1), then λ > 0, and if u and v are eigenfunctions corresponding to distinct eigenvalues, then they are orthogonal in L 2 (a, b). Proof If L[u] = λu and u ∈ D is non-zero, then λu2 = λ(u, u) = (L[u], u) > 0. If λ1 , = λ2 and u 1 and u 2 are non-zero with L[u i ] = λi u i , i = 1, 2, then λ1 (u 1 , u 2 ) = (L[u 1 ], u 2 ) = (u 1 , L[u 2 ]) = λ2 (u 1 , u 2 ), which implies that (u 1 , u 2 ) = 0.

192

Application: Sturm–Liouville Problems

The following lemma will play an important role in our subsequent analysis. Lemma 17.3 Suppose that u 1 , u 2 ∈ C 2 ([a, b]) are two non-zero solutions of  du d p(x) + w(x)u = 0, − dx dx where p ∈ C 1 ([a, b]) with p(x) > 0 and w ∈ C([a, b]). Then the ‘Wronskian’ W p (u 1 , u 2 )(x) := p(x)[u 1 (x)u 2 (x) − u 2 (x)u 1 (x)] is constant, and W p is non-zero if and only if u 1 and u 2 are linearly independent. Note that we do not require w(x) ≥ 0 for this result. When w(x) = q(x) the functions u 1 and u 2 are not required to satisfy the boundary conditions from (17.1), so they are not eigenfunctions of L. Proof Differentiate W p with respect to x, then use that L[u 1 ] = L[u 2 ] = 0 to substitute for pu i = qu i − p  u i to give W p = p  u 1 u 2 + pu 1 u 2 + pu 1 u 2 − p  u 1 u 2 − pu 1 u 2 − pu 1 u 2 = p  (u 1 u 2 − u 2 u 1 ) + p(u 1 u 2 − u 2 u 1 ) = p  (u 1 u 2 − u 2 u 1 ) + u 2 (wu 1 − p  u 1 ) − u 1 (wu 2 − p  u 2 ) = 0. For the link between linear independence of u 1 , u 2 and W p , first note that since p > 0 and W p is constant, either u 1 (x)u 2 (x) − u 2 (x)u 1 (x) = 0 for every x ∈ [a, b] or u 1 (x)u 2 (x) − u 2 (x)u 1 (x) = 0 for every x ∈ [a, b]. Suppose that αu 1 (x) + βu 2 (x) = 0 for every x ∈ [a, b]. Then, differentiating this equality, we obtain αu 1 (x) + βu 2 (x) = 0 for every x ∈ [a, b], and we can combine these two equations as   α u 1 (x) u 2 (x) = 0. u 1 (x) u 2 (x) β If the determinant of this matrix is non-zero, i.e. if W p = 0, then it is invertible and so we must have α = β = 0, i.e. u 1 and u 2 are linearly independent. On the other hand, if the determinant is zero, then the matrix is not invertible; by Lemma 1.22 this means that its kernel is not trivial, so there exists a non-zero solution (α, β), which implies that u 1 and u 2 are linearly dependent. We can now show that all eigenvalues of (17.1) are simple.

17.2 The Green’s Function

193

Corollary 17.4 All eigenvalues λ of (17.1) are simple, i.e. the space E λ := {u ∈ D : L[u] = λu} has dimension one. Proof Suppose that L[u] = λu and L[v] = λv. Then both u and v are solutions of the problem −( pu  ) + (q − λ)u = 0,

u(a) = u(b) = 0.

It follows from Lemma 17.3 that W p (u, v) is constant on [a, b]; since W p (u, v)(a) = 0, W p ≡ 0 and so u and v are linearly dependent, i.e. E λ has dimension one.

17.2 The Green’s Function To go further we will need to use the theory of self-adjoint compact operators developed in the previous chapter. In order to do this, we first turn the differential equation (17.1) into an integral equation by finding the Green’s function for the problem: this is a function G such that the solution of L[u] = f can be written as ˆ b G(x, y) f (y) dy. u(x) = a

Theorem 17.5 Suppose that u 1 , u 2 ∈ D are linearly independent solutions of  dy d p(x) + q(x)y = 0, (17.3) − dx dx with u 1 (a) = 0 and u 2 (b) = 0; set W := W p (u 1 , u 2 ) and define

W −1 u 1 (x)u 2 (y) a ≤ x < y G(x, y) = W −1 u 2 (x)u 1 (y) y ≤ x ≤ b. Then for any f ∈ C([a, b]) the function u given by ˆ b u(x) = G(x, y) f (y) dy a

is an element of D and L[u] = f . Proof Writing (17.4) out in full we have ˆ ˆ u 2 (x) x u 1 (x) b u(x) = u 1 (y) f (y) dy + u 2 (y) f (y) dy. W W a x

(17.4)

194

Application: Sturm–Liouville Problems

Now, ˆ u 2 (x) x u 2 (x)u 1 (x) u 1 (x)u 2 (x) u (x) = u 1 (y) f (y) dy − f (x) + f (x) W W W a ˆ u  (x) b + 1 u 2 (y) f (y) dy W x ˆ ˆ u  (x) x u  (x) b = 2 u 1 (y) f (y) dy + 1 u 2 (y) f (y) dy, W W a x 

and then, since W = p[u 1 u 2 − u 2 u 1 ], ˆ u 2 (x)u 1 (x) u  (x)u 2 (x) u 2 (x) x u (x) = u 1 (y) f (y) dy − 1 f (x) + f (x) W W W a ˆ u  (x) b + 1 u 2 (y) f (y) dy W x ˆ ˆ u  (x) b f (x) u 2 (x) x =− u 1 (y) f (y) dy + 1 u 2 (y) f (y) dy. + p(x) W W a x 

These expressions show that u ∈ C 2 ([a, b]); since G(a, y) = G(b, y) = 0 for every y ∈ [a, b] it follows that u(a) = u(b) = 0, so u ∈ D. Since L[u] = − pu  − p  u  + qu and L is linear with L[u 1 ] = L[u 2 ] = 0 it follows that L[u] = f (x) as claimed. The existence of the functions u 1 and u 2 needed for this result can be guaranteed using standard results in the theory of ordinary differential equations (see e.g. Hartman, 1973). We can rewrite (17.3) as the coupled system u = v v = −

p q v + u, p p

and solve this as an initial value problem with u 1 (a) = 1, v1 (a) = 0 for u 1 and u 2 (b) = 0, v2 (b) = 1 for u 2 . As a simple illustrative example, consider the case L[u] = −u  on the interval [0, 1]. The general solution of −y  = 0 is y(x) = Ax + B, so in the setting of Theorem 17.5 we can take u 1 (x) = x and u 2 (x) = x − 1, for which W = u 1 u 2 − u 2 u 1 = −1. The Green’s function for the equation −u  = f is therefore

x(1 − y) 0 ≤ x < y G(x, y) = (1 − x)y y ≤ x ≤ 1.

17.3 Eigenvalues of the Sturm–Liouville Problem

195

17.3 Eigenvalues of the Sturm–Liouville Problem The result of Theorem 17.5 shows that we can define a linear operator T : C([a, b]) → D by setting ˆ b G(x, y) f (y) dy. [T f ](x) = a

However, in order to apply the theory from the previous chapter we need to have an operator that is defined on an appropriate Hilbert space, in this case L 2 (a, b). We therefore want to consider an operator of the same form, but defined on this larger space. We will need the fact that the space D is dense in L 2 (a, b). Lemma 17.6 The space D := { f ∈ C 2 ([a, b]) : f (a) = f (b) = 0} is dense in L 2 (a, b) for any 1 ≤ p < ∞. Proof We know from Lemma 7.7 that P([a, b]) is dense in L 2 (a, b); since P([a, b]) ⊂ C 2 ([a, b]) it follows that C 2 ([a, b]) is dense in L 2 (a, b). To show that D is dense we define a family of ‘cutoff functions’ ψδ ∈ C 2 ([a, b]) by setting ⎧ ⎪ ⎪ ⎨φ((x − a)/δ) a ≤ x < a + δ ψδ (x) = 1 a+δ ≤ x ≤b−δ ⎪ ⎪ ⎩φ((b − x)/δ) b − δ < x ≤ b, where φ(x) := 3x − 3x 2 + x 3 is such that φ(0) = 0, φ(1) = 1, and φ  (1) = φ  (1) = 0; see Figure 17.1. For any δ > 0 and any g ∈ C 2 ([a, b]) we now have ψδ (x)g(x) ∈ D, and ψδ g − g L 2 → 0 as δ → 0. Given any ε > 0 we can first approximate f ∈ L 2 (a, b) by g ∈ C 2 ([a, b]), and then choose δ small enough that ψδ g − f  L 2 ≤ ψδ g − g L 2 + g − f  L 2 < ε. Theorem 17.7 The operator T : L 2 (a, b) → L 2 (a, b) defined by setting ˆ b T f (x) := G(x, y) f (y) dy, a

where G is as in Theorem 17.5, is compact and self-adjoint and has trivial kernel, i.e. Ker(T) = {0}. The eigenvalues of T form a countable sequence tending

196

Application: Sturm–Liouville Problems

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 17.1 The cutoff function ψδ (here shown for a = 0, b = 1, and δ = 0.2) is used to turn a function g ∈ C 2 ([0, 1]) into an element ψδ g that is zero at x = 0 and x = 1.

to zero, all the eigenvectors of T are elements of D, and these eigenvectors (suitably normalised) form an orthonormal basis for L 2 (a, b). Proof Since G(x, y) is symmetric and bounded, it follows from Example 11.10 and Lemma 13.6 that T is a bounded self-adjoint operator from L 2 (a, b) into itself; Proposition 15.4 shows that T is compact. That the eigenvalues form a countable sequence tending to zero follows from Proposition 16.4. To show that T f = 0 implies that f = 0, we note first that D is contained in the range of T. Indeed, given any u ∈ D we can define g = L[u] ∈ C([a, b]), and then by Theorem 17.5 we have Tg = u. If T f = 0, then for any g ∈ D we have g = Tφ for some φ ∈ L 2 (a, b), so we have 0 = (T f, φ) = ( f, Tφ) = ( f, g); since D is dense in L 2 (a, b) it follows from Exercise 9.9 that f = 0. So Ker(T) = {0}; in particular, T has no non-zero eigenvalues. As a first step towards showing that the eigenvectors of T are elements of D, we show that T f ∈ C 0 ([a, b]) for any f ∈ L 2 (a, b). Since G is continuous on the compact set [a, b] × [a, b], it is uniformly continuous, so given any ε > 0 there exists δ > 0 such that |G(x, y) − G(x  , y)| < ε

whenever

|x − x  | < δ, for all y ∈ [a, b].

Therefore, if we take x, x  ∈ [a, b] with |x − x  | < δ we obtain

17.3 Eigenvalues of the Sturm–Liouville Problem

|T f (x) − T f (x  )| =

ˆ

b

197

[G(x, y) − G(x  , y)] f (y) dy

a

ˆ

b

|G(x, y) − G(x  , y)| | f (y)| dy

a

ˆ

≤ε

b

ˆ | f (y)| dy ≤ ε

a

1/2 ˆ b

b

1 dy a

1/2

| f (y)|2 dy

a

≤ ε(b − a)1/2  f  L 2 , so T f ∈ C([a, b]). Now observe that if f ∈ L 2 (a, b) with T f = μf , then μ = 0, so we can write 1 (17.5) f = T f. μ We have just shown that if f ∈ L 2 , then T f ∈ C([a, b]), so it follows immediately from (17.5) that f ∈ C([a, b]). But T is the same as T whenever f ∈ C([a, b]), and we know from Theorem 17.5 that when f ∈ C([a, b]) we have T f ∈ D; using (17.5) again we conclude that f ∈ D as required. (This is an example of the well-established method of ‘bootstrapping’ to prove regularity of solutions of differential equations.) Since Ker(T) = {0}, it follows from Corollary 16.7 that the eigenvectors of T form an orthonormal basis for H . We now combine all the results in this chapter. Theorem 17.8 The eigenfunctions of the Sturm–Liouville problem form an orthonormal basis for L 2 (a, b). Furthermore, (i) every eigenvalue is strictly positive; (ii) each eigenvalue is simple (i.e. has multiplicity one); and (iii) the eigenvalues can be ordered to form a countable sequence that tends to infinity. Proof We have already proved (i) in Corollary 17.2 and (ii) in Corollary 17.4. Property (iii) and the fact that the eigenfunctions form a basis derive from the observation that if u ∈ D with L[u] = λu, then u = T (λu) and vice versa. Since zero is not an eigenvalue of either L or T, it follows that L[u] = λu

Tu =

1 u, λ

(17.6)

198

Application: Sturm–Liouville Problems

i.e. the eigenfunctions of L are precisely the eigenvectors of T, with the eigenvalues following the reciprocal relationship in (17.6). Given this, Theorem 17.7 concludes the proof. This gives us another proof that we can expand functions in L 2 (0, 1) using a Fourier sine series (see Corollary 6.6). Indeed, if we consider d2 u = λu u(0) = 0, u(1) = 0, (17.7) dx 2 which is (17.1) with p = 1, q = 0, it follows that the eigenfunctions of this equation will form a basis for L 2 (0, 1). To find the eigenvalues and eigenfunctions for this problem, observe that the of three kinds, depending on λ. If λ > 0, solutions of −u √= λu are potentially √ then u(x) = Ae λx + Be− λx . To satisfy the boundary conditions in (17.7) we would need −

A + B = Ae

λ

+ Be−

λ

= 0,

which implies that A = B = 0 so no λ < 0 can be an eigenvalue. Similarly if λ = 0, then the general solution of −u  = 0 is u(x) = Ax + B, and the boundary conditions require B = A + B = 0 which again yields A = B = 0 and λ = 0 is not an eigenvalue. If λ < 0, then the general solution is √ √ u(x) = A cos λx + B sin λx, √ and the boundary conditions require A√= 0 and B sin λ = 0; to ensure that B = 0 we have to take λ such that sin λ = 0, i.e. λ = (kπ )2 . The eigenvalues are therefore λk = (kπ )2 , with corresponding eigenfunctions sin kπ x. It follows that the set of normalised eigenfunctions .∞ 1 √ sin kπ x 2 k=1 form an orthonormal basis for L 2 (0, 1), so any f ∈ L 2 (0, 1) can be expanded in the form ∞  αk sin kπ x, f (x) = k=1

with the sum converging in

L 2 (a, b).

PART IV Banach Spaces

18 Dual Spaces of Banach Spaces

In Chapter 12 we introduced the dual space X ∗ of a normed linear space X (over a field K), the space B(X, K). Recall that since K is complete, Theorem 11.11 guarantees that when X is a normed space its dual space is always a Banach space with norm φ X ∗ = sup |φ(x)|.

(18.1)

x≤1

We remarked in Chapter 12 that knowledge of the action of elements of the dual space X ∗ can tell us a lot about the space X . We give a simple example. Example 18.1 Take X = C([a, b]), and for any choice of p ∈ [a, b] consider δ p : X → R defined by setting by δ p (φ) = φ( p)

for all

φ ∈ X.

Then |δ p (φ)| = |φ( p)| ≤ φ∞ , so that δ p ∈ X ∗ with δ p  X ∗ ≤ 1. Choosing a function φ ∈ C([a, b]) such that |φ( p)| = φ∞ shows that in fact δ p  X ∗ = 1. Observe here that if we knew the value of f (φ) for every f ∈ X ∗ , then we would know φ. In this first chapter of Part IV we return to the more general theory of dual spaces, which will play a major role in our study of Banach spaces. We identify the dual spaces of the  p sequence spaces, and discuss briefly the corresponding results for the spaces L p of Lebesgue integrable functions. We showed in Chapter 12 that in a Hilbert space H , given any y ∈ H we can define an element f ∈ H ∗ by setting f (x) = (x, y), 201

(18.2)

202

Dual Spaces of Banach Spaces

and that the resulting map R : H → H ∗ (‘the Riesz map’) defined by y → (·, y) is a bijective isometry, which is linear if K = R and conjugate-linear if K = C. In particular, given any f ∈ H ∗ there exists some y ∈ H such that (18.2) holds and y H =  f  H ∗ . When H = 2 (R) this shows that 2 ≡ (2 )∗ via the map x → f x , where f x ( y) = ( y, x) =

∞ 

x j yj.

j=1

We now use the same map to investigate the dual spaces of  p , 1 ≤ p < ∞, and of c0 . First we prove two very useful inequalities.

18.1 The Young and Hölder Inequalities We say that two indices 1 ≤ p, q ≤ ∞ are conjugate if 1 1 + = 1, p q allowing p = 1, q = ∞ and p = ∞, q = 1. The following simple inequality is fundamental. Lemma 18.2 (Young’s inequality) Let a, b > 0 and let ( p, q) be conjugate indices with 1 < p, q < ∞. Then ab ≤

ap bq + . p q

Proof The function x → ex is convex (see Exercise 3.4) and so  1 1 log a p + log bq ab = exp(log a + log b) = exp p q bq 1 ap 1 p q + . ≤ elog(a ) + elog(b ) = p q p q Using Young’s inequality we can prove Hölder’s inequality in  p . Lemma 18.3 (Hölder’s inequality in  p spaces) If x ∈  p and y ∈ q with ( p, q) conjugate, 1 ≤ p, q ≤ ∞, then ∞  j=1

|x j y j | ≤ x p  yq .

(18.3)

18.1 The Young and Hölder Inequalities

203

Proof For 1 < p < ∞, consider n n   |x j | |y j | 1 |x j | p 1 |y j |q ≤ ≤ 1. p + x p  yq p x p q  yqq j=1 j=1

So for each n ∈ N n 

|x j y j | ≤ x p  yq

j=1

and (18.3) follows on letting n → ∞. For p = 1, q = ∞, ⎛ ⎞ n n   |x j y j | ≤ max |y j | ⎝ |x j |⎠ ≤ x1  y∞ . j=1

j=1,...,n

j=1

The Hölder inequality in L p is proved in a similar way. Theorem 18.4 (Hölder’s inequality in L p spaces) Suppose that f ∈ L p () and g ∈ L q (), with ( p, q) conjugate, 1 ≤ p, q ≤ ∞; then f g ∈ L 1 () with  f g L 1 ≤  f  L p g L q .

(18.4)

Proof If 1 < p < ∞, then we use Young’s inequality (ab ≤ a p / p + bq /q from Lemma 18.2) to give ˆ ˆ | f g| |f| |g| =  f  L p g L q   f  L p g L q ˆ 1 | f |p 1 |g|q ≤ p + q gqL q  p  f L p ˆ ˆ 1 1 p ≤ | f | + |g|q p q p f  L p  qg L q  = 1, from which (18.4) follows. If p = 1, q = ∞, then ˆ ˆ | f (x)g(x)| dx ≤ | f (x)|g L ∞ dx =  f  L 1 g L ∞ , 



and similarly if p = ∞ and q = 1. Hölder’s inequality allows for an alternative proof of Minkowski’s inequality (the triangle inequality in  p and L p ) that we proved using a convexity argument in Lemma 3.9; see Exercise 18.2.

204

Dual Spaces of Banach Spaces

18.2 The Dual Spaces of  p We can now identify (up to isometric isomorphism) the dual space of  p for 1 < p < ∞. Theorem 18.5 For 1 < p < ∞ we have ( p ) ≡ (q )∗ , with ( p, q) conjugate, via the mapping x → L x , where L x ( y) =

∞ 

x j yj.

(18.5)

j=1

We denote this mapping x → L x by Tq :  p → (q )∗ . (The case p = q = 2 here follows from the Riesz Representation Theorem when H is real.) Proof Given x ∈  p define L x as in (18.5) above; this is clearly a linear map. Then from Hölder’s inequality ) ) ) )  ∞ )∞ ) |L x ( y)| = )) x j y j )) ≤ |x j y j | ≤ x p  yq , (18.6) ) j=1 ) j=1 so we do indeed have L x ∈ (q )∗ with L x (q )∗ ≤ x p .

(18.7)

To show that we have L x (q )∗ = x p , consider the element y ∈ q given by

|x j | p /x j x j = 0 yj = 0 x j = 0; this is in q since ∞ ∞ ∞    q p |y j |q = |x j |q( p−1) = |x j | p = x p < ∞,  yq = j=1

j=1

j=1

where we have used the fact that q( p − 1) = p. We can rewrite this equality as p/q

p−1

 yq = x p = x p , and so therefore ) ) ) )  ∞ )∞ ) p p−1 ) ) x j yj) = |x j | p = x p = x p x p =  yq x p . |L x ( y)| = ) ) j=1 ) j=1 This shows that L x (q )∗ ≥ x p and hence (combining this with the bound in (18.7)) that x → L x (the map Tq ) is a linear isometry; it follows that

18.2 The Dual Spaces of  p

205

Tq is injective. So we only need to show that it is surjective to show that it is an isomorphism: we need to show that any f ∈ (q )∗ can be written as L x for some x ∈  p . If we do in fact have f = L x for some x ∈  p , then for each e( j) (recall that ( j) ei = δi j ) we must have f (e( j) ) = L x (e( j) ) = x j , which tells us what x must be. Let x be the sequence with x j = f (e( j) ); if we can show that x ∈  p , then we will have finished the proof. Indeed, for any ( j) and then, using the continuity of f , y ∈ q we can write y = ∞ j=1 y j e ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ ∞ n n    f ( y) = f ⎝ y j e( j) ⎠ = f ⎝ lim y j e( j) ⎠ = lim f ⎝ y j e( j) ⎠ j=1

n→∞

= lim

n 

n→∞

n→∞

j=1

y j f (e( j) ) = lim

n 

n→∞

j=1

j=1

y j x j = L x ( y),

j=1

as required. So we need only show that x defined by setting x j = f (e( j) ) as above is an element of  p . To do this we consider the sequence φ (n) of elements of q defined by

|x j | p /x j j ≤ n and x j = 0 (n) φj = 0 j > n or x j = 0; then ⎛ ⎞ n n n n     (n) (n) ( j) ⎠ ( j) φ (n) e φ f (e ) = φ x = |x j | p , = f (φ (n) ) = f ⎝ j j j j j=1

j=1

j=1

j=1

and so n  j=1

|x j | p = | f (φ (n) )| ≤  f (q )∗ φ (n) q ⎛ =  f (q )∗ ⎝ ⎛ =  f (q )∗ ⎝

n  j=1 n  j=1

⎞1/q |x j |q( p−1) ⎠ ⎞1/q |x j | p ⎠

,

206

Dual Spaces of Banach Spaces

which shows that ⎛ ⎞1/ p n  ⎝ |x j | p ⎠ ≤  f (q )∗

for every n ∈ N

j=1

and so x ∈  p as required. We have a similar result for one of the endpoint values (∞ ) but not for the other (1 ). Theorem 18.6 We have ∞ ≡ (1 )∗ via the mapping (18.5), which we denote by T1 : ∞ → (1 )∗ . Proof Given x ∈ ∞ , Hölder’s inequality as in (18.6) shows that L x defined as in (18.5) is an element of (1 )∗ with L x (1 )∗ ≤ x∞ . The equality of norms follows by choosing, for each ε > 0, a j ∈ N such that |x j | > x∞ − ε, and then considering y ∈ 1 with

xi /|xi | xi = 0 yi = 0 xi = 0. Then |L x ( y)| = |x j | ≥ (x∞ − ε)  y1 , since  y1 = 1. Since this is valid for any ε > 0, it follows that L x (1 )∗ = x∞ . To show that the map x → L x is onto we use the same argument as before and, given f ∈ (1 )∗ , consider x defined by setting x j = f (e( j) ). It is easy to see that this is an element of ∞ , since e( j) ∈ 1 and |x j | = | f (e( j) )| ≤  f (1 )∗ e( j) 1 =  f (1 )∗ . The arguments used above do not work for ∞ ; primarily because the elements (e( j) ) do not form a basis (see Example 9.2). We will prove later that this is not just a failure in our method of proof, and that (∞ )∗  1 . However, the dual of c0 is isometrically isomorphic to 1 ; see Exercise 18.4 for a proof. Theorem 18.7 (c0 )∗ ≡ 1 .

18.3 Dual Spaces of L p ()

207

18.3 Dual Spaces of L p () Using Hölder’s inequality it is straightforward to show that any g ∈ L q () gives rise to an element g ∈ (L p ())∗ by setting ˆ

g ( f ) := f g dx, f ∈ L p (). (18.8) 

That this mapping is surjective requires more advanced results from measure theory, so we only give a very brief sketch of this part of the following result; the full proof can be found in Appendix B. (See Exercise 18.6 for a proof of this result in the range 1 < p < 2 that uses the Riesz Representation Theorem and the Monotone Convergence Theorem; see also Exercise 26.5 for an alternative proof that uses uniform convexity.) Theorem 18.8 For 1 ≤ p < ∞ the space L q () is isometrically isomorphic to (L p ())∗ , where ( p, q) are conjugate, via the mapping g → g defined in (18.8). Proof If we take g ∈ L q and define g as in (18.8), then this map is linear in f since the integral is linear and, using Hölder’s inequality, )ˆ ) ) ) ) | g ( f )| = ) f g )) ≤  f g L 1 ≤  f  L p g L q 

so that  g (L p )∗ ≤ g L q . Furthermore, if we choose

g(x)|g(x)|q−2 g(x) = 0 f (x) = 0 g(x) = 0 then, since | f (x)| p ≤ |g(x)| p(q−1) = |g(x)|q , it follows that f ∈ L p with ˆ ˆ p q  f  L p = | f (x)| p = |g(x)|q = g L q and

ˆ | g ( f )| =



q

|g|q = g L q .

Therefore q

q/ p

g L q = | g ( f )| ≤  g (L p )∗  f  L p =  g (L p )∗ g L q ,

208

Dual Spaces of Banach Spaces

which, since q − q/ p = q(1 − 1/ p) = 1, yields  g (L p )∗ ≥ g L q , from which it follows that  g (L p )∗ = g L q . That any element of (L p )∗ can be obtained in this way requires some more powerful results from measure theory. For any A ⊆  let

1 x∈A χ A (x) = 0 x∈ / A denote the characteristic function of A. Then, for any ∈ (L q )∗ , the map A → (χ A ), defined on the measurable sets, determines a signed measure on ; the Radon– Nikodym Theorem then implies that there is a g ∈ L 1 () such that ˆ

(χ A ) = g dx, A

and one can then verify that g ∈ and that g ( f ) = ( f ) for every f ∈ L p (). For details see Theorem B.16 in Appendix B. L q ()

Note that coupled with (18.1) the result of the previous theorem gives one way (which is sometimes useful) to find the L q norm of f for any 1 ≤ q < ∞: )ˆ ) ) ) ) sup ) f (x)g(x) dx )) ,  f  L q () = g L p () =1



where p and q are conjugate. (In a space of real functions we do not need the modulus signs around the integral.)

Exercises 18.1 Prove Young’s inequality by minimising the function f (t) =

1 tp + −t p q

and then setting t = ab−q/ p . 18.2 One of the standard ways of proving the Minkowski inequality for the  p norm on Kn (and similarly for the L p norm on L p ()) starts by writing

Exercises n 

|x j + y j | p =

j=1

n 

209

|x j + y j | p−1 |x j + y j |

j=1

n 

|x j + y j | p−1 |x j | +

j=1

n 

|x j + y j | p−1 |y j |;

j=1

use Hölder’s inequality ´ to complete the proof. 18.3 Suppose that || =  1 dx < ∞. Use Hölder’s inequality to show that for 1 ≤ p ≤ q ≤ ∞ we have L q () ⊂ L p () with  f  L p ≤ ||(q− p)/ pq  f  L q .

(18.9)

Show, however, that L q (Rn ) ⊂ L p (Rn ) for p, q in the same range. 18.4 By following the proof of Theorem 18.5 show that (c0 )∗ ≡ 1 . 18.5 Show that C([−1, 1])∗ is not separable.1 (Find an uncountable collection of functions in C([−1, 1])∗ that are a distance 2 apart in the C([−1, 1])∗ norm.) (Lax, 2002) 18.6 For another proof that (L p )∗ ≡ L q for 1 < p < 2, whose only measuretheoretic ingredient is the Monotone Convergence ´ Theorem, show that (i)  f  L p ≤ ||1/ p−1/2  f  L 2 , where || =  1 dx; (ii) if  ∈ (L p )∗ , then  ∈ (L 2 )∗ . The Riesz Representation Theorem therefore guarantees that there exists g ∈ L 2 such that ( f ) = ( f, g)

for every f ∈ L 2 .

(iii) By considering the functions

|gk (x)|q−2 gk (x) f k (x) = 0 where

(18.10)

gk (x) = 0 gk (x) = 0,

⎧ ⎪ g(x) < −k ⎪ ⎨−k gk (x) = g(x) |g(x)| ≤ k ⎪ ⎪ ⎩k g(x) > k,

use the Monotone Convergence Theorem (Theorem B.7) to show that g ∈ L q . (Lax, 2002)

1 For identification of the space C([a, b])∗ see e.g. Theorem 5.6.5 in Brown and Page (1970) or

Chapter 13 in Meise and Vogt (1997).

19 The Hahn–Banach Theorem

Often we want to define linear functionals on a Banach space that have particular properties. One way of doing this is to use the Hahn–Banach Theorem, which guarantees that a linear functional defined on a subspace U of a normed space X can be extended to a linear functional defined on the whole of X without increasing its norm. In other words, given a bounded linear map φ : U → K, we can find another bounded linear map f : X → K such that f (u) = φ(u) for every u ∈ U

and

 f  X ∗ = φU ∗ .

Note that since U is a subspace of X it naturally inherits the norm from X , and φU ∗ is understood in this way, i.e. φU ∗ = sup{|φ(u)| : u ∈ U, u X = 1}, cf. the definition of the norm in X ∗ in (18.1). We will prove the Hahn–Banach Theorem in a more general form than this, first in Section 19.1 for real vector spaces, and then – after some preparation – for complex spaces in Section 19.2. For the very simple proof in a Hilbert space see Exercise 19.1.

19.1 The Hahn–Banach Theorem: Real Case We now prove the Hahn–Banach Theorem in an arbitrary vector space, using Zorn’s Lemma.1 1 It is possible to prove a version of the Hahn–Banach Theorem in a separable normed space

without using Zorn’s Lemma; see Exercise 19.7. Note, however, that this would exclude the cases of ∞ and L ∞ (among others).

210

19.1 The Hahn–Banach Theorem: Real Case

211

We first prove the theorem in a real vector space and then use this version (after some careful preparation) to deduce a similar result in a complex vector space. Definition 19.1 If V is a vector space, then a function p : V → R is ●

sublinear if for x, y ∈ V p(x + y) ≤ p(x) + p(y)

p(λx) = λp(x), λ ∈ R, λ ≥ 0;

and

a seminorm if for x, y ∈ V p(x + y) ≤ p(x) + p(y)

and

p(λx) = |λ| p(x), λ ∈ K.

Note that if p is a seminorm, then p(x) ≥ 0 for every x ∈ X (see Exercise 19.4 for this and other properties of seminorms). Note also that if · is a norm on X , then M ·  defines a seminorm on X for any M ≥ 0. Theorem 19.2 (Real Hahn–Banach Theorem) Let X be a real vector space and U a subspace of X . Suppose that φ : U → R is linear and satisfies φ(x) ≤ p(x)

for all

x ∈U

for some sublinear map p : X → R. Then there exists a linear map f : X → R such that f (x) = φ(x) for all x ∈ U and f (x) ≤ p(x)

for all

x ∈ X.

for all

x ∈ X.

Furthermore, if p is a seminorm, then | f (x)| ≤ p(x)

In particular, if X is a normed vector space, then any φ ∈ U ∗ has an extension f ∈ X ∗ with  f  X ∗ = φU ∗ . The applications in the following chapter mostly use the final version of the result as stated above (the extension of bounded linear functionals in normed vector spaces), but we will also use the more general version that allows bounds by a sublinear map in Section 20.5 and Chapter 21. We already used Zorn’s Lemma in Chapter 1 to show that every vector space has a basis. We will not, therefore, recall all the terminology here, but do restate the lemma itself. Lemma 19.3 (Zorn’s Lemma) Let P be a non-empty partially ordered set. If every chain in P has an upper bound, then P has at least one maximal element.

212

The Hahn–Banach Theorem

Proof of the real Hahn–Banach Theorem. We will consider all possible linear extensions g of φ satisfying the bound g(x) ≤ p(x), and apply Zorn’s Lemma to deduce that there is a ‘maximal extension’. We then argue by contradiction to show that this maximal extension must be defined on the whole of X . More precisely, we consider the collection P of all pairs (G, g), where G is a subspace of X that contains U and g : G → R is a linear functional such that g = φ on U with g(x) ≤ p(x)

for every x ∈ G.

Clearly P is non-empty, since (U, φ) ∈ P. We define an order on P by declaring that (G, g)  (H, h) if h is an extension of g, i.e. H ⊇ G and h = g on G. To apply Zorn’s Lemma we have to show that every chain has an upper bound. Indeed, if C = {(G α , gα ) : α ∈ A} is a chain, then an upper bound for C is the pair (G ∞ , g∞ ), where Gα G∞ = α∈A

and g∞ (x) := gα (x)

whenever

x ∈ Gα.

(19.1)

Note that g∞ is well defined, since any two elements in C are ordered: indeed, if x ∈ G β ∩ G α , then either (G β , gβ )  (G α , gα ) (or vice versa), and we know that gα = gβ on G β , since gα extends gβ , so there is no ambiguity in the definition in (19.1) Similarly, g∞ satisfies g∞ (x) ≤ p(x) for every x ∈ G ∞ , since this bound holds for gα whenever x ∈ G α . Finally, to check that g∞ is linear, observe that if x, y ∈ G ∞ , then we have x ∈ G α and y ∈ G β , and then either G α ⊇ G β or G β ⊇ G α . Supposing the former then x, y ∈ G α and so g∞ (x + λy) = gα (x + λy) = gα (x) + λgα (y) = g∞ (x) + λg∞ (y); we argue similarly if G β ⊇ G α . Since any chain has an upper bound, Zorn’s Lemma now guarantees that P has a maximal element (Y, f ). We want to show that Y = X . Suppose, for a contradiction, that Y = X , in which case there exists an element z ∈ X \ Y ; we want to show that in this case we can extend f to the linear span of Y ∪ {z}, which is a space strictly larger than Y . This will contradict the maximality of (Y, f ), and so it will follow that Y = X and f is the extension of φ to the whole of X required by the theorem.

19.1 The Hahn–Banach Theorem: Real Case

213

Any such linear extension F of f must satisfy F(u + αz) = F(u) + α F(z) = f (u) + α F(z)

for every u ∈ Y, α ∈ R;

the only freedom we have is to choose F(z) = c for some c ∈ R; the issue is how to choose c so that F is still bounded above by p, i.e. so that f (u) + αc ≤ p(u + αz)

(19.2)

for every choice of α ∈ R and every u ∈ Y . We know (by assumption) that (19.2) holds for α = 0, so we have to guarantee that we can find c such that (i) for α > 0 we have (dividing by α)

u  f (u) c≤ p +z − for every u ∈ Y α α and (ii) for every α < 0 we have (dividing by −α)  u f (u) −p −z for every u ∈ Y. c≥ −α −α Since f is linear, this is the same as requiring

u

u  

u 

u  −p f −z ≤c≤ p +z − f α α α α for every u ∈ Y and every α > 0, and since Y is a linear subspace this is just the same as f (v) − p(v − z) ≤ c ≤ p(v + z) − f (v)

for every v ∈ Y.

(19.3)

To show that we can find such a c we use the triangle inequality for p: take v1 , v2 ∈ Y , and then f (v1 ) + f (v2 ) = f (v1 + v2 ) ≤ p(v1 + v2 ) = p(v1 − z + v2 + z) ≤ p(v1 − z) + p(v2 + z); therefore f (v1 ) − p(v1 − z) ≤ p(v2 + z) − f (v2 )

v1 , v2 ∈ Y.

It follows that for any fixed v1 ∈ Y we have f (v1 ) − p(v1 − z) ≤ inf p(v + z) − f (v) v∈Y

and hence we also have sup f (v) − p(v − z) ≤ inf p(v + z) − f (v).

v∈Y

v∈Y

Therefore we can find a c ∈ R to ensure that (19.3) holds.

214

The Hahn–Banach Theorem

However, this shows that we can extend (Y, f ) to a linear functional F defined on Y  , the span of Y and z, that still satisfies F(x) ≤ p(x) for all x ∈ Y  . This contradicts the maximality of (Y, f ), so in fact Y = X . This proves the result for sublinear functionals. If p is in fact a seminorm, then p(−x) = p(x) and we have f (x) ≤ p(x)

and

− f (x) = f (−x) ≤ p(−x) = p(x),

and so our extension satisfies | f (x)| ≤ p(x). To prove the result for bounded linear functionals on normed spaces, i.e. when p(x) = φU ∗ x, first we observe that this does indeed define a seminorm, so that there exists an extension with | f (x)| ≤ φU ∗ x, i.e. such that  f  X ∗ ≤ φU ∗ . Since U ⊆ X , we have  f  X ∗ ≥ φU ∗ if f extends φ, and so  f  X ∗ = φU ∗ as claimed.

19.2 The Hahn–Banach Theorem: Complex Case To extend the Hahn–Banach Theorem to the complex case we will have to restrict to the case that p is a seminorm (rather than just a sublinear functional). In order to use the real version of the theorem, we first observe that any complex vector space V can be viewed as a real vector space by only allowing scalar multiplication by real numbers. This does not effect the elements of the space itself, but has a significant effect on what it means for a map to be ‘linear’, i.e. the values of α, β allowed in the expression φ(αx + βy) = αφ(x) + βφ(y)

x, y ∈ V

(19.4)

become restricted to real numbers. We therefore make the distinction in this section between ‘real-linear’ maps (allowing only α, β ∈ R in (19.4)) and ‘complex-linear’ maps (for which we can take α, β ∈ C in (19.4)). While the ‘real version’ of a normed space V has the same elements, its dual space contains more maps, since they are only required to real-linear. We therefore use the notation VR∗ for the collection of all bounded real-linear functionals on V in order to distinguish it from V ∗ (which contains only complex-linear functionals when V is complex). Lemma 19.4 Let V be a complex vector space. Given any complex-linear functional f : V → C there exists a unique real-linear ψ : V → R such that f (v) = ψ(v) − iψ(iv)

for all

v ∈ V.

(19.5)

19.2 The Hahn–Banach Theorem: Complex Case

215

Furthermore, (i) if p : V → R is a seminorm and | f (v)| ≤ p(v), then |ψ(v)| ≤ p(v); (ii) if V is a normed space and f ∈ V ∗, then ψ ∈ VR∗ with ψVR∗ =  f V ∗ . Conversely, if ψ : V → R is real-linear and satisfies |ψ(v)| ≤ p(v), then f : V → C defined by (19.5) is complex-linear and satisfies | f (v)| ≤ p(v) for all v ∈ V . Moreover, if ψ ∈ VR∗ , then f ∈ V ∗ with  f V ∗ = ψVR∗ . Proof If v ∈ V, then we can write f (v) = ψ(v) + iφ(v), where ψ, φ : V → R are real-linear. Since ψ(iv) + iφ(iv) = f (iv) = i f (v) = iψ(v) − φ(v), it follows that ψ(iv) = −φ(v), which yields (19.5). If | f (v)| ≤ p(v), then this is inherited by ψ, since | f (v)|2 = |ψ(v)|2 + |ψ(iv)|2

|ψ(v)| ≤ | f (v)| ≤ p(v);

for bounded linear functionals on a normed space the same argument gives ψVR∗ ≤  f V ∗ , but the equality of these norms requires a neat trick. To show that ψVR∗ ≥  f V ∗ , observe that for any x we can write | f (x)| = eiθ f (x) for some θ ∈ R. So | f (x)| = eiθ f (x) = f (eiθ x) = ψ(eiθ x) − iψ(ieiθ x). Since | f (x)| is real, we must have, for all x ∈ V , | f (x)| = ψ(eiθ x) ≤ |ψ(eiθ x)| ≤ ψVR∗ eiθ xV = ψVR∗ xV , and so  f V ∗ ≤ ψVR∗ . To prove the converse results (from ψ to f ), we first show that, given a real-linear map ψ : V → R, the map f : V → C defined in (19.5) is complexlinear. It is clear that f (u + v) = f (u) + f (v), since ψ has this property. We need only show that f (λu) = λ f (u) for λ ∈ C, and for this it suffices to check that f (iu) = i f (u), since f is clearly real-linear because ψ is. We have

216

The Hahn–Banach Theorem f (iu) = ψ(iu) − iψ(−u) = i[ψ(u) − iψ(iu)] = i f (u).

To show that | f (x)| ≤ p(x) we use the same trick that we used above to show that  f V ∗ ≤ ψVR∗ . Suppose that | f (x)| = eiθ f (x); then | f (x)| = f (eiθ x) = ψ(eiθ x) − iψ(ieiθ x), and since | f (x)| is real we have | f (x)| = ψ(eiθ x) ≤ |ψ(eiθ x)| ≤ p(eiθ x) = |eiθ | p(x) = p(x). This also implies the result for the dual norms. As remarked above, for the complex version of the Hahn–Banach Theorem we require our bounding functional p to be at least a seminorm (Definition 19.1). Since we are now dealing with a complex space X , throughout the statement, ‘linear’ means ‘complex-linear’. Theorem 19.5 (Complex Hahn–Banach Theorem) Let X be a complex vector space, U a subspace of X , and p a seminorm on X . Suppose that φ : U → C is linear and satisfies |φ(x)| ≤ p(x)

for all

x ∈ U.

Then there exists a linear map f : X → C such that f (x) = φ(x) for all x ∈ U and | f (x)| ≤ p(x)

for all

x ∈ X.

In particular, if X is a normed space, then any φ ∈ U ∗ can be extended to some f ∈ X ∗ with  f  X ∗ = φU ∗ . Proof By Lemma 19.4 there exists a real-linear ψ : U → R such that φ(v) = ψ(v) − iψ(iv),

(19.6)

and |ψ(w)| ≤ p(w) for all w ∈ U . We can now use the real Hahn–Banach Theorem to extend ψ from U to X to give a real-linear map  : X → R that satisfies |(x)| ≤ p(x)

for every

x ∈ X.

Exercises

217

Finally, if we define f (u) = (u) − i(iu), then f : X → C is complex-linear, extends φ, and satisfies | f (w)| ≤ p(w), using the second half of Lemma 19.4. If φ ∈ U ∗ , then we follow a very similar argument, first writing φ as in (19.6), where now ψ ∈ UR∗ with ψUR∗ = φU ∗ . We then extend ψ to an ∗ with  ∗ = ψ ∗ and use the second part of Lemma element  ∈ X R XR UR 19.4 to guarantee that the complex-linear functional f on X defined by setting f (u) := (u) − i(iu) satisfies  f  X ∗ =  X ∗R = ψU ∗ = φU ∗ .

Exercises 19.1 Let H be a Hilbert space and U a closed linear subspace of H . Use the Riesz Representation Theorem to show that any φ ∈ U ∗ has an extension to an element f ∈ H ∗ such that f (x) = φ(x) for every x ∈ U and  f  H ∗ = φU ∗ . 19.2 Show that the extension obtained in the previous exercise is unique. 19.3 Let X be a normed space and U a subspace of X that is not closed. If φˆ : U → K is a linear map such that ˆ |φ(x)| ≤ Mx

for every

x ∈U

show that φˆ has a unique extension φ to U (the closure of U in X ) that is linear and satisfies |φ(x)| ≤ Mx for every x ∈ U . (For any x ∈ U there exists a sequence (xn ) ∈ U such that xn → x. Define ˆ n ). φ(x) := lim φ(x n→∞

Show that this is well defined and has the required properties.) 19.4 Show that a seminorm p on a vector space X satisfies (i) p(0) = 0; (ii) | p(x) − p(y)| ≤ p(x − y); (iii) p(x) ≥ 0; and (iv) {x : p(x) = 0} is a subspace of X . (Rudin, 1991)

218

The Hahn–Banach Theorem

19.5 Suppose that U is a subspace of a normed space X and that φ ∈ U ∗ . Show that the set of all ‘Hahn–Banach extensions’ f of φ (using the notation from Theorem 19.2) is convex. (Costara and Popa, 2003) 19.6 Suppose that X is a real separable normed space, and W a closed linear subspace of X . Show that there exists a sequence of unit vectors (z j ) ∈ X such that z j+1 ∈ / W j := Span W ∪ {z 1 , . . . , z j }, and if we define W∞ = Span W ∪ {z j }∞ j=1 , then W∞ = X . 19.7 Suppose that X is a real separable normed space, and that W is a closed linear subspace of X . Use the results of Exercises 19.3 and 19.6, along with the ‘extension to one more dimension’ part of the proof of the Hahn–Banach Theorem given in Section 19.1 to show that any φ ∈ W ∗ has an extension to an f ∈ X ∗ with  f  X ∗ = φW ∗ . (For a separable space this gives a proof of the Hahn–Banach Theorem that does not require Zorn’s Lemma.) (Rynne and Youngson, 2008)

20 Some Applications of the Hahn–Banach Theorem

We now explore some consequences of the Hahn–Banach Theorem in normed spaces. In Sections 20.1–20.4 we will only need the simplest version of the theorem, which we state here in a compact form. Theorem 20.1 (Hahn–Banach) Let X be a normed space and U a subspace of X . If φ ∈ U ∗ , then φ has an extension to an element f ∈ X ∗ , i.e. f (x) = φ(x) for every x ∈ U , such that  f  X ∗ = φU ∗ . In the final section of the chapter we give an application that requires the extension of a linear map bounded by a sublinear functional.

20.1 Existence of a Support Functional As a first application of the Hahn–Banach Theorem, we prove the existence of a particularly useful class of linear functionals (the ‘support functionals’) that return the norm of a particular element x ∈ X . This allows us to show that linear functionals can distinguish between elements of X , and that understanding all the linear functionals on X is in some way enough to understand X itself. Lemma 20.2 (Support functional) If X is a normed space, then given any x ∈ X there exists an f ∈ X ∗ such that  f  X ∗ = 1 and f (x) = x. We term this f the ‘support functional at x’. Proof Define φ on the linear space U = Span({x}) by setting φ(αx) = αx 219

for all α ∈ K.

220

Some Applications of the Hahn–Banach Theorem

Then φ(x) = x and |φ(z)| ≤ z for all z ∈ U , which shows that φU ∗ = 1. Use the Hahn–Banach Theorem to extend φ to an f ∈ X ∗ such that  f  X ∗ = 1, and then note that f (x) = φ(x) = x. The following simple corollary shows that X ∗ is rich enough to distinguish between elements of X . Corollary 20.3 (X ∗ separates points) If x, y ∈ X with x = y, then there exists f ∈ X ∗ such that f (x) = f (y). Consequently, if x, y ∈ X and f (x) = f (y) for every f ∈ X ∗ , then x = y. Proof If x = y, then by the previous lemma, there exists an f with  f  X ∗ = 1 such that f (x) − f (y) = f (x − y) = x − y = 0. This result shows that, rather than being particular to C([a, b])∗ , the observation we made in Example 18.1 that understanding the action of elements of X ∗ on X is enough to ‘understand’ the whole of X is true in a general context.

20.2 The Distance Functional The next result is a key ingredient in many subsequent proofs. Given a closed linear subspace Y and a point x ∈ / Y it provides a linear functional that vanishes on Y and encodes the distance of x from Y . We call this functional a ‘distance functional’. Proposition 20.4 (Distance functional) Let X be a normed space and Y be a proper closed subspace of X . Take x ∈ X \ Y and set d = dist(x, Y ) := inf{x − y : y ∈ Y } > 0.

(20.1)

Then there is an f ∈ X ∗ such that  f  X ∗ = 1, f (y) = 0 for every y ∈ Y , and f (x) = d. Proof First note that d > 0 since Y is closed (see the proof of Lemma 5.4). Let U = Span{Y ∪ {x}} and define φ : U → K by setting φ(y + λx) := λd,

y ∈ Y, λ ∈ K.

To see that φ is bounded on U , observe that |φ(y + λx)| = |λ|d ≤ |λ| x − (−y/λ) = λx + y since (−y/λ) ∈ Y and d is the distance between x and Y ; it follows that φU ∗ ≤ 1.

20.3 Separability of X ∗ Implies Separability of X

221

To see that φU ∗ ≥ 1, take yn ∈ Y such that  1 x − yn  ≤ d 1 + n (the existence of such a sequence follows from the definition of d in (20.1)). Then n φ(−yn + x) = d ≥ x − yn  n+1 and so φU ∗ ≥ n/(n + 1) for every n, i.e. φU ∗ ≥ 1. We have therefore shown that φU ∗ = 1. We now extend φ to an element f ∈ X ∗ using the Hahn–Banach Theorem; the resulting f satisfies  f  X ∗ = 1, f (x) = d, and f (y) = 0 for every y ∈ Y , as required.

20.3 Separability of X ∗ Implies Separability of X We now use the existence of a distance functional to prove the more substantial result that separability of X ∗ implies separability of X . Be warned that the converse is not true in general: we have already seen that (1 )∗ ≡ ∞ , and we know that 1 is separable but that ∞ is not (Lemma 3.24). Lemma 20.5 If X ∗ is separable, then X is separable. Proof Since X ∗ is separable, S X ∗ = { f ∈ X ∗ :  f  X ∗ = 1} (the unit sphere in X ∗ ) is separable (by (i) ⇒ (ii) in Lemma 3.23). Let { f n } be a countable dense subset of S X ∗ . Since  f n  X ∗ = 1, for each n, there exists an xn ∈ X with xn  = 1 such that | f n (xn )| ≥ 1/2, by the definition of the norm in X ∗ . We now show that M := clin({xn }) is all of X , which will imply (by (iii) ⇒ (i) of Lemma 3.23) that X is separable. Suppose for a contradiction that M = X . Then M is a proper closed subspace of X , and so Proposition 20.4 (existence of a distance functional) provides an f ∈ X ∗ with  f  X ∗ = 1 (i.e. f ∈ S X ∗ ) and f (x) = 0 for every x ∈ M. But then f (xn ) = 0 for every n and so 1 ≤ | f n (xn )| = | f n (xn ) − f (xn )| ≤  f n − f  X ∗ xn  =  f n − f  X ∗ 2 for every n, which contradicts the fact that { f n } is dense in S X ∗ .

222

Some Applications of the Hahn–Banach Theorem

Note that one immediate consequence of this result is that 1 is not the dual of ∞ . Corollary 20.6 (∞ )∗  1 and (L ∞ )∗  L 1 . Proof We know that 1 is separable, so if we had (∞ )∗  1 , then (∞ )∗ would be separable, since separability is preserved under isomorphisms (Exercise 3.13). Lemma 20.5 would then imply that ∞ was itself separable, but we know that this is not true (Lemma 3.24). The same arguments work with L 1 and (L ∞ )∗ . The three results we have just proved will be used repeatedly in the rest of Part IV. We now give two more applications which demonstrate some of the power of the Hahn–Banach Theorem, although we will not explore their consequences further.

20.4 Adjoints of Linear Maps between Banach Spaces When H and K are Hilbert spaces we defined in Theorem 13.1 the adjoint T ∗ ∈ B(K , H ) of a linear map T ∈ B(H, K ), and we showed there that T ∗  B(K ,H ) = T  B(H,K ) . We can do something similar for linear maps between Banach spaces. Suppose that X and Y are Banach spaces and T ∈ B(X, Y ). Given any g ∈ Y ∗ = B(Y, K), the map g ◦ T is a bounded linear functional on X (see (11.5)) with g ◦ T  X ∗ ≤ gY ∗ T  B(X,Y ) .

(20.2)

We define the adjoint T × of T to be the map T × : Y ∗ → X ∗ given by T × g = g ◦ T. This is a linear map on Y ∗ , since for every x ∈ X T × (αg1 + βg2 )(x) = αg1 (T (x)) + βg2 (T (x)) = αT × g1 (x) + βT × g2 (x) and T × (αg) = αg ◦ T = αT × g. The inequality in (20.2) shows immediately that T ×  B(Y ∗ ,X ∗ ) ≤ T  B(X,Y ) .

20.4 Adjoints of Linear Maps between Banach Spaces

223

To show that the norms are in fact equal, for every non-zero x ∈ X we can find g ∈ Y ∗ such that gY ∗ = 1 and g(T x) = T xY , using Lemma 20.2. Note that g(T x) = (T × g)(x), and so T xY = g(T x) = (T × g)(x) ≤ T ×  B(Y ∗ ,X ∗ ) gY ∗ x X . Since gY ∗ = 1, it follows that for every x ∈ X we have T xY ≤ T ×  B(Y ∗ ,X ∗ ) x X , which implies that T ×  B(Y ∗ ,X ∗ ) ≥ T  B(X,Y ) and hence that T ×  B(Y ∗ ,X ∗ ) = T  B(X,Y ) .

(20.3)

(The above shows that the mapping T → T × is a linear isometry from B(X, Y ) onto a subspace of B(Y ∗ , X ∗ ); in general this map need not be onto.) The following result will be useful later when we discuss reflexive spaces in Chapter 26. Lemma 20.7 If T : X → Y is an isomorphism, then T × : Y ∗ → X ∗ is an isomorphism. If T is an isometric isomorphism, then so is T × . Proof First we show that T × is an isomorphism by showing that (T × )−1 = (T −1 )× . Since T −1 ∈ B(Y, X ), it follows that (T −1 )× ∈ B(X ∗ , Y ∗ ). Now observe that for any f ∈ X ∗ and x ∈ X [T × ((T −1 )× f )](x) = ((T −1 )× f )(T x) = f ◦ T −1 (T x) = f (x), i.e. T × ◦ (T −1 )× = id X ∗ ; an almost identical argument can be used to show that (T −1 )× ◦ T × = idY ∗ . Thus (T × )−1 = (T −1 )× and so in particular T × is a bijection. To show that T × is an isometry we use Exercise 11.8, which shows that a bijection T ∈ B(X, Y ) is an isometry if T  B(X,Y ) = T −1  B(Y,X ) = 1. We use (20.3) and the fact that T is an isometry to deduce that T ×  = T  = 1

and

(T × )−1  = T −1  = 1;

it follows that T × is also an isometry. For the relationship between the Banach adjoint and the Hilbert adjoint see Exercise 20.11.

224

Some Applications of the Hahn–Banach Theorem

20.5 Generalised Banach Limits Let X = ∞ (R) be the space of all bounded real sequences, and let c(R) be the subspace of X consisting of all convergent real sequences. Let  ∈ c(R)∗ be defined by setting (x) = limn→∞ xn , i.e. (x) gives the limit of the sequence (xn ). We want to show that we can define an extension of  to the whole of X that retains some of the most important properties of the usual limit. (Since  is a bounded linear functional on c(R)∗ , which is a subspace of ∞ (R), we could just extend  to a linear functional on ∞ (R) using the Hahn–Banach Theorem to give some sort of ‘generalised limit’. However, the more involved construction we now give ensures that our generalised limit also satisfies (b) and (c), below.) A Banach (generalised) limit on X is any L ∈ X ∗ such that for x ∈ X (a) L(x) ≥ 0 if xn ≥ 0 for all n; (b) L(x) = L(sl x), where sl is the left shift from Example 11.7; and (c) L((1, 1, 1, . . .)) = 1. Lemma 20.8 If L is a Banach limit on X , then lim inf xn ≤ L(x) ≤ lim sup xn n→∞

n→∞

for every x ∈ X . In particular, if x ∈ c(R), then L(x) = (x). Proof Since lim inf xn = lim inf xn n→∞

k→∞ n≥k

and

lim sup xn = lim sup xn n→∞

k→∞ n≥k

it follows from property (b) that it suffices to show that inf xn ≤ L(x) ≤ sup xn . n

n

Take ε > 0; then there exists n 0 such that sup xn − ε < xn 0 ≤ sup xn , n

n

and so xn0 − xn + ε > 0

for all n.

Using properties (a) and (c) we obtain 0 ≤ L({xn 0 + ε − xn }) = xn 0 + ε − L(x),

20.5 Generalised Banach Limits

225

which implies that L(x) − ε ≤ xn 0 ≤ sup xn . n

Since this holds for all ε > 0, it follows that L(x) ≤ supn xn . A similar argument yields infn xn ≤ L(x). That L(x) = (x) whenever x ∈ c(R) now follows since for x ∈ c(R) we have lim infn→∞ xn = lim supn→∞ xn = limn→∞ xn = (x). We now use the Hahn–Banach Theorem with an appropriate sublinear functional to show that generalised Banach limits exist. Proposition 20.9 Banach limits exist. Proof Consider the functional p : X → R defined by setting p(x) := lim sup n→∞

x1 + · · · + xn . n

Then we have (x1 + y1 ) + · · · + (xn + yn ) n n→∞ x1 + · · · + xn y1 + · · · + yn ≤ lim sup + lim sup n n n→∞ n→∞

p(x + y) = lim sup

= p(x) + p( y), and clearly p(λx) = λp(x), so p is a sublinear functional on X . We also have − p(−x) = lim inf n→∞

x1 + · · · + xn . n

Now note that if x ∈ c(R), then x1 + · · · + xn = (x) n (see Exercise 20.13), and so in particular p(x) = (x) for every x ∈ c(R). We can now use the Hahn–Banach Theorem to extend  to some L ∈ X ∗ such that lim

n→∞

− p(−x) ≤ L(x) ≤ p(x)

x ∈ X.

That L satisfies properties (a) and (c) required for a generalised Banach limit follows immediately from this inequality; for property (b) note that L(x − sl x) ≤ p(x − sl x) = lim sup n→∞

since x ∈

∞ ,

and similarly for the lower bound.

xn+1 − x1 = 0, n

226

Some Applications of the Hahn–Banach Theorem

Goffman and Pedrick (1983) discuss conditions under which the Banach limit is unique, calling such sequences ‘almost convergent’ (Section 2.10).

Exercises 20.1

20.2

If H is a Hilbert space, given x ∈ H , find an explicit form for the functional f ∈ H ∗ such that  f  H ∗ = 1 and f (x) = x (as in Lemma 20.2). Let X be a normed space, {e j }nj=1 ∈ X a linearly independent set, and {a j }nj=1 ∈ K. Show that there exists f ∈ X ∗ such that f (e j ) = a j

20.3

j = 1, . . . , n.

If X is a Banach space show that x ≤ M if and only if | f (x)| ≤ M for all f ∈ X ∗ with  f  X ∗ = 1, and hence show that x = sup{| f (x)| : f ∈ X ∗ ,  f  X ∗ = 1}.

20.4 20.5

(Sometimes this provides a useful way to bound x ‘by duality’.) Find an explicit form for the distance functional of Proposition 20.4 when X is a Hilbert space. Show that if X ∗ is strictly convex (see Exercise 10.3), then for each x ∈ X with x = 0 the set { f ∈ X ∗ :  f  X ∗ = 1 and f (x) = x X }

20.6

consists of a single linear functional. If X is a Banach space and T ∈ B(X ) then the numerical range of T , V (T ), is defined by setting V (T ) := { f (T x) : f ∈ X ∗ :  f  X ∗ = x X = f (x) = 1}.

20.7 20.8

Show that this reduces to V (T ) = {(T x, x) : x = 1} in a Hilbert space (see Exercise 16.3). Deduce the existence of a support functional as a corollary of Proposition 20.4. Show that if f ∈ X ∗ and f = 0 then dist(x, Ker( f )) =

20.9

| f (x)| .  f X ∗

(Pryce, 1973) Let X be a separable Banach space. Show that X is isometrically isomorphic to a subspace of ∞ . [Hint: let (xn ) be a dense sequence in the

Exercises

227

unit sphere of X , let (φn ) be support functionals at (xn ), and show that T : X → ∞ defined by setting T x := (φ1 (x), φ2 (x), . . .) is a linear isometry.] (Heinonen, 2003) 20.10 Show that a point z ∈ X belongs to clin(E) if and only if f (z) = 0 for every f ∈ X ∗ that vanishes on E, i.e. f (x) = 0 for every x ∈ E implies that f (z) = 0. (Taking E = Y with Y a linear subspace shows that x ∈ Y if and only if f (x) = 0 whenever f ∈ X ∗ with f (y) = 0 for every y ∈ Y .) 20.11 Suppose that H and K are two Hilbert spaces and T ∈ B(H, K ). Show that × T ∗ = R −1 H ◦ T ◦ RK ,

where T ∗ is the Hilbert adjoint, T × the Banach adjoint, and R H and R K the Riesz maps from H → H ∗ and from K → K ∗ , respectively. 20.12 Use the Arzelà–Ascoli Theorem to show that if T is compact then T × is compact. [Hint: if B is a bounded subset B of Y ∗ then we can regard B as a subset of C(K ), where K is the compact metric space T (B X ).] 20.13 Show that if x ∈ c(R) with limn→∞ xn = α then lim

n→∞

x1 + · · · + xn = α. n

21 Convex Subsets of Banach Spaces

In this chapter we investigate some properties of convex subsets of Banach spaces. We begin by introducing the Minkowski functional, which is a sublinear functional on x that can be defined for any open convex subset that contains the origin. We then use this, along with the Hahn–Banach Theorem, to show that any two disjoint convex sets can be separated by a hyperplane (a translation of the kernel of a linear functional), and finally, we prove the Krein–Milman Theorem, which shows that a compact convex subset of a Banach space is determined by its extreme points.

21.1 The Minkowski Functional For the proof of the ‘separation theorem’ for convex sets we will need the following lemma, which allows us to find a sublinear functional that somehow ‘encodes’ any convex set. Lemma 21.1 If C is an open convex subset of a Banach space X with 0 ∈ C, then we define the Minkowski functional of C by setting p(x) := inf{λ > 0 : λ−1 x ∈ C}

for each x ∈ X ;

see Figure 21.1. Then p is a sublinear functional on X and there exists a constant c > 0 such that 0 ≤ p(x) ≤ cx

for every x ∈ X.

(21.1)

Furthermore, C = {x : p(x) < 1}.

228

(21.2)

21.1 The Minkowski Functional

x x

x

y

229

C

2y/3

0 z/2

x x

z

Figure 21.1 The Minkowski functional: p is constant on scaled versions of ∂C. On the dotted line p = 3/2; on the dashed line p = 2; and p < 1 in C.

Proof To see that p is sublinear, first observe that it follows easily from the definition that p(λx) = λp(x) for λ > 0. For the triangle inequality, take α > p(x) and β > p(y); then α −1 x, β −1 y ∈ C, and since C is convex α β x+y α −1 x + β −1 y = ∈ C. α+β α+β α+β It follows that p(x + y) ≤ α + β, and since this holds for any α > p(x), β > p(y) we obtain p(x + y) ≤ p(x) + p(y), as required. Since C is open and 0 ∈ C, C contains an open ball B(0, δ) for some δ > 0, and so z < δ

z∈C

| p(z)| ≤ 1,

and then (21.1) follows, since for any non-zero x ∈ X we can consider z=

δ x ; 2 x

since z < δ it follows that | p(z)| ≤ 1, i.e. )  ) ) ) ) p δ x ) = δ | p(x)| ≤ 1, ) 2 x ) 2x which yields (21.1) with c = 2/δ. Finally, we prove (21.2). If x ∈ C, then, since C is open, we have λ−1 x ∈ C for some λ < 1, and so p(x) ≤ λ < 1, while if p(x) < 1, then λ−1 x ∈ C for some λ < 1, and since 0 ∈ C and C is convex, it follows that x = λ(λ−1 x) + (1 − λ)0 ∈ C.

230

Convex Subsets of Banach Spaces

21.2 Separating Convex Sets We now use the Minkowski functional (applied to an appropriate set) as the sublinear functional in the Hahn–Banach Theorem, to show that any two convex sets in a Banach space X can be ‘separated’ by some f ∈ X ∗ (see Figure 21.2). We start with the case of a real Banach space. Theorem 21.2 (Functional separation theorem) Suppose that X is a real Banach space and A, B ⊂ X are non-empty, disjoint, convex sets. (i) If A is open, then there exist f ∈ X ∗ and γ ∈ R such that f (a) < γ ≤ f (b),

a ∈ A, b ∈ B.

(ii) If A is compact and B is closed, then there exist f ∈ X ∗ , γ ∈ R, and δ > 0 such that f (a) ≤ γ − δ < γ + δ ≤ f (b),

a ∈ A, b ∈ B.

A simple but important example of case (ii) is when A = {a} is a point and B is closed. (Compare this result with the Hilbert-space prototype of Corollary 10.2.) Proof (i) Choose a0 ∈ A and b0 ∈ B, and let w0 = b0 − a0 . Now consider C := w0 + A − B,

A

x

b0 B

w0

a0 x

f (x) = γ

Figure 21.2 Illustration of the functional separation theorem (case (ii)).

21.2 Separating Convex Sets

231

i.e. C = {w0 + a − b : a ∈ A, b ∈ B}. Then it is easy to check (see Exercise 3.2) that C is an open convex set that contains 0; we let p(·) be the Minkowski functional for C defined in Lemma 21.1. / C, and so p(w0 ) ≥ 1. Since A ∩ B = ∅, w0 ∈ Let U = Span(w0 ), and define a linear functional φ on U by setting φ(αw0 ) = α,

α ∈ R.

If α ≥ 0, then φ(αw0 ) = α ≤ αp(w0 ) = p(αw0 ), while if α < 0, then φ(αw0 ) < 0 ≤ p(αw0 ), and so φ(w) ≤ p(w) for every w ∈ U . We can therefore use the Hahn–Banach Theorem (Theorem 19.2) to find a linear extension f : X → R of φ such that f (x) ≤ p(x)

for every x ∈ X.

Since we have (21.1), this f satisfies f (x) ≤ p(x) ≤ cx. Since f is linear and p is sublinear, it follows that − f (x) = f (−x) ≤ p(−x) ≤ c − x = cx, and so | f (x)| ≤ cx, i.e. f is actually an element of X ∗ . By definition for any a ∈ A and b ∈ B we have w0 + a − b ∈ C, and so, since f (w0 ) = φ(w0 ) = 1, 1 + f (a) − f (b) = f (w0 + a − b) ≤ p(w0 + a − b) < 1. This shows that f (a) < f (b), and so if we define γ = infb∈B f (b) we obtain f (a) ≤ γ ≤ f (b)

a ∈ A, b ∈ B.

(21.3)

To guarantee that the left-hand inequality is in fact strict, suppose not, i.e. that there exists an a ∈ A such that f (a) = γ . Since A is open, we must have a + δw0 ∈ A for some δ > 0, and then we would have f (a + δw0 ) = f (a) + δφ(w0 ) = γ + δ > γ , which contradicts (21.3).

232

Convex Subsets of Banach Spaces

Note that if A and B are both open, then the same argument shows that f (a) < γ < f (b),

a ∈ A, b ∈ B.

To prove (ii) we set ε=

1 inf{a − b : a ∈ A, b ∈ B} > 0, 4

which is strictly positive since A is compact and B is closed. Now consider the two open convex disjoint sets Aε := A + B(0, ε)

Bε := B + B(0, ε),

and

where here B(0, ε) is the open ball of radius ε. We can now apply part (i) to the open sets Aε and Bε . Set w0 = a0 − b0 for some a0 ∈ Aε and b0 ∈ Bε , and then follow the same argument as in part (i) to find f ∈ X ∗ and γ ∈ R such that f (a) < γ < f (b),

a ∈ Aε , b ∈ Bε .

If we let δ = ε/2w0 , then for any a ∈ A we have a + δw0 ∈ Aε , and so f (a) = f (a + δw0 ) − δφ(w0 ) ≤ γ − δ, and similarly, γ + δ ≤ f (b) for any b ∈ B. We have a very similar result in a complex Banach space, except that we replace the linear functional f by its real part throughout. Theorem 21.3 (Functional separation theorem – complex case) Suppose that X is a complex Banach space. Then Theorem 21.2 still holds, except that in the inequalities f should be replaced by Re f throughout. Proof We consider real-linear maps on the space X , as in the proof of the complex version of the Hahn–Banach Theorem. So, for example, in case (i) ∗ and γ ∈ R such that (when A is open) we can use Theorem 21.2 to find φ ∈ X R φ(a) < γ ≤ φ(b),

a ∈ A, b ∈ B.

Now we use the argument from the proof of the complex version of the Hahn–Banach Theorem to show that the linear functional f (x) := φ(x) − iφ(ix) is in X ∗ . This clearly has the required properties, since Re f = φ.

21.3 Linear Functionals and Hyperplanes

233

21.3 Linear Functionals and Hyperplanes We now show how linear functionals are related to hyperplanes (subspaces of codimension one). The discussion at the end of Chapter 12 has already given some indication of the nature of this relationship. Definition 21.4 A hyperplane U in a vector space X is a codimension-one subspace of X , i.e. a maximal proper subspace: U = X and if Z is a subspace with U ⊆ Z ⊆ X , then Z = U or Z = X . This definition allows us to give a more geometric interpretation of the above separation theorems (which should be unsurprising given Figure 21.2). Lemma 21.5 The following are equivalent: (i) U is a hyperplane in X ; (ii) U is a subspace of X with U = X but for any x ∈ X \ U , the span of (U, {x}) is X ; and (iii) U = Ker(φ) for some non-zero linear functional φ : X → K. Note that in (iii) the linear functional φ does not have to be bounded; we will show in the next result that it is bounded if and only if U is closed. Proof (i) ⇔ (ii) If U is a hyperplane and x ∈ / U , then span(U, {x}) is a subspace that strictly contains U , so must be X . Conversely, if U ⊆ Z ⊆ X , either Z = U or there exists z ∈ Z \ U , and then Z ⊃ Span(U, {z}) = X . (ii) ⇒ (iii) Let U be a hyperplane and choose x ∈ / U . Define φ : Span(U ∪ {x}) = X → K by setting φ(y + λx) := λ,

y ∈ U, λ ∈ K.

This is well defined since if y + λx = y  + λ x, then y − y  = (λ − λ)x

(λ − λ)x ∈ U

λ = λ .

As defined φ is linear, non-zero (since φ(x) = 1) and φ(y) = 0 if and only if y ∈ U , so U = Ker(φ). (iii) ⇒ (ii) Suppose that U = Ker(φ) and take any x ∈ / U ; it follows that φ(x) = 0. Now given any z ∈ X \ U let y := z −

φ(z) x, φ(x)

234

Convex Subsets of Banach Spaces

so that φ(y) = 0, i.e. y ∈ U , and z = y + (φ(z)/φ(x))x. It follows that the span of U ∪ {x} is X , as claimed. Lemma 21.6 If U = Ker(φ) is a hyperplane in X , then U is closed if and only if φ is bounded; otherwise U is dense in X . Proof First, note that since U = X , it cannot be both closed and dense. So it is enough to show that bounded implies closed, and unbounded implies dense. If φ is bounded, then U = Ker(φ) is closed by Lemma 11.12. If φ is unbounded, then we can find (xn ) ∈ X such that xn  = 1 but φ(xn ) ≥ n. Now given x ∈ X , consider the sequence yn = x −

φ(x) xn . φ(xn )

Then φ(yn ) = 0, so yn ∈ U , and * * * φ(x) * |φ(x)|xn  |φ(x)| * xn * = ≤ , x − yn  = * φ(xn ) * |φ(xn )| n and so yn → x and n → ∞ and U is dense. The translate by y ∈ X of a hyperplane U is the set U + y = {u + y : u ∈ U }. Since any hyperplane U is equal to Ker(φ) for some φ ∈ X ∗ , any translate of U is also given by U + y = {x ∈ X : φ(x) = φ(y)}. The following is now just a restatement of Theorem 21.2, if we understand that two sets A and B are separated by φ(x) = γ if φ(a) < γ < φ(b) for every a ∈ A, b ∈ B. Corollary 21.7 Suppose that A, B are non-empty convex subsets of X with A closed and B compact. Then there exists a closed hyperplane that can be translated so that it separates A and B.

21.4 Characterisation of Closed Convex Sets We can use the separation Theorem 21.2 to give a characterisation of any closed convex subset of X as the envelope of its ‘supporting hyperplanes’; see Figure 21.3. This result will be useful later when we show that such sets are

21.5 The Convex Hull

235

Figure 21.3 A convex set in a real Banach space is the envelope of a collection of translated hyperplanes (Corollary 21.8). Each of the lines represents a set of the form {x : f (x) = inf y∈C f (y)} (in the real case).

also ‘weakly closed’ (Theorem 27.7), a fact that is very useful in the calculus of variations (maximisation/minimisation problems). Corollary 21.8 Suppose that C is a closed convex subset of a Banach space X . Then C = {x ∈ X : Re f (x) ≥ inf Re f (y) for every f ∈ X ∗ }. y∈C

Proof That C is contained in the right-hand side is immediate. Suppose that x0 ∈ / C. Then, since {x0 } is compact and convex, we can use Theorem 21.2 to find f ∈ X ∗ such that Re f (x0 ) ≤ γ − δ < γ + δ ≤ Re f (y),

for every y ∈ C.

In particular, Re f (x0 ) < inf y∈C Re f (y).

21.5 The Convex Hull Suppose that U is a subset of a vector space X . Then the convex hull of U is the collection of all ‘convex linear combinations’ of points in U , i.e. ⎧ ⎫ n n ⎨ ⎬  conv(U ) := λ j u j : n ∈ N, u j ∈ U, λ j > 0, λ j = 1 . (21.4) ⎩ ⎭ j=1

j=1

Lemma 21.9 The convex hull of U is the smallest convex set that contains U .

236

Convex Subsets of Banach Spaces

Proof First we show that conv(U ) contains U and is convex. If {u j }nj=1 and n m {vk }m j=1 λ j = 1 and k=1 μk = 1, then for any t ∈ (0, 1) k=1 are in U and ⎧ ⎫

m n ⎨ ⎬  λ j u j + (1 − t) μjvj t ⎩ ⎭ j=1

k=1

is another convex linear combination of elements of {u j } (note that the definition in (21.4) does not require the u j to be distinct, although it could and would yield the same set). Now suppose that C is convex and contains U . We show by induction that for any n ∈ N, nj=1 λ j u j ∈ C whenever u j ∈ U , λ j > 0, and nj=1 λ j = 1. This is true when n = 2 since U is convex. Now take {u j }nj=1 ∈ U and λ j > 0 such that nj=1 λ j = 1. By induction n−1 we know that n−1 j=1 μ j u j ∈ C for any μ j > 0, j=1 μ j = 1. Now choose (1 − t) = λn and μ j such that tμ j = λ j ; note that n−1  j=1

μ j = t −1

n−1 

λ j = t −1 [1 − λn ] = 1.

j=1

So conv(U ) ⊆ C. In a normed space we can take closures, which leads to the following definition. Definition 21.10 In a normed space X the closed convex hull of U is conv(U ). The closed convex hull of U is the smallest closed convex subset of X that contains U . Exercise 21.4 shows that if U is compact, then its closed convex hull is also compact.

21.6 The Krein–Milman Theorem We end this chapter with the Krein–Milman Theorem: any non-empty compact convex subset of a Banach space is the closed convex hull of its extreme points. Definition 21.11 If K is a convex set, then a point a ∈ K is an extreme point (of K ) if whenever a = λx + (1 − λ)y for some λ ∈ (0, 1), x, y ∈ K , then we must have x = y = a. Extreme points are particular cases of extreme sets.

21.6 The Krein–Milman Theorem

237

Definition 21.12 Suppose that K is a convex subset of a Banach space X . A subset M ⊆ K is an extreme set (in K ) if M is non-empty, closed, and whenever x, y ∈ K with λx + (1 − λ)y ∈ M for some λ ∈ (0, 1), then x, y ∈ M. Note that a is an extreme point in K if and only if {a} is an extreme set in K , and that – trivially – K is extreme in itself. Note also that if K is compact, then any extreme set in K is also compact, since it is a closed subset of a compact set (Lemma 2.25). We will use the following lemma twice in our proof of the Krein–Milman Theorem. Lemma 21.13 Let K be a non-empty compact convex subset of a Banach space X , let M ⊆ K be an extreme set, and take any f ∈ X ∗ . Then M f := {x ∈ M : f (x) = max f (y)} y∈M

is another extreme set. Proof First note that M f is non-empty; since M is compact the continuous map f : M → R attains its maximum. If x, y ∈ K and λx + (1 − λ)y ∈ M f , λ ∈ (0, 1), then λx + (1 − λ)y ∈ M: since M is extreme it follows that x, y ∈ M. Now we have f (λx + (1 − λ)y) = λ f (x) + (1 − λ) f (y) = max f (z); z∈M

x

x

x

x

x

x

Figure 21.4 The extreme points in three convex sets, identified by bold curves and by crosses: each set is the closed convex hull of its extreme points. Every extreme point is an extreme set; each of the straight edges in the second and third shapes is an extreme set. Note that every extreme set contains an extreme point.

238

Convex Subsets of Banach Spaces

it follows that f (x) = f (y) = maxz∈M f (z), and so x, y ∈ M f . The following result is the key ingredient in the proof of the Krein–Milman Theorem. Proposition 21.14 Every extreme set in a non-empty compact convex set K contains an extreme point. Proof Let M be an extreme set in K , and consider the collection E of all extreme sets in K that are contained in M. Order E so that A  B if B ⊆ A (so that the ‘maximal set’ is the smallest). We use Zorn’s Lemma to show that E has a maximal element. Given any chain C we set B = ∩ A∈C A. The set B is non-empty: if B = ∅, then {K \ A : A ∈ C} is an open cover of K , and so has a finite subcover {K \ A j : j = 1, . . . , k}. It follows that ∩kj=1 A j = ∅; but this is not possible: since A j ∈ C and C is a chain we have k 5

A j = Ai

j=1

for some i ∈ {1, . . . , k}; see Exercise 1.7. Clearly A  B for every A ∈ C (since we order by inclusion). Moreover, B is an extreme set, since if λx + (1 − λ)y ∈ B, then for any A ∈ C we have λx + (1 − λ)y ∈ A, and then x, y ∈ A (since A is an extreme set), from which it follows that x, y ∈ B. It follows that B is an upper bound for C, and so – since any chain in E has an upper bound – Zorn’s Lemma guarantees that E has a maximal element M∗ . Now suppose that M∗ is not a single point, so there exist a, b ∈ M∗ with a = b. Then, since X ∗ separates points in X (Lemma 20.3), there must exist some f ∈ X ∗ such that f (a) < f (b). Consider f

M∗ := {x ∈ M∗ : f (x) = max f (y)}; y∈M∗

f M∗

f

Lemma 21.13 guarantees that ∈ E, and certainly M∗  M∗ . Since M∗ is f f f / M∗ so M∗ = M∗ . This contramaximal, we must have M∗ = M∗ ; but a ∈ diction shows that M∗ must consist of a single point, and hence M contains an extreme point. We can now prove the Krein–Milman Theorem, Theorem 21.15 (Krein–Milman) Suppose that K is a non-empty compact convex subset of a Banach space X . Then K is the closed convex hull of its extreme points.

Exercises

239

Proof Let K  be the closed convex hull of the extreme points of K ; then every extreme point is contained in K  and K  ⊆ K . So suppose that there is a point b ∈ K such that b ∈ / K  . Using the separation form of the Hahn–Banach Theorem (Theorem 21.2 (ii) with A = K  and B = {b}), there is an f ∈ X ∗ such that f (x) < f (b)

for every

x ∈ K .

Then K

f

:= {x ∈ K : f (x) = max f (y)} y∈K

is an extreme set of K (by Lemma 21.13) such that K f ∩ K  = ∅. Then K f must contain an extreme point of K (Proposition 21.14), but these are all contained in K  . Hence K f ∩ K  = ∅, a contradiction.

Exercises 21.1 If (X,  · ) is a normed space show that the Minkowski functional of B X is  · . (Lax, 2002) 21.2 Show that if (K α )α∈A are a family of convex subsets of a vector space X , then ∩α∈A K α is also convex. 21.3 Show that if K is a convex subset of a normed space X , then K is also convex. 21.4 Show that if U is a compact subset of a Banach space, then its closed convex hull is compact. (Show that conv(U ) is totally bounded; see Exercise 6.10.) (Pryce, 1973) 21.5 Suppose that U is a subset of Rn . Show that any point in the convex hull of U can be written as a linear combination of at most n + 1 points in U . (Suppose that x ∈ conv(U ) is a convex combination of more than n + 1 points in U , and show that the number of points required can be reduced.) 21.6 If K is a non-empty closed subset of a Banach space X define d(x) = dist(x, K ) = inf x − k. k∈K

Show that K is convex if and only if d : X → R is a convex function. (Giles, 2002)

22 The Principle of Uniform Boundedness

We now use the Baire Category Theorem – a fundamental result about complete spaces – to prove some results about linear maps between Banach spaces. We first state and prove this theorem, and then we use it to prove the Principle of Uniform of Boundedness (in this chapter) and the Open Mapping Theorem and its corollaries (in the next chapter).

22.1 The Baire Category Theorem The Baire Category Theorem encodes an important property of complete spaces.1 It comes in two equivalent formulations. The first says that a countable intersection of ‘large’ sets is still large. ∞ is a countTheorem 22.1 (Baire Category Theorem: residual form) If {G i }i=1 able family of open dense subsets of a complete normed space (X,  · ), then

G=

∞ 5

Gi

i=1

is dense in X . Any set that contains a countable intersection of open dense sets (such as the set G in the theorem above) is called residual. Note that the intersection of a countable collection of residual sets is still residual. Proof Take x ∈ X and ε > 0; we need to show that B(x, ε) ∩ G is non-empty. 1 The theorem also holds in any complete metric space.

240

22.1 The Baire Category Theorem

241

First note that for any n ∈ N, since each G n is dense, given any z ∈ X and r > 0 we have B(z, r ) ∩ G n = ∅, so there exists some y ∈ G n such that y ∈ B(z, r ) ∩ G n . Since each G n is also open, it follows that B(z, r ) ∩ G n is open too, and so there exists r  > 0 such that B(y, 2r  ) ⊂ B(z, r ) ∩ G n . It follows that B(y, r  ) ⊂ B(y, 2r  ) ⊂ B(z, r ) ∩ G n . We now use this observation repeatedly. First choose x1 ∈ G 1 and r1 < 1/2 such that B(x1 , r1 ) ⊂ B(x, ε) ∩ G 1 ;

(22.1)

then take x2 ∈ G 2 and r2 < 1/4 such that B(x2 , r2 ) ⊂ B(x1 , r1 ) ∩ G 2 ; and inductively find xn ∈ G n and rn < 2−n such that B(xn , rn ) ⊂ B(xn−1 , rn−1 ) ∩ G n .

(22.2)

This yields a sequence of nested closed sets, B(x1 , r1 ) ⊃ B(x2 , r2 ) ⊃ B(x3 , r3 ) ⊃ · · · .

(22.3)

The points (x j ) form a Cauchy sequence, since (22.3) shows that for i, j ≥ n we have xi , x j ∈ B(xn , rn )

d(xi , x j ) < 2−(n−1) ;

since (X, d) is complete the sequence (x j ) must converge to some x0 ∈ X . Furthermore, since for each n the point xi is contained in the closed set B(xn , rn ) for all i ≥ n it follows, since B(xn , rn ) is closed, that x0 ∈ B(xn , rn ). By (22.1) we have x0 ∈ B(x, ε), and from (22.2) we must have x0 ∈ G n for every n ∈ N. Therefore x0 ∈ B(x, ε) ∩ G, so this set is not empty and G is dense in X as claimed. An alternative formulation of the Baire Category Theorem says that a complete metric space cannot be from the countable union of ‘small’ sets. To be more precise, we say that a subset W of (X, d) is nowhere dense if (W )◦ = ∅, i.e. if the closure of W contains no open sets. Observe that if W is nowhere dense, then X \ W is open and dense: that this set is open is clear (since its complement is closed); if it were not dense there would be a point x ∈ X such that B(x, r ) ∩ X \ W = ∅ for all r sufficiently small, which would imply that W ⊃ B(x, r ), a contradiction.

242

The Principle of Uniform Boundedness

Theorem 22.2 (Baire Category Theorem: meagre form) Let {F j }∞ j=1 be a countable collection of nowhere dense subsets of a complete normed space (X,  · ). Then ∞

F j = X.

j=1

In particular, if {F j }∞ j=1 are closed and F j contains a non-empty open set.

6∞

j=1

F j = X , then at least one of the

A countable union of nowhere dense subsets is called2 meagre. Proof The sets X \ F j form a countable collection of open dense sets. It follows using Theorem 22.1 that ⎧ ⎫ ∞ ∞ ⎨ ⎬ 5 X \ Fj = X \ Fj ⎩ ⎭ j=1

j=1

is dense, and in particular non-empty. One classical application of the Baire Category Theorem is to prove the existence of continuous functions that are nowhere differentiable; see e.g. Exercise 13 in Chapter 9 of Costara and Popa (2003) or Section 1.10 in Goffman and Pedrick (1983).

22.2 The Principle of Uniform Boundedness We now use the Baire Category Theorem in the form of Theorem 22.2 to prove the Banach–Steinhaus Principle of Uniform Boundedness. Theorem 22.3 (Principle of Uniform Boundedness) Let X be a Banach space and Y a normed space. Let S ⊂ B(X, Y ) be a collection of bounded linear operators such that sup T xY < ∞

for each

T ∈S

x ∈ X.

Then sup T  B(X,Y ) < ∞.

T ∈S

2 A less memorable terminology is that such sets are ‘of the first category’. A set that is not of

the first category is ‘of the second category’.

22.2 The Principle of Uniform Boundedness

243

Some care is needed in applying this theorem: each element of S has to be a bounded linear map from X to Y (this is easy to forget). Proof Consider the sets F j = {x ∈ X : T xY ≤ j for all T ∈ S}. Each F j is closed, since for each T ∈ S the set {x ∈ X : T xY ≤ j} is closed because T is continuous, and then 5 {x ∈ X : T xY ≤ j} Fj = T ∈S

is the intersection of closed sets and therefore closed (Lemma 2.6). By assumption X=

Fj ,

j=1

and so Theorem 22.2 implies that at least one of the F j must contain a nonempty open set; so there must exist y ∈ X and r > 0 such that B X (y, r ) ⊂ Fn for some n. Then for any x with x X < r we have y + x ∈ B X (y, r ) ⊂ Fn , and so for every T ∈ S we must have T xY = T (y + x) + T (−y)Y ≤ n + T yY ≤ 2n, since y ∈ Fn . So for any x with x X = r/2 we have T xY ≤ 2n

for every T ∈ S.

Since T is linear, we can write any y ∈ X as then * 2y X * *T r y T yY = * 2y r

y = (2y X /r )(r y/2y X ), and

* * * ≤ 4n y X , * r X Y

and the conclusion follows. Corollary 22.4 Suppose that X is a Banach space, Y a normed space, and that Tn ∈ B(X, Y ). Suppose that T x := lim Tn x n→∞

exists for every x ∈ X . Then T ∈ B(X, Y ).

244

The Principle of Uniform Boundedness

Proof The operator T is linear, since if x, y ∈ X and α, β ∈ K, then T (αx + βy) = lim Tn (αx + βy) = lim αTn x + βTn y n→∞

n→∞

= α lim Tn x + β lim Tn y n→∞

n→∞

= αT x + βT y. To show that T is bounded, observe that since lim Tn xY

n→∞

exists it follows that for every x ∈ X the sequence (Tn x)∞ n=1 is bounded. The Principle of Uniform Boundedness now shows that Tn  B(X,Y ) ≤ M for every n ∈ N. It follows that T xY = lim Tn xY ≤ Mx X n→∞

and so T is bounded. An almost immediate corollary of the Principle of Uniform Boundedness, the ‘Condensation of Singularities’, is often useful in applications; we will use it in the next section to show that there are continuous functions whose Fourier series do not converge at every point (i.e. that diverge at at least one point). Corollary 22.5 (Condensation of Singularities) Suppose that X is a Banach space, Y a normed space, and S ⊂ B(X, Y ) with sup T  B(X,Y ) = ∞.

T ∈S

Then there exists x ∈ X such that sup Tn xY = ∞.

T ∈S

(22.4)

Proof If (22.4) does not hold for any x ∈ X , then the Principle of Uniform Boundedness would imply that supT ∈S T  B(X,Y ) < ∞. Exercise 22.2 shows that under the conditions of this corollary the set of x ∈ X for which (22.4) holds is a residual subset of X .

22.3 Fourier Series of Continuous Functions As an application of the Principle of Uniform Boundedness we now prove that there is a 2π -periodic continuous function f : [−π, π ] → R such that the

22.3 Fourier Series of Continuous Functions

245

Fourier series of f at 0 does not converge, i.e. the partial sums are unbounded. This does not contradict the result of Corollary 6.10 that any such function can be uniformly approximated by an expression of the form n 

ck eikx ;

k=−n

rather, it shows that these approximations are not simply the partial sums of a single infinite ‘series expansion’ of f . Nor does it contradict the result of Lemma 9.16, where we showed that for any f ∈ L 2 (−π, π ) the expansion ∞ 

ck eikx

k=−∞

L 2,

converges to f in since we will be showing rather that this expansion does not converge uniformly on [−π, π ]. Using the expression (9.7) for the coefficients ck from Lemma 9.16 the nth partial sum is n ˆ π 1  f (t)eikt dt e−ikx . f n (x) = 2π −π k=−n

At x = 0 this gives n ˆ π 1  f (t)eikt dt f (0) = 2π −π k=−n + n , ˆ π  1 f (t) eikt dt = 2π −π k=−n ˆ π 1 f (t)K n (t) dt, = 2π −π

where we define K n (t) :=

n 

eikt .

k=−n

Note that K n (t) is real and continuous with |K n (t)| ≤ 2n + 1. Let X = { f ∈ C([−π, π ]) : f (−π ) = f (π )}; this is a Banach space when equipped with the supremum norm. We consider the collection of maps Sn : X → R given by f → f n (0). Note that we have Sn ∈ B(X, R), since

246

The Principle of Uniform Boundedness ) ) ˆ π ) 1 ) |Sn ( f )| = )) f (t) K n (t) dt )) 2π −π  ˆ π 1 |K n (t)| dt  f ∞ ≤ 2π −π ≤ (2n + 1) f ∞ ,

since |K n (t)| ≤ (2n + 1). Using the ‘Condensation of Singularities’ (Corollary 22.5) it is enough to show that supn Sn  = ∞: if this is true, then there must exist an f ∈ X such that |Sn f | = | f n (0)| is unbounded, and therefore f n (0) cannot converge to a limit as n → ∞. In Example 11.9 we showed that ˆ π 1 Sn  = In := |K n (t)| dt; 2π −π we now prove that In → ∞ as n → ∞. We first simplify the expression for K n : if t = 0, then K n (t) :=

n 

eikt = e−int (1 + · · · + e2int )

k=−n

= e−int

ei(2n+1)t − 1 eit − 1 1

=

1

ei(n+ 2 )t − e−i(n+ 2 )t e

1 2 it

and clearly K n (0) = 2n + 1. It follows that 1 In = 2π

− 12 it

−e

=

sin(n + 12 )t sin 12 t

;

) ) ) sin(n + 1 )t ) ) 2 ) ) ) dt. sin 12 t ) −π )

ˆ

π

To estimate In from below, observe that | sin(t/2)| ≤ |t/2|, and so ) ) ) ) ) sin(n + 1 )t ) ) sin(n + 1 )t ) ) ) ) 2 2 ) ) )≥) ). 1 ) sin 1 t ) ) ) 2 2t The right-hand side can be bounded below, for each k = 1, . . . , n, by ) ) ) ) ) ) sin(n + 1 )t ) ) sin(n + 1 )t ) 2n + 1 ))  1 )) ) ) ) ) 2 2 ) sin n + t , ) )≥)1 )= 1 ) ) ) kπ/(n + 1 ) ) kπ ) 2 ) 2t 2 2 7 8 (k − 1)π kπ for all t ∈ , . n + 12 n + 12

Exercises

So

247

) ) + 12 )t )) ) 2π In > ) ) dt ) sin 1 t ) 0 2 1 ˆ n )  2n + 1 kπ/(n+ 2 ) )) ) ≥ )sin(n + 12 )t ) dt kπ (k−1)π/(n+ 21 ) ˆ

nπ/(n+ 21 ) ) sin(n

k=1

= =

ˆ n  2n + 1 k=1 n  k=1

1 π/(n+ 2 ) )

) ) ) )sin(n + 12 )t ) dt

kπ 0 ˆ π n 2 4 1 sin t dt = , kπ 0 π k k=1

which is unbounded as n → ∞.

Exercises ∞ be a 22.1 Let X be an infinite-dimensional Banach space and let (xi )i=1 sequence in X . Let Yn = Span(x1 , . . . , xn ). Using the Baire Category Theorem show that the linear span of (xi ) is not the whole of X . (No infinite-dimensional Banach space can have a countable Hamel basis.) 22.2 Show that if S ⊂ B(X, Y ) is such that supT ∈S T  = ∞, then the set of x ∈ X for which supT ∈S T x = ∞ is residual. (Consider the collection of sets {x ∈ X : supT ∈S T x ≤ n}.) 22.3 Consider the space X of all real polynomials

p(x) =

∞ 

ajx j,

(22.5)

j=0

where a j = 0 for all j ≥ N for some N , equipped with the norm  p = max |a j |. j

Consider the sequence of linear functionals Tn : X → R defined by setting Tn p =

n−1  j=0

aj

when

p(x) =

∞ 

ajx j.

j=0

Show that supn |Tn p| < ∞ for every p ∈ X , but supn Tn  = ∞. Deduce that X is not complete with the norm  · .

248

The Principle of Uniform Boundedness

22.4 Suppose that X , Y , and Z are normed spaces, and that one of X and Y is a Banach space. Suppose that b : X × Y → Z is bilinear and continuous. Use the Principle of Uniform Boundedness to show that there exists an M > 0 such that |b(x, y)| ≤ Mx X yY

for every x ∈ X, y ∈ Y.

(Rudin, 1991) 22.5 Show that  p is meagre in q if 1 ≤ p < q ≤ ∞. (Show that for every n the set {x ∈  p : x p ≤ n} is closed but has empty interior in q .) (Rudin, 1991) 22.6 Suppose that x is a sequence (in K) with the property that ∞ 

x j yj

j=1

converges for every y ∈  p , 1 < p < ∞. Show that x ∈ q , where q is conjugate to p. (Meise and Vogt, 1997)

23 The Open Mapping, Inverse Mapping, and Closed Graph Theorems

In this chapter we prove another three fundamental theorems about linear operators between Banach spaces. First we prove the Open Mapping Theorem, using the Baire Category Theorem once again. As a consequence we prove the Inverse Mapping Theorem, which guarantees that any bounded linear map between Banach spaces that has an inverse is ‘invertible’ (i.e. its inverse is bounded) and then the Closed Graph Theorem, which provides what can be a relatively simple way of proving that a linear map between Banach spaces is bounded.

23.1 The Open Mapping and Inverse Mapping Theorems We start by proving the Open Mapping Theorem. Theorem 23.1 (Open Mapping Theorem) If X and Y are Banach spaces and T : X → Y is a bounded surjective linear map, then T maps open sets in X to open sets in Y . Recall that we use B X for the closed unit ball in X , and BY (x, r ) for the open ball in Y of radius r around x. Proof It suffices to show that T (B X ) includes an open ball around 0 in Y , say BY (0, r ) for some r > 0: if U is an open set in X and y ∈ T U , then y = T x for some x ∈ U , and so there exists a δ > 0 such that x + δB X ⊂ U . Then T (U ) ⊃ T (x + δB X ) = T x + δT (B X ) ⊃ y + δ BY (0, r ) = BY (y, δr ). First we show that T (B X ) contains a non-empty open ball around 0. 249

250 The Open Mapping, Inverse Mapping, and Closed Graph Theorems

Notice that the closed sets T (nB X ) = nT (B X ) cover Y (since T (X ) = Y as T is surjective). It follows by using the Baire Category Theorem (in the form of Theorem 22.2) that at least one of them contains a non-empty open ball. All these sets are scaled copies of T (B X ), so T (B X ) contains BY (z, r ) for some z ∈ Y and some r > 0. Since T (B X ) is symmetric, it follows that BY (−z, r ) ⊂ T (B X ) too. Since T (B X ) is convex, we can deduce that BY (0, r ) ⊂ T (B X ): any v ∈ BY (0, r ) can be written as 12 (v + z) + 12 (v − z). Now we show that T (2B X ) must include BY (0, r ). Note that given any y ∈ BY (0, αr ), since T (αB X ) # y, for any ε > 0 there exists x ∈ αB X such that y − T xY < ε. We use this argument repeatedly. Given u ∈ BY (0, r ), find x1 ∈ B X with u − T x1 Y < r/2. Then, since u − T x1 ∈ B(0, r/2), we can find x2 ∈ 12 B X such that (u − T x1 ) − T x2  < r/4. Now find x3 ∈ 14 B X such that (u − T x1 − T x2 ) − T x3  < r/8, and so on, yielding a sequence (xn ) with xn ∈ 2−n B X such that * ⎛ ⎞* * * n  * * −n * *u − T ⎝ ⎠ x j * < r2 . * * * j=1

(23.1)

Now, since X is a Banach space and nj=1 x j  ≤ 2 < ∞ it follows using ∞ Lemma 4.13 that j=1 x j converges to some x ∈ 2B X ; since T is bounded it is continuous, so taking limits in (23.1) yields T x = u. Since T (2B X ) ⊃ BY (0, r ), it follows that T (B X ) ⊃ BY (0, r/2). The Inverse Mapping Theorem (also known as the Banach Isomorphism Theorem) is an almost immediate corollary. If T ∈ B(X, Y ) with X and Y both Banach spaces this result removes the seeming anomaly that a map can have an inverse but not be ‘invertible’.

23.1 The Open Mapping and Inverse Mapping Theorems

251

Theorem 23.2 (Inverse Mapping Theorem) If X and Y are Banach spaces and T ∈ B(X, Y ) is bijective, then T −1 ∈ B(Y, X ), so T is invertible in the sense of Definition 11.13. Proof The map T has an inverse since it is bijective, and this inverse is necessarily linear, as discussed in Section 11.5. The Open Mapping Theorem shows that whenever U is open in X , (T −1 )−1 (U ) = T (U ) is open in Y , so T −1 is continuous (Lemma 2.13) and hence bounded (Lemma 11.3). For a less topological argument, note that the Open Mapping Theorem shows that T (B X ) includes θ BY for some θ > 0, so T −1 (θ BY ) ⊆ B X

T −1 (BY ) ⊆

1 BX , θ

and so T −1  B(Y,X ) ≤ θ −1 . This result has an immediate application in spectral theory. Recall (see Chapter 14) that when X is an infinite-dimensional normed space, the resolvent set of a bounded linear operator T : X → X consists of all those λ ∈ C for which T − λI is ‘invertible’, i.e. has a bounded inverse. The Inverse Mapping Theorem tells us that if X is a Banach space and T ∈ B(X ), then the boundedness of the inverse of T − λI is automatic if the inverse exists, so in fact ρ(T ) := {λ ∈ C : T − λI : X → X is a bijection}

(23.2)

and σ (T ) = {λ ∈ C : T − λI : X → X is not a bijection}. We will use this result in the next chapter to investigate the spectrum of compact operators on Banach spaces. The following corollary of the Inverse Mapping Theorem, concerning equivalence of norms on Banach spaces, follows by considering the identity map I X : (X,  · 1 ) → (X,  · 2 ). Corollary 23.3 If X is a Banach space that is complete with respect to two different norms  · 1 and  · 2 and x2 ≤ Cx1 , then the two norms are equivalent. A quick example before a much longer one: you cannot put the 1 norm on 2 and make a complete space: we know that x2 ≤ x1 . But (1, 1/2, 1/3, . . .) ∈ 2 and not in 1 , so the norms are not equivalent.

252 The Open Mapping, Inverse Mapping, and Closed Graph Theorems

23.2 Schauder Bases in Separable Banach Spaces We now use Corollary 23.3 to investigate bases in separable Banach spaces. Suppose that1 X is a separable Banach space with a countable Schauder basis {en }∞ n=1 so that (see Definition 9.1) any element x ∈ X can be written uniquely in the form ∞  x= ajej, j=1

for some coefficients a j ∈ K, where the sum converges in X . Suppose that we consider the ‘truncated’ expansions Pn x =

n 

ajej.

j=1

Can we find a constant C such that Pn x ≤ Cx for every n and every x ∈ X ? If {en }∞ n=1 is an orthonormal basis in a Hilbert space, then this follows immediately from Bessel’s Inequality with C = 1; see Lemma 9.11. (The minimal C such that Pn x ≤ Cx for every n and x is known as the basis constant.) If we knew that Pn was in B(X ) for every n (i.e. that for each n we had Pn x ≤ Cn x for every x ∈ X ) this would follow from the Principle of Uniform Boundedness; but this is not so easy to show directly. Instead we use Corollary 23.3. Proposition 23.4 Suppose that {e j }∞ j=1 is a countable Schauder basis for a Banach space X , with e j  = 1 for every j. Then, setting * * * n * ∞  * * * |||x||| := sup * ajej* when x = ajej * n * * j=1 j=1 X

defines a norm |||·||| on X , and X is complete with respect to this norm. Proof To check that |||·||| is a norm, the only issue is the triangle inequality; but note that by the triangle inequality for  ·  * n * * n * * n * * * * * * * * * * * * * xi ei * + * yi ei * , * (xi + yi )ei * ≤ * * * * * * * i=1

i=1

i=1

1 Contrary to intuition, there are separable Banach spaces that do not have a Schauder basis; a

counterexample was constructed by Enflo (1973).

23.2 Schauder Bases in Separable Banach Spaces

so

253

* n * * n * * n * * * * * * * * * * * * * sup * (xi + yi )ei * ≤ sup * xi ei * + sup * yi ei * , * * * * * * n n n i=1

i=1

i=1

i.e. |||x + y||| ≤ |||x||| + |||y|||. To show that (X, |||·|||) is complete is more involved. A key observation, however, is that if x = ∞ j=1 a j e j , then for every j * * * m * m−1  * * * ajej − ajej* |a j | = a j e j  = * * * j=1 * j=1 * * * * * m * *m−1 * * * * * * * * (23.3) ≤* ajej* + * ajej* * ≤ 2|||x|||. * j=1 * * j=1 * Now suppose that x (n) is a Cauchy sequence in (X, |||·|||), with x (n) =

∞ 

(n)

aj ej :

j=1

given any ε > 0 there exists an N = N (ε) such that * * * k * ))) )))  * * ))) (n) ))) (n) (m) * [a − a ]e )))x − x (m) ))) = sup * j* < ε j j * k * j=1 * for all n, m ≥ N (ε). (n) It follows that for each j the sequence (a (n) j ) is Cauchy, so a j → α j for ∞ some α j ∈ K as n → ∞. We now show that j=1 α j e j is convergent to some ))) ))) element x ∈ X and that )))x (n) − x ))) → 0 as n → ∞. Take n ≥ N (ε) and any k ≥ 1. Then * * * * * k * * k * * (n) * * (n) * (m) * (a − α j )e j * = * (a − lim a )e j * j j j * * * * m→∞ * j=1 * * j=1 * * * * k * * (n) * (m) * = lim * (a j − a j )e j * (23.4) * < ε. m→∞ * * j=1

 k We will use this to show that the partial sums α e form a Cauchy j=1 j j k sequence in (X,  · ) and hence, since (X,  · ) is complete, they converge to some x = ∞ j=1 α j e j .

254 The Open Mapping, Inverse Mapping, and Closed Graph Theorems (n) Now set n := N (ε); we know that ∞ j=1 a j e j converges in X , so there exists M(ε) such that if r > s ≥ M(ε) we have * * r * * * (n) * (23.5) ai ei * < ε. * * * i=s

Therefore

* r * * r * * r * * * * * * * * * * * * (n) (n) * αi ei * = * (ai − αi )ei * + * ai ei * * * * * * * * i=s i=s i=s * r * s−1 * *  * * (n) (n) ≤ * (ai − αi )ei − (ai − αi )ei * + ε * * i=1

i=1

≤ 3ε, using (23.4) and (23.5), which shows that ( kj=1 α j e j )k is a Cauchy sequence. It now follows from (23.4) that x (n) converges to x in the norm |||·|||, as required. Corollary 23.5 If {e j }∞ j=1 satisfies the conditions of Proposition 23.4, then there exists a constant C > 0 such that Pn x ≤ Cx

for every n ∈ N,

(23.6)

where Pn x :=

n 

ajej

for

j=1

x=

∞ 

ajej.

j=1

In particular, the map x → a j is an element of X ∗ for each j (and its norm can be bounded independently of n). Proof Since both (X,  · ) and (X, |||·|||) are complete and * * * * * n * * n * * * * * * * * a j e j * ≤ sup * ajej* x = lim * * = |||x|||, n→∞ * n * * * j=1 j=1 X

X

it follows from Corollary 23.3 that |||x||| ≤ Cx for some C > 0. The linearity of the map x → a j follows from the uniqueness of the expansion ∞ x = j=1 a j e j ; this map is bounded since |a j | ≤ 2|||x||| ≤ 2Cx (see (23.3)). Corollary 23.3 seems to suggest that you cannot put two different norms on a vector space X to make it a Banach space, but this is not the case. For

23.3 The Closed Graph Theorem

255

simplicity consider a Banach space with a Schauder basis {en }∞ n=1 , and define a linear map T : X → X by setting T en = nen for each n. Exercise 5.2 guarantees that  · T defined by setting xT := T x is also a norm on X , with which X is again complete. However,  ·  and  · T cannot be equivalent: xn T = nxn , so there is no constant C such that xT ≤ Cx for every x ∈ X . (In a general infinite-dimensional Banach space one can use Riesz’s Lemma to find a countable linearly independent set, and then use a similar construction.)

23.3 The Closed Graph Theorem The Closed Graph Theorem gives a way to check whether a linear map T : X → Y is bounded when both X and Y are Banach spaces by considering its ‘graph’ in the product space X × Y . Theorem 23.6 (Closed Graph Theorem) Suppose that T : X → Y is a linear map between Banach spaces and that the graph of T , G := {(x, T x) ∈ X × Y : x ∈ X }, is a closed subset of X × Y (with norm (x, y) X ×Y = x X + yY ). Then T is bounded. If the graph G is closed, then this means that if xn → x and T xn → y, then T x = y. Continuity is stronger, since it does not require T xn → y (so whenever T ∈ B(X, Y ) the set G is automatically closed). Proof Since X × Y is a Banach space (Lemma 4.6) and G is a subspace of X × Y (since T is linear), it follows from the assumption that G is closed that G is a Banach space when equipped with the norm of X × Y (Lemma 4.3). Now consider the projection map  X : G → X , defined by  X (x, y) = x, which is both linear and bounded. This map is clearly surjective and it is oneto-one, since  X (x, T x) =  X (y, T y)

x=y

(x, T x) = (y, T y).

256 The Open Mapping, Inverse Mapping, and Closed Graph Theorems By the Inverse Mapping Theorem (Theorem 23.2) the map −1 X is bounded. It follows that x X + T xY = (x, T x) X ×Y = −1 X x X ×Y ≤ Mx X , and so T xY ≤ Mx X as required.

Exercises 23.1 Show that if (αn ) is a sequence of strictly positive real numbers such that ∞ ∞, then there is a sequence (yn ) with yn → ∞ as n → ∞ n=1 αn < ∞ → 1 such that ∞ n=1 αn yn < ∞. [Hint: consider the map T :  defined by setting (T x) j = α j x j ; this is clearly one-to-one. Assume that no (yn ) as above exists and deduce a contradiction by applying the Inverse Mapping Theorem to T .] (Pryce, 1973) 23.2 Show that a countable set {e j }∞ j=1 of norm one elements of a Banach space X is a Schauder basis for X if and only if (i) the linear span of {e j }∞ j=1 is dense in X and (ii) there is a constant K such that * * * n * * * m * * * * * * * ai ei * ≤ K * a e * j j* * * * * * i=1

j=1

for all {a j } ⊂ K and every n < m. (One way follows from Corollary 23.5. For the other direction, let Y be the linear span of the {e j } and for each n define a projection n : Y → Span(e1 , . . . , en ) by setting ⎛ ⎞ min(m,n) m   ajej⎠ = ajej. n ⎝ j=1

j=1

Extend each n to a map Pn : X → Span(e1 , . . . , en ) using Exercise 19.3 and then show that Pn x → x for every x ∈ X .) (Carothers, 2005) 23.3 Use the Closed Graph Theorem to show that if H is a Hilbert space and T : H → H is a linear operator that satisfies (T x, y) = (x, T y)

for every

x, y ∈ H

then T is bounded. (This is the Hellinger–Toeplitz Theorem.)

Exercises

257

23.4 Let X = C 1 ([0, 1]) with the supremum norm (so this is not a Banach space) and Y = C([0, 1]) with the supremum norm (which is a Banach space). If we define T : X → Y by T f = f  show that the graph of T is closed but that T is not bounded. (This does not contradict the Closed Graph Theorem since X is not a Banach space.) 23.5 Use the Closed Graph Theorem to show that if X is a real Banach space and T : X → X ∗ is a linear map such that (T x)(x) ≥ 0 then T is bounded. (Brezis, 2011)

for every x ∈ X,

24 Spectral Theory for Compact Operators

In Chapter 16 we investigated the spectrum of compact self-adjoint operators on Hilbert spaces. We showed there that the spectrum of these operators consists entirely of eigenvalues, apart perhaps from zero. We also showed that each eigenvalue has finite multiplicity and that the eigenvalues have no accumulation points except zero. In this chapter we will prove the same results but for a compact operator T on a Banach space X ; we drop the self-adjointness and the requirement that the operator acts on a Hilbert space. In the next chapter we will instead drop the compactness (and more besides), and consider self-adjoint unbounded operators on Hilbert spaces. Recall (see Definition 15.1) that T : X → X is compact if whenever (xn ) is a bounded sequence in X , (T xn ) has a convergent subsequence. Our primary tool throughout this chapter (in addition to the compactness of T itself) will be Riesz’s Lemma (Lemma 5.4): if Y is a proper closed subspace of X , then there exists x ∈ / Y with x = 1 such that x − y ≥ 1/2 for every y ∈ Y .

24.1 Properties of T − I When T Is Compact In order to investigate the spectrum of T we have to understand properties of the operators T − λI for λ ∈ C. Since we can write T − λI = λ( λ1 T − I ) for any λ = 0, it is enough to understand operators of the form T − I when T is compact. We therefore analyse this case and deduce results for T − λI as a consequence. Our first result is that all eigenvalues of T have finite multiplicity. Lemma 24.1 If T ∈ K (X ), then dim Ker(T − I ) < ∞. 258

24.1 Properties of T − I When T Is Compact

259

Proof Write E for Ker(T − I ), and suppose that dim E = ∞. We show that in this case we can find a sequence (w j ) ∈ E λ such that w j  = 1 and wi − w j  ≥ 1/2 for i = j. If we have a collection (w j )nj=1 of elements of E such that w j  = 1 and wi − w j  ≥ 1/2

1 ≤ i, j ≤ n, i = j,

then let Yn be the finite-dimensional space spanned by {w1 , . . . , wn }. Since this space is finite-dimensional, it is closed (Exercise 5.3), and so we can use Riesz’s Lemma (Lemma 5.4) to find wn+1 ∈ E with wn+1  = 1 such that wn+1 − w j  ≥ 1/2 for every j = 1, . . . , n. This inductive process can be started with any choice of w1 ∈ E with w1  = 1. In this way we generate a sequence (w j ) such that T w j = w j for every j. However, since w j  = 1 and T is compact the sequence (T w j ) must have a convergent subsequence: so (w j ) must have a convergent subsequence. However, wi − w j  ≥ 1/2 > 0 for every i = j, so (w j ) cannot have a convergent subsequence. It follows that dim Ker(T − I ) is finite, as claimed. Recall (see Section 14.1) that the eigenspace corresponding to an eigenvalue λ is E λ := {x ∈ X : T x = λx} = Ker(T − λI ), and that the multiplicity of λ is the dimension of E λ . Corollary 24.2 Suppose that T ∈ K (X ) and λ = 0. Then dim Ker(T − λI ) < ∞; in particular, any non-zero eigenvalue of T has finite multiplicity. We have already seen that in general the range of a bounded operator need not be closed: we gave an example in (11.13), the map T : 2 → 2 defined by setting 

x2 x3 x4 T x := x1 , , , , . . . . 2 3 4 This is also a compact map from 2 into 2 (see Exercise 15.3), so in fact the range of a compact map need not be closed either. It is therefore striking that the range of T − I is closed whenever T is compact. Proposition 24.3 If T ∈ K (X ), then Range(T − I ) is closed.

260

Spectral Theory for Compact Operators

(For use in the proof of Theorem 24.4 we make explicit here the trivial observation that Range(I − T ) is also closed if T ∈ K (X ).) Proof We use Lemma 24.1, which guarantees that Ker(T − I ) is finitedimensional. Take (yn ) ∈ Range(T − I ) with yn = (T − I )xn , such that yn → y for some y ∈ X . We have to show that y ∈ Range(T − I ), i.e. that y = (T − I )x for some x ∈ X . Let dn := dist(xn , Ker(T − I )); since Ker(T − I ) is finite-dimensional there exists z n ∈ Ker(T − I ) such that xn − z n  = dn ; see Exercise 5.4. We want to show that xn − z n  is a bounded sequence; if not, then there is a subsequence such that xn j − z n j  → ∞ as j → ∞. Note that since z n ∈ Ker(T − I ) we have yn = (T − I )(xn − z n ) = T (xn − z n ) − (xn − z n ).

(24.1)

Setting w j = (xn j − z n j )/xn j − z n j  it follows that w j  = 1 and Twj − wj =

yn j xn j − yn j 

→0

as j → ∞.

Since T is compact, we can find a subsequence of (w j ), (w jk ), such that T w jk → q for some q ∈ X . Using the triangle inequality we obtain w jk − q ≤ w jk − T w jk  + T w jk − q, and so we also have w jk → q. Since T is compact, it is bounded and therefore continuous, so  q = lim T w jk = T lim w jk = T q, k→∞

k→∞

i.e. q ∈ Ker(T − I ). So on the one hand dist(w jk , Ker(T − I )) → 0, while on the other dist(w jk , Ker(T − I )) =

dist(xn jk , Ker(T − I )) xn jk − z n jk 

= 1,

a contradiction. Since (xn − z n ) is a bounded sequence in X , T (xn − z n ) has a convergent subsequence with T (xn j − z n j ) → p for some p ∈ X . Using this observation in (24.1) and recalling that yn → y it follows that xn j − z n j → p − y,

24.1 Properties of T − I When T Is Compact

261

so setting x = p − y we obtain y = (T − I )x and hence Range(T − I ) is closed as claimed. We will obtain more information about the spectrum of general compact operators from the following important result. Theorem 24.4 If T ∈ K (X ) and Ker(T − I ) = {0}, then T − I is invertible. Proof Since T − I is a bounded linear operator and the assumption that Ker(T − I ) = {0} ensures that T − I is injective (Lemma 1.21), T − I will be a bijection from X onto X if we show that it is surjective. The Inverse Mapping Theorem (Theorem 23.2) will then guarantee that (T − I )−1 is bounded, and hence that T − I is invertible. So all we need to prove is that T − I maps X onto X . We begin by observing that  n  n n−k n T k = (−1)n [Sn − I ], (−1) (T − I ) = k k=0

where Sn ∈ K (X ) (since T k is compact for each k and K (X ) is a vector space). Now let X n := Range((T − I )n ); it follows using Proposition 24.3 that each X n is closed. So we have X ⊇ X1 ⊇ X2 ⊇ X3 ⊇ · · · ,

(24.2)

with X n+1 a closed linear subspace of X n . To prove the theorem we will show that X 1 = Range(T − I ) = X . If X n+1 = X n for every n, then, using Riesz’s Lemma (Lemma 5.4), it follows that for each n there exists xn ∈ X n with xn  = 1 such that dist(xn , X n+1 ) ≥

1 . 2

Now observe that if n > m, then 1 , (24.3) 2 since (T − I )xn + xn − (T − I )xm ∈ X m+1 (the first term is in X n+1 , the second in X n , the third in X m+1 , and these spaces are nested as in (24.2)). The inequality in (24.3) shows that (T x j ) cannot have a convergent subsequence, which contradicts the compactness of T since the sequence (x j ) is bounded. It follows that X n+1 = X n for some n ∈ N, and then X j = X n for all j ≥ n. T xm − T xn  = xm − [(T − I )xn + xn − (T − I )xm ] ≥

262

Spectral Theory for Compact Operators

Therefore, given any x ∈ X , we have (T − I )n x = (T − I )2n y for some y ∈ X . Since Ker(T − I ) = {0}, it follows that Ker(T − I )n = {0}, and so (T − I )n is one-to-one, which shows that x = (T − I )n y, i.e. X n = X . That X 1 = X now follows immediately from (24.2).

24.2 Properties of Eigenvalues After all the above preparation, we can now quickly deduce that the spectrum of a compact operator consists entirely of eigenvalues, perhaps with the exception of zero, their only possible accumulation point. Corollary 24.5 If T ∈ B(X ) is compact and λ ∈ σ (T ) with λ = 0, then λ is an eigenvalue of T . Proof If λ is not an eigenvalue of T , then Ker(T − λI ) = {0}. For λ = 0  T − I = Ker(T − λI ) = {0}; Ker λ since T /λ is a compact operator we can now apply Theorem 24.4 to guarantee that (T /λ) − I is invertible. Since this implies that T − λI is invertible, it follows that λ ∈ / σ (T ). Proposition 24.6 If T ∈ K (X ) and (λ j )∞ j=1 is a sequence of distinct non-zero eigenvalues of T , then λ j → 0 as j → ∞. Proof Choose eigenvectors (e j ) for (λ j ) with e j  = 1, and define a sequence of closed subspaces of X by setting E n := Span(e1 , . . . , en ). Since the eigenvectors (e j ) are linearly independent (Lemma 14.2), we have dim(E n ) = n and the spaces E n are strictly increasing (i.e. E n is a proper subspace of E n+1 ). Using Riesz’s Lemma (Lemma 5.4) we can find xn ∈ E n such that xn  = 1 and dist(xn , E n−1 ) ≥ 1/2. Now take n > m ≥ 2; then we have * * * * * * T xn T xm * (T − λn I )xn (T − λm I )xm * * * * *, − xm − * λ − λ * = *xn + * λn λm n m

24.2 Properties of Eigenvalues

263

and since E m−1 ⊂ E m ⊆ E n−1 ⊂ E n

and

(T − λn I )E n = E n−1

it follows that − and so

(T − λn I )xn (T − λm I )xm + xm + ∈ E n−1 , λn λm * * * T xn T xm * * * ≥ dist(xn , E n−1 ) ≥ 1 . − * λ * λ 2 n m

This shows that (T xn /λn ) cannot have a convergent subsequence. Since T is compact, this implies that (xn /λn ) can have no bounded subsequence (if it did, then there would be a further subsequence xn jk for which T xn jk /λn jk did converge). Since xn /λn  = 1/λn , it follows that λ−1 n has no bounded subsequence, so λn → 0 as n → ∞. We summarise the results for compact operators from Chapters 14 and 15 and this chapter in the following theorem. Theorem 24.7 Suppose that X is an infinite-dimensional Banach space and T ∈ K (X ). Then 0 ∈ σ (T ); σ (T ) = σp (T ) ∪ {0}; all non-zero eigenvalues of T have finite multiplicity; eigenvectors corresponding to distinct eigenvalues are linearly independent; and (v) the only possible accumulation point of the set σp (T ) is zero.

(i) (ii) (iii) (iv)

Proof (i) is Theorem 15.8; (ii) follows from Corollary 24.5 and (i); (iii) is Corollary 24.2; (iv) is Lemma 14.2; and (v) is Proposition 24.6.

25 Unbounded Operators on Hilbert Spaces

We will now look at unbounded self-adjoint operators defined on dense subspaces of Hilbert spaces. These arise very naturally in applications, in particular, in differential equations (see e.g. Evans, 1998; Renardy and Rogers, 1993) and quantum mechanics (see e.g. Kreyszig, 1978; Zeidler, 1995). Because we will want to prove some results about the spectrum we assume throughout that H is a complex Hilbert space; in particular, we write 2 for 2 (C) (and similarly for other sequence spaces). As an illustration throughout we will use a simple example, closely related to the operation of taking two derivatives. If we take f ∈ L 2 (0, π), then we can expand it as a Fourier cosine series, f (x) =

∞ 

ck cos kx,

k=0

where the sum converges in L 2 (cf. Exercise 9.11). If we take two derivatives of both sides (assuming that we can differentiate term-by-term on the right-hand side), then we obtain − f  (x) =

∞ 

k 2 ck cos kx.

k=0

Thus the map f  → − f  induces a map on the coefficients, (c1 , c2 , c3 , . . .)  → (c1 , 22 c2 , 32 c3 , . . .). We will therefore consider the operator T2 that acts on sequences x ∈ 2 with (T2 x)n = n 2 xn , i.e. T2 x := (x1 , 22 x2 , 32 x3 , . . . , n 2 xn , . . .). 264

(25.1)

265

For T2 x to be an element of 2 we certainly need to place some restrictions on x; for example, if x ∈ 2 there is no reason why T2 x should also be in 2 . We have some freedom to define an appropriate domain (of definition) of T2 , which we write as D(T2 ), that ensures that T x ∈ 2 . To start with, our only requirement will be that this domain is a dense subspace of 2 ; so, for example, we could choose D(T2 ) = c00 , the space of sequences with only a finite number of non-zero terms (this is dense in 2 ; see the very last paragraph of Chapter 3). While T2 x ∈ 2 for every x ∈ c00 , T2 : c00 → 2 is unbounded, since T2 en = n 2 en for each n. The density of the domain will prove crucial: we will frequently use the fact that if (x, y) = (x, z) for every x in a dense subset A of H , then y = z (this was proved in Exercise 9.9).

25.1 Adjoints of Unbounded Operators We will develop a general theory for unbounded linear operators defined on a dense subspace D(T ) of H , T : D(T ) → H . We use the norm and inner product of H on D(T ), so T is ‘unbounded’ in the sense that there is no constant C such that T x H ≤ Cx H

for every x ∈ D(T ).

The space D(T ) is called the domain of the operator T , and when an operator is unbounded this domain forms part of the definition of the operator. We therefore refer throughout this chapter to an ‘operator’ as a pair (D(T ), T ) to emphasise the importance of the domain. We want to define an adjoint for unbounded operators. We saw (see (13.2)) that a bounded operator on a Hilbert space is self-adjoint if and only if it is symmetric, but we require something more for unbounded operators. Lemma 25.1 If T : D(T ) → H is an unbounded operator, then there exists an adjoint operator T ∗ : D(T ∗ ) → H such that (T x, y) = (x, T ∗ y)

for every x ∈ D(T ), y ∈ D(T ∗ ).

(25.2)

Note that as well as defining the adjoint we also have to define an appropriate domain D(T ∗ ) on which the adjoint can act. Once we fix the definition of this domain we will show that the adjoint operator is uniquely determined by (25.2).

266

Unbounded Operators on Hilbert Spaces

Proof We define D(T ∗ ) to be all those y ∈ H for which the map x → (T x, y) defines a bounded linear functional φ : D(T ) → C. Since D(T ) is dense in H , we can use the result of Exercise 19.3 to extend φ in a unique way to an element f ∈ H ∗ ; now we can use the Riesz Representation Theorem to guarantee that there is a z ∈ H such that f (x) = (x, z). We set T ∗ y := z, and then by definition (T x, y) = (x, T ∗ y)

x ∈ D(T ), y ∈ D(T ∗ ).

It is easy to check that T ∗ is linear and that D(T ∗ ) is a subspace of H . Note that if (25.2) holds for two operators T ∗ and T  defined on D(T ∗ ) we have (x, T ∗ y) = (x, T  y) for every x ∈ D(T ) y ∈ D(T ∗ ). Since D(T ) is dense in H , it follows that T ∗ y = T  y for every y ∈ D(T ∗ ), and so T ∗ = T . Note that while we could choose the domain D(T ) for the original operator T , once this is done the domain of T ∗ is determined by the above procedure. Definition 25.2 An operator (D(T ), T ) is self-adjoint if (D(T ∗ ), T ∗ ) = (D(T ), T ). This definition means that we need the domains of T and T ∗ to coincide (D(T ∗ ) = D(T )) and for T to be symmetric in the sense that (T x, y) = (x, T y)

for every x, y ∈ D(T ).

(25.3)

We showed in Exercise 23.3 that if (25.3) holds for every x, y ∈ H , then T must be bounded, so this is the strongest symmetry property we can expect for unbounded operators. The proof used in Exercise 16.1 can easily be adapted to show that (D(T ), T ) is symmetric if and only if (T x, x) is real for every x ∈ D(T ). Our example operator T2 : c00 → 2 is symmetric, since (T2 x, y) =

∞ 

j 2 x j y j = (x, T2 y),

x, y ∈ c00 ,

j=1

so (c00 , T2 ) is symmetric. However, it is not self-adjoint. We have (T2 x, y) =

∞  j=1

j2x j y j .

25.2 Closed Operators and the Closure of Symmetric Operators

267

Thus the map x → (T2 x, y) defines a bounded linear functional on c00 ⊂ 2 precisely when ( j 2 y j ) = T2 y ∈ 2 . So we have D(T2∗ ) = { y :

∞ 

j 4 |y j |2 < ∞} = { y : T y ∈ 2 } = c00

j=1

and in this case T2∗ y = (y1 , 22 y2 , 32 y3 , . . .) = T2 y. Essentially the same calculation shows that T2 is self-adjoint if we take its domain to be h2 := {x ∈ 2 : ( j 2 x j ) ∈ 2 }.

(25.4)

25.2 Closed Operators and the Closure of Symmetric Operators A linear operator is continuous if and only if it is bounded (Lemma 11.3), but there is a substitute for continuity for unbounded operators. Definition 25.3 A densely defined linear operator T : D(T ) → H is called closed if its graph G = {(x, T x) ⊂ H × H : x ∈ D(T )} is closed, i.e. if (xn ) ∈ D(T ) with xn → x and T xn → y implies that y = T x. The Closed Graph Theorem (Theorem 23.6) guarantees that if T is closed and D(T ) = H , then T is bounded. So being ‘closed’ is like a weak form of being bounded. The operator T2 is not closed if we take its domain to be c00 : we can take x n = (1, 2−3 , 3−3 , . . . , n −3 , 0, 0, . . .) ∈ c00 , and then T2 x n = (1, 2−1 , 3−1 , . . . , n −1 , 0, 0, . . .). As n → ∞ we have T2 x n → (1/j) j ∈ 2 and x n → x := ( j −3 ) j ∈ 2 , but x ∈ / c00 , so (c00 , T2 ) is not closed. However, if we choose to define T2 on the larger domain h2 from (25.4), then it is closed. To prove this, we define S : 2 → 2 by setting x2 x3 S(x1 , x2 , x3 , . . .) = (x1 , 2 , 2 , . . .). 2 3 Then S : 2 → 2 is bounded, and if x ∈ h2 , we have S(T2 x) = x.

268

Unbounded Operators on Hilbert Spaces

Now suppose that x (n) → x and T2 x (n) → y; then x (n) = S(T2 x (n) ) → S( y)

as n → ∞

since S is continuous; so x = S( y). It follows that ) )2  ∞ ∞ ∞   ) ) 4 2 4 ) yj ) j |x j | = j ) 2) = |y j |2 < ∞, j j=1

j=1

j=1

so x ∈ h2 and T2 x = T2 S y = y. Here we started with a symmetric operator that was not closed, and obtained a closed operator by enlarging its domain. That this is always possible is shown by the following result. We say that (D2 , T2 ) is an extension of (D1 , T1 ) if D1 ⊆ D2 and T2 | D1 = T1 . Theorem 25.4 If T : D(T ) → H is symmetric, then it has an extension (D(T ), T ) that is closed and symmetric. Proof Let D(T ) be the set of all x ∈ H for which we can find a sequence (xn ) ∈ D(T ) such that xn → x

and

T xn → y

for some y ∈ H . The space D(T ) is a vector space, and D(T ) ⊂ D(T ) (for any x ∈ D(T ) take the constant sequence xn = x). On D(T ) we define T by setting T x = y. We now show that T is well defined on D(T ), and is both symmetric and closed. (i) T is well defined. Suppose that (xn ) is another sequence in D(T ) such that xn → x

and

T xn → z.

Now for any v ∈ D(T ) we have (v, T xn − T xn ) = (v, T (xn − xn )) = (T v, xn − xn ) since T is symmetric. Letting n → ∞ we obtain (v, y − z) = 0 for every v ∈ D(T ). Since D(T ) is dense in H , it follows that y = z. (ii) It is easy to check that T is linear; so it is an extension of T . To check that T is also symmetric, for every x, x  ∈ D(T ) there exist (xn ), (xn ) in D(T ) such that xn → x

and

T xn → T x

xn → x 

and

T xn → T x  .

and

25.3 The Spectrum of Closed Unbounded Self-Adjoint Operators

269

Since T is symmetric, (T xn , xn ) = (xn , T xn ), and so (T x, x  ) = (x, T x  ) as the inner product is continuous. (iii) Finally, we show that T is closed. We take xn ∈ D(T ) such that xn → x

and

T xn → y;

we need to show that x ∈ D(T ) and T x = y. For each n we find ξn ∈ D(T ) such that 1 1 and T xn − T ξn  < . xn − ξn  < n n It follows that ξn → x and T ξn → y, and so x ∈ D(T ) and T x = y, so T is closed. We have actually found the minimal closed extension of (D(T ), T ), i.e. if (D(T  ), T  ) is such that D(T  ) ⊇ D(T ) and T  is closed, then D(T  ) ⊇ D(T ). Our extension (D(T ), T ) is called the closure of T . An operator T whose closure is self-adjoint is called essentially self-adjoint. While it is not the case that every symmetric operator has a self-adjoint closure, if T is symmetric and bounded below then there always exists an extension of T that is self-adjoint (the ‘Friedrichs extension’). The proof is lengthy, and we do not give it here; it can be found in Chapter 5 of Zeidler (1995). Theorem 25.5 (Friedrichs extension) If T : D(T ) → H is symmetric and (T x, x) ≥ αx2 for some α ∈ R for every x ∈ D(T ), then T has a selfadjoint extension (D(Tˆ ), Tˆ ).

25.3 The Spectrum of Closed Unbounded Self-Adjoint Operators We showed in Chapter 23, using the Inverse Mapping Theorem (Theorem 23.2), that for a bounded operator the resolvent set is given by (23.2), ρ(T ) := {λ ∈ C : T − λI : H → H is a bijection},

(25.5)

and then σ (T ) = {λ ∈ C : T − λI : H → H is not a bijection}. For a closed unbounded operator (D(T ), T ) we cannot expect T − λI to map H onto H , so we make the following alternative definition.

270

Unbounded Operators on Hilbert Spaces

Definition 25.6 Suppose that (D(T ), T ) is a closed linear operator. The resolvent set ρ(T ) is ρ(T ) := {λ ∈ C : T − λI is injective with a dense range on which (T − λI )−1 is bounded}.

(25.6)

The spectrum of T , σ (T ) is the complement of ρ(T ); it can be decomposed into ● ●

the point spectrum: λ ∈ σp (T ) if T − λI is not injective; the continuous spectrum: λ ∈ σs (T ) if T −λI is injective with a dense range but (T − λI )−1 is not bounded; the residual spectrum: λ ∈ σr (T ) if T − λI is injective and its range is not dense.

Although this new definition of the resolvent set looks quite different from that in (25.5), they do in fact coincide for bounded operators. Lemma 25.7 If T ∈ B(H ), then the two definitions of the resolvent set agree. Proof Assume that S = T − λI is injective with a dense range. We need to show that S maps H onto H (Definition 25.5) if and only if S −1 is bounded on Range(S) (as in Definition 25.6). If S maps H onto H , then, by the Inverse Mapping Theorem, S −1 is bounded on H = Range(S). If S −1 is bounded on the dense set Range(S), then, given any y ∈ H , we can find yn ∈ Range(S) such that yn → y and then set x := lim S −1 yn . n→∞

The boundedness of S −1 implies that S −1 yn is a Cauchy sequence in H , and so this limit exists. Now since S is bounded we have 

Sx = S lim S −1 yn = lim S(S −1 yn ) = lim yn = y; n→∞

n→∞

n→∞

it follows that S is onto. The point spectrum consists, as before, of eigenvalues of T , i.e. λ ∈ C such that T x = λx for some non-zero x ∈ D(T ). The proof of the following result is identical to the case when T is bounded (see Corollary 16.2). Lemma 25.8 If (T, D(T )) is symmetric, then all of its eigenvalues are real.

25.3 The Spectrum of Closed Unbounded Self-Adjoint Operators

271

We will now show that the residual spectrum of a self-adjoint operator is empty. Lemma 25.9 Suppose that (D(T ), T ) is self-adjoint. If λ ∈ C is such that T −λI is injective, then Range(T −λI ) is dense in H . In particular, σr (T ) = ∅. Proof Suppose that T − λI is injective but that its range is not dense, i.e. R = Range(T − λI ) = H . It follows that there exists a y ∈ R ⊥ with y = 0 such that 0 = ((T − λI )x, y)

for every x ∈ D(T ).

So (T x, y) = λ(x, y), which shows that y ∈ D(T ∗ ). Therefore we can write 0 = (T x, y) − λ(x, y) = (x, T ∗ y) − (x, λy) = (x, T y) − (x, λy) = (x, T y − λy). Since D(T ) is dense in H , it follows that T y = λy, i.e. λ is an eigenvalue of T . But eigenvalues of self-adjoint operators are real, so T y = λy, and since y = 0 this contradicts the initial assumption that T − λI is injective. Since σr (T ) consists of those λ for which T − λI is injective but Range(T − λI ) is not dense, it follows immediately that σr (T ) = ∅. While not every element of the spectrum of a self-adjoint operator T need be an eigenvalue, because σr (T ) is empty any λ ∈ σ (T ) is either an eigenvalue or a member of the continuous spectrum. Using this we can show that every element of σ (T ) is an approximate eigenvalue, in the sense made precise in the following result. Corollary 25.10 If T is self-adjoint, then for every λ ∈ σ (T ) there exists a sequence (xn ) ∈ D(T ) such that xn  = 1 and T xn − λxn → 0. Proof If λ ∈ σp (T ) this is immediate. Otherwise (T − λI )−1 is unbounded, so there exists a sequence yn with yn  = 1 and ξn  ≥ n, where we set ξn = (T − λI )−1 yn ; note that ξn ∈ D(T ). Therefore (T − λI )

yn ξn = →0 ξn  ξn 

as

from which the result follows with xn := ξn /ξn .

n → ∞,

272

Unbounded Operators on Hilbert Spaces

As a consequence of this we can show that the spectrum is real and closed. Theorem 25.11 If T : D(T ) → H is self-adjoint, then its spectrum is real and closed. Proof If λ ∈ σ (T ), then we can find (xn ) ∈ D(T ) such that xn  = 1 and T xn − λxn → 0 and hence (T xn , xn ) − λxn 2 → 0;

(25.7)

taking the complex conjugate and using the fact that T is self-adjoint (xn , T xn ) − λxn 2 = (T xn , xn ) − λxn 2 → 0. Combining this with (25.7) it follows that (λ − λ) = (λ − λ)xn 2 → 0

as n → ∞,

and hence λ = λ. The spectrum is closed since the resolvent is open: if λ ∈ ρ(T ), then T − λI has a bounded inverse so 1 ⇒ T y − λy ≥ y, (T − λI )−1 x ≤ Cx C where y = (T − λI )−1 x. If |μ − λ| < 1/2C, then 1 y. 2C The above inequality shows that T − μI is injective, and gives a bound on its inverse. Lemma 25.9 shows that if T is self-adjoint and T − μI is injective, then the range of T − μI is dense; it follows that μ ∈ ρ(T ). T y − μy ≥

26 Reflexive Spaces

We have seen that (q )∗ ≡  p for 1 ≤ q < ∞ and ( p, q) conjugate [i.e. p −1 + q −1 = 1]. It follows for 1 < q < ∞ that if we take the ‘second dual’ (i.e. the dual of the dual), then [(q )∗ ]∗ ≡ ( p )∗ ≡ q : we get back to where we started by taking the dual twice, so (q )∗∗ ≡ q . We pursue this idea further in this chapter, introducing the notion of a ‘reflexive space’. However, we emphasise that being reflexive is more than just having X ∗∗ ≡ X . James (1951) constructed a Banach space X that satisfies X ∗∗ ≡ X but is not reflexive.

26.1 The Second Dual Since X ∗ is always a Banach space (Theorem 11.11), there is nothing to stop us from considering the dual of X ∗ , i.e. the set of bounded linear functionals from X ∗ into K. We write X ∗∗ for this space (which we would otherwise denote by (X ∗ )∗ ), so X ∗∗ := B(X ∗ ; K).

(26.1)

There is a canonical way1 of associating any element x ∈ X with an element x ∗∗ ∈ X ∗∗ , by setting x ∗∗ ( f ) := f (x)

for each

f ∈ X ∗.

The following lemma shows that x ∗∗ is indeed an element of X ∗∗ , and that the mapping x  → x ∗∗ is a linear isometry; in this way ‘X ⊆ X ∗∗ ’ for any normed space X . 1 The process is canonical, but the x ∗∗ notation adopted here is not. The notation J for the corresponding map from X into X ∗∗ is more common, but is not universal.

273

274

Reflexive Spaces

Lemma 26.1 For any normed space X we can isometrically map X onto a subspace of X ∗∗ via the canonical linear mapping x → x ∗∗ , where x ∗∗ is the element of X ∗∗ defined by setting x ∗∗ ( f ) = f (x)

for each

f ∈ X ∗.

We denote this mapping by J : X → X ∗∗ . Proof We have to show that for any x ∈ X , x ∗∗ defines a linear functional on X ∗ (i.e. an element of X ∗∗ ) with the same norm as x. Given x ∈ X we set x ∗∗ ( f ) := f (x)

for every f ∈ X ∗ .

Then, since |x ∗∗ ( f )| = | f (x)| ≤  f  X ∗ x X , it certainly follows that x ∗∗ ∈ X ∗∗ and that x ∗∗  X ∗∗ ≤ x X . If we take the ‘support functional at x’ from Corollary 20.2, i.e. f ∈ X ∗ for which  f  = 1 and f (x) = x, then we have |x ∗∗ ( f )| = | f (x)| = x X = x X  f  X ∗ (since  f  X ∗ = 1) and it follows that x ∗∗  X ∗∗ ≥ x X , which yields the required equality of norms. In general J does not map X onto X ∗∗ , but only onto a subspace of X ∗∗ . Lemma 26.2 If X is a Banach space, then J (X ) is a closed subspace of X ∗∗ . Proof If (Fn ) ∈ J (X ) with Fn → F in X ∗∗ , then (Fn ) must be Cauchy in X ∗∗ . Since there exist xn ∈ X such that Fn = xn∗∗ and the map J is a linear isometry, we have xn − xm  X = Fn − Fm  X ∗∗ , so (xn ) is Cauchy in X . It follows that there exists x ∈ X such that xn → x in X , and so Fn − x ∗∗  X ∗∗ = xn − x X → 0

as

n → ∞.

By uniqueness of limits it follows that F = x ∗∗ , so J (X ) is closed. When J does map X onto X ∗∗ , we say that X is reflexive. It follows immediately in this case that J is an isometric isomorphism and so X ≡ X ∗∗ ; but note that this is a consequence of X being reflexive, and not the definition of reflexivity.

26.2 Some Examples of Reflexive Spaces

275

Definition 26.3 A Banach space X is reflexive if J : X → X ∗∗ is onto, i.e. if every F ∈ X ∗∗ can be written as x ∗∗ for some x ∈ X .

26.2 Some Examples of Reflexive Spaces We now show that all Hilbert spaces are reflexive, as are the  p and L p spaces when 1 < p < ∞. We start with Hilbert spaces. Proposition 26.4 All Hilbert spaces are reflexive. Before we begin the proof, recall that we know from the Riesz Representation Theorem (Theorem 12.4) that the map R : H → H ∗ defined by setting R(x)(y) := (y, x)

for every x, y ∈ H

is a conjugate-linear isometric isomorphism. Since R is surjective, given any f ∈ H ∗ we can write f (y) = (y, R −1 f ).

(26.2)

(Here R −1 f is the element z ∈ H such that f (y) = (y, z) for all y ∈ H ; the existence of such a z is the main element of the Riesz Representation Theorem.) Proof Given F ∈ H ∗∗ we need to find x ∈ H such that F = x ∗∗ , i.e. such that for every f ∈ H ∗ we have F( f ) = f (x) (this is because x ∗∗ ( f ) is defined to be f (x)). Now, F ◦ R : H → K is a bounded conjugate-linear map, so the map F ◦ R : H → K, defined by setting F ◦ R(y) := F ◦ R(y) is a bounded linear map, i.e. an element of H ∗ . So, using the Riesz Representation Theorem, we can find an element x ∈ H such that (F ◦ R)(y) = (y, x)

(26.3)

for every y ∈ H . Since R is a bijection, for any f ∈ H ∗ we can choose y = R −1 f , and then F( f ) = (F ◦ R)(R −1 f ) = (R −1 f, x) = (x, R −1 f ) = f (x),

276

Reflexive Spaces

where we use (26.3) for the first equality and (26.2) for the final equality; thus F( f ) = f (x) as required. We now use a similar argument to show that  p is reflexive provided that 1 < p < ∞. Proposition 26.5 The sequence space  p (K) is reflexive if 1 < p < ∞. In the proof we use the notation x, y :=

∞ 

x j yj;

j=1

although this agrees with the L 2 inner product when K = R, it lacks the complex conjugate that we would use in the complex case. We know from Theorem 18.5 that the map Tq :  p → (q )∗ defined by setting [Tq (x)]( y) = x, y

for y ∈ q ,

is a linear isometric isomorphism. So, given any f ∈ (q )∗ we can write f ( y) = Tq−1 ( f ) , y

for all y ∈ q

(26.4)

(and similarly with p and q switched). Proof Since ( p )∗ ≡ q , we also have ( p )∗∗ ≡ (q )∗ ≡  p ; to prove reflexivity we have to be careful about the maps involved: given any F ∈ ( p )∗∗ we need to find x ∈  p such that F( f ) = f (x) for every f ∈ ( p )∗ . We start by relating F ∈ ( p )∗∗ to an element of (q )∗ . To do this, note that F ◦ T p : q → K is both linear and bounded; so F ◦ T p ∈ (q )∗ . Now we use that fact that (q )∗ ≡  p via (26.4) to find x ∈  p such that (F ◦ T p )( y) = x, y for all y ∈ q (it is probably unhelpful to write x = Tq−1 (F ◦ T p )). Since T p : q → ( p )∗ is a bijection, for any f ∈ ( p )∗ we can choose y = T p−1 f , and then (using (26.4) once again but with p and q swapped) F( f ) = F ◦ T p (T p−1 ( f )) = x , T p−1 ( f ) = T p−1 ( f ) , x = f (x). This shows that F = x ∗∗ , and so  p is reflexive. A very similar proof, involving little more than a slight change of notation, shows that L p is reflexive for 1 < p < ∞. We write q for the conjugate

26.3 X Is Reflexive If and Only If X ∗ Is Reflexive

277

exponent to p, and denote by T p the isometric isomorphism from L q onto (L p )∗ given by ˆ f g dx for every f ∈ L p . (T p g)( f ) := 

Proposition 26.6 The space L p () is reflexive for 1 < p < ∞. Proof Given any F ∈ (L p )∗∗ we need to find g ∈ L p such that for every φ ∈ (L p )∗ .

F(φ) = φ(g)

First, note that F ◦ T p : L q → K is a bounded linear functional, so defines an element ψ ∈ (L q )∗ ; we can therefore write F ◦ T p = Tq g for some g ∈ L p . Now if we take any φ ∈ (L p )∗ we have φ = T p f for f = T p−1 φ ∈ L q , and then F(φ) = F ◦ T p (T p−1 φ) = (Tq g) f ˆ g(x) f (x) dx = (T p f )g = φ(g). = 

It follows that g = Tq−1 (F ◦ T p ) is the required element of  p . We will see shortly that 1 , ∞ , L 1 , and L ∞ are not reflexive. For now we give a quick proof that C([−1, 1]) is not reflexive; it relies on the fact that C([−1, 1]) is separable but C([−1, 1])∗ is not. Lemma 26.7 The space C([−1, 1]) (with the usual supremum norm) is not reflexive. Proof Let X = C([−1, 1]). We know that X is separable from Corollary 6.4. If X was reflexive we would as a consequence have X ∗∗ ≡ X , which would imply that X ∗∗ was separable (by Exercise 3.13). Lemma 20.5 would then imply that X ∗ was separable; but this contradicts the result of Exercise 18.5 that X ∗ is not separable.

26.3 X Is Reflexive If and Only If X ∗ Is Reflexive The following result is very useful; its proof is a good exercise in using the definition of reflexivity.

278

Reflexive Spaces

Theorem 26.8 Let X be a Banach space. Then X is reflexive if and only if X ∗ is reflexive. In the second part of the proof we use implicitly the fact that (X ∗ )∗∗ = (X ∗∗ )∗ . This equality is clear when we note from (26.1) that (X ∗ )∗∗ = B((X ∗ )∗ ; K) = B(X ∗∗ ; K) = (X ∗∗ )∗ . Before we begin the proof a brief remark on notation might be helpful: we will use x, y for elements of X , f, g for elements of X ∗ , F, G for elements of X ∗∗ , and ,  for elements of X ∗∗∗ . Proof Suppose first that X is reflexive; we want to show that X ∗ is reflexive, i.e. that for any ∈ (X ∗ )∗∗ we can find an f ∈ X ∗ such that f ∗∗ = , i.e. such that

(F) = F( f )

for every F ∈ X ∗∗ .

This actually tells us what f should be. Since any F ∈ X ∗∗ can be written as x ∗∗ for some x ∈ X , we require

(x ∗∗ ) = x ∗∗ ( f )

for every x ∈ X.

But since, by definition, x ∗∗ ( f ) = f (x), this says that we must have f (x) = (x ∗∗ )

for every x ∈ X,

and we now use this as the definition of f . We just have to check that f really is an element of X ∗ , i.e. is a bounded linear map from X into K. But this follows immediately, as it is the composition of J , a bounded linear map from X into X ∗∗ , with , which is a bounded linear map from X ∗∗ into K. For the converse, suppose that X ∗ is reflexive but X is not, i.e. there is an element F ∈ X ∗∗ such that F = x ∗∗ for any x ∈ X . Then the set J (X ) = {x ∗∗ : x ∈ X } is a proper closed linear subspace of X ∗∗ (see Lemma 26.2), and hence by Proposition 20.4 (existence of a distance functional) there is some non-zero

∈ (X ∗∗ )∗ such that = 0 on J (X ), i.e.

(x ∗∗ ) = 0 (X ∗∗ )∗

(X ∗ )∗∗

for all x ∈ X.

X∗

Since = and is reflexive, we know that = f ∗∗ for some f ∈ X ∗ , and so if x ∈ X , we have f (x) = x ∗∗ ( f ) = f ∗∗ (x ∗∗ ) = (x ∗∗ ) = 0.

26.3 X Is Reflexive If and Only If X ∗ Is Reflexive

279

But this means that f = 0, which in turn implies that = 0, a contradiction. Since any reflexive space satisfies X ∗∗ ≡ X (but not vice versa, as commented above), and we know that (c0 )∗ ≡ 1

and

(1 )∗ ≡ (∞ ),

it follows that the space c0 is not reflexive. (We know that c0  ∞ because c0 is separable and ∞ is not.) We would like to say now that 1 cannot be reflexive (because it is the dual of c0 ), and then that ∞ is not reflexive (because it is the dual of 1 ); but we need to be a little careful, since the dual of c0 is not 1 but a space isometrically isomorphic to 1 . Lemma 26.9 If X is reflexive and X ≡ Y , then Y is reflexive. (In fact the hypothesis can be weakened to X  Y ; see e.g. Megginson, 1998.) Proof Suppose that φ : X → Y is a linear isometric isomorphism. Then the Banach adjoint of φ, φ × : Y ∗ → X ∗ defined by setting φ × (g) = g ◦ φ, is a linear isometric isomorphism (see Lemma 20.7). Applying the same argument again, the map φ ×× : X ∗∗ → Y ∗∗ by setting φ ×× (F) = F ◦ φ × as again a linear isometric isomorphism. Take G ∈ Y ∗∗ , then G = φ ×× (F) = F ◦ φ × for some F ∈ X ∗∗ ; so for any g ∈ Y ∗ we have G(g) = F ◦ φ × (g) = F(g ◦ φ) = (g ◦ φ)(x) for some x ∈ X (since X is reflexive); but (g ◦ φ)(x) = g(φ(x)) = g(y), where y := φ(x) ∈ Y . So Y is reflexive. We can now say with confidence that 1 is not reflexive (since it is isometrically isomorphic to (c0 )∗ , which is not reflexive) and hence ∞ is not reflexive (since it is isometrically isomorphic to (1 )∗ , which is not reflexive). The following result, whose argument follows similar lines as that used to prove Theorem 26.8, will be useful later. Lemma 26.10 Any closed subspace Y of a reflexive Banach space X is reflexive.

280

Reflexive Spaces

Proof Take f ∈ X ∗ and let f Y denote the restriction of f to Y , so that f Y ∈ Y ∗ . Because of the Hahn–Banach Theorem, any element of Y ∗ can be obtained as such a restriction. To show that Y is reflexive we need to show that for any  ∈ Y ∗∗ there exists a y ∈ Y such that ( f Y ) = y ∗∗ ( f Y )

for every

f ∈ X ∗.

ˆ : X ∗ → R by setting First define an element  ˆ f ) = ( f Y ), ( and then ˆ f )| ≤  f Y  ≤  f  |(

for any

f ∈ X ∗,

ˆ ∈ X ∗∗ . Now we can use the fact that X is reflexive to find an x ∈ X such so  that ˆ = x ∗∗ .  We only need now show that x ∈ Y . Suppose that x ∈ / Y . Then the distance functional from Proposition 20.4 provides an f ∈ X ∗ such that f (x) = 1 and f (y) = 0 for every y ∈ Y , i.e. such that f Y = 0. Then ˆ f ) = ( f Y ) = 0, f (x) = x ∗∗ ( f ) = ( a contradiction.

Exercises 26.1 Suppose that X and Y are Banach spaces, and that TX : X → Y ∗ and TY : Y → X ∗ are both isometric isomorphisms (so that X ∗ ≡ Y and Y ∗ ≡ X ). Show that if [TX x](y) = [TY y](x)

for all

x ∈ X, y ∈ Y,

then X is reflexive. (The proof, generalising the argument we used to prove reflexivity of the  p and L p spaces, gives some indication why X ∗∗ ≡ X alone is not sufficient for X to be reflexive.) 26.2 Suppose that U is a subset of a Banach space X . Show that U is bounded if and only if for every f ∈ X ∗ the set f (U ) = { f (u) : u ∈ U }

is bounded in R.

Exercises

281

(Use the Principle of Uniform Boundedness on an appropriately chosen set of elements of X ∗∗ .) 26.3 If T : X → Y is a linear map between Banach spaces and φ ◦ T is bounded for every φ ∈ Y ∗ show that T is bounded. (Prove the contrapositive.) 26.4 Suppose that X is a reflexive real Banach space and ξ : [0, T ] → X is such that the real-valued function f (ξ(·)) : [0, T ] → R is integrable for ´T ∗ every f ∈ X . Show that if 0 ξ(t) X < ∞, then there exists a unique y ∈ X such that ˆ T f (y) = f (ξ(t)) dt for every f ∈

X ∗.

´T

0

If we define 0 ξ(t) dt := y show that * *ˆ T ˆ T * * * * ξ(t) dt * ≤ ξ(t) X dt. * 0

X

0

26.5 Assuming that is reflexive, show that (L p )∗ ≡ L q as follows. We showed in Theorem 18.8 that the map T : L q  → (L p )∗ by setting ˆ [T (g)]( f ) = f g dx for each f ∈ L p Lp



is a linear isometry. Suppose that T is not onto and obtain a contradiction, using the fact that T (L q ) is a closed subset of (L p )∗ along with a variant of the proof of the second part of Theorem 26.8. (This is not necessarily a circular argument, since L p is uniformly convex (see Exercise 10.6) and the Milman–Pettis Theorem guarantees that any uniformly convex space is reflexive; see e.g. Theorem 5.2.15 in Megginson, 1998.) (Lax, 2002)

27 Weak and Weak-∗ Convergence

We have seen that in any infinite-dimensional space the closed unit ball is not compact (and that this characterises infinite-dimensional spaces). However, in this chapter we will prove that in any reflexive Banach space the closed unit ball is weakly sequentially compact, which is often sufficient in applications. We first introduce the notion of weak convergence: the key idea is to define a convergence based on the action of linear functionals.

27.1 Weak Convergence The definition of convergence that we have used up until now (xn → x if xn − x → 0) we will here call ‘strong convergence’ to distinguish it from the notion of weak convergence that we now introduce. Definition 27.1 We say that a sequence (xn ) ∈ X converges weakly to x ∈ X , and write xn  x, if f (xn ) → f (x)

for all

f ∈ X ∗.

Note that in a Hilbert space, where every linear functional is of the form x → (x, y) for some y ∈ H , xn  x if (xn , y) → (x, y)

for all

y ∈ H.

This observation allows us to provide an example of a sequence that converges weakly but does not converge strongly. Pick any countable orthonormal sequence (e j )∞ j=1 in H ; then for any y ∈ H Bessel’s inequality (Lemma 9.11) ∞ 

|(y, e j )|2 ≤ y2

j=1

282

27.1 Weak Convergence

283

shows that the sum converges; it follows that (y, e j ) → 0 as j → ∞ for any y ∈ H , and hence that e j √0. But the sequence (e j ) does not converge (any two elements are a distance 2 apart). Lemma 27.2 Weak convergence has the following properties. (i) Strong convergence implies weak convergence; (ii) in a finite-dimensional normed space weak convergence and strong convergence are equivalent; (iii) weak limits are unique; (iv) weakly convergent sequences are bounded; and (v) if xn  x, then x ≤ lim inf xn .

(27.1)

n→∞

Proof (i) If xn → x, then for any f ∈ X ∗ | f (xn ) − f (x)| ≤  f  X ∗ xn − x X → 0

n → ∞,

as

so f (xn ) → f (x), and hence xn  x. (ii) Due to part (i) we need only show that if V is a finite-dimensional normed space, then weak convergence in V implies strong convergence in V . If {e1 , . . . , en } is a basis for V , then for each i = 1, . . . , n the map x=

n 

x j e j → xi

j=1 (k)

is an element of V ∗ , so if x (k)  x it follows that x j j = 1, . . . , n, and so x

(k)

=

n 

(k) xj ej

j=1

n 

→ x j for each

x j e j = x.

j=1

(iii) Suppose that xn  x and xn  y. Then for any f ∈ X ∗ , f (x) = lim f (xn ) = f (y), n→∞

(X ∗

so, by Lemma 20.3 separates points in X ), x = y. (iv) Since f (xn ) converges, it follows that f (xn ) is a bounded sequence (in K) for every f ∈ X ∗ . If we consider the sequence (xn∗∗ ) ∈ X ∗∗ , then, since xn∗∗ ( f ) = f (xn ), it follows that (xn∗∗ ( f ))n is bounded in K for every f ∈ X ∗ . We can now use the Principle of Uniform Boundedness (Theorem 22.3) to deduce that (xn∗∗ ) is

284

Weak and Weak-∗ Convergence

bounded in X ∗∗ . Since x ∗∗  X ∗∗ = x X (Lemma 26.1), it follows that (xn ) is bounded in X . (v) Choose f ∈ X ∗ with  f  X ∗ = 1 such that f (x) = x (the support functional at x whose existence is guaranteed in Lemma 20.2). Then x = f (x) = lim f (xn ), n→∞

so x ≤ lim inf | f (xn )| ≤ lim inf  f  X ∗ xn  X ; n→∞

n→∞

the result follows since  f  X ∗ = 1. There are two situations in which we can easily convert weak to strong convergence. The first is in a Hilbert space: if a sequence (xn ) converges weakly to x and we also know that the norms converge, xn  → x, then this implies strong convergence. In fact the same result is true in any uniformly convex Banach space (see Exercise 27.4), but the proof in a Hilbert space is particularly simple. Lemma 27.3 Let H be a Hilbert space. If (xn ) ∈ H with xn  x and xn  → x, then xn → x. Proof Observe that x − xn 2 = (x − xn , x − xn ) = x2 − (x, xn ) − (xn , x) + xn 2 . Since xn  x, we have (xn , x) → (x, x) = x2 and xn 2 → x2 by assumption; so x − xn 2 → 0 as n → ∞. Another way to obtain a strongly convergent sequence starting with a weakly convergent one is to apply a compact operator. Lemma 27.4 Suppose that T : X → Y is a compact linear operator. If (xn ) ∈ X with xn  x in X , then T xn → T x in Y . Proof We first show that T xn  T x in Y ; indeed, if f ∈ Y ∗ , then f ◦ T is an element of X ∗ , so that xn  x implies that f (T xn ) → f (T x). Now, suppose that T xn → T x; then there is an ε > 0 and a subsequence (xn j ) j such that T xn j − T x > ε

for every j.

(27.2)

27.2 Examples of Weak Convergence in Various Spaces

285

Since xn j converges weakly, it is a bounded sequence in X (by part (iv) of Lemma 27.2); since T is compact it follows that (T xn j ) has a subsequence (T xn j ) j that converges to some z ∈ Y . Since strong convergence implies weak convergence (Lemma 27.2 (i)), we also have T xn j  z; but weak limits are unique (part (iii) of Lemma 27.2) and we already know that T xn j  T x (since xn j is a subsequence of xn and we know that T xn  T x), so we must have z = T x and lim T xn j − T x → 0

as j → ∞.

j→∞

Since xn j is a subsequence of xn j the preceding equation contradicts (27.2), and therefore T xn → T x as claimed.

27.2 Examples of Weak Convergence in Various Spaces We now look at some examples of weak convergence in particular spaces. We characterise weak convergence in  p for 1 < p < ∞, show that weak and strong convergence in 1 coincide, and make some observations about weak convergence in C([a, b]) and how it relates to other notions of convergence.

27.2.1 Weak Convergence in  p , 1 < p < ∞ If we take 1 ≤ p < ∞, then we know from Theorem 18.5 that any element of ( p )∗ can be represented as ·, y for some y ∈ q , where p and q are conjugate and we use ·, · to denote the pairing x, y =

∞ 

x j yj

j=1

(whenever this makes sense). So we have x (n)  x in  p

x (n) , y → x, y for every y ∈ q .

(27.3)

For 1 < p < ∞ there is an even nicer characterisation. (The following result is not true in 1 or in ∞ : see Exercise 27.6.) p Lemma 27.5 Let (x (n) )∞ n=1 be a sequence in  , with 1 < p < ∞. Then (n) p x  x in  if and only if

x (n)  p is bounded

and

xk(n) → xk for every k ∈ N.

286

Weak and Weak-∗ Convergence

Proof ⇒ This follows from taking y = e(k) in (27.3), and using the fact that any weakly convergent sequence is bounded (Lemma 27.2 (iii)). ⇐ Suppose that x (n)  p ≤ M; we first show that x p ≤ M. For any k and any ε > 0 there exists N such that for every n ≥ N we have ⎞ ⎛ k k   (n) |x j | p ≤ ⎝ |x j | p ⎠ + ε ≤ M p + ε. j=1

j=1

Since this holds for every k ∈ N and ε > 0 is arbitrary, we have x p ≤ M as claimed. Take any y ∈ q ; then, since y = limk→∞ kj=1 y j e( j) , given any ε > 0 there exists k such that * * * * k  * * ε ( j) * *y − yj e * < ; * 4M * * j=1

q

then

)9 : 9 :)) ) k k   ) ) y j e( j) + x (n) − x, y − y j e( j) )) |x (n) − x, y| = )) x (n) − x, ) ) j=1 j=1 )9 ) * * :) ) * * k k   ) (n) ) * * ( j) ) (n) ( j) ) * ≤ ) x − x, yj e yj e * ) + x − x p * y − * ) ) * * j=1 j=1

q

k 

|y j ||x (n) − x, e( j) | + 2M

j=1

=

k  j=1

ε 4M

ε |y j ||x (n) − x, e( j) | + . 2

Since x (n) , e( j)  → x, e( j)  for each j, it follows that x (n)  x.

27.2.2 Weak Convergence in 1 : Schur’s Theorem In 1 weak convergence is equivalent to strong convergence. Theorem 27.6 (Schur’s Theorem) If x (n)  x in 1 , then x (n) → x in 1 . Proof By subtracting x from x (n) , it suffices to show that if x (n)  0, then x (n) → 0.

27.2 Examples of Weak Convergence in Various Spaces

287

Suppose that this conclusion does not hold: then x (n) 1 → 0, so we can find ε > 0 and a subsequence (which we relabel) such that x (n)  0 and x (n) 1 =

∞ 

(n)

|xk | ≥ ε

for every n.

(27.4)

k=1

Note that we know, taking the linear map in (1 )∗ given by x → x j , that (n) each component x j → 0 as n → ∞. Now we inductively choose N j , M j in the following way: first choose N1 such that ∞ 

(1)

|xk |
 f  X ∗ − ε; then  f  X ∗ − ε < f (x) = lim f n (x) ≤ lim inf  f n  X ∗ x = lim inf  f n  X ∗ , n→∞

n→∞

n→∞

which yields the result since ε > 0 is arbitrary. (v) f n  f in X ∗ means that for every F ∈ X ∗∗ we have F( f n ) → F( f ). Given any element x ∈ X we can consider the corresponding x ∗∗ ∈ X ∗∗ . Since f n  f in X ∗ , we have f n (x) = x ∗∗ ( f n ) → x ∗∗ f = f (x), ∗

and so f n  f . (vi) When X is reflexive any F ∈ X ∗∗ is of the form x ∗∗ for some x ∈ X . ∗ So if f n  f in X ∗ we have F( f n ) = x ∗∗ ( f n ) = f n (x) → f (x) = x ∗∗ ( f ) = F( f ), using the weak-∗ convergence of f n to f to take the limit. So f n  f in X ∗ .

292

Weak and Weak-∗ Convergence

27.5 Two Weak-Compactness Theorems We now prove two key compactness theorems. We begin with a preparatory lemma.1 Lemma 27.10 Suppose that ( f n ) is a bounded sequence in X ∗ , so that  f n  X ∗ ≤ M for some M > 0, and suppose that f n (a) converges as n → ∞ for every a ∈ A, where A is a dense subset of X . Then limn→∞ f n (x) exists for every x ∈ X , and the map f : X → R defined by setting f (x) = lim f n (x),

for each

n→∞

x∈X

is an element of X ∗ with  f  X ∗ ≤ M. Proof We first prove that if f n (a) converges for every a ∈ A, then f n (x) converges for every x ∈ X . Given ε > 0 and x ∈ X , first choose a such that x − a X ≤ ε/3M. Now, using the fact that f n (a) converges as n → ∞, choose n 0 sufficiently large that | f n (a) − f m (a)| < ε/3 for all n, m ≥ n 0 . Then for all n, m ≥ n 0 we have | f n (x) − f m (x)| ≤ | f n (x) − f n (a)| + | f n (a) − f m (a)| + | f m (a) − f m (x)| ε ≤  f n  X ∗ x − a + +  f m  X ∗ a − x 3 ≤ ε. It follows that ( f n (x)) is Cauchy and hence converges. We now define f : X → R by setting f (x) := lim f n (x). n→∞

Then f is linear since f (x + λy) = lim f n (x + λy) = lim f n (x) + λ f n (y) = f (x) + λ f (y) n→∞

n→∞

and f is bounded since | f (x)| = lim | f n (x)| ≤ Mx. n→∞

1 Note that this is similar to Corollary 22.4, which translated to this particular setting says that if ( f n ) ∈ X ∗ and f n (x) converges for every x ∈ X , then setting f (x) := limn→∞ f n (x) defines an element f ∈ X ∗ . The lemma here weakens one hypothesis by requiring convergence only

on a dense subset of X , but the boundedness of the sequence ( f n ), which was the key thing to be proved in Corollary 22.4, becomes an assumption here and is used to show that convergence on a dense subset implies convergence for every x ∈ X .

27.5 Two Weak-Compactness Theorems

293

Using this we can prove a weak-∗ compactness result when X is separable.2 The theorem as stated here is due to Helly (1912). Theorem 27.11 Suppose that X is separable. Then any bounded sequence in X ∗ has a weakly-∗ convergent subsequence. Proof Let {xk } be a countable dense subset of X , and ( f j ) a sequence in X ∗ such that  f j  X ∗ ≤ M. As in the proof of Theorem 15.3 we will use a diagonal argument to find a subsequence of the ( f j ) (which we relabel) such that f j (xk ) converges for every k. Since | f n (x1 )| ≤ Mx1 , we can use the Bolzano–Weierstrass Theorem to find a subsequence f n 1,i such that f n 1,i (x1 ) converges. Now, since | f n 1,i (x2 )| ≤ Mx2  we can find a subsequence f n 2,i of f n 1,i such that f n 2,i (x2 ) converges; f n 2,i (x1 ) will still converge since it is a subsequence of f n 1,i (x1 ) which we have already made converge. We continue in this way to find successive subsequences f n m,i such that f n m,i (xk )

converges as i → ∞ for every k = 1, . . . , m.

By taking the diagonal subsequence f m∗ := f n m,m (as in the proof of the Arzelà– Ascoli Theorem) we can ensure that f m∗ (xk ) converges for every k ∈ N. The proof concludes using Lemma 27.10. A consequence of Helly’s Theorem is the following extremely powerful weak-compactness result that holds in any reflexive space. It finally offers a way around the failure of the ‘Bolzano–Weierstrass property’ in infinitedimensional spaces, i.e. in general bounded sequences have no strongly convergent subsequence. The proof seems short, but it builds on many of the results and techniques that have been introduced in the course of this book. The Hahn–Banach Theorem plays a crucial (though hidden role), since it is this that allowed us to prove the transfer of separability from X ∗ to X (Lemma 20.5) and the transfer of reflexivity from X to any closed subset of X (Lemma 26.10). Also note that 2 This result can also be derived as a consequence of the more powerful Banach–Alaoglu Theo-

rem, which guarantees that for any Banach space X (no separability assumption required) the closed unit ball in X ∗ is compact in the weak-∗ topology. While topological compactness and sequential compactness are not equivalent in general, they coincide in metric spaces, and hence also in ‘metrisable’ topologies, i.e. those that can arise from a metric. When X is separable, it is possible to find a metric that gives rise to the weak-∗ topology on the closed unit ball in X ∗ , and one can then deduce the sequential compactness of our Theorem 27.11 from the topological compactness of the Banach–Alaoglu Theorem. This is a very long way round to obtain Theorem 27.11 compared to our more direct proof, but the more general result is important in the further development of the theory of Banach spaces (see e.g. Megginson, 1998). Details are given in Appendix C.

294

Weak and Weak-∗ Convergence

while the statement concerns only weak convergence, the proof relies on the definition of weak-∗ convergence. Theorem 27.12 Let X be a reflexive Banach space. Then any bounded sequence in X has a weakly convergent subsequence. This is equivalent to the statement that the closed unit ball is ‘weakly sequentially compact’. Eberlein (1947) proved that the closed unit ball in X is weakly sequentially compact if and only if X is reflexive; see Exercises 27.9 and 27.10. Proof Take a bounded sequence (xn ) ∈ X and let Y := clin{x1 , x2 , . . .}. Then, using Lemma 3.23, Y is separable. Since Y ⊆ X and X is reflexive, so is Y (Lemma 26.10). Therefore Y ∗∗ ≡ Y , which implies that Y ∗∗ is separable (Exercise 3.13); Lemma 20.5 implies that Y ∗ is separable. Now, xn∗∗ is a bounded sequence in Y ∗∗ , so using Theorem 27.11 there is a subsequence xn k such that xn∗∗k is weakly-∗ convergent in Y ∗∗ to some limit

∈ Y ∗∗ . Since Y is reflexive, = x ∗∗ for some x ∈ Y ⊆ X . Now for any f ∈ X ∗ we have f Y := f |Y ∈ Y ∗ , so lim f (xn k ) = lim f Y (xn k ) = lim xn∗∗k ( f Y )

k→∞

k→∞

k→∞

= x ∗∗ ( f Y ) = f Y (x) = f (x), i.e. xn k  x. This theorem can be used to deduce that certain spaces are not reflexive; see e.g. Exercise 27.2. Here is an example of the use of weak compactness and ‘approximation’ to prove the existence of a fixed point. Lemma 27.13 Let X be a reflexive Banach space, and T : X → X a compact linear operator. Suppose that (xn ) is a sequence in X such that there exist c1 , c2 with 0 < c1 ≤ c2 so that c1 ≤ xn  ≤ c2 and T xn − xn  → 0

(27.9)

as n → ∞. Then there exists a non-zero x ∈ X such that T x = x. Note that a linear map always has x = 0 as a fixed point. Proof Since (xn ) is a bounded sequence in a reflexive Banach space, by Theorem 27.12 it has a weakly convergent subsequence, xn j  x. Since T is compact, it follows from Lemma 27.4 that T xn j → T x strongly in X . Since

Exercises

295

lim T xn j − xn j = 0,

j→∞

it follows that xn j → T x. Since strong convergence implies weak convergence, we have xn j  T x, and since weak limits are unique and we already have xn j  x it follows that x = T x. To ensure that x = 0, note that since T xn j converges strongly to T x = x, it follows from (27.9) that xn j also converges strongly to x, and so x ≥ c1 . We end with a simple prototype of the sort of minimisation problem that occurs in the calculus of variations. We will use Mazur’s Theorem and sequential weak compactness in a reflexive Banach space X to prove the existence of at least one closest point in any closed, convex subset of X . Lemma 27.14 Suppose that X is reflexive and that K is closed convex subset of X . Then for any x ∈ X \ K there exists at least one k ∈ K such that x − k = dist(x, K ) = inf x − y. y∈K

Proof Let (yn ) be a sequence in K such that x − yn  → dist(x, K ). Then (yn ) is a bounded sequence in X , so has a subsequence yn k that converges weakly to some k ∈ X . Since K is closed and convex, it is also weakly closed (Theorem 27.7), and so k ∈ K . Since yn k  k, x − yn k  x − k, and so we have x − k ≤ lim inf x − yn k  = dist(x, K ) k→∞

using (27.1). Since x − k ≥ dist(x, K ), we have x − k = dist(x, K ) as required.

Exercises 27.1

27.2 27.3

Suppose that U is a closed linear subspace of a Banach space X , and suppose that (xn ) ∈ U converges weakly to some x ∈ X . Show that x ∈ U . [Hint: use Exercise 20.10.] Use Theorem 27.12 to show that 1 is not reflexive. Suppose that X is a Banach space, and that (xn ) is a sequence in X such that xn  x. Show that there exist yn ∈ X such that yn → x, where each yn is a convex combination of (x1 , . . . , xn ). [Hint: use the (obvious) fact that the closed convex hull of the {xn } is closed and convex.]

296

27.4

Weak and Weak-∗ Convergence

Recall (see Exercise 8.10) that a space X is uniformly convex if for every ε > 0 there exists δ > 0 s.t. * * *x + y* * * x − y > ε, x, y ∈ B X ⇒ * 2 * < 1 − δ, where B X is the closed unit ball in X . Show that if X is uniformly convex, then xn  x

and

xn  → x

xn → x.

(Set yn = xn /xn , y = x/x, and show that (yn + y)/2 → 1. Then argue by contradiction using the uniform convexity of X .) 27.5 The space L 1 (0, 2π ) is not uniformly convex. Find a counterexample to the result of Exercise 27.4 in this space. 27.6 Find an example to show that Lemma 27.5 does not hold in 1 and ∞ . 27.7 Show that if (en ) is an orthonormal sequence in a Hilbert space H and T : H → H is compact, then T en → 0 as n → ∞. 27.8 Show that a sequence (φn ) ∈ C([−1, 1]; R) satisfies ˆ 1 lim f (t)φn (t) dt = f (0) (27.10) n→∞ −1

for all f ∈ C([−1, 1]) if and only if ˆ 1 lim φn (t) dt = 1, n→∞ −1

(27.11)

for every function g ∈ C([−1, 1]) that is zero in a neighbourhood of x = 0, ˆ 1 lim g(t)φn (t) dt = 0, (27.12) n→∞ −1

and there exists a constant M > 0 such that ˆ 1 φn (t) dt ≤ M for every n. −1

27.9

(27.13)

(Lax, 2002) Suppose that X is a real Banach space. A theorem due to James (1964) states that if X is not reflexive, then there exists θ ∈ (0, 1) and sequences ( f n ) ∈ S X ∗ , (xn ) ∈ S X , such that f n (x j ) ≥ θ, n ≤ j,

f n (x j ) = 0, n > j.

Show that the sets Cn := conv{xn , xn+1 , xn+2 , . . .} form a decreasing sequence of non-empty closed bounded convex sets in X that satisfies ∩ j C j = ∅. (Show that if x ∈ Ck for some k, then f n (x) → 0

Exercises

297

as n → ∞, but that if x ∈ ∩ j C j , then f n (x) ≥ θ for every n.) (Megginson, 1998) 27.10 Let X be a Banach space. Show that if every bounded sequence in X has a weakly convergent subsequence, then whenever (Cn ) is a decreasing sequence (Cn+1 ⊆ Cn ) of non-empty closed bounded convex sets in X , ∩n Cn = ∅. Deduce, using the previous exercise, that X is reflexive if and only if its closed unit ball is weakly sequentially compact. [Hint: use Corollary 21.8.] (Megginson, 1998)

APPENDICES

Appendix A Zorn’s Lemma

We will show here that Zorn’s Lemma is a consequence Axiom of Choice (in fact the two are equivalent). We follow the lecture notes of Bergman (1997). We begin with a formal statement of the Axiom of Choice. Axiom of Choice If (X α )α∈A is any family of non-empty sets, there exists a function ϕ : A → ∪α∈A X α such that ϕ(α) ∈ X α for every α ∈ A. The statement of Zorn’s Lemma requires some more terminology. A set P is partially ordered with respect to the relation  provided that (i) x  x for all x ∈ P; (ii) if x, y, z ∈ P with x  y and y  z, then x  z; (iii) if x, y ∈ P, x  y, and y  x, then x = y. Two elements x, y ∈ P are comparable if x  y or y  x. A subset C ⊆ P is called a chain if every two elements of C are comparable, and P is totally ordered if every two elements of P are comparable. An element b ∈ P in an upper bound for a subset T ⊆ P if x  b for every x ∈ T , and m ∈ T is a maximal element for T if x ∈ T and m  x implies that x = m. Zorn’s Lemma If P is a non-empty partially ordered set in which every chain has an upper bound, then P contains at least one maximal element. We will need some other terminology and minor results for the proof. An initial segment of a chain S is a subset T ⊆ S such that if u, v ∈ S with u  v and v ∈ T , then u ∈ T . We will write T  S. A well-ordered set is a totally ordered set in which every non-empty subset has a least element (i.e. every non-empty subset A contains an element s such that s  a for every a ∈ A). Such a least element is unique, since if s1 , s2 ∈ A 301

302

Zorn’s Lemma

are both least elements of A, then s1  s2 and s2  s1 , which implies that s1 = s2 . Fact 1 If S is a well-ordered subset of a partially ordered set P and t ∈ / S is an upper bound for S in P, then S ∪ {t} is well ordered. Proof If a, b ∈ S ∪ {t}, then there are three possibilities: (i) a, b ∈ S so are comparable; (ii) a ∈ S, b = t so a  t; (iii) a = b = t so a  b and b  a. Any subset of S ∪ {t} contains a least element: {t} has least element t; a subset of S contains a least element; and for A ⊂ S the set A ∪ {t} has the same least element as A. Fact 2 If Z is a set of well-ordered subsets of a partially ordered set P, such that for all X, Y ∈ Z, either X  Y or Y  X , then ∪ X ∈Z X is well ordered. Proof First we show that U = ∪ X ∈Z X is totally ordered. Given any two elements a, b ∈ U , a ∈ X and b ∈ Y , where X, Y ∈ Z; but either X  Y or Y  X , so a, b are comparable. Now we show that any non-empty subset V of U has a least element. Since V ⊂ U , V has a non-empty intersection with some X ∈ Z, and then V ∩ X has a least element s (since X is well ordered). Now suppose that we also have V ∩ Y = ∅ for another Y ∈ Z. Then either (i) Y  X , in which case Y ⊆ X so V ∩ Y ⊆ V ∩ X ; since s is the least element of V ∩ X it is also the least element of V ∩ Y ; or (ii) X  Y ; in this case suppose that there exists v ∈ Y with v  s. Then since s ∈ X and X  Y it follows that v ∈ X ; but then, since s is the least element of U ∩ X , we also have s  v, which shows that v = s. It follows that V has s as its least element, and hence U is totally ordered. Theorem A.1 Zorn’s Lemma is equivalent to the Axiom of Choice. Proof First we show that the Axiom of Choice implies Zorn’s Lemma. Let P be a non-empty partially ordered set with the property that every chain in P is bounded. In particular, for any chain C the set of all upper bounds for C is non-empty. Suppose that C does not contain an element that is maximal for P; then C must have upper bounds that do not lie in C. Otherwise, suppose that b ∈ C is an upper bound for C and m ∈ P satisfies b  m; then m is an upper bound for C and so m ∈ S; therefore m  b, whence m = b. It follows that b is a maximal element of P.

Zorn’s Lemma

303

We denote that set of these upper bounds for C that do not lie in C by B(C), and using the Axiom of Choice for each chain C we choose one element of B(C), and denote it by ϕ(C). Now we would like to argue as follows: choose some p0 ∈ P. If this is not maximal, then let p1 = ϕ({ p0 }) % p0 . If p1 is not maximal, then let p2 = ϕ({ p0 , p1 }) % p1 . If this process never terminates, then we let p ∗ = ϕ({ p0 , p1 , p2 , . . .}). If p ∗ is not maximal in P, then we append p ∗ to the above chain and continue. . . Now, fix an element p ∈ P, and let Z denote the set of subsets S of P that have the following properties: (i) S is a well-ordered chain in P; (ii) p is the least element of S; (iii) for every proper non-empty initial segment T ⊂ S the least element of S \ T is ϕ(T ). Note that Z is non-empty, since it contains { p}. If S and S  are two members of Z then one is an initial segment of the other. To see this, let R denote the union of all sets that are initial segments of both S and S  – the ‘greatest common initial segment’ (R is non-empty since { p} is an initial segment of every S ∈ Z). If R is a proper subset of S and of S  , then by (iii) the element ϕ(R) is the least element of both S \ R and S  \ R; this would mean that R ∪ ϕ(R) is an initial segment of both S and S  , but this contradicts the maximality of R. Therefore R = S or R = S  , i.e. one is an initial segment of the other. By Fact 2, the set U , the union of all members of Z, is well ordered. All members of Z are initial segments of U (suppose that u, v ∈ U , u  v, and v ∈ X (X ∈ Z); if u ∈ Y , then either Y is an initial segment of X , in which case u ∈ X immediately; or X is an initial segment of Y and it follows from this that u ∈ X ) and the least element of U is { p}. Furthermore, U also satisfies (iii): if T is a proper non-empty initial segment of U , then there exists some u ∈ U \ T . By construction of U , u ∈ S for some S ∈ Z, and so T must be a proper initial segment of S. Hence (iii) ensures that ϕ(T ) is the least element of S \ T , and since S is an initial segment of U , ϕ(T ) is also the least element of U \ T . Therefore U is a member of Z. If U does not contain a maximal element of P, then U ∪ {ϕ(U )} will be an element of Z that is not a subset of U , which contradicts the definition of U . Now we show that Zorn’s Lemma implies the Axiom of Choice.

304

Zorn’s Lemma

Let P be the collection of all subsets ⊂ A × ∪α∈A X α with the property that (i) for each α ∈ A there is at most one element of the form (α, ϕ) ∈ and (ii) if (α, ϕ) ∈ then ϕ is a single element of X α . We partially order P by inclusion. P is non-empty because the empty set is a member of P (or, alternatively, {(α, ϕ)} ∈ P for any choice of α ∈ A and ϕ ∈ X α ). Now suppose that C is a chain in P; then U = ∪ S∈C S is an upper bound for C, since S ⊆ U for every S ∈ C. It follows that P has a maximal element ∗ . If there exists an α ∈ A such that ∗ contains no element of the form (α, ξ ) with ξ ∈ X α we can consider  := ∗ ∪ {(α, ξ )} for some ξ ∈ X α , and then  ∈ P, any ∈ P satisfies  , but  = ∗ which contradicts the maximality of ∗ .

Appendix B Lebesgue Integration

In this appendix we give an outline of the construction of the Lebesgue integral in Rd , omitting the proofs. We then use this theory to give intrinsic definitions of the spaces L p () for open subsets  in Rd . Finally, we prove the properties of these spaces that we have obtained in other ways in the main part of the book. In Sections B.1 and B.2 we follow quite closely the presentation in Stein and Shakarchi (2005); for Section B.3 we follow Chapter 3 of Rudin (1966) and Chapter 2 of Adams (1975). An alternative construction of the Lebesgue integral based on step functions rather than simple functions can be found in Priestley (1997).

B.1 The Lebesgue Measure on Rd For full details and proofs of statements in this section see Chapter 1 of Stein and Shakarchi (2005). A ‘cube’ Q ⊂ Rd is a set of the form [a1 , b1 ] × · · · × [ad , bd ] with b1 − a1 = · · · = bd − ad . The volume of a cube Q we write as |Q|, i.e. |Q| = (b1 − a1 )d . Given any subset E of Rd we define the outer measure of E to be μ∗ (E) := inf

∞ 

|Q j |,

j=1

where the infimum is taken over all countable collections of cubes {Q j } that cover E. A subset of Rd is called measurable if for any ε > 0 we can find an open subset U of Rd such that μ∗ (U \ E) < ε. If E is measurable, then we define its Lebesgue measure μ(E) to be μ(E) = μ∗ (E). The Lebesgue 305

306

Lebesgue Integration

measure is countably additive, i.e. for any countable collection {A j }∞ j=1 of disjoint measurable sets we have ⎞ ⎛ ∞ ∞  Aj⎠ = μ(A j ). μ⎝ j=1

j=1

We denote by M(Rd ) the collection of all measurable sets in Rd , and for any measurable subset  of Rd we let M() be the set of all measurable subsets of  (it consists of E ∩  for all E ∈ M(Rd )). These collections are both examples of σ -algebras. Definition B.1 A collection  of subsets of A is called a σ -algebra on A provided that (i) ∅ ∈ ; (ii) if E ∈ , then A \ E ∈ ; 6∞ (iii) if {E j }∞ j=1 E j ∈ . j=1 ⊂ , then Note that as a consequence of the definition A ∈  and whenever ;∞ {E j }∞ j=1 E j ∈ . j=1 ⊂ , Theorem B.2 (Lebesgue measure) The σ -algebra M = M(Rd ) of measurable subsets of Rd and the Lebesgue measure μ : M → [0, ∞] satisfy (i) every open subset of Rd belongs to M, as does every closed subset; (ii) if A ⊂ B and B ∈ M with μ(B) = 0, then A ∈ M and μ(A) = 0; (iii) every set A of the form {x ∈ Rd : ai ≤ xi ≤ bi } belongs to M and 0, Lusin’s Theorem (Theorem B.4) guarantees that there exists g ∈ C() such that g(x) = s(x) except on a set E with μ(E) < ε, and that g∞ ≤ s L ∞ . It follows that ˆ g −s

Lp

1/ p

=

|g(x) − s(x)| p dx E

≤ μ(E)1/ p g −s∞ ≤ 2s∞ ε1/ p .

Since S() is dense in L p (), the result now follows. Since L p () is complete and C() is a dense subset, it follows (see Lemma 7.2) that L p () is the completion of C() in the L p norm, which was the definition we gave in Chapter 7. Lemma B.14 For any  ⊂ Rd and 1 ≤ p < ∞ the space L p () is separable. Proof Suppose first that  is bounded; then  ⊂ [−M, M]d =: Q M for some M > 0 and || < ∞. If we use an argument similar to that of Exercise 6.7 we can show that the collection of functions

314

Lebesgue Integration

A :=

⎧ k ⎨  ⎩

ci1 ,i2 ,...,in x1i1

· · · xnin : n ∈ N, ci1 ,...,in ∈ R

i 1 ,...,i n =1

⎫ ⎬ ⎭

is uniformly dense in C(Q M ), and hence uniformly dense in C(). Indeed, A is a subalgebra of C(Q M ) that separates points, and hence A = C(Q M ) by the Stone–Weierstrass Theorem. Since ˆ p p  f  L p () = | f (x)| p ≤ || f ∞ , 

A is also dense in C(Q M ) in the L p norm. Since A is the linear span of the countable set A := {x1i1 · · · xnin : i j ∈ N, j = 1, . . . , n}, it follows from Exercise 3.14 that the linear span of A is also dense in L p (), and so L p () is separable. If  is unbounded, then, given any f ∈ L p () and ε > 0, there exists N ∈ N such that ˆ \Q N

| f | p < ε/2;

this is a consequence of the Monotone Convergence Theorem applied to the sequence of functions f χ Q n . We have just shown that for any j ∈ N the function f | Q j can be approximated to within ε/2 by elements of the linear span of {x k | Q j }. Since the collection {x k | Q j } k, j∈N

is countable, this shows that L p () is separable.

B.4 Dual Spaces of L p , 1 ≤ p < ∞ In the final section of this appendix we show that for 1 ≤ p < ∞ we have (L p )∗ ≡ L q with ( p, q) conjugate. In this section we follow Rudin (1966) and give the standard measuretheoretic argument based on the Radon–Nikodym Theorem. We also give an alternative proof that is based on the uniform convexity of L p taken from Adams (1975). Given any f ∈ L 1 () the function λ : M() → R defined by setting ˆ λ(A) = f (x) dx A

B.4 Dual Spaces of L p , 1 ≤ p < ∞

315

defines a signed measure on M(). That this can be reversed is the content of the Radon–Nikodym Theorem. Theorem B.15 (Radon–Nikodym Theorem) If λ : M() → R is a measure on M() such that λ(A) = 0 whenever μ(A) = 0, then there exists f ∈ L 1 () such that ˆ λ(A) = f (x) dx A ∈ M, A

and f is unique (up to sets of measure zero). We now use this to find the dual spaces of L p (). Theorem B.16 For 1 ≤ p < ∞ we have (L p ())∗ ≡ L q (), where p and q are conjugate. Proof We showed in the proof of Theorem 18.8 given any g ∈ L q () the linear functional ˆ f (x)g(x) dx, f ∈ L p () (B.5)

g ( f ) := 

is an element of (L p )∗

with  g (L p )∗ = g L q . What remains is to show that any ∈ (L p )∗ can be written in the form (B.5) for some g ∈ L q (). We give the proof when μ() < ∞. For the steps required to prove the result when μ() = ∞ see the conclusion of Theorem 6.16 in Rudin (1966). Given ∈ (L p )∗ , for any E ∈ M() define λ(E) = (χ E ). Clearly λ is additive, since if A, B ∈ M() are disjoint χ A∪B = χ A + χ B , and so λ(A ∪ B) = φ(χ A∪B ) = φ(χ A + χ B ) = (χ A ) + (χ B ) = λ(A) + λ(B). If E = ∪∞ j=1 A j , where the {A j } are disjoint subsets of M(), then, if we set E k = ∪kj=1 A j , we have χ E − χ E k  L p = χ E\E k  L p = μ(E \ E k )1/ p , which tends to zero as k → ∞ since μ(E) = ∞ j=1 μ(A j ) < ∞. Since is continuous, it follows that k  j=1

λ(A j ) = λ(E k ) (χ E k ) → (χ E ) = λ(E),

316

Lebesgue Integration

∞ and so λ(E) = j=1 λ(A ´ j ) and λ is countably additive. Furthermore, if μ(E) = 0, then λ(E) = E f (x) dx = 0. We can therefore apply the Radon–Nikodym Theorem to guarantee the existence of g ∈ L 1 () such that ˆ ˆ

(χ E ) = g dx = χ E g dx for every E ∈ M(). (B.6) 

E

Since simple measurable functions are linear combinations of characteristic functions, it follows by linearity of the integral that ˆ

( f ) = f g dx for every f ∈ s(). Since every f ∈ L ∞ () is the uniform limit of simple functions,2 it follows that ˆ (B.7)

( f ) = f g dx for every f ∈ L ∞ (). ´ We now need to show that g ∈ L q and that ( f ) = f g for every f ∈ L p . If p = 1, then, from (B.6), it follows that ) )ˆ ) ) ) g dx ) = | (χ E )| ≤   1 ∗ χ E  1 =   1 ∗ μ(E), (L ) L (L ) ) ) E

which shows that |g(x)| ≤  (L 1 )∗ almost everywhere (since otherwise there would be a set E of positive measure for which the inequality was violated) and so g L ∞ ≤  (L 1 )∗ . If 1 < p < ∞, then for each k ∈ N define

g(x) |g(x)| ≤ k gk (x) = 0 otherwise. We apply (B.7) to the function f k ∈ L ∞ () defined by setting

|gk (x)|q−2 gk (x) gk (x) = 0 f k (x) = 0 otherwise and obtain ˆ ˆ ˆ |gk (x)|q dx = g f k dx = ( f k ) ≤  (L p )∗ |gk |q 





1/ p

,

2 Set M =  f  ∞ and for each n ∈ N and −n ≤ j ≤ n − 1, let a j,n = j M/n and L a χ (x). E j,n = f −1 ( j M/n, ( j + 1)M/n); then take sn = n−1 j=−n j,n E j,n

B.4 Dual Spaces of L p , 1 ≤ p < ∞

317

which implies that ˆ

1/q 

|gk (x)|q dx

≤  (L p )∗ .

Since |gk+1 (x)| ≥ |gk (x)| and gk (x) → g(x) for almost every x, we can apply the Monotone Convergence Theorem to deduce that g ∈ L q () with g L q ≤  (L p )∗ . ´ It only remains to show that ( f ) =  f g dx for every f ∈ L p (); we currently only have this for f ´∈ L ∞ () (in (B.7)). Since we now know that f ∈ L q (), both ( f ) and  f g dx define continuous linear functionals on L p () that agree for f ∈ L ∞ (). To finish the proof, note that L ∞ () in Theorem is dense in L p (), since L ∞ () ⊃ C 0 () and (as we showed ´ B.13) C() is dense in L p (). It follows that ( f ) =  f g dx for every f ∈ L p () and we have finished. The proof of reflexivity of L p , which is very similar to the proof that  p is reflexive once we have identified the dual spaces, is given in Chapter 26 (Proposition 26.6). We end this appendix with an alternative proof of the ‘onto’ part of Theorem B.16 that does not use the Radon–Nikodym Theorem. This proof, which can be found in Adams (1975), uses the uniform convexity of L p (1 < p < ∞). The proof as presented here also uses many results that are proved in various exercises in this book. The only part we have not proved is the uniform convexity of L p for 1 < p < 2, which uses Clarkson’s second inequality. However, this case was treated without the Radon–Nikodym Theorem in Exercise 18.6. Alternative proof that (L p )∗ ≡ L q for 1 < p < ∞. We use the uniform convexity of L p , which we proved in Exercise 10.6. Suppose that we have ∈ (L p )∗ with  (L p )∗ = 1. Then we can find ˆ L p = 1 and (g) ˆ = 1. Indeed, we showed a unique gˆ ∈ L p such that g in Exercise 10.7 that any closed convex subset of a uniformly convex Banach space contains a unique element of minimum norm, so the set G := { f ∈ L p : ( f ) = 1}, which is closed (it is the preimage of 1 under the continuous map ) and convex, contains a unique element gˆ of minimum norm. Since for any f ∈ L p we have 1 = | ( f )| ≤  (L p )∗  f  L p

318

Lebesgue Integration

it follows that  f  L p ≥ 1 for every f ∈ G, while from the definition of  (L p )∗ we know that for every δ > 0 there exists an f ∈ G such that ˆ L p = 1 as claimed.  f  L p < 1 + δ. It follows that g We now use the facts that any uniformly convex space is strictly convex (Exercise 10.4), and that in a strictly convex space for each x ∈ X with x = 0 there is a unique linear functional φ ∈ X ∗ such that φ X ∗ = 1 and φ(x) = x X (Exercise 20.5). Set g(x) = Then

ˆ g

Lq

=



|g(x)| ˆ

p−2 g(x) |g(x)| ˆ ˆ

g(x) ˆ

= 0

0

( p−1)q

otherwise. ˆ

1/q

dx

=



1/q p |g(x)| ˆ dx

= 1,

and so the linear map g as defined in (B.5) is an element of (L p )∗ with  g (L p )∗ = 1. Furthermore, we have ˆ ˆ p ˆ = g(x)g(x) ˆ dx = |g(x)| ˆ dx = 1.

g (g) 



It follows from the uniqueness result from the beginning of this paragraph that

g = . If we start with ∈ (L p )∗ with ν =  (L p )∗ = 1, then we can apply this result to ν −1 , obtain g such that g = ν −1 , and then νg = . For a limiting argument that allows one to obtain the result for p = 1, see Theorem 2.34 in Adams (1975).

Appendix C The Banach–Alaoglu Theorem

This appendix gives a proof of two powerful results: the Tychonoff Theorem and the Banach–Alaoglu Theorem. The first guarantees that the topological product of any collection of compact topological spaces is compact (if we use the product topology). The second, which uses the Tychonoff Theorem as a key ingredient in the proof, says that if X is a Banach space, then the closed unit ball in X ∗ is compact in the weak-∗ topology. The Banach–Alaoglu Theorem is a more general version of Helly’s Theorem (sequential weak-∗ compactness of the closed unit ball in X ∗ when X is separable) that we proved as Theorem 27.11. (By finding a metric that gives rise to the weak-∗ topology on the closed unit ball when X is separable we can derive Theorem 27.11 as a consequence.) We have not discussed topologies and topological spaces elsewhere in this book, so we first give a quick overview of this material, including the notion of continuity and convergence in the topological setting. We show how to construct a topology from a basis or sub-basis, and define the product topology. We then prove Tychonoff’s Theorem, derive the Banach–Alaoglu Theorem as a (non-trivial) consequence, and show that the weak-∗ topology (on the closed unit ball) is metrisable when X is separable. To complete the proof that Helly’s Theorem follows from the Banach–Alaoglu Theorem we show that compactness and sequential compactness are equivalent in a metric space. For a more leisurely treatment of the essentials of topological spaces, see Sutherland (1975) or Munkres (2000). The presentation here is heavily influenced by David Preiss’s lecture notes for the University of Warwick Metric spaces module, with the later sections also drawing on Brezis (2011).

C.1 Topologies and Continuity A topology T on a set T is a collection of subsets of T such that 319

320

The Banach–Alaoglu Theorem

(T1) T, ∅ ∈ T, (T2) the union of any collection of elements of T is in T, and (T3) if U, V ∈ T, then U ∩ V ∈ T. The pair (T, T) is a topological space. Any set in T is called ‘open’, and any subset U of T with T \ U ∈ T is called ‘closed’. For example, the collection of all open sets in a metric space (X, d) forms a topology on X (see Lemma 2.6). We showed in Lemma 2.13 that continuity of a function between metric spaces can be defined in terms of preimages of open sets; we can use this form of the definition for functions between general topological spaces. Definition C.1 A map f : (T, T) → (S, S) is said to be continuous if f −1 (S) ∈ T for every S ∈ S, where f −1 (S) = {x ∈ T : f (x) ∈ S}. Lemma 2.9 provides, similarly, a definition of convergence in terms only of open sets, and we adopt this definition in a topological space. Definition C.2 A sequence (xn ) ∈ T converges to x in the topology T if for every U ∈ T with x ∈ U there exists N such that xn ∈ U for all n ≥ N .

C.2 Bases and Sub-bases In a metric space we do not have to specify all the open sets: we can build them up from open balls (see Exercise 2.7). We can do something similar in an abstract topological space. Definition C.3 A basis for a topology T on T is a collection B ⊆ T such that every set in T is a union of sets from B. The following lemma is an immediate consequence of the definition of a basis, since T ∈ T, and if B1 , B2 ∈ B, then B1 , B2 ∈ T so B1 ∩ B2 ∈ T. Lemma C.4 If B is any basis for T, then (B1) T is the union of sets from B; (B2) if B1 , B2 ∈ B, then B1 ∩ B2 is the union of sets from B. However, this can be reversed.

C.2 Bases and Sub-bases

321

Proposition C.5 Let B be a collection of subsets of a set T that satisfy (B1) and (B2). Then there is a unique topology of T whose basis is B; its open sets are precisely the unions of sets from B. Note that T is the smallest topology that contains B. Proof If there is such a topology, then, by the definition of a basis, its sets consist of the unions of sets from B. So we only need check that if T consists of unions of sets from B, then this is indeed a topology on T . We check properties (T1)–(T3). (T1): T is the union of sets from B by (B1). (T2): if U, V ∈ T, then U = ∪i∈I Bi and V = ∪ j∈I D j , with Bi , D j ∈ B, and so Bi ∩ D j , U ∩V = i, j

which is a union of sets in B by (B2) and hence an element of T. (T3): any union of unions of sets from B is a union of sets from B. There is a smaller collection of sets from which we can construct the topology T by allowing for not only unions (as in (T2)) but also finite intersections (as in (T3)). Definition C.6 A sub-basis for a topology T on T is a collection B ⊆ T such that every set in T is a union of finite intersections of sets from B. (Note that if B is a basis for T, then it is also a sub-basis for T; but in general a sub-basis will be ‘smaller’.) Example: the collection of intervals (a, ∞) and (−∞, b) (ranging over all a, b ∈ R) is a sub-basis for the usual topology on R, since intersections give the open intervals (a, b) and these are a basis for the usual topology. Proposition C.7 If B is any collection of subsets of a set T whose union is T , then there is a unique topology T on T with sub-basis B. Its open sets are precisely the unions of finite intersections of sets from B. Proof If B is a sub-basis for T , then any topology has D, the collection of all finite intersections of elements of B, as a basis. But D satisfies (B1) and (B2) from Lemma C.4, so by Proposition C.5 there is a unique topology T with basis D, which is also the unique topology with sub-basis B.

322

The Banach–Alaoglu Theorem

Note that the topology T from this proposition is the smallest topology on T that contains B. To check that a map f : T1 → T2 is continuous it is enough to check that preimages of a sub-basis for T2 are open in T1 (since any basis is also a subbasis, we could check for a basis if we wanted). Lemma C.8 Suppose that f : (T1 , T1 ) → (T2 , T2 ), and that B is a sub-basis for the topology T2 . Then f is continuous if and only if f −1 (B) ∈ T1 for every B ∈ B. Proof ‘Only if’ is clear since every element of the sub-basis is an element of T2 . Now, any element U of T2 can be written as U = ∪i Di , for some {Di } that are finite intersections of elements of B. So f −1 (U ) = f −1 (∪i Di ) = ∪i f −1 (Di ), n(i)

and since for each Di we have Di = ∩ j=1 B j with B j ∈ B we have

 f −1 ∩nj=1 B j = ∩nj=1 f −1 (B j ), which is open by assumption. So f −1 (U ) is a union of open sets, so open.

C.3 The Weak-∗ Topology The weak-∗ topology on X ∗ is the smallest topology T∗ on X ∗ such that every map δx : X ∗ → K given by δx ( f ) := f (x) is continuous. Since the open balls {B(z, ε) : z ∈ K, ε > 0} form a basis (and so also a sub-basis) for the topology of K, by Lemma C.8 it is enough to guarantee that δx−1 (B(z, ε)) is an element of T∗ for every z ∈ K , ε > 0. In other words we should take T∗ to be the topology with sub-basis {φ ∈ X ∗ : φ(x) ∈ B(z, ε)}

x ∈ X, z ∈ K, ε > 0,

(C.1)

i.e. T∗ is formed of all unions of finite intersections of sets of the form (C.1). We now show that weak-∗ convergence and convergence in the weak-∗ topology coincide. ∗

Proposition C.9 If ( f n ) ∈ X ∗ , then f n  f if and only if f n → f in T∗ .

C.4 Compactness and Sequential Compactness

323

Proof Suppose that f n  f , and let U ∈ T∗ be such that f ∈ U . Then, since U is the union of finite intersections of sets of the form in (C.1), f must belong to one such intersection that itself is a subset of U ; so there exist m, yi ∈ X , z i ∈ K, ε > 0 such that f ∈

m 5

{φ ∈ X ∗ : φ(yi ) ∈ B(z i , ε)} ⊂ U.

i=1 ∗

It follows that, in particular, f (yi ) ∈ B(z i , ε). Since f n  f , for n sufficiently large we have f n (yi ) ∈ B(z i , ε) for all i = 1, . . . , m, i.e. f n ∈ U . So f n → f in T∗ . Conversely, if f n → f in T∗ and we choose some x ∈ X and ε > 0, then f is contained in the open set U := {φ ∈ X ∗ : φ(x) ∈ B( f (x), ε)} ∗

and so f n ∈ U for all n ≥ N , i.e. | f n (x) − f (x)| < ε, so f n  f .

C.4 Compactness and Sequential Compactness A topological space (T, T) is compact if every open cover of T has a finite subcover, i.e. if Uα , Uα ∈ T, T = αA

then there exist Uα1 , . . . , Uαn such that T =

n

Uαi .

i=1

A topological space is sequentially compact if any sequence (xn ) ∈ T has a subsequence (xn j ) that converges to a limit (that lies in T ). Two relatively simple results will enable us to prove the equivalence of compactness and sequential compactness in any metric space. Theorem C.10 Let F be a collection of non-empty closed subsets of a compact space T such that every finite subcollection of F has a non-empty intersection. Then the intersection of all the sets from F is non-empty. Proof Suppose that the intersection of all the sets from F is empty, and let U be the collection of their complements, U := {T \ F : F ∈ F}.

324

The Banach–Alaoglu Theorem

Then U is an open cover of T , since 5 5 T\ U= (T \ U ) = F = ∅. U ∈U

U ∈U

F∈F

Thus U has a finite subcover U1 , . . . , Un , which implies that ∅=T \

n i=1

Ui =

n 5

Fi ,

i=1

a contradiction. Lemma C.11 Any closed subset K of a compact space T is compact. Proof Let {Uα }α∈A be an open cover of K . Then {Uα }α∈A ∪[T \ K ] is an open cover of T , so it has a finite subcover {Uα j }nj=1

or

{Uα j }nj=1 ∪ [T \ K ].

Either way, {Uα j }nj=1 is a finite subcover of K , so K is compact. Corollary C.12 Let F1 ⊃ F2 ⊃ F3 ⊃ · · · be non-empty closed subsets of a ; compact space T . Then ∞ j=1 F j = ∅. Lemma C.13 If K is a sequentially compact subset of a metric space, then, given any open cover U of K , there exists δ > 0 such that for any x ∈ X there exists U ∈ U such that B(x, δ) ⊂ U . (Such a δ is called a Lebesgue number for the cover U.) Proof Suppose that U is an open cover of K for which no such δ exists. Then for every ε > 0 there exists x ∈ X such that B(x, ε) is not contained in any element of U. Choose xn such that B(xn , 1/n) is not contained in any element of U. Then xn has a convergent subsequence, xn j → x. Since U covers K , x ∈ U for some element U ∈ U. Since U is open, B(x, ε) ⊂ U for some ε > 0. But now take n sufficiently large that d(xn , x) < ε/2 and 1/n < ε/2. Then B(xn , 1/n) ⊂ B(x, ε) ⊂ U , contradicting the definition of xn . Theorem C.14 A subset K of a metric space (X, d) is sequentially compact if and only if it is compact. Proof Step 1. Compactness implies sequential compactness.

C.4 Compactness and Sequential Compactness

325

Let (x j ) be a sequence in a compact set K . Consider the sets Fn defined by setting Fn = {xn , xn+1 , . . .}. The sets Fn are a decreasing sequence of closed subsets of K , so we can find x∈

∞ 5

Fj .

j=1

We now show that there is a subsequence that converges to x: ● ● ●

since x ∈ {x j : j ≥ 1} there exists j1 such that d(x j1 , x) < 1; since x ∈ {x j : j > j1 } there exists j2 > j1 such that d(x j2 , x) < 1/2; continue in this way to find jk > jk−1 such that d(x jk , x) < 1/k.

Then x jk is a subsequence of (x j ) that converges to x. Step 2. Sequential compactness implies compactness. First we show that for every ε > 0 there is a cover of K by a finite number of sets of the form B(x j , ε) for some x j ∈ K . Suppose that this is not true, and that we have found {x1 , . . . , xn } such that d(xi , x j ) ≥ ε for all i, j = 1, . . . , n. Since the collection B(x j , ε) does not cover K , there exists xn+1 ∈ K such that {x1 , . . . , xn+1 } all satisfy d(xi , x j ) ≥ ε for all i, j = 1, . . . , n + 1. Therefore the sequence (x j ) has no Cauchy subsequence, so no convergent subsequence, which contradicts the sequential compactness of K . Now, given any open cover U of K , consider the finite points y1 , . . . , y N such that B(yi , δ) cover K , where δ is the δ from Lemma C.13 for the original cover. Then B(yi , δ) ⊂ Ui for some Ui ∈ U, and we have K ⊂

N

B(yi , δ) ⊂

i=1

N

Ui ,

i=1

so we have found a finite subcover. A topology T on T is called metrisable if it coincides with the open sets for some metric d defined on T . Since compactness is defined only in terms of open sets, and the notion of convergence used for sequential compactness can be defined solely in terms of open sets, it follows that in any metrisable topological space compactness and sequential compactness of equivalent.

326

The Banach–Alaoglu Theorem

C.5 Tychonoff’s Theorem Suppose that {Tα : α ∈ A} is a collection of topological spaces. Then the product of the {Tα }, = T = Tα , α∈A

is the collection of all functions x : A → ∪α∈A Tα such that x(α) ∈ Tα for each α ∈ A. The product topology on T is the smallest topology such that for each α ∈ A the map πα : T → Tα defined by setting πα (x) := x(α) is continuous. This is the topology with sub-basis {x ∈ T : x(α) ∈ πα−1 (U ), U ∈ Tα },

α ∈ A.

The topology is therefore formed of unions of sets of the form {x ∈ T : x(αi ) ∈ πα−1 (Ui ), Ui ∈ Tαi }, i

α1 , . . . , αn ∈ A.

(C.2)

(Note that this basis is not all sets of the form x ∈ T with x(α) ∈ Tα for each α; this defines the ‘box topology’ on T , and in general T is not compact in this topology.) Tychonoff’s Theorem guarantees that if all of the Tα are compact, then so is the product T (with the product topology). In order to prove the theorem we will need a preparatory lemma. Lemma C.15 Let (T, T) be a topological space. If T is not compact, then it has a maximal open cover V that has no finite subcover. Moreover, if (S, S) is any topological space, then for any continuous π : (T, T) → (S, S) (i) the collection U := {U ∈ S : π −1 (U ) ∈ V} contains no finite cover of S; and (ii) if U ⊂ S is open and not in U, then π −1 (S \ U ) can be covered by finitely many sets from V. Proof We use Zorn’s Lemma to guarantee the existence of such a V. If T is not compact, then it has an open cover V with no finite subcover. Let E be the collection of all open covers of T that contain V and have no finite subcover; define a partial order  on E so that W  W if W ⊆ W . Given any chain C ⊂ E, let C. Z := C∈C

C.5 Tychonoff’s Theorem

327

Then Z ⊃ C for every C ∈ C , and so provided that Z ∈ E it is an upper bound for C . Certainly Z is an open cover of T that contains V , and it has no finite subcover: if it did, then this subcover would consist of sets {Ui : i = 1, . . . , n} with Ui ∈ Ci , and since C is a chain we would have Ui ∈ C for all i for some C ∈ C . This would yield some C with a finite subcover, which is not possible. Since every chain has an upper bound, there exists a maximal element V in E. (i) if U contained a finite cover {U1 , . . . , Un } of S, then π −1 (Ui ) would give a finite open (as π is continuous) cover of T using sets from V. If (ii) does not hold, then we claim that V := π −1 (U ) ∪ V is an open cover of T that is strictly larger than V but that still does not have a finite subcover. / V since U ∈ / U, and if It is clearly an open cover of T (since V is), π −1 (U ) ∈ π −1 (S \ U ) cannot be covered by finitely many sets from V then π −1 (U ) ∪ π −1 (S \ U ) = T cannot be covered by finitely many sets from π −1 (U ) ∪ V, so V has no finite subcover, contradicting the maximality of V. We can now prove Tychonoff’s Theorem. Theorem C.16 (Tychonoff’s Theorem) The topological product of compact spaces is compact. Proof Let (Tα )α∈A be a collection of compact topological spaces, and let = T = Tα α∈A

be their topological product. Suppose that T is not compact; then it has a maximal cover with no finite sub-cover, which we will call V. Let Uα be the collection of open sets U in Tα such that πα−1 (U ) ∈ V. Since Tα is compact, if Uα covers Tα it would have a finite subcover, but this cannot hold using part (i) of Lemma C.15; so Uα does not cover Tα . Therefore for each α ∈ A we can choose some x(α) ∈ Tα that is not covered by Uα . It follows, from part (ii) of Lemma C.15, that any open set U that contains x(α) is not an element of Uα , and so πα−1 (Tα \ U ) can be covered by finitely many sets from V. The point x = (x(α))α∈A belongs to some set V ∈ V, since V is a cover of T . So we can find {α1 , . . . , αn } and open sets Uαi ⊂ Tαi with x(αi ) ∈ Uαi such that n 5 π −1 (Uαi ) i=1

328

The Banach–Alaoglu Theorem

is contained in V (recall that open sets in T are unions of sets of this form; see (C.2)). But now, using part (ii) of Lemma C.15, since each Uαi is not in Uαi (as x(αi ) ∈ Uαi ) it follows that π −1 (Tαi \ Uαi ) can be covered by a finite number of sets from V. Thus T =

n 5

πα−1 (Uαi ) ∪ i

i=1

n

πα−1 (Tαi \ Uαi ) i

i=1

is a cover of T by a finite collection of sets from V, a contradiction.

C.6 The Banach–Alaoglu Theorem We can now prove the Banach–Alaoglu Theorem. This is a ‘topological’ version of Theorem 27.11; unlike that theorem it does not say that bounded sequences in X ∗ always have weakly-∗ convergent subsequences, although we will be able to deduce this sequential result from the topological theorem when X is separable in the next section. Theorem C.17 (Banach–Alaoglu Theorem) If X is a Banach space, then the closed unit ball in X ∗ is compact in the weak-∗ topology. Proof Denote the closed unit ball in X ∗ by B X ∗ , and consider the topological space = {λ ∈ K : |λ| ≤ x} Y := x∈X

equipped with the product topology. By Tychonoff’s Theorem (Theorem C.16), Y is a compact topological space. Elements of Y are maps f : X → K such that | f (x)| ≤ x; the product topology is the smallest topology on Y that makes all the individual evaluation maps δx : f → f (x) continuous. So restricted to B X ∗ ⊂ Y (the collection of all elements in Y that are also linear maps) the product topology coincides with the weak-∗ topology. Since B X ∗ ⊂ Y , to show that B X ∗ is compact, it suffices to show that B X ∗ is closed (in the product topology). To do this, observe that B X ∗ consists of all elements of Y that are linear, i.e. B X ∗ = {φ ∈ Y : φ(λx + μy) = λφ(x) + μφ(y), λ, μ ∈ K, x, y ∈ X } 5 (δλx+μy − λδx − μδ y )−1 {0}. =Y ∩ x,y∈X, λ,μ∈K

C.7 Metrisability of the Weak-∗ Topology in a Separable Space

329

Since all evaluation maps are continuous, so is δλx+μy −λδx −μδ y , and so each set in the intersection is the preimage of the closed set {0} under a continuous map, and so closed. Therefore B X ∗ is the intersection of closed sets, and so closed itself.

C.7 Metrisability of the Weak-∗ Topology in a Separable Space We showed in Theorem C.14 that compactness and sequential compactness are equivalent in metrisable topologies. We can therefore obtain Theorem 27.11 from Theorem C.17 as a consequence of the following observation. Lemma C.18 If X is separable, then the weak-∗ topology on B X ∗ , the closed unit ball in X ∗ , is metrisable; it can be derived from the metric d( f, g) =

∞  | f (x j ) − g(x j )| j=1

2 j (1 + x j )

(C.3)

where (x j )∞ j=1 is any countable dense subset of X . Note that (i) open sets in (B X ∗ , T∗ ) are unions of sets of the form {φ ∈ B X ∗ : φ(y) ∈ B(z j , ε j ) : y ∈ X, z j ∈ K, ε j > 0, j = 1, . . . , n} and (ii) that for each j we have (from (C.3)) | f (x j ) − g(x j )| ≤ 2 j (1 + x j ) d( f, g). Proof Suppose that U ∈ T∗ and that f ∈ U . Then there exist yi ∈ X , for i = 1, . . . , n, and ε > 0 such that f ∈ {φ ∈ B X ∗ : |φ(yi ) − f (yi )| < ε, i = 1, . . . , n} ⊂ U. Now for each yi find x j (i) such that yi − x j (i)  < ε/3, and set M = max 2 j (i) (1 + x j (i) ). i=1,...,n

Then, if d(φ, f ) < ε/3M, we have |φ(yi ) − f (yi )| = |φ(yi ) − φ(x j (i) ) + φ(x j (i) ) − f (x j (i) ) + f (x j (i) ) − f (yi )|

330

The Banach–Alaoglu Theorem ≤ |φ(yi ) − φ(x j (i) )| + |φ(x j (i) ) − f (x j (i) )| + | f (x j (i) ) − f (yi )|

≤ yi − xi( j)  + 2 j (i) (1 + x j (i) ) d(φ, f ) + x j (i) − yi  ε ε ε ≤ + + = ε. 3 3 3 So U is open in (X ∗ , d). Conversely, consider the open ball in (B X ∗ , d) given by U := {φ ∈ B : d(φ, f ) < ε}. Given any g ∈ U set δ = d(g, f ) and r = ε − δ. Note that for any φ ∈ B we have 2x j  |φ(x j ) − g(x j )| ≤ j ≤ 2−( j−1) , 2 j (1 + x j ) 2 (1 + x j ) so that, taking N such that 2−(N −1) < r/2 we obtain ∞ ∞   |φ(x j ) − g(x j )| ≤ 2−( j−1) = 2−(N −1) < r/2. 2 j (1 + x j )

j=N +1

j=N +1

It follows that if ψ∈

N 5

{φ ∈ B X ∗ : |φ(xi ) − g(xi )| < r/2},

j=1

then d(ψ, g) =

N  |ψ(x j ) − g(x j )| j=1

2 j (1 + x j )

+

∞  |ψ(x j ) − g(x j )| 2 j (1 + x j )

j=N +1

r  −j r 2 + < r, 2 2 N

j=1

and so d(ψ, f ) ≤ d(ψ, g) + d(g, f ) ≤ r + δ = ε, i.e. ψ ∈ U , so U ∈ T∗ .

Solutions to Exercises

Chapter 1 1.1 Dividing by b p and setting x = a/b it suffices to show that (1 + x) p ≤ 2 p−1 (1 + x p ). The maximum of f (x) := (1 + x) p /(1 + x p ) occurs when f  (x) =

p(1 + x) p−1 (1 + x p ) − p(1 + x) p x p−1 = 0, (1 + x p )2

i.e. when (1 + x p ) − (1 + x)x p−1 = 0 which happens at x = 1. Thus f (x) ≤ 2 p−1 which yields the required inequality. 1.2 The only thing that is not immediate is that f + g ∈ L˜ p (0, 1) whenever we take f, g ∈ L˜ p (0, 1). First we have f + g ∈ C(0, 1) as the sum of two continuous functions is continuous, and then we use the result of the previous exercise to guarantee that ˆ 1 ˆ 1 p p−1 | f (x) + g(x)| ≤ 2 | f (x)| p + |g(x)| p < ∞. 0

0

1.3 Suppose that v = nj=1 α j e j = nj=1 β j e j ; then nj=1 (α j − β j )e j = 0, and since the {e j } are linearly independent it follows that α j − β j = 0 for each j, i.e. α j = β j for each j. m another basis for V . Suppose 1.4 Let E := {e j }nj=1 be a basis for V , and F := { f i }i=1 that m > n; otherwise we can reverse the roles of the two bases. Since Span(F) = V , we can write

e1 =

m 

αi f i ;

(S.1)

i=1

since e1 = 0, there is a k ∈ {1, . . . , m} such that αk = 0, and so we have  αk−1 αi f i . f k = αk−1 e1 − i =k

331

(S.2)

332

Solutions to Exercises

We can therefore replace f k by e1 to get a new basis F  for V : F  still spans V because of (S.2), and is still linearly independent, since if  βi f i = 0 βk e1 + i =k

then, using (S.1), m 

βk αi f i +

β k αk f k +

βi f i = 0,

i =k

i=1

which yields





(βk αi + βi ) f i = 0,

i =k

so βk = 0, and then βi = 0 for all i = k. We can repeat this procedure in turn for each element of the basis E, ending up with a new basis F (n) that still contains m elements but into which we have swapped all of the n elements of E. Since E spans V , F (n) can no longer be linearly independent, contradicting our assumption that F contains more than n elements initially. 1.5 If x, y ∈ Ker(T ) and α ∈ K, then T (x + y) = T x + T y = 0 + 0 = 0

and

T (αx) = αT x = 0,

so Ker(T ) is a vector space. If ξ, η ∈ Range(T ), then ξ = T x and η = T y for some x, y ∈ X ; then T (x + y) = ξ + η and T (αx) = αT x = αξ , and so Range(T ) is also a vector space. 1.6 Since U is a vector space, we have [x] + [y] = x + U + y + U = x + y + U = [x + y] and λ[x] = λx + λU = λx + U = [λx]. These two properties show immediately that the map x  → [x] is linear. 1.7 This is clearly true if n = 1. Suppose true for n = m, i.e. c j  ci  ck for i = 1, . . . , m. Then since C is a chain we have cm+1  c j or c j  cm+1 ; in the former case set j  = m + 1, in the latter set j  = j. Similarly we have ck  cm+1 or cm+1  ck ; in the former case set k  = m + 1 and in the latter set k  = k. Then we obtain c j   ci  ck  for i = 1, . . . , m + 1. 1.8 Let P be the collection of all linearly independent subsets of V that contain Z . We define a partial order on P as in the proof of Theorem 1.28. Arguing just as we did there, any chain in P has an upper bound, so P has a maximal element. This is a maximal linearly independent set, so is a Hamel basis for V that contains Z . Chapter 2 2.1 If η = 0 the inequality is immediate, so take η = 0. Since it follows that

n

2 j=1 (ξ j − λη j ) ≥ 0,

Solutions to Exercises

0≤

n  j=1

Now choose λ =

ξ 2j − 2λξ j η j + λ2 η2j =

n 

ξ 2j − 2λ

j=1

j ξjηj

 / j η2j and rearrange.

n 

333

ξ j η j + λ2

j=1

n 

η2j .

j=1

2.2 Once we prove that this expression provides a norm on X 1 × X 2 it is enough to proceed by induction. The only thing that is not trivial to check is the triangle inequality in the case p = ∞, and for this we use (2.10): taking any three points (x1 , x2 ), (y1 , y2 ), (z 1 , z 2 ) ∈ X 1 × X 2 we have 1/ p 

p ((x1 , x2 ), (y1 , y2 )) = d1 (x1 , y1 ) p + d2 (x2 , y2 ) p 1/ p  ≤ [d1 (x1 , z 1 ) + d1 (z 1 , y1 )] p + [d2 (x2 , z 2 ) + d2 (z 2 , y2 )] p ≤ [d1 (x1 , z 1 ) p + d2 (x2 , z 2 ) p ]1/ p + [d1 (z 1 , y1 ) p + d2 (z 2 , y2 ) p ]1/ p = p ((x1 , x2 ), (z 1 , z 2 )) + p ((z 1 , z 2 ), (y1 , y2 )). To go from the first to second line, we use the triangle inequality in X 1 and in X 2 ; to go from the second to third line, we use (2.10); to go from the third to the fourth line, we use the definition of p . ˆ ˆ 2.3 Clearly d(x, y) ≥ 0, d(x, y) = 0 if and only if d(x, y) = 0 if and only if x = y, ˆ ˆ x). It remains only to check the triangle inequality, and this follows and d(x, y) = d(y, from the fact that the map t  → t/(1 + t) is monotonically increasing: since d satisfies the triangle inequality, d(x, y) ≤ d(x, z) + d(z, y), we have ˆ d(x, y) =

d(x, z) + d(z, y) d(x, y) ≤ 1 + d(x, y) 1 + d(x, z) + d(z, y) d(x, z) d(z, y) ˆ z) + d(z, ˆ y). ≤ + = d(x, 1 + d(x, z) 1 + d(z, y)

(To see that t  → t/(1 + t) is increasing, note that the function f (t) = t/(1 + t) has derivative f  (t) = 1/(1 + t)2 .) 2.4 Clearly d(x, y) ≥ 0, and if d(x, y) = 0, then |x j − y j | = 0 for every j and so x = y. That d(x, y) = d( y, x) is clear. To check the triangle inequality, note that d(x, y) =

∞ 

2− j

|x j − y j | 1 + |x j − y j |

2− j

|x j − z j | + |z j − y j | 1 + |x j − z j | + |z j − y j |

j=1

≤ ≤

∞  j=1 ∞  j=1

2− j

1

|z j − y j | |x j − z j | + 1 + |x j − z j | 1 + |z j − y j |

2

= d(x, z) + d(z, y), using the facts that |x j − y j | ≤ |x j − z j | + |z j − y j | and the map t  → t/(1 + t) is monotonically increasing in t.

334

Solutions to Exercises

If d(x (n) , y) → 0, then ∞ 

2− j

j=1

(n)

|x j

− yj|

(n)

1 + |x j

− yj|

→ 0; (n)

in particular, each term converges to zero, so we must have x j → y j as n → ∞ for − j < ε/2, and each j. Conversely, given any ε > 0, choose M such that ∞ j=M+1 2 then choose N such that (n)

|x j

− y j | < ε/2

j = 1, . . . , M

for all n ≥ N . Then for n ≥ N we have d(x (n) , y) =

∞ 

− yj| (n) 1 + |x j − y j |

2− j

ε + 2

j=1

M 

(n)

2− j

j=1

|x j

∞ 

2− j ≤

j=M+1

ε ε + = ε, 2 2

and so x (n) → y. 2.5 If {F j }nj=1 are closed, then X \ F j are open, and X\

n

Fj =

j=1

n 5

X \ Fj

j=1

is open since the finite intersection of open sets is open: this shows that ∩nj=1 F j is closed. Similarly, if {Fα }α∈A is any family of closed sets, then {X \ Fα }α∈A is a family of open sets, so 5 X\ Fα = X \ Fα α∈A

α∈A

is open since any union of open sets is open: this shows that ∩α Fα is closed. 2.6 Take any y ∈ B(x, r ) and let ε = r − d(y, x) > 0. If z ∈ B(y, ε), then d(z, x) ≤ d(z, y) + d(y, x) < ε + d(y, x) = r, and so B(y, ε) ⊆ B(x, r ), which shows that B(x, r ) is open. 2.7 A set U is open in a metric space (X, d) if for every x ∈ U there exists εx > 0 such that B(x, εx ) ⊆ U . So we can write B(x, εx ) U= x∈U

(every x ∈ U is contained in the right-hand side, and all sets in the union on the right-hand side are subsets of U ). 2.8 If f : (X, d X ) → (Y, dY ) is continuous and xn → x in X , then since f is continuous at x there exists δ > 0 such that d X (xn , x) < δ implies that dY ( f (xn ), f (x)) < ε.

Solutions to Exercises

335

Now, since xn → x we can choose N such that d X (xn , x) < δ for all n ≥ N ; this implies that dY ( f (xn ), f (x)) < ε for all n ≥ N , so f (xn ) → f (x) in Y as claimed. If f is not continuous at x, then there exists ε > 0 such that for every δ > 0 there exists y ∈ X with d X (y, x) < δ by dY ( f (y), f (x)) ≥ ε. For each n we set δ = 1/n and thereby obtain a sequence (xn ) ∈ X such that d X (xn , x) < 1/n and dY ( f (xn ), f (x)) ≥ ε: so we have found a sequence (xn ) such that xn → x but f (xn ) → f (x). 2.9 If A is a countable dense subset of (X, d X ) and B is a countable dense subset of (Y, dY ), then A × B is a countable subset of X ×Y . To show that it is dense if we use the metric p , taking 1 ≤ p < ∞ given (x, y) ∈ X × Y and ε > 0 choose (a, b) ∈ A × B such that d(a, x) < 2−1/ p ε

and

d(b, y) < 2−1/ p ε,

and then

p ((a, b), (x, y)) p = d(a, x) p + d(b, y) p < ε p . For p = ∞ we find (a, b) ∈ A × B with d(a, x) < ε and d(b, y) < ε. 2.10 Suppose that

;

α∈A Fα is empty. Then

X=X\

5

α∈A

Fα =

X \ Fα .

α∈A

So {X \ Fα }α∈A is an open cover of X . Since X is compact, there is a finite subcover, X=

n j=1

X \ Fα j = X \

n 5

Fα j ;

j=1

this implies that ∩nj=1 Fα j = ∅, but this contradicts the assumption that such an intersection is always non-empty. So ∩α Fα must be non-empty. 2.11 The sets {F j } have the finite intersection property from the previous exercise, since for any finite collection ∩kj=1 Fn j ⊃ Fk , where k = max j n j . It follows that ∩ j F j = ∅. 2.12 If α = sup(S), then for any ε > 0 there exists x ∈ S such that x > α − ε. In particular, there exist xn ∈ S such that α − 1/n < xn ≤ α, and so xn → α. Since S is closed, α ∈ S. 2.13 Take any closed subset A of X . Then, since A is a closed subset of a compact set, it is compact, and then since f is continuous, f (A) is a compact subset of Y . It follows that f (A) is a closed subset of Y . Now, since f is a bijection f (A) = ( f −1 )−1 (A); since ( f −1 )−1 (A) is therefore a closed subset of Y it follows using Lemma 2.13 that f −1 is continuous. An alternative, less topological proof, proceeds via contradiction. Suppose that f −1 is not continuous: then there exist ε > 0, y ∈ Y , and a sequence (yn ) ∈ Y such that dY (yn , y) → 0 but d X ( f −1 (yn ), f −1 (y)) ≥ ε.

(S.3)

336

Solutions to Exercises

Since X is compact, there is a subsequence of f −1 (yn ) such that f −1 (yn j ) → x. Since f is continuous, it follows that yn j → f (x), so y = f (x). Since f is injective, it follows that f −1 (y) = x, but then we have f −1 (yn j ) → f −1 (y) contradicting (S.3). 2.14 For each n cover X by open balls of radius 1/n. Since X is compact, this cover (n) Nn has a finite subcover {B(yi , 1/n)}i=1 . Take the centres of the balls in each of these finite subcovers, adding them successively (for each n) to our sequence (x j ). (More n (n) for all explicitly, if we set s0 = 0 and sn = j=1 N j , then we let x sn +i = yi 1 ≤ i ≤ Nn .) Now for any x ∈ X and ε > 0 there exists an n such that 1/n < ε; if we set M(ε) = sn , then there is an x j with 1 ≤ j ≤ M(ε) such that d(x j , x) < 1/n < ε. Chapter 3 3.1 We have d(x + z, y + z) = (x + z) − (y + z) = x − y = d(x, y) and d(αx, αy) = αx − αy = |α|x − y = |α|d(x, y). The metric from Exercise 2.4 does not satisfy the second of these two properties. 3.2 If x1 , x2 ∈ A + B, then xi = ai + bi , with ai ∈ A and bi ∈ B, and for any λ ∈ (0, 1) λx1 + (1 − λ)x2 = λ(a1 + b1 ) + (1 − λ)(a2 + b2 ) = (λa1 + (1 − λ)a2 ) + (λb1 + (1 − λ)b2 ) ∈ A + B. 3.3 A simple inductive argument shows that if a, b ∈ C, then λa + (1 − λb) ∈ C for any λ of the form a2−k with a = 0, . . . , 2k . Given any λ ∈ (0, 1) we can find n k with 0 ≤ n k ≤ 2k such that nkk → λ; since C is closed it follows that 2 1 2  nk nk b ∈ C. λa + (1 − λb) = lim a + 1 − k→∞ 2k 2k 3.4 If f  ≥ 0 on (a, b), then f  is increasing on [a, b]. Take a < c < b, where c = (1 − t)a + tb. Then, by the Mean Value Theorem, there exists ξ ∈ (a, c) and η ∈ (c, b) such that f (c) − f (a) = f  (ξ ) c−a

and

f (b) − f (c) = f  (η). b−c

Since f  is increasing, f (b) − f (c) f (c) − f (a) ≤ . c−a b−c Since c = (1 − t)a + tb, we have b − c = (1 − t)(b − a) and c − a = t (b − a); it follows that (1 − t)( f (c) − f (a)) ≤ t ( f (b) − f (c)), and so f is convex.

Solutions to Exercises

337

Convexity of f (x) = ex is now immediate, since f  (x) = ex > 0. For convexity of f (t) = |t|q we first consider t ≥ 0, where we have f (t) = t q . This is twice differentiable, with f  (t) = q(q − 1)t q−2 ≥ 0 on [0, ∞), so f is convex on [0, ∞). A similar argument shows that f is convex on (−∞, 0]. So it only remains to check the required convexity inequality if x < 0 and y > 0, and then, using convexity on [0, ∞), |t x + (1 − t)y|q ≤ |t|x| + (q − t)|y||q ≤ t|x|q + (1 − t)|y|q . 3.5 The inequality is almost immediate, since m = |xi | for some i, and |x j | ≤ m for every j = 1, . . . , n. Taking the pth root yields ⎛ ⎞1/ p n  m≤⎝ |x j | p ⎠ ≤ n 1/ p m, j=1

and n 1/ p → 1 as p → ∞. So x p → m = x∞ as p → ∞. 3.6 If x ∈ 1 , then x ∈  p for every p ∈ [1, ∞] so x p < ∞ for every p ∈ [1, ∞]. Fix ε > 0. Since x ∈ 1 , there exists N such that ∞ 

|x j | < ε

for every n ≥ N .

j=n+1

In particular, x − (x1 , . . . , xn , 0, 0, 0, . . .) p ≤ x − (x1 , . . . , xn , 0, 0, 0, . . .)1 < ε for every n ≥ N . It follows that for every p ∈ [1, ∞) we have x p − ε ≤ (x1 , . . . , xn , 0, 0, 0, . . .) p = (x1 , . . . , xn ) p ≤ x p .

(S.4)

In ∞ the inequality is immediate from the definition of the supremum: there exists an N  such that for n ≥ N  x∞ − ε ≤

max |x j | = (x1 , . . . , xn )∞ ≤ x∞ .

j=1,...,n

(S.5)

Now fix some n ≥ max(N , N  ), and using Exercise 3.5 choose p0 such that (x1 , . . . , xn )∞ ≤ (x1 , . . . , xn ) p ≤ (x1 , . . . , xn )∞ + ε for all p ≥ p0 . Combining (S.4), (S.5), and (S.6) we obtain x∞ − ε ≤ (x1 , . . . , xn ) p ≤ x p and x p − ε ≤ (x1 , . . . , xn ) p ≤ (x1 , . . . , xn )∞ + ε ≤ x∞ + ε, and so x∞ − ε ≤ x p ≤ x∞ + 2ε for all p ≥ p0 , i.e. x p → x∞ as p → ∞.

(S.6)

338

Solutions to Exercises

3.7 Set xn = [1/n(log n)2 ]1/ p . Then ∞ 

|xn | p =

n=1

so x ∈  p but

∞ 

∞  n=1

∞ 

|xn |q =

n=1

n=1

1 0. We have already observed that 0 ∈ U , while if x = 0 we have εx/2x ∈ B X (0ε) ⊂ U , and then, since U is a subspace, x=

2x εx ∈ U. ε 2x

3.9 Suppose that x, y ∈ U and α ∈ K. Then there exist (xn ) ∈ U and (yn ) ∈ U such that xn → x and yn → y. It follows that xn + yn → x + y, so x + y ∈ U , and αxn → αx, so αx ∈ U . This shows that U is also a linear subspace, and it is closed since A is closed for any set A. 3.10 If x = (x1 , . . . , xn ) ∈ Rn , then max |x j | ≤

j=1,...,n

n 

|x j | ≤ n max |x j |, j=1,...,n

j=1

i.e. x∞ ≤ x1 ≤ nx∞ ; and ⎛ ⎞1/2 n  max |x j | ≤ ⎝ |x j |2 ⎠ ≤ [n( max |x j |)2 ]1/2 = n 1/2 max |x j |, j=1,...,n

j=1,...,n

j=1

j=1,...,n

i.e. x∞ ≤ x2 ≤ n 1/2 x∞ . Combining these two ‘equivalences’ yields n −1/2 x2 ≤ x1 ≤ nx2 . 3.11 If  f n − f ∞ → 0, then ˆ 1 p p | f n (x) − f (x)| p dx ≤  f n − f ∞ → 0,  fn − f L p = 0

so  f n − f  L p → 0; pointwise convergence follows almost immediately, since for each x ∈ [0, 1] | f n (x) − f (x)| ≤  f n − f ∞ . 3.12 The linear span of the elements {e( j) }∞ j=1 is dense in c0 (K). Given any ε > 0 there exists N such that |xn | < ε for all n ≥ N , and then * * * * n  * * ( j) *x − xje * * * = sup |x j | < ε. * *∞ j>n j=1 

Solutions to Exercises

339

This shows that c0 (K) is separable using Lemma 3.23 (iii). 3.13 Let A be a countable dense subset of (X,  ·  X ), and let φ : X → Y be an isomorphism. Then φ(A) = {φ(a) : a ∈ A} is a countable subset of Y that is dense in Y : given any y ∈ Y and ε > 0, we know that y = φ(x) for some x ∈ X ; since A is dense in X we can find a ∈ A such that x − a X < ε, and then, since φ is an isomorphism, φ(x) − φ(a)Y ≤ c2 x − a X < c2 ε. 3.14 Since A ⊂ B, we have clin(A) ⊆ clin(B). Given any x ∈ clin(B), we have * * * * n  * * *x − * for some n ∈ N, α j ∈ K, b j ∈ B. α b j j * < ε/2 * * * j=1 Since A is dense in B, for each b j there exists a j ∈ A with a j − b j  < ε/2n|α j |, and then * * * * * * *  * n n n   * * * * * ≤ *x − *+ *x − α a α b |α j |b j − a j  < ε, j j j j * * * * * * * j=1 * j=1 j=1 which shows that x ∈ clin(A) and therefore clin(B) ⊆ clin(A). So if A is dense in B then clin(A) = clin(B). Now B is dense in X if clin(B) = X , and so if A is dense in B, then clin(A) = clin(B) = X . 3.15 We use Cantor’s diagonal argument. Suppose that b is countable; then there is a bijection φ : N → b. Consider the sequence (x ( j) ) ∈ ∞ with x ( j) := φ( j); every element of b must occur as some x ( j) since φ is a bijection. However, if we set ⎧ ⎨1 x ( j) = 0 j xj = ⎩0 x ( j) = 1, j

( j)

then this defines an element x ∈ b, but x = x ( j) for any j, since x j = x j . So b must be uncountable. 3.16 Let A = {a j }∞ j=1 be a countable set whose linear span is dense in X and let X n = Span(a1 , . . . , an ). Then X n has dimension at most n, and if x ∈ X we can find elements yn in the linear span of A such that yn → x as n → ∞. Since each element of the linear span of A is contained in one of the spaces X n , it follows that ∪X n = X . On the other hand, if X = ∪X n , with each X n finite-dimensional, then for each n find a basis E n for X n , which will have a finite number of elements. Let A = ∪n E n ; then A is a countable set whose linear span is dense in X . 3.17 Suppose that X is the closed linear span of a compact set K . Since K is compact, it has a countable dense subset A (see Exercise 2.14). Using Exercise 3.14 we have clin(A) = clin(K ) = X , so X is separable. ∞ If X is separable, then X = clin({x j }∞ j=1 ) for some countable set {x j } j=1 . But this is the same as the closed linear span of (xn /nxn )∞ n=1 ∪ {0}, which is compact.

340

Solutions to Exercises

Chapter 4 4.1 (i) Taking ε = 1 in the definition of a Cauchy sequence there exists N such that xn − xm  < 1 for every n, m ≥ N . In particular, xn − x N  < 1 for every n ≥ N . It follows that for every n we have   xn  < max x1 , . . . , x N −1 , x N  + 1 and so (xn ) is bounded. (ii) Given ε > 0 choose N such that xi − xk  < ε/2

i, k ≥ N ;

now choose J such that n J ≥ N and xn j − x < ε/2

j ≥ J.

Then, for all i ≥ n J , we have xi − x ≤ xi − xn J  + xn J − x < ε and so xi → x as claimed. (k)

(k)

4.2 Suppose that (x (k) ) with x (k) = (x1 , x2 , · · · ) is a Cauchy sequence in ∞ (K). Then for every ε > 0 there exists an Nε such that (n)

x (n) − x (m) ∞ = sup |x j j

(m)

− xj

| 0 for every x ∈ X . Since the function x  →  f (x)−x is continuous and K is compact,  f (x)−x attains its lower bound on K , i.e. inf  f (x) − x =  f (y) − y

x∈K

(S.8)

342

Solutions to Exercises

for some y ∈ K . But then f (y) ∈ X and  f ( f (y)) − f (y) <  f (y) − y, contradicting (S.8), so f must have a fixed point in K . The uniqueness follows as in the standard theorem. 4.7 Integrating (4.10) from 0 to t we obtain (4.11). Setting t = 0 we obtain x(0) = x0 ; the Fundamental Theorem of Calculus ensures that x is differentiable with derivative x(t) ˙ = f (x(t)). 4.8 The space C([0, T ]) is complete (with the supremum norm), so we can apply the Contraction Mapping Theorem in this space. We have ) )ˆ t ˆ t ) ) f (x(s)) ds − f (y(s)) ds )) |(Jx)(t) − (Jy)(t)| = )) 0

ˆ t 0

≤L

0

| f (x(s)) − f (y(s))| ds

ˆ t 0

|x(s) − y(s)| ds ≤ L

ˆ t 0

x − y∞ ds ≤ L T x − y∞ .

Since this holds for every t ∈ [0, T ], it follows that Jx − Jy∞ ≤ L T x − y∞ , and so J is a contraction on [0, T ] for any T < 1/L. Now observe that the time of existence given by this argument does not depend on x(0); we obtain existence on [0, 1/2L] for any x(0). Set x0 (t) = x(t) for all t ∈ [0, T ]. To extend the existence time, observe that if xn (t) solves x˙n = f (xn ) with xn (0) = xn−1 (1/2L), then x(t) ˆ = xk (t),

(k − 1)/2L ≤ t < 2k L ,

solves x˙ˆ = f (x) ˆ with x(0) ˆ = x(0) for all t ≥ 0. 4.9 If ( f n ) is Cauchy, then for any ε > 0 there exists N such that d( f i , f j ) =

∞   f i − f j [−n,n] 1 n we must have gn = gm |[−n,n] we can unambiguously define g ∈ C(R) by setting g(x) = gn (x),

x ∈ [−n, n],

and then f i converges uniformly to g on each interval [−n, n], which makes C(R) complete using the metric. Just as in Exercise 3.1 this metric does not come from a norm, since it does not have the homogeneity property d(λx, λy) = λd(x, y).

Solutions to Exercises

343

Chapter 5 5.1 Note that since K is non-empty there exists some x ∈ K ; since K is symmetric −x ∈ K , and since K is convex it follows that 12 (x + (−x)) = 0 ∈ K . Since K is open, it follows that K contains some ball B(0, r ), using the usual Euclidean norm on Rn ; since K is bounded K ⊂ B(0, R). r x ∈ K and 2R x ∈ Hence for every non-zero x ∈ Rn we have 2x x / K , so 2 1 x < N (x) < x 2R r and N (x) = 0 if and only if x = 0. Taking any λ ∈ R we have N (λx) = inf{M > 0 : M −1 λx ∈ K } = inf{M > 0 : M −1 |λ|x ∈ K } = |λ| inf{(M/|λ|) > 0 : M −1 |λ|x ∈ K } = |λ| inf{L : L −1 x ∈ K } = |λ|N (x), where we have used the fact that K is symmetric. Finally, we show that the ‘closed unit ball’ B = {x : N (x) ≤ 1} is convex. If x, y ∈ B, then for any α > 1 we have α −1 x ∈ K and α −1 y ∈ K . Since K is convex, α −1 (λx + (1 − λ)y) = λα −1 x + (1 − λ)α −1 y ∈ K , which shows that N (λx + (1 − λ)y) ≤ α. Since this holds for all α > 1, it follows that N (λx + (1 − λ)y) ≤ 1, so B is convex and N is a norm on Rn . 5.2 We have xW = T xV ≥ 0, and if 0 = xW = T xV , then T x = 0 (since  · V is a norm on V ), which implies that x = 0 since T is an isomorphism (and T 0 = 0). We also have αxW = T (αx)V = αT xV = |α|T xV = |α|xW , since T is linear. Finally, the triangle inequality follows, since for every x, y ∈ W we have x + yW = T (x + y)V = T x + T yV ≤ T xV + T yV = xW + yW . That T is an isometry is now immediate from the construction, and so T is a bijective linear isometry, i.e. an isometric isomorphism (see Definition 3.19). 5.3 Theorem 5.2 guarantees that any finite-dimensional normed space – and hence any finite-dimensional subspace of a normed space – is complete. It follows from Lemma 4.3 that any finite-dimensional subspace of a Banach space must be closed. 5.4 For any x ∈ X \ Y there exist (yn ) ∈ Y such that x − yn  → dist(x, Y ) as n → ∞. In particular, (yn ) is a bounded sequence, and so, as Y is finite-dimensional, there is a convergent subsequence (yn j ) with yn j → y, for some y ∈ Y (we know that Y is closed by the previous exercise). Since x  → x is continuous (Lemma 3.14), it follows that x − y = dist(x, Y ).

344

Solutions to Exercises

5.5 For any y ∈ Y

* y* * * αx − y = |α| *x − * ; α

since Y is a linear space Y /α = Y , so dist(αx, Y ) = inf αx − y = |α| inf x − z = |α|dist(x, Y ). y∈Y

z∈Y

Similarly for any w ∈ Y and y ∈ Y we have (x + w) − y = x − (y − w); since Y − w = Y it follows that dist(x + w, Y ) = inf (x + w) − y = inf x − z = dist(x, Y ). y∈Y

z∈Y

5.6 First choose any w ∈ X \ Y and let δ := dist(w, Y ) > 0. Using Exercise 5.4 there exists z ∈ Y such that w − z = δ. Using the result of the previous exercise if we set w = r w/δ and z  = r z/δ (which is still in Y ), then w − z   = dist(w , Y ) = r. Finally, we let x = w − z  + y; since y − z  ∈ Y we have x − y = w − z   = dist(x, Y ) = r, as required. 5.7 Suppose that X is a normed space with a countably infinite Hamel basis {e j }∞ j=1 . Set X n = Span(e1 , . . . , en ); since {e j } is a Hamel basis every element of X be written 6 as a finite linear combination of the {e j }, so X = ∞ j=1 X j . Starting with y1 = e1 /e1 , for each n use the result of the previous exercise to find yn ∈ X n such that yn − yn−1  = dist(yn , X n−1 ) = 3−n . Then (yn ) is a Cauchy sequence, since if m > n, we have m 

yn − ym  ≤

yk − yk−1  =

k=n+1

m  k=n+1

3−k
n, then dist(ym , X n ) ≥ dist(yn+1 , X n ) −

m 

y j − y j−1 

j=n+2

= 3−(n+1) −

m  j=n+2

3− j ≥ 3−(n+1) −

3−(n+1) 3−(n+1) = > 0, 2 2

so (yn ) cannot have a limit lying in X n . Since this holds for every n, the sequence (yn ) does not converge.

Solutions to Exercises

345

Chapter 6 6.1 We have

2k k

=

(2k)! e(2k)2k+1/2 e−2k e 22k ≤ = √ √ . (k!)2 2π k 2k+1 e−2k 2π k

6.2 We calculate the derivatives of f (x) = (1 − x)1/2 . We have 1 11 113 (1− x)−3/2 , f  (x) = − (1− x)−5/2 , f  (x) = − (1− x)−1/2 , f  (x) = − 2 22 222 and in general 1 (2(n − 1))! (1 − x)−(2n−1)/2 , f (n) (x) = − n (2n − 3)!!(1 − x)−(2n−1)/2 = − 2n−1 2 2 (n − 1)! using the fact that (2n − 3)!! = 1 · 3 · 5 · · · (2n − 3) = (2(n − 2))!/2n−1 (n − 1)!. This gives the n-term Taylor expansion in terms up to x n as ⎤ ⎡ n  (2(k − 1))! 2(2n)! k (1 − c)−(2n+1)/2 x(x − c)n x ⎦ − 2n f (x) = ⎣1 − 2 2 (n!)2 22(k−1) (k − 1)!k! k=1

for some c ∈ (0, x). In particular, the error term is bounded by 1 2(2n)! (x − c)n 2(2n!) A(x) x ≤ 2n ≤ √ , 2n 2 n+1/2 2 n+1/2 n 2 (n!) (1 − c) 2 (n!) (1 − x) where A(x) is a constant depending only on x (using the result of the previous exercise to bound the factorial terms), which tends to zero as n → ∞. 6.3 Using the Weierstrass Approximation Theorem, given any ε > 0 we can find a polynomial p(x) = nk=0 ck x k such that sup

| p(x) − |x|| < ε.

x∈[−1,1]

Then, given any f ∈ A with  f ∞ ≤ 1, we have ) ) ) ) ) ) n ck f k (x) − | f (x)|)) < ε. sup )) ) x∈X )k=0 n Since A is an algebra, k=0 ck f k ∈ A. It follows that | f | ∈ A. The rest of the lemma follows as before. 6.4 Consider E n := {x ∈ [a, b] : f (x) − f n (x) < ε}. Then, since f − f n is continuous, each E n is open, since it is the preimage of the open set (−∞, ε). Since f n → f pointwise, ∪n E n = [a, b]; this gives an open cover of the compact set [a, b], and so there is a finite subcover. Since the sets E n are nested, with E n+1 ⊇ E n , the largest of the E j in this subcover is a cover itself, i.e. E N = [a, b] for some N , which shows that | f (x) − f N (x)| < ε for all n ≥ N , i.e. f n converges uniformly to f .

346

Solutions to Exercises

6.5 For any fixed p ∈ [−1, 1] consider the function f ( p) = 12 x 2 + p − 12 p2 . Then 0 ≤ p ≤ |x| ≤ 1

p ≤ f ( p) ≤ |x|,

and p = f ( p) if and only if p = |x|. Now for each fixed x ∈ [0, a] the sequence ( pn (x)) satisfies pn+1 (x) = f ( pn (x)), and is therefore a positive, monotone, bounded sequence. It follows that for each fixed x ∈ [−1, 1] we have pn (x) → px for some px ∈ [−1, 1]; taking limits as n → ∞ in pn+1 (x) = f ( pn (x)) yields px = f ( px ), and so px = |x|. Dini’s Theorem (the result of the previous exercise) now guarantees that pn (x) converges uniformly to |x| on [0, 1]. To apply this to give another proof of Lemma 6.7, observe that if pn (x) is a polynomial in x, then so is pn+1 (x); we now approximate | f (x)| by pn ( f (x)), which will converge uniformly for all f with  f ∞ ≤ 1. n 6.6 The collection A of all functions of the form i=1 f i (x)gi (y) is clearly closed under addition and multiplication. We have 1 ∈ A, since we can take n = 1 and f 1 (x) = g1 (x) = 1, so A is an algebra. It is clear that A separates points, since x ∈ A and y ∈ A. The result now follows immediately from the Stone–Weierstrass Theorem. 6.7 The argument is very similar to that of the previous exercise. We let A be the col n ai j x i y j . This forms an algebra, and separates lection of all functions of the form i=1 points since x ∈ A and y ∈ A. 6.8 Take f ∈ C([a, b] × [c, d]; R) and ε > 0. Then, using the result of the previous exercise, we can find n ∈ N and ai j ∈ R such that * * * * n  * * i y j * < ε. * f (x, y) − a x i j * * * * i, j=1 ∞

Therefore ) ) ) )ˆ b ˆ d ˆ bˆ d  n ) ) i j ) f (x, y) dy dx − ai j x y dy dx )) < (b − a)(d − c)ε. ) a c i, j=1 ) ) a c Now for any i, j, we have 8 ˆ bˆ d ˆ b 7 j+1 d − c j+1 i j i x y dy dx = x dx j +1 a c a bi+1 − a i+1 d i+1 − ci+1 i +1 i +1 8 ˆ dˆ b ˆ d 7 i+1 i+1 b −a x i y j dx dy, y j dy = = i +1 c c a =

and so

)ˆ ˆ ) ˆ dˆ b ) b d ) ) ) f (x, y) dy dx − f (x, y) dx dy ) < 2ε(b − a)(d − c). ) ) a c ) c a

Since this holds for any ε > 0, we obtain the required result.

Solutions to Exercises

347

6.9 Take z, z 0 ∈ S 1 with z = eix and z 0 = eix , where x, x0 ∈ [−π, π ). Since f is continuous, for any ε > 0 there exists δ > 0 such that (i) |x − x0 | < δ ensures that | f (x) − f (x0 )| < ε and (ii) either |x − π | < δ or |x + π | < δ implies that | f (x) − f (π )| < ε/2. Now, observe that |z − z 0 |2 = |eix − eix0 |2 = (eix − eix0 )(e−ix − e−ix0 ) = 2(1 − cos(x − x0 )). It follows that if |z − z 0 | < δ/2, then δ2 . 8 If δ is small, then either x − x0 is close to zero or close to 2π . Since cos θ ≤ 1 − θ 2 /2 + θ 4 /4 ≤ 1 − θ 2 /4 for |θ| ≤ 1, it follows in the first case that cos(x − x0 ) > 1 −

δ2 |x − x0 |2 < , 4 8 and so |x − x0 | < δ, which ensures that |g(eix ) − g(eix0 )| = | f (x) − f (x0 )| < ε. In the second case, when x − x0 is close to 2π , we have x = −π +ξ and x0 = π −ξ0 (or vice versa) with ξ > 0 and ξ0 > 0 small. Then cos(x − x0 ) = cos(ξ0 + ξ − 2π ) = cos(ξ0 + ξ ) and the same argument as before guarantees that |ξ0 + ξ | < δ, from which it follows that ε ε | f (x)− f (x0 )| ≤ | f (x)− f (−π )|+| f (−π )− f (π )|+| f (π )− f (x0 )| < +0+ = ε, 2 2 and so once again |g(eix ) − g(eix0 )| = | f (x) − f (x0 )| < ε. n B(x , ε/2). Then we have 6.10 (i) Given ε > 0 let {x1 , . . . , xn } be such that A ⊂ ∪i=1 i n A ⊂ ∪i=1 B(xi , ε), since, given any y ∈ A, there exists x ∈ A with x − y < ε/2 and xi such that x − xi  < ε/2, so y − xi  < ε. (ii) Let (xn ) be a sequence in A. Find a cover of A by finitely many balls of radius 1, and choose one of these, B(y1 , 1), such that xn ∈ B(y1 , 1) for infinitely many n, giving a subsequence (xn 1, j ) j ∈ B(y1 , 1). Now cover A by finitely many balls of radius 1/2, and from these choose B(y2 , 1/2) that contains infinitely many of the xn 1, j : these give a further subsequence (xn 2, j ) j that is contained in B(y2 , 1/2). Continue in this way to find successive subsequences (xn k, j ) j ∈ B(yk , 2−k ). The sequence xn k,k is a Cauchy sequence, since for j > k we know that xn j, j is an element of the sequence (xn k, j ) j , and all elements of this sequence are in B(yk , 2−k ) so no more than 2−(k−1) apart. (iii) If A is a subset of a complete normed space that is totally bounded and closed, then any sequence in A has a Cauchy subsequence (since A is totally bounded) that converges to a limit (since X is complete) that must lie in A (since A is closed), i.e. A is compact. Conversely, if A is compact, then it is totally bounded, and since A ⊂ A it must be totally bounded too.

348

Solutions to Exercises

6.11 Suppose that ( f n ) is bounded and equicontinuous on R. Then it is bounded and equicontinuous on each compact interval [−N , N ]. We can use the Arzelà–Ascoli Theorem to extract a subsequence ( f n 1, j ) j such that f n 1, j converges uniformly on [−1, 1]; now take a further subsequence to ensure that f n 2, j converges uniformly on [−2, 2], and continue in this way, finding subsequences such that ( f n k, j ) j converges uniformly on [−k, k]. As in the proof of the Arzelà–Ascoli Theorem itself take the ‘diagonal’ subsequence ( f n k,k )k to obtain a subsequence of the initial ( f n ) that converges uniformly on every interval of the form [−k, k] for k ∈ N, and so on any bounded interval in R. 6.12 First it is simple to show that f δ is bounded, since ˆ x+δ ˆ x+δ 1 1 | f (x)| d x ≤  f ∞ dx =  f ∞ . | f δ (x)| ≤ 2δ x−δ 2δ x−δ Consider x, z ∈ R. If |x − z| ≥ δ, then (x − δ, x + δ) ∩ (z − δ, z + δ) = ∅ and ˆ x+δ ˆ z+δ 1 1 | f (y)| dy + | f (y)| dy | f δ (x) − f δ (y)| ≤ 2δ x−δ 2δ z−δ 2 f ∞ |x − z|. ≤ 2 f ∞ ≤ δ If |x − z| < δ and z > x, then (x − δ, x + δ) ∩ (z − δ, z + δ) = (z − δ, x + δ), and so

ˆ z−δ ˆ z+δ 1 1  f ∞ | f (y)| dy + | f (y)| dy ≤ |z − x|. 2δ x−δ 2δ x+δ δ This shows that | f δ (x) − f δ (z)| ≤ (2 f ∞ /δ)|x − z| and so f is Lipschitz. To show convergence on [−R, R], first note that since f is continuous it is uniformly continuous on any closed bounded interval (a compact subset of R): so, given any ε > 0 that there exists δ > 0 such that x, y ∈ [−(R + 1), R + 1] with |x − y| < δ implies that | f (x) − f (y)| < ε. It follows that for any x ∈ [−R, R], if δ < 1 we have ) ˆ ) ˆ x+δ )1 ) x+δ 1 ) ) | f δ (x) − f (x)| = ) f (y) dy − f (x) dy ) ) 2δ x−δ ) 2δ x−δ ˆ x+δ 1 = | f (y) − f (x)| dy < ε, 2δ x−δ | f δ (x) − f δ (y)| ≤

which gives the required convergence. 6.13 Since xn satisfies (6.9), it follows that ˆ t  f n ∞ ds ≤ |x0 | + T  f n ∞ ≤ B := |x0 | + M T, |xn (t)| ≤ |x0 | + 0

so (xn ) is a bounded sequence in C([0, T ]). It is also equicontinuous, since for each n and any t, t  ∈ [0, T ] with t  > t we have )ˆ  ) ) t ) ) ) |xn (t  ) − xn (t)| = ) f n (xn (s)) ds ) ≤ M(t  − t). ) t )

Solutions to Exercises

349

The Arzelà–Ascoli Theorem guarantees that (xn ) has a subsequence that converges uniformly on [0, T ] to some limiting function x ∈ C([0, T ]). Since f n → f uniformly on [−B, B], it follows that f n (xn (·)) → f (x(·)) uniformly on [0, T ], and so taking limits in (6.9) we obtain (6.10). 6.14 Approximate f by Lipschitz functions ( f n ) using Exercise 6.12. Exercise 4.8 guarantees that the ODE x˙n = f (xn ), xn (0) = x0 , has a solution on [0, T ]. From Exercise 4.7 this is also a solution of the integral equation ˆ t xn (t) = x0 + f n (xn (s)) ds t ∈ [0, T ]. 0

Using the previous exercise there exists x ∈ C([0, T ]) that satisfies the limit equation (6.10), and by Exercise 4.7 this is the required solution of the ODE x˙ = f (x) with x(0) = x0 . Chapter 7 7.1 This is immediate from the estimate ˆ b ˆ b  fn − f L 1 = | f n (x) − f (x)| dx ≤  f n − f ∞ = (b − a) f n − f ∞ . a

7.2 We have

a

ˆ 1 0

| f n (x)| dx =

ˆ 1/n 0

1 − nx dx =

1 → 0, 2n

but f n (0) = 1 for every n, so f n does not converge pointwise to zero. 7.3 We have gn (0) = 0 for every n, and for each x ∈ (0, 1] we have gn (x) = 0 for all n > 2/x. However, ˆ 1/n ˆ 2/n ˆ 1 |gn (x)| dx = n 2 x dx + n(2 − nx) dx = 1. 0

0

1/n

7.4 Suppose that ξ ∈ X ; then there exists a sequence (xn ) ∈ X such that i(xn ) → ξ , and so, in particular, ξ X = lim i(xn )X = lim xn  X , n→∞

n→∞

since i is an isometry. The convergence of i(xn ) to ξ means that (i(xn )) must be Cauchy in X , and using once again the fact that i is an isometry it follows that (xn ) is Cauchy in X . Since i is another isometry, (i (xn )) is Cauchy in X  , and since X  is complete i (xn ) → ξ  ∈ X  , from which it follows that ξ  X  = lim i (xn )X  = lim xn  X = ξ X .

(S.9)

n→∞ n→∞  We now define j : X → X by setting j(ξ ) = ξ  . This map is well defined, and (S.9)

shows that j is an isometry.

7.5 First note that it follows from the triangle inequality that |[i(x)]y| = |d(y, x) − d(y, x0 )| ≤ d(x, x0 ),

350

Solutions to Exercises

so i(x) ∈ Fb (X ; R) as claimed. To show that i is an isometry, we use the triangle inequality again to obtain [i(x)](y) − [i(x  )](y) = [d(y, x) − d(y, x0 )] − [d(y, x  ) − d(y, x0 )] = d(y, x) − d(y, x  ) ≤ d(x, x  ), so i(x) − i(x  )∞ ≤ d(x, x  ). For the opposite inequality, choose y = x  , and then [i(x)](x  ) − [i(x  )](x  ) = d(x  , x),

which shows that i(x) − i(x  )∞ ≥ d(x, x  ). We therefore obtain i(x) − i(x  )∞ = d(x, x  )

and i is an isometry as claimed. 7.6 We certainly have  f  L ∞ ≤  f ∞ , since the supremum can be taken over a smaller set for the L ∞ norm (excluding a set of measure zero). Suppose that E ⊂  is a set of measure zero such that | f (x)| ≤  f  L ∞

x ∈  \ E.

Since E has measure zero, given any x ∈ E there exists a sequence (xn ) ∈  \ E such that xn → x; otherwise, there would be an open ball B(x, δ) ⊂ E, and then E would not have measure zero. But then | f (xn )| ≤  f  L ∞ , and since f is continuous it follows that | f (x)| = limn→∞ | f (xn )| ≤  f  L ∞ , which shows that  f ∞ =  f  L ∞ as claimed. Chapter 8 8.1 We check properties (i)–(iv) from Definition 8.1.

´ (i) We have ( f, f ) = | f |2 =  f 2L 2 ≥ 0, and if ( f, f ) = 0, then f = 0 almost everywhere, i.e f = 0 in L 2 . (ii) We have ˆ ˆ ˆ f h + gh = ( f, h) + (g, h) ( f + g, h) = ( f + g)h = using the linearity of the integral. (iii) Again, the linearity of the integral yields ˆ ˆ f g = α( f, g). (α f, g) = α f g = α (iv) Finally, we have ( f, g) =

´

fg =

´

f g = (g, f ).

8.2 We use the Cauchy–Schwarz inequality in L 2 to write ˆ ˆ 1/2 ˆ 1/2 | f (x)| dx ≤ | f (x)|2 dx 1 dx ≤ || f  L ∞ 





which immediately yields (8.11). 8.3 First we check that this really does defined an inner product.

Solutions to Exercises

351

(i) ((x, ξ ), (x, ξ )) H ×K = (x, x) H + (ξ, ξ ) K ≥ 0 and if (x, x) H + (ξ, ξ ) K = 0, then, since (x, x) H ≥ 0 and (ξ, ξ ) K ≥ 0, we must have x = ξ = 0, and so (x, ξ ) = 0. (ii) ((x, ξ ) + (x  , ξ  ),(y, η)) H ×K = (x + x  , y) H + (ξ + ξ  , η) K = [(x, y) H + (ξ, η) K ] + [(x  , y) H + (ξ  , η) K ] = ((x, ξ ), (y, η)) H ×K + ((x  , ξ  ), (y, η)) H ×K . (iii) (α(x, ξ ), (y, η)) H ×K = (αx, y) H +(αξ, η) K = α(x, y) H + α(ξ, η) K (iv) ((x, ξ ), (y, η)) H ×K

= α((x, ξ ), (y, η)) H ×K . = (x, y) H + (ξ, η) K = (y, x) H + (η, ξ ) K = ((y, η), (x, ξ )) H ×K .

It remains to check that H × K is complete with the induced norm. But from (i) the induced norm satisfies (x, ξ )2H ×K = x2H + ξ 2K , and H × K is complete with this norm (see comments around (4.2)). 8.4 If (8.12) holds, then putting x = y yields T x2K = x2H . For the reverse implication we use the polarisation identity (8.8) to obtain 4(T x, T y) K = T x + T y2K − T x − T y2K = T (x + y)2K − T (x − y)2K = x + y2H − x − y2H = 4(x, y) H . (The proof in the complex case is essentially the same, but notationally more involved since we have to use the complex polarisation identity in (8.9).) 8.5 Take x = (1, 0, 0, . . .) and y = (0, 1, 0, . . .). Then x + y = (1, 1, 0, . . .) and x − y = (1, −1, 0, . . .). So x p =  y p = 1 and x + y p = x − y p = 21/ p ; but x + y2 p + x − y2 p = 21+2/ p = 4 = 2(x2 p +  y2 p ) unless p = 2. 8.6 Consider f (x) = x and g(x) = 1 − x, with +ˆ ,1/ p  f  L p = g L p =

1

0

x p dx

=

1 . ( p + 1)1/ p

Now ( f + g)(x) = 1 so  f + g L p = 1, and ( f − g)(x) = 2x − 1, so +ˆ ,1/ p 1 1 |2x − 1| p dx = .  f − g L p = ( p + 1)1/ p 0

352

Solutions to Exercises

Now  f + g2L p +  f − g2L p = 1 +

1 2

= = 2( f 2L p + g2L p ) ( p + 1)2/ p ( p + 1)2/ p

unless p = 2. 8.7 Using the parallelogram identity we have *2 *1 *2 *1 2 2 * * * x+y y−x* * +* z− x +y − y−x* z − + z − x2 + z − y2 = * * * * 2 2 2 2 * *2 *2 * * * *y −x* x + y* * * * = 2* *z − 2 * + 2 * 2 * *2 * * x + y* 1 * . z − = x − y2 + 2 * * 2 2 * 8.8 To prove (ii), i.e. x, z + y, z = x + y, z, we write 4[x, z + y, z] = x + z2 − x − z2 + y + z2 − y − z2 *2 *1 *2 *1 2 2 * * x+y x+y x − y* x − y* * * * * + − =* z+ +* z+ 2 2 * 2 2 * *2 *1 *2 *1 2 2 * * * x+y x − y* * −* z − x + y + x − y* z − − −* * * * 2 2 2 2 * *2 *2 *2 *2 * * * * * * * * * * * x + y* * + 2 * x − y * − 2 *z − x + y * − 2 * x − y * z + = 2* * * * * * * * 2 2 2 2 * *2 *2 * * * * x + y* x + y* * * * = 2* *z + 2 * − 2 *z − 2 * ? > x+y ,z . (S.10) =2 2 Setting y = 0 shows that 2x, z = 2 x2 , z, from which it follows from (S.10) that x, z + y, z = x + y, z,

(S.11)

as required. To prove (iii), i.e. αx, z = αx, z, observe that we can now use (S.11) to deduce that for any n ∈ N we have nx, z = nx, z, and so also x, z = mx/m, z = mx/m, z, which shows that 1 x, z. m We also have x, z + −x, z = 0, so −x, z = −x, z. Combining these shows that αx, z = αx, z for any α ∈ Q. That this holds for every α ∈ R now follows from the fact that x  → x is continuous: given any α ∈ R, find αn ∈ Q such that αn → α, and x/m, z =

Solutions to Exercises

353

then αx, z = lim αn x, z = lim αn x, z n→∞

= lim

n→∞

1

n→∞ 4

=

αn x + z2 − αn x − z2



 1 αx + z2 − αx − z2 = αx, z. 4

8.9 Recall that [x] H/F = infu∈U x + u. We show that this norm satisfies the parallelogram identity and therefore we can define an inner product on H/F using the polarisation identity. Note that [x] + [y]2 + [x] − [y]2 = [x + y]2 + [x − y]2 for every x, y ∈ X , and observe that for any ξ, η ∈ U we have x + y + ξ 2 + x − y + η2 = [x + 12 (ξ + η)] + [y + 12 (ξ

= 2 x + 12 (ξ + η)2 + y +

− η)]2 + [x + 12 (ξ + η)] − [y + 12 (ξ − η)]2  1 (ξ − η)2 . (S.12) 2

Taking the infimum over ξ ∈ U we obtain [x + y]2 + x − y + η2 ≥ 2([x]2 + [y]2 ), and then taking the infimum over η ∈ U gives [x + y]2 + [x − y]2 ≥ 2([x]2 + [y]2 ).

(S.13)

Returning to (S.12) and setting ξ + η = 2α and ξ − η = 2β we have x + y + (α + β)2 + x − y + (α − β)2 = 2(x + α2 + y + β2 ). Now we take first the infimum over α ∈ U , [x + y]2 + [x − y]2 ≤ 2([x]2 + y + β2 ), and then the infimum over β ∈ U to give [x + y]2 + [x − y]2 ≤ 2([x]2 + [y]2 ).

(S.14)

Combining the two inequalities (S.13) and (S.14) shows that the norm on H/U satisfies the parallelogram identity, and hence comes from an inner product. So H/U is a Hilbert space. 8.10 From the parallelogram identity, if x = y = 1 and x − y > ε then 4 = 2(x2 + y2 ) = x + y2 + x − y2 > x + y2 + ε2 which implies that x + y2 < 4 − ε2 so , * * + 2 1/2 *x + y* ε * * =1−δ * 2 *< 1− 4 0 if we set δ = 1 − 1 − ε2 /4 > 0.

354

Solutions to Exercises

Chapter 9

√ 9.1 First we check that all the elements of E have norm 1. This is clear for 1/ 2π , and for the other elements we have ˆ ˆ π 1 π cos2 nt dt = 1 + cos 2nt dt = π 2 −π −π and similarly for sin2 nt. Now we check the orthogonality properties: for any n, m ˆ π ˆ π ˆ π cos nt dt = sin nt dt = sin nt cos mt dt = 0; −π

−π

−π

and for any n = m ˆ π ˆ π ˆ π cos nt cos mt dt = sin nt sin mt dt = sin nt cos mt dt = 0. −π

−π

−π

9.2 Throughout we use the equality x + αy2 = (x + αy, x + αy) = x2 + α(y, x) + α(x, y) + |α|2 y2 = x2 + Re [α(y, x)] + |α|2 y2 . The ‘only if’ parts of (i) and (ii) now follow immediately. For part (i), if x + αy ≥ x for every α ∈ K then taking α ∈ R yields 2α Re(y, x) + |α|2 y2 ≥ 0, which implies that Re(y, x) = 0, for otherwise we could invalidate the inequality by taking α sufficiently small; similarly taking α = iβ with β ∈ R yields 2β Im(x, y) + |β|2 y2 ≥ 0, which implies that Im(x, y) = 0. So (x, y) = 0. For part (ii), if x + αy = x − αy, then we obtain α(y, x) + α(x, y) = 0 for every α ∈ K. We take α ∈ R and then α = iβ with β ∈ R to deduce that (x, y) = 0. 9.3 Bessel’s inequality guarantees that |{ j : |(x, e j )|2 > M}|M 2
m 2 x2 elements, one can select N elements {e1 , . . . , e N } from E m , and then N  j=1

|(x, e j )|2 ≥ N ×

1 > x2 . m2

Solutions to Exercises

355

But this contradicts Bessel’s inequality. Thus each E m contains only a finite number of elements, and hence ∞

E m = {e ∈ E : (x, e) = 0}

m=1

contains at most a countable number of elements. ∞ 9.5 We write u = ∞ j=1 (u, e j )e j and v = j=1 (v, e j )e j . Then by the continuity of the inner product we have ⎛ ⎞ n n   (u, e j )e j , (v, ek )ek ⎠ (u, v) = lim ⎝ n→∞

lim

j=1

n 

n→∞

(u, e j )(ek , v)δ jk

j,k=1 ∞ 

= lim

n→∞

k=1

(u, e j )(e j , v) =

j=1

∞ 

(u, e j )(e j , v).

j=1

(Note that the Cauchy–Schwarz inequality guarantees that the sum converges.) 9.6 Take any sequence (x (n) ) ∈ Q. Then, if we let x (n) =

∞  (n) αj ej, j=1

(n) we know that each sequence (α j ) is bounded in K. We can use a diagonal argument as in the proof of the Arzelà–Ascoli Theorem to find a subsequence x (n k ) such that (n k )

→ α ∗j as k → ∞ for each j ∈ N. Note that |α ∗j | ≤ 1/j. Now we define ∗ (n k ) → x ∗ as k → ∞. Given ε > 0 find M x∗ = ∞ j=1 α j e j ; we want to show that x such that ∞  1 < ε2 /8. j2 αj

j=M+1

Now find N such that for all n ≥ N we have (n k )

|α j

− α ∗j |2 < ε2 /2M

j = 1, . . . , M;

then for all n ≥ N , x (n k ) − x ∗ 2 =

∞ 

(n k )

|α j

− α ∗j |2 =

j=1

≤M

ε2 2M

M 

(n k )

|α j

j=1

+

∞  j=M+1

− α ∗j |2 +

∞  j=M+1

4 < ε2 , j2

which shows that x (n k ) − x ∗  < ε, and so x (n k ) → x ∗ as claimed.

(n k )

|α j

− α ∗j |2

356

Solutions to Exercises

9.7 Since Hˆ := clin(E) is a closed linear subspace of H , it is a Hilbert space itself. Now we can use Proposition 9.14 to deduce [(e) ⇒ (b)] that any element of clin(E) can be written as ∞ j=1 α j e j for some (α j ) ∈ K. 9.8 Since {e j } is an orthonormal basis for H , the linear span of the {e j } is dense in H . Noting that e j can be written as a linear combination of f 1 , . . . , f j , it follows that the linear span of the ( f j ) is also dense in H . −1 Now consider x ∈ H with x = ∞ j=1 j e j . There is no expansion for x of the ∞ form x = j=1 α j f j , since taking the inner product of both sides with ek would yield ∞ ∞  1 1 = (x, ek ) = α j ( f j , ek ) = αj, k k

i.e.

j=k

j=1

j=k α j = 1 for every k ∈ N; but this is impossible.

9.9 We have (x − y, z) = 0 for every z in a dense subset of H . So we can find z n such that z n → x − y, and then 

x − y2 = (x − y, x − y) = x − y, lim z n = lim (x − y, z n ) = 0, n→∞

n→∞

using the continuity of the inner product. It follows that x = y. 9.10 By Lemma 7.7 the set P(a, b) of polynomials restricted to (a, b) forms a dense subset of L 2 (a, b). The given equality shows that ˆ b ˆ b p(x) f (x) dx = p(x)g(x) dx = ( p, g) ( p, f ) = a

a

for every p ∈ P. Since P is dense in L 2 , the result of the previous exercise now guarantees that f = g. 2 9.11 The functions {1} ∪ {cos kπ x}∞ k=1 are orthonormal in L (0, 1) by Example 9.11. We need only show that their linear span is dense in L 2 (0, 1). Corollary 6.5 shows that the linear span of these functions is uniformly dense in C([0, 1]), and we know that C([0, 1]) is dense in L 2 (0, 1), so, given any f ∈ L 2 (0, 1) and ε > 0, first find g ∈ C([0, 1]) such that  f − g L 2 < ε/2,

and then approximate g by an expression h(x) =

n 

ck cos(kπ x)

k=0

´ such that g − h∞ < ε/2. Since h − g L 1 = 01 |h − g| ≤ h − g∞ , it follows that  f − h L 2 ≤  f − g L 2 + g − h L 2 < ε. The coefficients in the resulting expansion f (x) =

∞  k=0

ak cos kπ x

Solutions to Exercises

357

can be found by taking the inner product with cos nπ x: ˆ 1 ˆ 1 a0 = f (x) dx = a0 and an = 2 f (x) cos nπ x dx, n = 0. 0

0

9.12 We can write x = by (9.7):

ikx where the Fourier coefficients (c ) are given k k=−∞ ck e

ck =

ˆ π 1 xeikx dx. 2π −π

If k = 0 this yields c0 = 0, while for k = 0 we obtain ˆ π ikx (−1)k e 1  x ikx π e dx = . ck = − 2π ik ik −π −π ik 4 1 eikx are orthonormal, so To use the Parseval identity, recall that the functions 2π we have √ 2  (−1)k 2π 1 1 x= √ eikx ; ik 2π k =0

therefore 4π which yields

k=1 k

ˆ π ∞  1 2π 3 , = x 2 dx = 2 3 k −π k=1

−2 = π 2 /6.

9.13 Suppose that H contains an orthonormal set E k = {e j }kj=1 . Then E k does not form a basis for H , since H is infinite-dimensional. It follows that there exists a nonzero u k ∈ H such that (u k , e j ) = 0

for all

j = 1, . . . , k,

for otherwise, by characterisation (c) of Proposition 9.14, E k would be a basis. Now define ek+1 = u k /u k  to obtain an orthonormal set E k+1 = {e1 , . . . , ek }. The result follows by induction, starting with e1 = x/x for some non-zero x ∈ H . 9.14 The closed unit ball is both closed and bounded. When H is finite-dimensional this is equivalent to compactness by Theorem 5.3. If H is infinite-dimensional, then by the previous exercise it contains a countable orthonormal set {e j }∞ j=1 , and we have ei − e j 2 = 2 for i = j. The (e j ) therefore form a sequence in the unit ball that can have no convergent subsequence. 9.15 An application of Zorn’s Lemma shows that there is a maximal orthonormal set: let P be the set of all orthonormal subsets of H order by inclusion. Then, for any chain C, the set A E= A∈C

is an orthonormal set that provides an upper bound for C, and so P has a maximal element E = (eα )α∈A .

358

Solutions to Exercises

Now suppose that there exists y ∈ H that cannot be written as ∞ j=1 a j eα j for any choice of (α j ) ∈ A and a j ∈ K. Set aα = (y, eα ); then, by Exercise 9.4, aα is only non-zero for a countable collection {α j } ∈ A. It then follows from Corollary 9.13 that x = α∈A aα eα converges, with x ∈ H . By construction (y − x, eα ) = (y, eα ) − (x, eα ) = aα − aα = 0, and so y −x is orthogonal to every element of E. Since y −x is non-zero by assumption, so is e := (y − x)/y − x; then E ∪ {e } is an orthonormal set this is larger than E. This contradicts the maximality of E, so E is indeed a basis for H .

Chapter 10 10.1 Proposition 10.1 is equivalent to finding the element of A − x = {a − x : a ∈ A} with minimum norm; since x ∈ / A, 0 ∈ / A−x, and A−x is a closed convex set whenever A is. 10.2 For any δ > 0 define αδ ∈ C([−1, 1]) by ⎧ ⎪ ⎨−(1 + δ) −1 ≤ x < −2δ/(1 + δ) 2 αδ (t) := (1+δ) x −2δ/(1 + δ) ≤ x ≤ 2δ/(1 + δ) ⎪ ⎩ 2δ (1 + δ) 2δ/(1 + δ) < x ≤ 1. Observe that αδ ∞ = 1 + δ and that ˆ 0 αδ (t) dt = −1 −1

ˆ 1 and 0

αδ (t) dt = 1.

It follows that f (t) := g(t) + αδ (t) is an element of U with  f − g∞ < 1 + δ. Now, for any f ∈ U we have ˆ 0 ˆ 1 g(t) − f (t) = 1 and g(t) − f (t) = −1. −1

0

Since g and f are both continuous, max g(t) − f (t) ≥ 1,

t∈[−1,0]

(S.15)

with equality holding if and only if g(t) − f (t) = 1 for all t ∈ [−1, 0]; similarly min g(t) − f (t) ≤ −1,

t∈[0,1]

(S.16)

with equality holing if and only if g(t) − f (t) = −1 for all t ∈ [0, 1]. We cannot have both g(0) − f (0) = 1 and g(0) = f (0) = −1, so the inequality in either (S.15) or (S.16) must be strict. Hence g − f ∞ > 1 for every f ∈ U . Combining this with our previous upper bound for a particular choice of f shows that dist(g, U ) = 1, but we have also just shown that g − f ∞ > 1 for every f ∈ U . 10.3 Suppose that x ∈ / U , and u, v ∈ U with u = v such that x − u = x − v = dist(x, U ).

Solutions to Exercises

359

Now, if we set w = (u + v)/2 ∈ U , then by strict convexity x − u + x − v < 2 dist(x, U )

x − w < dist(x, U ),

contradicting the definition of dist(x, U ). 10.4 If x, y ∈ X with x = y = 1 and x = y, then certainly x − y > ε for some ε > 0. If X is uniformly convex, then there exists some δ > 0 such that * * *x + y* * * ⇒ x + y < 2(1 − δ) < 2, * 2 * ε. Then * * * f + g *p p * * * 2 * p ≤ 1 − (ε/2) , L so  12 ( f + g) L p < 1 − δ with δ = 1 − [1 − (ε/2) p ]1/ p > 0, which shows that L p is uniformly convex. For the case 1 < p ≤ 2 we again take f, g ∈ L p with  f  L p = g L p = 1 and  f − g L p > ε. Clarkson’s second inequality gives * * * f + g *q q * * * 2 * p + ≤ 1 − (ε/2) , L

and the argument concludes similarly. (The proof in  p is more or less identical.)

360

Solutions to Exercises

10.7 (i) Let d := inf{x : x ∈ K } > 0. Then there exist kn ∈ K such that kn  → δ. Set xn = kn /kn  so that xn  = 1. Now 1 1 xn + xm = kn + km 2 2kn  2km   1 1 + (cn kn + cm km ); = 2kn  2km  note that cn + cm = 1, and so since K is convex cn kn + cm km ∈ K . It follows that cn kn + cm km  ≥ d and so * * * xn + xm * *≥ d + d . * * 2k  2k  * 2 n

m

(ii) Since kn  → d, we can find N such that for all n, m ≥ N * * * xn + xm * * > 1 − δ. * * * 2

(S.17)

(iii) Now fix ε > 0. Since X is uniformly convex, there exists δ > 0 such that x = x   = 1 and x − x   > ε implies that (x + x  )/2 < 1 − δ. This implies that if x = x   = 1 and (x + x  )/2 ≥ 1 − δ then x − x   ≤ ε. It follows from (S.17) that xn − xm  < ε for all n, m ≥ N , so (xn ) is Cauchy. (iv) We now deduce that (kn ) in also Cauchy. Indeed, since kn = kn xn and kn  → d we can write * * kn − km  = *kn xn − km xm * ≤ |kn  − km | xn  + km xn − xm  = |kn  − km | + Mxn − xm , where km  ≤ M for some M > 0 since km  → δ. Give any η > 0 choose N  such that |kn  − d| < η for all n ≥ N  and xn − xm  < η/3 for all m, n ≥ N  ; then, for m, n ≥ N  , we have kn − km  < η, which shows that (kn ) is Cauchy. Since X is complete, it follows that kn → k; since K is closed we must have k ∈ K , and we have k = limn→∞ kn  = d. 10.8 If u ∈ (X + Y )⊥ then (u, x + y) = 0

x, y ∈ X, Y.

Choosing y = 0 ∈ Y shows that u ∈ X ⊥ ; choosing x = 0 ∈ X ⊥ shows that u ∈ Y ⊥ , so u ∈ X ⊥ ∩ Y ⊥ . For the reverse inclusion, if x ∈ X ⊥ ∩ Y ⊥ , then (u, x + y) = (u, x) + (u, y) = 0. 10.9 Since Span(E) ⊆ clin(E), we have (Span(E))⊥ ⊇ (clin(E))⊥ . To show equality, take y ∈ (Span(E))⊥ and u ∈ clin(E): we want to show that we have (y, u) = 0 so that u ∈ (clin(E))⊥ . Now, since u ∈ clin(E), there exists a sequence xn ∈ Span(E) such that xn → 0. Therefore

Solutions to Exercises

361

(y, u) = (y, lim xn ) = lim (y, xn ) = 0, n→∞

n→∞

as required. 10.10 Noting that T ([x]) = P ⊥ (x + M) = P ⊥ x it is clear that Range(T ) ⊂ M ⊥ , and that T is linear. Since for any m ∈ M we have x + m2 = PM (x + m)2 + P ⊥ x2 , it follows that [x]2 = inf x + m2 = P ⊥ x2 , m∈M

i.e. [x] = P ⊥ x, which in particular shows that T is an isometry. It remains only to show that T is onto, but this is almost immediate, since given x ∈ M ⊥ we have [x] = x + M and P ⊥ (x + M) = x. 10.11 We have e4 = x 3 −

+

3 x 3,

5 (3x 2 − 1) 8

,3

+ 3 ,3 5 3 3 2 3 (3x − 1) − x , x x 8 2 2

1 1 − x 3, √ √ 2 2 ˆ 1 ˆ 1 ˆ 5 3 1 1 3 3t 5 − t 3 dt − x t 4 dt − t dt = x 3 − (3x 2 − 1) 8 2 −1 2 −1 −1 3 = x3 − x 5 and e4 2L 2 =

ˆ 1 −1

x6 −

It follows that

3 e4 =

6 3 6 4 9 2 1 x + x dx = 2 − + 5 25 7 25 25

175 1 (5x 3 − 3) = 8 5

3

=

8 . 175

7 (5x 3 − 3). 8

10.12 We have, using repeated integration by parts, ˆ 1 ˆ 1 (n) x k u n (x) dx = x k u n (x) dx −1

−1

= x k u (n−1) (x) = −k = ···

ˆ 1 −1

1 −1

−k

(n−1)

x k−1 u n

ˆ 1 −1

(x) dx

(n−1)

x k−1 u n

(x) dx

362

Solutions to Exercises

= (−1)k k!

ˆ 1 −1

(n−k)

un

(x) dx

1 (n−k−1) (x) = 0. = (−1)k k! u n −1 Now observe that if m < n, then Pm is a polynomial of order m, and Pn is proportional to u (n) (x). It follows that (Pm , Pn ) = 0 as claimed. Chapter 11 11.1 Since T is linear, for any x ∈ X with x < 1 we have * * * x * * * , T xY = x X *T x X *Y and so sup T xY ≤

x X ≤1

sup T xY .

x X =1

The reverse inequality follows immediately since {x X = 1} ⊂ {x X ≤ 1}; now use Lemma 11.6. For the second expression, note that if x = 0, then * * * x * T xY * T =* * x * ; x X

Y

therefore T xY ≤ sup T zY , x =0 x X z X =1 sup

and the reverse inequality follows by restricting the supremum to those x for which x X = 1 on the left-hand side. 11.2 This is clearly a linear mapping, since the integral is linear. ´ x We check that T f ∈ Cb ([0, ∞)). That T f is continuous at each x > 0 is clear, since 0 f (s) ds is continuous in x, and x  → 1/x is continuous at each point in (0, 1). So we only need to check that T f is continuous at x = 0 and that it is bounded on R. To see that T f is continuous at zero, consider ) ) ˆ x ) )1 f (s) ds − f (0))) |[T f ](x) − f (0)| = )) x ) ˆ0 x ) )1 ) = )) f (s) − f (0) ds )) x 0 ˆ 1 x ≤ | f (s) − f (0)| ds. x 0 Since f is continuous at 0, for any ε > 0 there exists δ > 0 such that | f (s) − f (0)| < ε for all 0 ≤ s < δ. If we take 0 ≤ x < δ, then we obtain ˆ 1 x ε ds = ε, |[T f ](x) − f (0)| ≤ x 0 which shows that T f ∈ C([0, ∞)).

Solutions to Exercises

363

The function T f is bounded since |[T f ](0)| = | f (0)| ≤  f ∞ and for every x > 0 we have ˆ 1 x  f ∞ dx ≤  f ∞ ; x 0

[T f ](x) ≤

it follows that T f ∞ ≤  f ∞ , which also shows T  B(X ) ≤ 1. The choice f (x) = 1 for all x yields [T f ](x) = 1 for all x, and so T f ∞ =  f ∞ for this f , which shows that T  B(X ) = 1. 11.3 If T is bounded, then it is continuous, in which case ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ n ∞ n    x j ⎠ = T ⎝ lim x j ⎠ = lim T ⎝ xj⎠ T⎝ n→∞

j=1

n→∞

j=1

= lim

n→∞

j=1 n 

Txj =

j=1

∞ 

Txj.

j=1

For the converse, suppose that yk → y. Then set y0 = 0 and put xk = yk − yk−1 . Then ⎛ ⎞ n n n    x j = yn and T⎝ xj⎠ = T y j − T y j−1 = T yn . j=1

j=1

j=1

The assumed equality means that ⎞ ⎛ n n   x j ⎠ = lim T x j = lim T yn , T y = T ⎝ lim n→∞

n→∞

j=1

j=1

n→∞

i.e. T yn → T y whenever yn → y, which shows that T is continuous, and therefore bounded. 11.4 We have Sn Tn − ST  B(X,Z ) = Sn (Tn − T ) + T (Sn − S) B(X,Z ) ≤ Sn  B(Y,Z ) Tn − T  B(X,Y ) + T  B(X,Y ) Sn − S B(Y,Z ) ≤ MTn − T  B(X,Y ) + T  B(X,Y ) Sn − S B(Y,Z ) , where we use that fact that Sn  B(Y,Z ) ≤ M for some M since Sn converges in B(Y, Z ). Convergence of Sn Tn to ST now follows. j 11.5 Since ∞ j=1 T  < ∞ we know that the partial sums are Cauchy; in particular, for any ε > 0 there exists N such that for n > m ≥ N n 

T j  < ε

j=m+1

(as a particular case we have T j  → 0 as j → ∞). If we consider Vn = I + T + · · · + T n ,

364

Solutions to Exercises

then for n > m ≥ N we have Vn − Vm  = T m+1 + · · · + T n−1 + T n  ≤ T m+1  + · · · + T n−1  + T n  =

n 

T j  < ε.

j=m+1

It follows that (Vn ) is Cauchy in B(X ), and so (since B(X ) is complete by Theorem j 11.11), Vn converges to some V ∈ B(X ) with V  ≤ ∞ j=1 T . For any finite n we have (I + T + · · · + T n )(I − T ) = I − T n+1 = (I − T )(I + T + · · · + T n ); taking n → ∞ we can use the fact that nj=1 T j → V and T n → 0 as n → ∞ to deduce (using the result of the previous exercise) that V (I − T ) = I = (I − T )V, so V = (I − T )−1 . If T  < 1, then T n  ≤ T n , and then ∞ 

T j  ≤

j=1

∞  j=1

T  j =

1 < ∞. 1 − T 

11.6 Let P = T −1 (T + S) = I + T −1 S. Then, since T −1 S < 1, it follows from the previous exercise that P is invertible with P−1 ≤

1 . 1 − T −1 S

Using the definition of P we have T −1 (T + S)P −1 = P −1 T −1 (T + S) = I ; from the first of these identities we have, acting with T on the left and then T −1 on the right, (T + S)P −1 T −1 = I and so (T + S)−1 = P −1 T −1 and (11.15) follows. 11.7 We argue by induction. If |T n f (x)|∞ ≤ M n  f ∞ (x − a)n /n!, then ) ) ) ) n+1 f (x)) = |T (T n f (x))| )T ) ˆ x )ˆ x ) ) = )) K (x, y)[T n f (y)] dy )) ≤ |K (x, y)||T n f (y)| dy a

a

ˆ x M n+1 M n+1 ≤ (y − a)n dy =  f ∞  f ∞ (x − a)n+1 . n! (n + 1)! a In particular, (b − a)n (b − a)n ⇒ T n  B(X ) ≤ M n . n! n! We rewrite the integral equation (11.16) as T n f ∞ ≤ M n  f ∞

(I − λT ) f = g.

(S.18)

Solutions to Exercises

365

The bounds on T n  above imply that ∞ 

(λT )n  ≤

j=1

∞ 

[λM(b − a)]n / n! < ∞.

j=1

The previous exercise then shows that (I − λT ) is invertible with inverse j j and so f = ∞ j=0 λ T g.

j j=0 (λT ) ,

11.8 That T  B(X,Y ) = T −1  B(Y,X ) = 1 when T is an isometry is immediate. For the opposite implication, we have T xY ≤ x X

T −1 y X ≤ yY ;

and

setting y = T x in the second inequality yields x X ≤ T xY and shows that T xY = x X as claimed. 11.9 Suppose that yn ∈ Range(T ) with yn → y ∈ Y ; then yn = T xn for some xn ∈ X . Since yn converges, it is Cauchy in Y , and since αxn − xm  X ≤ T (xn − xm )Y = yn − ym Y , it follows that (xn ) is Cauchy in X . Because X is complete, xn → x for some x ∈ X , and now since T is continuous yn = T xn → T x = y. So Range(T ) is closed. 11.10 We have ˆ 1 ˆ 1 −1 −1

|K (t, s)|2 dt ds = =

while T x(t) =

ˆ b a

ˆ 1 ˆ 1 −1 −1

ˆ 1

−1

 1 + 12ts + 36t 2 s 2 dt ds

2 + 24s 2 ds = 4 + 16 = 20,

K (t, s)x(s) ds = 2e1 (t)(x, e1 ) + 4e1 (t)(x, e2 ).

Since e1 and e2 are orthonormal, T x2L 2 = 4|(x, e1 )|2 + 16|(x, e2 )|2 ≤ 16(|(x, e1 )|2 + |(x, e2 )|2 ) ≤ 16x2L 2 , √ and so T  ≤ 4. In fact, since T e2 = 4e2 , we have T  = 4 < 20. 11.11 Since T k  ≤ T k , it follows that ∞ 

(T k )/k! ≤

k=0

Lemma 4.13 now ensures that

∞ 

T k /k! < ∞.

k=0

k k=0 T /k! converges to an element of B(X ).

Chapter 12 12.1 Let X be any infinite-dimensional normed space, let  := {xn }∞ n=1 be a linearly independent set with xn  = 1 and let E be a Hamel basis containing this set (see Exercise 1.8). Define L : X → R by setting L(xn ) = n for each n and L(y) = 0 for all

366

Solutions to Exercises

y ∈ E \ . The operator L : X → K defined by setting L( linear but unbounded.

j αjxj) =

j jα j is

12.2 Let {e1 , . . . , en } be a basis for V and take any f ∈ V ∗ ; then for any (α j )nj=1 ∈ K we have, since f is linear ⎛ ⎞ n n   f⎝ αjej⎠ = α j f (e j ). (S.19) j=1

j=1

Define linear functionals {φ j }nj=1 ∈ V ∗ by setting φ j (ei ) = δi j . These form a basis for V ∗ : they are linearly independent since if n 

αjφj = 0

j=1 ∗ we can apply both sides to ek to show n that αk = 0 for each k; and they span V , since ∗ we can write any f ∈ V as f = i=1 f (ei )φi . Indeed, using (S.19) we have ⎤⎛ ⎞ ⎡ n n    ⎣ f (ei )φi ⎦ ⎝ αjej⎠ = f (ei )α j φi (e j ) i=1

i, j

j=1

=



f (ei )α j δi j =

i, j

n 

⎛ α j f (e j ) = f ⎝

j=1

n 

⎞ αjej⎠.

j=1

12.3 If u minimises F, then F(u) ≤ F(u + tv) for every t ∈ R, so 1 1 B(u, u) − f (u) ≤ F(u + tv) = B(u + tv, u + tv) − f (u + tv) 2 2  1 B(u, u) + t B(u, v) + t B(v, u) + t 2 B(v, v) − f (u) − t f (v) = 2  1 B(u, u) + 2t B(u, v) + t 2 B(v, v) − f (u) − t f (v), (S.20) = 2 which yields t[B(u, v) − f (v)] +

t2 B(v, v) ≥ 0. t

Taking t > 0 and sufficiently small shows that we must have B(u, v) − f (v) ≥ 0; taking t < 0 and sufficiently small shows that we also have B(u, v) − f (v) ≤ 0, and so B(u, v) = f (v) for every v ∈ V . For the contrary, if B(u, v) = f (v) for every v ∈ V , then the expression in (S.20) gives F(u + tv) =

t2 1 B(u, u) − f (u) + B(v, v) ≥ F(u). 2 2

12.4 (i) By assumption for each fixed x ∈ H the map x  → B(x, y) is a linear map from H into R, so we only need check that it is bounded. But this follows immediately

Solutions to Exercises

367

from assumption (ii), namely |B(x, y)| ≤ cxy. The Riesz Representation Theorem guarantees that there exists w ∈ H such that B(x, y) = (x, w) and w ≤ cy. (ii) If we set Ay = w, then, by definition, B(x, y) = (x, Ay). To show that A is linear, we use the linearity of B to write (x, A(αx + βz)) = B(x, αy + λz) = α B(x, y) + λB(x, z) = α(x, Ay) + β(x, Az) = (x, α Ay + β Az); since this holds for every x ∈ H , it follows that A is linear. To see that A is bounded, either use the fact that Ay = w ≤ cy from part (i), or note that Au2 = (Au, Au) = B(Au, u) ≤ cuAu, and so Au ≤ cu. (iii) We have T y − T y  2 = (y − y  ) − A(y − y  )2 = y − y  2 − 2 (y − y  , A(y − y  )) + 2 A(y − y  )2 = y − y  2 − 2 B(y − y  , y − y  ) + 2 A(y − y  )2 ≤ y − y  2 [1 − 2 b + 2 c2 ], and by choosing sufficiently small we can ensure that T is a contraction. Chapter 13 13.1 We have (T f, g) = = =

ˆ 1 ˆ t 0

0

ˆ 1ˆ t 0

0

0

s

ˆ 1ˆ 1

K (t, s) f (s) ds g(t) dt

K (t, s) f (s)g(t) ds dt K (t, s) f (s)g(t) dt ds = ( f, T ∗ g),

where T ∗ g(s) =

ˆ 1 s

K (t, s)g(t) dt.

13.2 Suppose that Tn → T . Since Tn∗ − T ∗  = (Tn − T )∗  = Tn − T , it follows that T − T ∗  ≤ T − Tn  + Tn − Tn∗  + Tn∗ − T  = 2Tn − T ; since the right-hand side tends to zero it follows that T − T ∗  = 0, i.e. that T = T ∗ and so T is self-adjoint. 13.3 Take x ∈ Ker(T ) and z ∈ Range(T ∗ ), so that z = T ∗ y for some y ∈ H . Then (x, z) = (x, T ∗ y) = (T x, y) = 0; it follows that Ker(T ) ⊂ Range(T ∗ )⊥ .

368

Solutions to Exercises

Now suppose that z ∈ Range(T ∗ )⊥ ; since T ∗ (T z) ∈ Range(T ∗ ) it follows that T z2 = (T z, T z) = (z, T ∗ (T z)) = 0, and so T z = 0. This shows that Range(T ∗ )⊥ ⊂ Ker(T ), and equality follows. 13.4 Since T  = T ∗ , we have T ∗ T  ≤ T ∗ T  = T 2 . But we also have T x2 = (T x, T x) = (x, T ∗ T x) ≤ xT ∗ T x ≤ T ∗ T x2 , i.e. T 2 ≤ T ∗ T , which gives equality. 13.5 Since T is invertible, we have T −1 ∈ B(K , H ) and T T −1 = T −1 T = I . Taking adjoints yields (T −1 )∗ T ∗ = T ∗ (T −1 )∗ = I ∗ = I. This shows that (T ∗ )−1 = (T −1 )∗ , and since (T −1 )∗  = T −1  it follows that T ∗ is invertible. If T is self-adjoint, then T ∗ = T , and so (T −1 )∗ = (T ∗ )−1 = T −1 , i.e. T −1 is self-adjoint. Chapter 14 14.1 First we check that (·, ·) HC is an inner product on HC . (i) positivity: (x + iy, x + iy) HC = (x, x) + (y, y) = x2 + y2 ≥ 0, and if x2 + y2 = 0, then x = y = 0; (ii) linearity: ((x + iy) + (x  + iy  ), w + iz) HC = ((x + x  ) + i(y + y  ), w + iz) HC = (x + x  , w) + i(y + y  , w) − i(x + x  , z) + (y + y  , z)

= [(x, w) + i(y, w) − i(x, z) + (y, z)] + [(x  , w) + i(y  , w) − i(x  , z) + (y  , z)] = (x + iy, w + iz) HC + (x  + iy  , w + iz) HC ; (iii) scalar multiples: if α ∈ C with α = a + ib, a, b ∈ R, then (α(x + iy), w + iz) HC = ((ax − by) + i(bx + ay), w + iz) = (ax − by, w) + i(bx + ay, w) − i(ax − by, z) + (bx + ay, z) = (a + ib)(x, w) + (ai − b)(y, w) + (b − ia)(x, z) + (a + ib)(y, z) = (a + ib)[(x, w) + i(y, w) − i(x, z) + (y, z)] = α(x + iy, w + iz) HC ;

Solutions to Exercises

369

(iv) conjugation: using the fact that (a, b) = (b, a) since H is real we have (x + iy, w + iz) HC = (x, w) + i(y, w) − i(x, z) + (y, z) = (w, x) − i(w, y) + i(z, x) + (z, y) = (w + iz, x + iy) HC . The space HC is a Hilbert space: it is isomorphic to H × H with the norm (x, y)2H ×H = x2H + y2H , and so complete (see Lemma 4.6). 14.2 (i) We have to show that TC is complex-linear and bounded. If a, b ∈ R, then TC ((a + ib)(x + iy)) = TC ((ax − by) + i(bx + ay)) = T (ax − by) + iT (bx + ay) = (a + ib)T x + (ia − b)T y = (a + ib)[T x + iT y] = (a + ib)TC (x + iy), so TC is complex-linear. To show that TC is bounded, we estimate TC (x + iy)2HC = T x2 + T y2 ≤ T 2B(H ) (x2 + y2 ) = T 2B(H ) x + iy2HC , so TC  ≤ T . But also T x = TC (x + i0), and so T  ≤ TC , which shows that TC  = T . If λ is an eigenvalue of T with eigenvector x, then TC (x + i0) = T x = λx = λ(x + i0); while if TC has eigenvalue λ ∈ R with eigenvector x + iy (with either x or y non-zero), it follows that T x + iT y = TC (x + iy) = λ(x + iy) = λx + iλy, and so T x = λx and T y = λy, so since x or y is non-zero, λ is an eigenvalue of T . 14.3 Suppose that Dα x = λx; then α j x j = λx j

(α j − λ)x j = 0

for every j. So either λ = α j or x j = 0. This shows that the only eigenvalues are {α j }∞ j=1 . Suppose now that λ ∈ C with λ ∈ / σp (Dα ); then |λ − α j | ≥ δ for every j ∈ N. For such λ note that [(Dα − λI )x] j = (α j − λ)x j ; since |α j − λ| ≥ δ for every j this map is one-to-one and onto, i.e. a bijection. So we can define its inverse by setting [(Dα − λI )−1 x] j = (α j − λ)−1 x j , and then |[(Dα − λI )−1 x] j | = |(α j − λ)−1 x j | ≤

1 |x j |, δ

which shows that [Dα − λI ]−1 : ∞ → ∞ is bounded. So Dα − λI is invertible and λ∈ / σ (Dα ). That σ (Dα ) = σp (Dα ) now follows from the fact that the spectrum is closed (and must contain the point spectrum).

370

Solutions to Exercises

Finally, given any compact subset K of C we can find a set {α j }∞ j=1 that is dense in K (see Exercise 2.14). 14.4 We use the Spectral Mapping Theorem (Theorem 14.9) and (14.3) σ (T ) ⊆ {λ ∈ C : |λ| ≤ T }. If λ ∈ σ (T ), then λn ∈ σ (T n ), so [rσ (T )]n = rσ (T n ), n and since σ (T ) ⊆ {λ : |λ| ≤ T n } it follows that we have rσ (T ) ≤ T n 1/n for every n, and so rσ (T ) ≤ lim infn→∞ T n 1/n . 14.5 This operator is the same as that in Exercise 11.7 for the particular choice K (x, y) = 1 and [a, b] = [0, 1]. In this case the general bound we obtained in the solution as (S.18) becomes T n  B(X ) ≤

1 . n!

It now follows from the previous exercise that  1 n rσ (T ) ≤ lim inf = 0, n→∞ n! so no non-zero λ is contained in the spectrum. Since T : X → X is not invertible we know that 0 ∈ σ (T ); T is not surjective, since for any f ∈ C([0, 1]) we have T f ∈ C 1 ([0, 1]). But 0 is not an eigenvalue, since ˆ x f (s) ds = 0 ⇒ f (x) = 0, x ∈ [0, 1] Tf =0 ⇒ 0

using the Fundamental Theorem of Calculus. 14.6 Simply note that F4 = id. It follows from the Spectral Mapping Theorem that if λ ∈ σ (T ), then λ4 = 1, so σ (T ) ⊂ {±1, ±i}. Chapter 15 15.1 Suppose that T is compact and take any sequence in T B X , i.e. a sequence (T xn ) with (xn ) ∈ B X . Then (xn ) is a bounded sequence in X , so (T xn ) has a convergent subsequence. Conversely, take any bounded sequence (xn ) ∈ X with xn  X ≤ R. The sequence (xn /R) ∈ B X , so (T (xn /R)) ∈ T B X has a convergent subsequence (T xn j /R). Since T (xn j /R) = R1 T xn j , T xn j converges, which shows that T is compact. 15.2 Let (xn ) be a bounded sequence in X . Suppose that S is compact: then (T xn ) is a bounded sequence in Y , and so S(T xn ) = (S ◦ T )xn has a subsequence that converges in Z , i.e. S◦T is compact. If T is compact, then (T xn ) has a subsequence that converges in Y , i.e. T xn j → y ∈ Y , and then since S is continuous (S ◦ T )xn j = S(T xn j ) → Sy, so once again S ◦ T is compact.

Solutions to Exercises

371

15.3 Consider the operators Tn : 2 → 2 where

j −1 x j 1 ≤ j ≤ n, (Tn x) j = 0 j > n. Then each Tn has finite-dimensional range, so is compact. We also have T x − Tn x22 =

∞  |x j |2 j=n+1

j2

1 x2 , (n + 1)2

so Tn → T in B(X ). That T is compact now follows from Theorem 15.3. Alternatively we can argue directly, using the compactness of the Hilbert cube Q from Exercise 9.6. Suppose that (x ( j) ) j is a bounded sequence in 2 , x ( j) 2 ≤ M. ( j)

Then, in particular, |xi | ≤ M for all i, j. It follows that (T x ( j) ) is a sequence in M Q, where Q is the Hilbert cube defined in Exercise 9.6. We showed there that Q is compact, so M Q is also compact, which shows that T is a compact operator. 15.4 Since K is continuous on the compact set [a, b] × [a, b], it is bounded and uniformly continuous on [a, b] × [a, b]. Let B be the closed unit ball in L 2 (a, b). We will show that T (B) is a bounded equicontinuous subset of C([a, b]), and so precompact in C([a, b]) by the Arzelà– Ascoli Theorem (Theorem 6.12). A precompact subset of C([a, b]) is also precompact in L 2 (a, b), since for any sequence ( f n ) ∈ C([a, b]) we have  f n − f m  L 2 ≤ (b − a)1/2  f n − f m ∞ . Using Exercise 15.1 it will follow that T : L 2 → L 2 is compact as claimed. First, if u ∈ B, then we have ,1/2 +ˆ ,1/2 +ˆ b b 2 2 |K (x, y)| dy |u(y)| dy |T u(x)| ≤ a

a

≤ (b − a)1/2 K ∞ u L 2 ≤ (b − a)1/2 K ∞ . Since K is uniformly continuous, given any ε > 0, we can choose δ > 0 so that |K (x, y) − K (x  , y)| < ε for all y ∈ [a, b] whenever |x − x  | < δ. Therefore if u ∈ B and |x − x  | < δ we have )ˆ ) ) b ) ) )   |T u(x) − T u(x )| = ) [K (x, y) − K (x , y)]u(y) dy ) ) a ) ≤

+ˆ b a

,1/2 |K (x, y) − K (x  , y)|2 dy

≤ [(b − a)ε2 ]1/2 = (b − a)1/2 Mε. Therefore T (B) is equicontinuous.

u L 2

372

Solutions to Exercises

15.5 Let {e j } be an orthonormal basis for H ; then ⎛ ⎞ ∞ ∞   Tu = T ⎝ (u, e j )e j ⎠ = (u, e j )T e j . j=1

j=1

Therefore, using the Cauchy–Schwarz inequality, T u ≤

∞ 

|(u, e j )|T e j 

j=1

⎛ ≤⎝

∞ 

⎞1/2 ⎛ |(u, e j )|2 ⎠

j=1

∞ 

⎞1/2 T e j 2 ⎠

= T HS u,

j=1

which shows that T  B(H ) ≤ T HS as claimed. 15.6 We have T e j 2L 2 =

ˆ b

|(κx , e j )|2 dx =

a

ˆ b a

|(K (x, ·), e j (·))|2 dx.

Therefore ∞ 

T e j 2L 2 =

j=1

=

∞ ˆ b  j=1 a

ˆ b a

|(κx , e j )|2 dx =

κx 2L 2 dx =

ˆ b a

ˆ bˆ b a

a

⎛ ⎝

∞ 

⎞ |(κx , e j )|2 ⎠ dx

j=1

|K (x, y)|2 dy dx < ∞,

and so T is Hilbert–Schmidt. 15.7 We show that S is Hilbert–Schmidt. Take the orthonormal basis {e( j) } j ; then (Se(k) )i =

∞ 

K i j δk j = K ik ,

i=1

so Se(k) 2 =

∞  i=1

|K ik |2

and

∞ 

Se(k) 2 =

k=1

∞ 

|K ik |2 < ∞.

k,i=1

15.8 If there is no such sequence, then we must have T x ≥ δ for all x ∈ S X . Since X if infinite-dimensional, we can use the argument from Theorem 5.5 to find (x j )∞ j=1 ∈ S X with xi − x j  X ≥ 1/2 whenever i = j. It follows that * * * xi − x j * 1 * ≥ δ, * T xi − T x j  = *T xi − x j  xi − x j  * i.e. T xi − T x j  ≥

δ . 2

Solutions to Exercises

373

This shows that (T xi ) has no convergent subsequence, but this contradicts the compactness of T . For a counterexample take the map T : 2 → 2 given by x x T (x1 , x2 , x3 , . . .) = (x1 , 2 , 3 , . . .). 2 3 This map is compact (see Exercise 15.3), but if T x = 0, then x j /j = 0 for every j, i.e. x = 0. 15.9 The operator T  is the composition of the compact operator T with the bounded operator σr , so it follows from Exercise 15.2 that T  is compact. To show that it has no eigenvalues, suppose first that λ = 0 is an eigenvalue; then x x (0, x1 , 2 , 3 , · · · ) = λ(x1 , x2 , x3 , x4 , . . .), 2 3 so λx1 = 0,

λx2 = x1 ,

λx3 = x1 ,

··· ;

but this implies that x j = 0 for every j, so λ is not an eigenvalue. If we try λ = 0, then we immediately obtain x j = 0 for every j also. So T  has no eigenvalues. 15.10 To see that T is not compact, consider the sequence (e(2 j−1) )∞ j=1 . This is a

bounded sequence, but (T e(2 j−1) ) = (e2 j ) which has no convergent subsequence, so T cannot be compact.

Chapter 16 16.1 (T (x + iy), x + iy) = (T x, x) − i(T x, y) + i(T y, x) + (T y, y) = (T x, x) + i(y, T x) − i(x, T y) + (T y, y), which shows that (T y, x) + (x, T y) = (T x, y) + (y, T x).

(S.21)

Also (T (x + y), x + y) = (T x, x) + (T x, y) + (T y, x) + (T y, y) = (T x, x) + (y, T x) + (x, T y) + (T y, y) which gives (T y, x) − (x, T y) = (y, T x) − (T x, y).

(S.22)

Adding (S.21) and (S.22) shows that (T y, x) = (y, T x)

for every x, y ∈ H,

and so T is self-adjoint. 16.2 Set y = T x, then (T x, T x) = 0, so T x2 = 0 so T x = 0. If H is complex, then we can use the equalities from the previous exercise: first 0 = (T (x + iy), x + iy) = −i(T x, y) + i(T y, x)

(T x, y) − (T y, x) = 0

and then 0 = (T (x + y), x + y) = (T x, y) + (T y, x).

374

Solutions to Exercises

Adding these yields (T x, y) = 0, and we can set y = T x as before. 16.3 Note first that ((β I − T )x, x) ≥ 0 and (T x, x) ≥ 0, for every x ∈ H since V (T ) ⊆ [0, β]. Now, since T is self-adjoint we can write β(T x, x) − T x2 = ((β I − T )T x, x) 1 = ((β I − T )T x, (T + β I − T )x) β 1 = [(β I − T )T x, T x) + (T (β I − T )x, (β I − T )x)] ≥ 0. β 16.4 We prove the result for α = inf V (T ), since the result for β is more similar to the proof of Theorem 16.3. Find a sequence (xn ) ∈ H with xn  = 1 such that (T xn , xn ) → α. Then T − α I is self-adjoint, V (T − α I ) ⊂ [0, β − α], and we have ((T − α I )x, x) ≥ 0 for every x ∈ H , so we can use the result of the previous exercise to obtain T xn − αxn 2 ≤ (β − α)((T − α I )xn , xn ) = (β − α)[(T xn , xn ) − α]. This shows that T xn − αxn → 0 as n → ∞. Now we argue as in the proof of Theorem 16.3 to deduce that α is an eigenvalue of T . For β = sup V (T ) we apply a similar argument to β I − T , for which we again have V (β I − T ) ⊂ [0, β − α]. 16.5 We have T e j = λ j e j , so every λ j is an eigenvalue. If T u = λu, then ∞ 

(λ j − λ)(u, e j )e j = 0;

j=1

if λ = λ j then (u, e j ) = 0 for every j and hence u = 0. So λ is not an eigenvalue of T . (i) We have + , ∞  2 2 2 2 |λ j | |(u, e j )| ≤ sup |λ j | u2 , T u = j

j=1

so if sup j |λ j | < ∞, then T is bounded. If T is bounded, then for u = ek we have T ek  = λk ek  = |λk | ≤ T ek  = T , so supk |λk | ≤ T . (ii) If (λ j ) is bounded, then T is self-adjoint: we have ⎛ ⎞ ∞ ∞   (T u, v) = ⎝ λ j (u, e j )e j , v ⎠ = λ j (u, e j )(e j , v) =

j=1 ∞ 

j=1 ∞ 

j=1

j=1

λ j (u, e j )(v, e j ) =

= ⎝u,

∞  j=1

(u, λ j (v, e j )e j )

λ j (v, e j )e j ⎠ = (u, T v).

Solutions to Exercises

375

(iii) Suppose that λ j → 0. Observe that each operator Tn :=

n 

λ j (u, e j )e j

j=1

is compact since its range is finite-dimensional (see Example 15.2). To show that T is compact we show that Tn → T in B(H ) and use Theorem 15.3. We have * *2 *  * ∞ ∞  * * * λ (u, e )e |λ j |2 |(u, e j )|2 (T − Tn )x2 = * j j j* ≤ * * j=n+1 * j=n+1 ≤ λ2n+1

∞ 

|(u, e j )|2 ≤ λ2n+1 u2 ,

j=n+1

which shows that T − Tn  B(H ) ≤ λn+1 and λn+1 → 0 as n → ∞ by assumption. If λ j → 0, then there is a δ > 0 and a subsequence with |λ jk | > δ, and then T e jk − T e jl 2 = λ jk e jk − λ jl e jl 2 > δ 2 , so (T e jk ) has no Cauchy subsequence, contradicting the compactness of T . (iv) Since T e j  = |λ j |, |λ j |2 < ∞ if and only if T e j 2 < ∞ (recall that the property of being Hilbert–Schmidt does not depend on the choice of orthonormal basis). 16.6 First note that [T u](x) = =

ˆ b a

K (x, y)u(y) dy

ˆ b ∞ a j=1

λ j e j (x)e j (y)u(y) dy =

∞ 

λ j e j (e j , u);

j=1

if we augment {e j } to an orthonormal basis of H , then this is an operator of the form in Exercise 16.5 if we assign the eigenvalue zero to the additional elements of the basis; the result then follows immediately. 16.7 For (i) we rewrite the kernel as K (t, s) = cos(t − s) = cos t cos s − sin t sin s, recalling (see Exercise 9.1) that cos t and sin t are orthogonal in L 2 (−π, π ). We have ˆ π K (t, s) sin s ds T (sin t) = −π ˆ π cos t cos s sin s − sin t sin2 s ds = −2π sin t = −π

and T (cos t) = =

ˆ π −π

ˆ π

−π

K (t, s) cos s ds cos t cos2 s − sin t sin s cos s ds = 2π cos t.

376

Solutions to Exercises

For (ii) we write K (t, s) in terms of Legendre polynomials from Example 10.8: +3 , +3 , , +3 , +3 8 3 3 5 2 5 2 t s + (3t − 1) (3s − 1) . K (t, s) = 4 2 2 5 8 8 . -4 4 3 t, 5 (3t 2 − 1) are orthonormal, the integral operator T has Since 2 8 T (t) = 4t

T (3t 2 − 1) =

and

8 2 (3t − 1). 5

16.8 (i) Given any n-dimensional subspace V of H , any (n + 1)-dimensional subspace W of H contains a vector orthogonal to V . Indeed, if {w j }n+1 j=1 is a basis of W , then PV w j ∈ V , where V is the orthogonal projection onto V . Since V is n-dimensional, the n + 1 vectors PV w j are linearly dependent, so there exists k such that  α j PV w j . PV wk = j =k

It follows that PV [wk −



α j w j ] = 0,

j =k

so wk − j =k α j w j ∈ W is orthogonal to V (note that this is non-zero, otherwise w1 , . . . , wn+1 would be linearly dependent). (ii) Suppose that x ∈ Span{e1 , . . . , en }, i.e. x = nj=1 α j e j ; then 

n n n 2 α λ e , α e j j j k k j=1 j=1 j=1 λ j |α j | = ≥ λn . (T x, x) = n n 2 2 j=1 |α j | j=1 |α j | It follows from (i) that if Vn is any n-dimensional subspace, then there is a vector in Span(e1 , . . . , en+1 ) contained in Vn⊥ , and so max Vn⊥

(T x, x) ≥ λn+1 , x2

using (ii). If we take V to be the n-dimensional space spanned by the first n eigenvectors, then V ⊥ is the space spanned by {e j }∞ j=n+1 , and then max V⊥

(T x, x) = λn+1 ; x2

this gives the required equality. 16.9 Suppose that xn + iyn is a bounded sequence in HC ; then (xn ) and (yn ) are bounded sequences in H . Since T is compact, there is a subsequence xn j such that T xn j has a convergent subsequence. Since (yn j ) is bounded in H , we can find a further subsequence such that both T xn  and T yn  are convergent. It follows that j

j

TC (xn  + iyn  ) = T xn  + iT yn  j

j

j

j

Solutions to Exercises

377

converges, so TC is compact. To show that TC is self-adjoint, we write (TC (x + iy), u + iv) HC = (T x + iT y, u + iv) = (T x, u) − i(T x, v) + i(T y, u) + (T y, v) = (x, T u) − i(x, T v) + i(y, T u) + (y, T v) = (x + iy, T u + iT v) = (x + iy, TC (u + iv)). Chapter 18 18.1 We have f  (t) = t p−1 − 1, so f  (t) ≥ 0 for all t ≥ 1, so f (1) ≤ f (t) for every t ∈ R. With 1/ p + 1/q = 1 we obtain f (1) = 0 and so 1 tp + − t ≥ 0. p q Choosing t = ab−q/ p now yields a p b−q 1 + ≥ ab−q/ p , p q and so ap bq + ≥ abq(1−1/ p) = ab, p q as required. 18.2 Note that p/( p − 1) and p are conjugate indices, so we have n 

|x j + y j | p ≤

j=1

n 

|x j + y j | p−1 |x j | +

j=1

⎛ ≤⎝

n 

n 

⎞( p−1)/ p ⎛ |x j + y j | p ⎠

j=1

⎛ +⎝

=⎝

⎞1/ p |x j | p ⎠

j=1

n 

n 

n 

⎞( p−1)/ p ⎛ |x j + y j | p ⎠

j=1

|x j + y j | p−1 |y j |

j=1

n 

⎞1/ p |y j | p ⎠

j=1

⎞( p−1)/ p |x j + y j | p ⎠

[x p +  y p ];

j=1

dividing both sides by (

p ( p−1)/ p yields j |x y + y j | )

⎛ x + y p = ⎝

n  j=1

⎞1/ p |x j + y j | p ⎠

≤ x p +  y p .

378

Solutions to Exercises

18.3 We apply Hölder’s inequality with exponents q/ p and q/(q − p) to obtain ˆ ˆ p/q ˆ (q− p)/q p | f (x)| p dx ≤ | f (x)|q dx 1 dx  f L p = 





p = ||(q− p)/q  f  L q ,

which yields (18.9). If 1 ≤ p ≤ q < ∞ the function f (x) = (1 + |x|)−n/q provides an example for which f ∈ L q (Rn ) but f ∈ / L p (Rn ). For 1 ≤ p < q = ∞ simply take f (x) = 1. 18.4 We show that the map x  → L x , defined for y ∈ c0 by setting L x ( y) =

∞ 

xj yj

j=1

is an isometric isomorphism. Note that L x is linear in y. We first show that it is a linear isometry. Take x ∈ 1 and y ∈ c0 ; then ) ) ) ) )∞ ) ) x j y j )) ≤ x1  y∞ |L x ( y)| = ) ) j=1 ) by Hölder’s inequality, which shows that L x ∈ (c0 )∗ with L x (c0 )∗ ≤ x1 .

(S.23)

The linearity of the map x  → L x is clear. We now show that it is an isometry, i.e. that we have equality in (S.23). This is clear if x = 0, so we take x = 0, and we choose y(n) ∈ c0 with

x j /|x j | x j = 0 and j ≤ n (n) (S.24) yj = 0 x j = 0 or j > n. Now  y(n) ∞ = 1, and |L x ( y(n) )| =

n 

⎛ |x j | = ⎝

j=1

n 

⎞ |x j |⎠  y(n) ∞ .

j=1

It follows that for every n we have L x (c0 )∗ ≥

n 

|x j |

j=1

and hence L x (c0 )∗ ≥ x1 . Combined with (S.23) this gives the desired equality of norms. We now show that the map x  → L x is onto. Given L ∈ (c0 )∗ , arguing as in the proof of Theorem 18.5, if we can find an x ∈ 1 such that L = L x , then, by applying this operator to e j ∈ c0 , we must have L(e j ) = L x (e j ) = x j .

Solutions to Exercises

379

If x is an element of 1 , then since we can write any y ∈ c0 as y = ∞ j=1 y j e j (with the sum convergent in c0 ), we will then have ⎞ ⎛ ∞ ∞ ∞    yjej⎠ = y j L(e j ) = xj yj L( y) = L ⎝ j=1

j=1

j=1

as required. So we need only show that x ∈ 1 . If we define y(n) as in (S.24), then n 

|x j | = |L( y(n) )| ≤ L(c0 )∗ uy (n) c0 = L(c0 )∗ ;

j=1

this bound does not depend on n so x ∈ 1 as required. 18.5 The linear functionals δx on X = C([−1, 1]) defined as δx ( f ) := f (x) for each x ∈ [−1, 1] clearly have norm 1. However, δx − δ y  X ∗ = 2; the upper bound is clear and we can find f ∈ X such that f (x) = − f (y) =  f ∞ . This gives an uncountable collection in X ∗ that are all a distance 2 apart, from which it follows that X ∗ cannot be separable. 18.6 (i) We have p

ˆ

 f L p =



| f | p dx ≤



12/(2− p) dx

(2− p)/2 ˆ 

| f |2 dx

p/2

p = ||1− p/2  f  2 . L (ii) Part (i) shows that if f ∈ L 2 , then f ∈ L p , so any linear functional  ∈ (L p )∗ also acts on any f ∈ L 2 , with

|( f )| ≤ (L p )∗  f  L p ≤ (L p )∗ ||1/ p−1/2  f  L 2 ; so  ∈ (L 2 )∗ . (iii) The function f k is an element of L 2 , since  f k  L ∞ ≤ k, so we can use f k in (18.10) to obtain ˆ ˆ ˆ f k g dx = |gk |q−2 gk g dx ≥ |gk |q dx. ( f k ) = 



However, we also have p

ˆ

 fk L p =





|gk |(q−1) p dx =

ˆ 

|gk |q ,

and so, since |( f k )| ≤ (L p )∗  f k  L p , it follows that q

q/ p

gk  L q ≤ (L p )∗ gk  L q , i.e. gk  L q ≤ (L p )∗ uniformly in k. The Monotone Convergence Theorem (Theorem B.7) now ensures that g ∈ L q . Chapter 19 19.1 First note that Lemma 8.11 implies that U is itself a Hilbert space with the same norm and inner product as H . Since φ ∈ U ∗ , we can use the Riesz Representation Theorem (Theorem 12.4) to find v ∈ U with v = φU ∗ such that

380

Solutions to Exercises

φ(u) = (u, v)

for every

u ∈ U.

Now define f (u) := (u, v) for every u ∈ H ; then f ∈ H ∗ extends φ (i.e. f (x) = φ(x) for every x ∈ U ) and  f  H ∗ = v = φU ∗ (see the proof of Theorem 12.4) as required. 19.2 We have one extension f with f (x) = (x, v) with v ∈ U . Now suppose that f  ∈ H ∗ is another extension of fˆ to H with the same norm. Then there exists (by the Riesz Representation Theorem) v  ∈ H such that f  (x) = (x, v  ) for every x ∈ H , and  f   H ∗ = v   H ; but then, since these two functionals must agree on U , we have (u, v  ) = (u, v)

u ∈ U,

so v  = v + q with q ∈ U ⊥ . But then v  2 = v2 + q2 , and since v   = v it follows that q = 0, so v  = v. 19.3 We extend f ‘by continuity’: if y ∈ U , then there exist (xn ) ∈ U such that xn → y, and we define F(y) = lim f (xn ). n→∞

This is well defined, since if we also have (yn ) ∈ U with yn → x, then | f (xn ) − f (yn )| = | f (xn − yn )| ≤ Mxn − yn  → 0

and

n → ∞.

Furthermore, F is linear, since for x, y ∈ U , α, β ∈ K, we can find sequences (xn ), (yn ) ∈ U such that xn → x and yn → y, and then, as αxn + βyn ∈ U and αxn + βyn → αx + βy, F(αx + βy) = lim f (αxn + βyn ) = lim α f (xn ) + β f (yn ) = α F(x) + β F(y). n→∞

n→∞

Finally, we have | f (xn )| ≤ Mxn  for all n, and since xn  → x (as xn → x) we have |F(x)| ≤ Mx as required. 19.4 For (i) we use p(αx) = αx with α = 0. For (b) note that we have p(x) + p(y − x) ≤ p(y)

and

p(y) + p(x − y) ≤ p(x),

and since p(y − x) = p(x − y) the inequality follows. For (c) put y = 0 in (b). Finally, for part (d), if p(x) = p(y) = 0 and α, β ∈ K, then 0 ≤ p(αx + βy) ≤ |α| p(x) + |β| p(y) = 0, so p(αx + βy) = 0. 19.5 Suppose that f, g ∈ X ∗ with f (x) = g(x) = φ(x) for all x ∈ U and that  f  X ∗ = g X ∗ = φU ∗ . Then, for any λ ∈ (0, 1) and x ∈ U , we have [λ f + (1 − λ)g](x) = (1 − λ) f (x) + λg(x) = φ(x), and furthermore, λ f + (1 − λ)g X ∗ ≤ (1 − λ) f  X ∗ + λg X ∗ = φU ∗ . Since λ f + (1 − λg) = φ on U , we also have λ f + (1 − λ)g X ∗ ≥ φU ∗ , and so λ f + (1 − λ)g X ∗ = φU ∗ .

Solutions to Exercises

381

19.6 Let {ζ j }∞ j=1 be the countable sequence of unit vectors whose linear span is dense in X . Let W0 = W and i 0 = 0. Given Wn and i n , choose i n+1 to be the smallest index > i n so that ζi n+1 ∈ / Wn and let z n+1 = ζi n+1 . The resulting collection (z j ) has the required properties. 19.7 First, if W is closed use Exercise 19.3 to extend φ to a bounded linear functional φ on W with the same norm. It therefore suffices to prove the result assuming from the outset that W is closed. Consider the sequence of linear subspaces and elements (z j ) from Exercise 19.6. Given a linear functional f n ∈ Wn∗ with  f n ∗ = φW ∗ we can use the extension argument from the proof of the Hahn–Banach Theorem in Section 19.1 to extend f n ∗ to f n+1 ∈ Wn+1 with  f n+1 ∗ =  f n ∗ = φ∗ . In this way we can define a linear functional f ∞ on W∞ by letting f ∞ (x) = f n (x) for any x ∈ Wn (this is well defined, since if x ∈ Wn ∩ Wm with m > n we know that f m extends f n ). Finally, we use Exercise 19.3 again to extend f ∞ to an element f ∈ X ∗ (since X = W∞ ) that satisfies the same bound. Chapter 20 20.1 If x = 0 we take f (y) = (y, z) for any z ∈ H with z = 1. For x = 0 we set f (y) = (y, x/x); then  f  H ∗ = 1 and f (x) = x. 20.2 Let U = Span(x j )nj=1 , and define a linear functional φ : U → K by setting ⎛ ⎞ n n   φ⎝ αjej⎠ = αjaj. j=1

j=1

Since U is finite-dimensional, φ is bounded, i.e. φ ∈ U ∗ . Now extend φ to f ∈ X ∗ using the Hahn–Banach Theorem. 20.3 If x ≤ M, then | f (x)| ≤  f ∗ x ≤ M for all f ∈ X ∗ with  f ∗ ≤ 1. Conversely, if | f (x)| ≤ M for all f ∈ X ∗ with  f ∗ = 1, we can take f to be the support functional for x, which gives | f (x)| = x ≤ M. The equality for x follows from these two facts. 20.4 Since Y is a proper subspace of H , we can decompose x = u + v, where u ∈ Y and v ∈ Y ⊥ ; note that d = v. Set z = v/v and define f ∈ H ∗ by setting f (a) = (a, z). Then  f  = 1, f (y) = (y, z) = 0 for every y ∈ Y, and

v f (x) = f (u + v, z) = v, v

= v = d

as required. 20.5 Take x ∈ X with x = 0 and suppose that there are distinct f, g ∈ X ∗ with  f  X ∗ = g X ∗ = 1 and f (x) = g(x) = x X .

382

Solutions to Exercises

Then, since X ∗ is strictly convex,  12 ( f + g) X ∗ < 1, and so 2x X = | f (x) + g(x)| = 2|[ 12 ( f + g)(x)]| < 2x X , a contradiction. 20.6 A Hilbert space is uniformly convex (Exercise 8.10) and hence strictly convex (Exercise 10.4), so by Exercise 20.5 there is a unique linear functional f ∈ X ∗ such that  f  X ∗ = x X such that f (x) = 1, and this is just (·, x). For this functional we have f (T x) = (T x, x) and so V (T ) = {(T x, x) : x ∈ H } as in Exercise 16.3. 20.7 Take Y = {0} and x = 0. Then dist(x, Y ) = x, and so we have f ∈ X ∗ with  f  X ∗ = 1 and f (x) = x. 20.8 For any y ∈ Ker f we have | f (x)| = | f (x − y)| ≤  f  X ∗ x − y;  X ∗ dist(x, Ker(

so that | f (x)| ≤  f f )). For the reverse inequality, we use the fact that for every ε > 0 there exists u ∈ X with u = 1 such that f (u) >  f  X ∗ − ε and set y=x−

f (x) u. f (u)

Then f (y) = 0, i.e. y ∈ Ker( f ), and so dist(x, Ker( f )) ≤ x − y =

| f (x)| | f (x)| < . | f (u)|  f X ∗ − ε

Since this holds for all ε > 0, it follows that dist(x, Ker( f )) ≤ | f (x)|/ f  X ∗ . Combining these two inequalities yields the result. 20.9 T is linear and since each φ has norm 1 we have T x∞ ≤ sup |φn (x)| ≤ sup x = x, n

n

and so T  ≤ 1. We now have to show that T x∞ ≥ x for every x ∈ X ; since T is linear it is sufficient to do this for x with x = 1. Given such an x and ! > 0 choose xn from the dense sequence with x − xn  < !, and then φn (x) = φn (xn ) + φ(x − xn ) = 1 + φ(x − xn ) ≥ 1 − φn x − xn  > 1 − !. So T x∞ > 1 − ! and since ! was arbitrary it follows that T x∞ ≥ 1 as required. 20.10 If z ∈ / clin(E), then Proposition 20.4 furnishes an f ∈ X ∗ so that f |clin(E) = 0 but f (z) = 0, which is disallowed by our assumption. If f ∈ X ∗ vanishes on E, then it vanishes on Span(E) (finite linear combinations of elements of E), and so vanishes on clin(E) since it is continuous: if x ∈ clin(E), then there exist xn ∈ Span(E) such that xn → x, and then f (x) = limn→∞ f (xn ) = 0. So if there exists f ∈ X ∗ that vanishes on E but for which f (z) = 0, then we have | f (z)| = | f (z − y)| ≤  f  X ∗ z − y

Solutions to Exercises

383

for any y ∈ clin(E). It follows that dist(z, clin(E)) = inf{z − y : y ∈ clin(E)} ≥

| f (z)| > 0.  f X ∗

20.11 If T ∈ B(H, K ), then we have T × ∈ B(K ∗ , H ∗ ) defined by T × g = g ◦ T . The Hilbert adjoint is defined by setting (T u, v) K = (u, T ∗ v) H , or R K v(T u) = R H (T ∗ v)(u). We can rewrite this as (R K v) ◦ T = (R H ◦ T ∗ )v, or, using the Banach adjoint, (T × ◦ R K )v = (R H ◦ T ∗ )v. This shows that T × ◦ R K = R H ◦ T ∗ , and so, applying R −1 H to both sides, we obtain × T ∗ = R −1 H ◦ T ◦ RK .

20.12 Let B be a bounded subset of Y ∗ ; we need to show that T × (B) is precompact, i.e. has compact closure. If we let K = T (B X ), then K is a compact subset of Y , since T is compact. Since any f ∈ Y ∗ is a continuous map from Y into K, we can consider its restriction to K , which gives an element of C(K ). So we can think of B as a subset of C(K ), and for any f ∈ B and any y1 , y2 ∈ K we have | f (y1 ) − f (y2 )| ≤  f Y ∗ y1 − y2 Y

and

| f (y)| ≤  f Y ∗ y,

the second inequality since 0 ∈ K . So B is a bounded equicontinuous family in C(K ), so precompact by the Arzelà–Ascoli Theorem. Now take any sequence ( f n ) ∈ B. By the above, ( f n ) has a subsequence ( f n j ) j that converges uniformly on K . Therefore ( f n j ◦ T ) ∈ X ∗ converges uniformly on B X ; since X ∗ is complete, f n j ◦ T → g in X ∗ , for some g ∈ X ∗ . Since T × f n j = f n j ◦ T , it follows that T × f n j → g in X ∗ , which shows that T × is compact. 20.13 Since (xn ) converges, it is bounded, with |xn | ≤ M, say. Since xn → α, given any ε > 0 there exists N such that |xn − α| < ε/2 for all n ≥ N . Now choose N  ≥ N sufficiently large that N (M + |α|)/N  < ε/2, and then ) ) ) ) ) x1 + · · · + xn ) ) (x1 − α) + · · · + (xn − α) ) ) ) ) ) − α) = ) ) ) n n ) ) ) ) ) (x1 − α) + · · · + (x N − α) ) ) (x N +1 − α) + (xn − α) ) ) ) ) ) ≤) )+) ) N N ≤

ε (N  − N )ε/2 + < ε, 2 N

Chapter 21 21.1 We have p(x) = inf{λ > 0 : λ−1 x ∈ B X } = inf{λ > 0 : λ−1 x < 1} = x.

384

Solutions to Exercises

21.2 If a, b ∈ ∩α∈A K α and λ ∈ (0, 1) then λa + (1 − λ)b ∈ K α for each α ∈ A, since a, b ∈ K α and K α is convex. Since this holds for every α ∈ A, it follows that λa + (1 − λ)b ∈ ∩α K α and hence ∩α∈A K α is convex. 21.3 Suppose that a, b ∈ K ; then there exist sequences (an ), (bn ) ∈ K such that an → a and bn → b. If we take λ ∈ (0, 1), then λan + (1 − λbn ) ∈ K and λan + (1 − λ)bn → λa + (1 − λ)b, so λa + (1 − λb) ∈ K which shows that K is convex. 21.4 We will show that conv(U ) is totally bounded, i.e. that for any ε > 0 we can find a cover of U by a finite number of open balls of radius ε. It then follows from Exercise 6.10 that conv(U ) is compact. Given ε > 0 find a cover of U by a finite number of balls of radius ε/2 with centres in F := {x1 , . . . , xk }. The line segments L i = {λxi : λ ∈ [0, 1]} are all convex sets, so their sum L := L 1 + · · · + L k is also convex, and contains all the xi . Since L is compact, there exists a finite set G such that L ⊂ G + 2ε B X . Then ε ε ε ε U ⊂ F + BX ⊂ L + BX ⊂ G + BX + BX = G + ε BX . 2 2 2 2 Since the set L + 2ε B X is convex and contains U , it must also contain conv(U ), so conv(U ) ⊂ G + ε B X . In other words, conv(U ) has a finite cover by balls of radius ε for every ε > 0. It follows that conv(U ) is compact. 21.5 Take any x ∈ conv(U ) with x=

k 

λjxj,

x j ∈ U and

where

j=1

k 

λ j = 1,

j=1

and suppose that k > n + 1. Then k − 1 > n and so the k − 1 vectors {x2 − x1 , x3 − x1 , . . . , xk − x1 } are linearly dependent: there exist {α j }kj=2 , not all zero, such that k 

α j (x j+1 − x1 ) = 0.

j=2

If we set α1 = −

k

j=2 α j , then k 

αjxj = 0

and

j=1

k 

α j = 0.

j=1

Since not all the α j are zero and α j = 0, it follows that at least one of them, αi , say, is positive. For any choice of γ ∈ R we have x=

k  j=1

λjxj − γ

k  j=1

αjxj =

k 

[λ j − γ α j ]x j .

j=1

(S.25)

Solutions to Exercises

385

We choose γ = min

. λj : α j > 0, j = 1, . . . , k ; αj

then λ j − γ α j ≥ 0 for each j = 1, . . . , k and λi − γ αi = 0 for some i ∈ {1, . . . , k}. The sum in (S.25) is therefore a convex combination of the (x j ), since we have λ j − γ α j ≥ 0 and kj=1 λ j − γ α j = 1. However, we can remove the term from the sum in which λi − γ αi = 0. We continue in this way until we have no more than n + 1 terms in the sum. 21.6 If K is convex, then for any x, y and λ ∈ (0, 1) we have λd(x) + (1 − λ)d(y) = λ inf x − a + (1 − λ) inf y − b a∈K

b∈K

= inf λx − λa + inf (1 − λy) + (1 − λ)b a∈K

b∈K

  = inf λx − λa + (1 − λy) + (1 − λ)b a,b∈K

≥ inf |[λx + (1 − λy)] − [λa + (1 − λ)b]



a,b∈K

≥ inf [λx + (1 − λy)] − k k∈K

= d(λx + (1 − λ)y), since λa + (1 − λb) ∈ K . For the converse simply take a, b ∈ K : then 0 ≤ d(λa + (1 − λb)) ≤ λd(a) + (1 − λ)d(b) = 0, so d(λa + (1 − λb)) = 0, i.e. λa + (1 − λb) ∈ K (since K is closed). Chapter 22 22.1 First note that finite-dimensional subspaces are complete so they must be closed, and they contain no open balls (otherwise they would be infinite-dimensional) so they are nowhere dense. If we let Yn = Span(x1 , . . . , xn ), then Yn is nowhere dense, and since X cannot be given as the countable union of nowhere dense subsets, it follows that X = ∪n Yn . 22.2 Consider again the sets Fn = {x ∈ X : sup T x ≤ n} T ∈S

from the proof of the Principle of Uniform Boundedness. These sets are closed and must all have empty interior (so be nowhere dense), since if one set contained a nonempty open ball we could follow the proof of the Principle of Uniform Boundedness to show that supT ∈S T  < ∞. The complements of these sets Fn , G n = {x ∈ X : sup T x > n}, T ∈S

386

Solutions to Exercises

are therefore open and dense, and so their intersection ∞ 5

G n = {x ∈ X : sup T x = ∞} T ∈S

n=1

is residual. m j 22.3 For any particular polynomial p we have p(x) = j=0 a j x , i.e. p can be written in the form in (22.5), with a j = 0 for all j ≥ N . So |Tn p(x)| ≤ min(n, N ) max |a j |

sup |Tn p(x)| ≤ N max |a j | < ∞.

j

n

n

j

j However, if we take pn (x) = j=0 x , then  pn  = 1, but Tn pn = n, so we have Tn  X ∗ ≥ n. In order not to contradict the Principle of Uniform Boundedness, it must be the case that (X,  · ) is not a Banach space, i.e. not complete.

22.4 We treat the case when Y is a Banach space. Define a map T : X → B(Y, Z ) by setting (T x)(y) = b(x, y); then since b(x, y) is linear and continuous in Y we have T x ∈ B(Y, Z ) with (T x)(y) Z = b(x, y) Z ≤ C(x)yY . Now consider the collection {T x : x ∈ B X }, which is a subset of B(Y, Z ). Arguing as above, for each y ∈ Y we have T x(y) Z = b(x, y) Z ≤ C(y)x since x  → b(x, y) is linear and continuous for each y ∈ Y . This shows, in particular, that sup T x(y) Z ≤ C(y) < ∞ x∈B X

for each y ∈ Y . The Principle of Uniform Boundedness therefore guarantees that sup T x B(Y,Z ) ≤ M.

x∈B X

It follows, since T is linear, that T x B(Y,Z ) ≤ Mx X , and therefore b(x, y) Z = (T x)(y) Z ≤ T x B(Y,Z ) y|Y ≤ Mx X yY . 22.5 First recall (see Lemma 3.10) that  p ⊂ q . We let Sn := {x ∈  p : x p ≤ n}. Now suppose that (x (k) ) ∈  p with ∞ 

(k)

|x j | p ≤ n p

j=1

and x (k) → x in q . Then in particular, for any N we have N  j=1

(k)

|x j

− x j |q → 0,

Solutions to Exercises (k)

which implies that x j

387

→ x j for every k. So we have N 

N 

|x (k) | p ≤ lim

n→∞

j=1

(k)

|x j | p ≤ n p .

j=1

This shows that Sn is closed in q . However, Sn contains no open sets (so its interior is empty). To see this, take some y ∈ q \  p (see Exercise 3.7): then x + ε y ⊂ Bq (x, ε) for any x ∈ Sn and ε > 0, but x∈ / p. 6 Since Sn is closed and nowhere dense in q , it follows that  p = ∞ n=1 Sn is meagre q in  . 22.6 Consider the maps Tn :  p → K given by Tn ( y) :=

n 

xj yj.

j=1

Then Tn ∈ B( p ; K) for each n, since by Hölder’s inequality ) ) ⎛ ⎞1/q ⎛ ⎞1/ p ⎛ ⎞1/q ) ) n n n    ) n ) q p q ⎝ x j y j )) ≤ ⎝ |x j | ⎠ |y j | ⎠ =⎝ |x j | ⎠  y p . |Tn ( y)| = )) ) j=1 ) j=1 j=1 j=1 (S.26) If we choose

|x j |q /x j 1 ≤ j ≤ n and x j = 0 yj = 0 otherwise then |Tn ( y)| =

n 

⎛ |x j |q ≤ Tn  y p = Tn  ⎝

j=1

⎛ = Tn  ⎝

n 

n 

⎞1/ p

⎞1/ p |x j |(q−1) p ⎠

j=1

|x j |q ⎠

,

j=1

which shows, since 1 − (1/ p) = (1/q), that ⎛ ⎞1/q n  q |x j | ⎠ , Tn  ≥ ⎝ j=1

which combined with (S.26) yields ⎛ Tn  = ⎝

n  j=1

⎞1/q |x j |q ⎠

.

(S.27)

We also know that supn |Tn (y)| < ∞, since ∞ j=1 x j y j converges. The Principle of Uniform Boundedness now guarantees that supn Tn  < ∞; given (S.27), this implies that x ∈ q .

388

Solutions to Exercises

Chapter 23 23.1 If no (yn ) exists such that αn yn < ∞, then the map T : ∞ → 1 is not only one-to-one but also onto. Since T is bounded, the Inverse Mapping Theorem would then guarantee that T is an isomorphism. However, ∞ is not isomorphic to 1 (∞ is not separable but 1 is, and separability is preserved under isomorphisms); so in fact there must exist such a sequence (yn ). 23.2 If the {e j } are a basis, then their linear span is dense, and, using Corollary 23.5, we have * * * ⎛ * * ⎞* * n * 1 * * * 2 * m  * * * * * * m * * * * * * ⎝ ⎠ P ≤ sup = a e a e P  a e n i i* j j * j j*. * n * * n *i=1 * * * * * j=1 j=1 Let Y = Span({e j }), which is dense in X by (i). Not also that it follows from (ii), using induction, that the {e j } are linearly independent and * * * * * m * * n < m. |an | = |an |en  ≤ 2K * a e j j* * * j=1 * Therefore the maps n : Y → Span(e1 , . . . , en ) defined by setting ⎛ ⎞ min(m,n) m   ajej⎠ = ajej n ⎝ j=1

j=1

are bounded linear projections on Y with norm at most K . Since Y is dense in X , it follows using Exercise 19.3 that each map n can be extended uniquely to a map Pn : X → Span(e1 , . . . , en ) that satisfies Pn  B(X ) ≤ K . Now, given any x ∈ X and ε > 0, find y = mj=1 a j e j such that y − x < ε; then, for all n > m, we have x − Pn x ≤ x − y + y − Pn y + Pn y − Pn x ≤ ε + 0 + Pn ε ≤ (1 + K )ε, and so Pn x → x as n → ∞. 23.3 We will show that if xn → x and T xn → y, then y = T x, and then the boundedness of T follows from the Closed Graph Theorem. Take any z ∈ H ; then (T xn , z) = (xn , T z). Letting n → ∞ on both sides we obtain (y, z) = (x, T z) = (T x, z)

for every

z ∈ H,

and so y = T z as required. 23.4 Clearly T is not bounded: if we set f n (x) = x n , then f n (x) = nx n−1 , so  f n ∞ = 1

and

 f n ∞ = n.

Solutions to Exercises

389

However, suppose that ( f n ) ∈ X , with f n → f and f n → g in the supremum norm; then f n is a Cauchy sequence in C 1 , and so f n → g with g ∈ C 1 and f n = f  (see Theorem 4.12). 23.5 Suppose that (xn ) ∈ X with xn → x and T xn → f , f ∈ X ∗ . Since (T xn − T y)(xn − y) ≥ 0, we can take n → ∞ to obtain ( f − T y)(x − y) ≥ 0. Set y = x + t z with z = 0 then 0 ≤ ( f − T x − t T z)(t z) = t[ f (z) − T x(z)] − t 2 (T z)(z). If this is to hold for all t ∈ R, then we must have f (z) = T x(z), i.e. T x = f . The Closed Graph Theorem now implies that T is bounded. Chapter 26 26.1 Take F ∈ X ∗∗ . We need to find x ∈ X such that, for every f ∈ X ∗ we have F( f ) = f (x). Note that F ◦ TY : Y → K, and is both linear and bounded, since |F ◦ TY (y)| ≤ F X ∗∗ TY (y) X ∗ ≤ F X ∗∗ TY  B(Y,X ∗ ) yY . It follows that F ◦ TY ∈ Y ∗ , and so we can find an x ∈ X such that F ◦ TY = TX x. Now, given f ∈ X ∗ , apply both sides to TY−1 f to obtain F( f ) = [TX x](TY−1 f ) = (TY TY−1 f )(x) = f (x) as required. 26.2 If U is bounded, then u ≤ M for all u ∈ U and some M > 0, and then | f (u)| ≤  f u ≤ M f . To show the opposite implication, consider the family of maps S = {u ∗∗ : u ∈ U } that all lie in X ∗∗ . Then for every f ∈ X ∗ we have sup |u ∗∗ ( f )| = sup | f (u)| < ∞, u∈U

u∈U

so the Principle of Uniform Boundedness guarantees that sup u ∗∗  < ∞. u∈U

But we know that u ∗∗  = u, and so supu∈U u < ∞, i.e. U is bounded. 26.3 If T is not bounded, then there exist xn ∈ X such that xn  = 1 and T xn  ≥ n. The (T xn ) also form an unbounded sequence when considered as elements of Y ∗∗ , so there is a functional φ ∈ Y ∗ for which φ(T xn ) is unbounded. (If this were not the case, then for each φ ∈ Y ∗ , the scalars (T xn )∗∗ (φ) would be bounded, and then the

390

Solutions to Exercises

Principle of Uniform Boundedness would show that T xn Y ∗∗ is bounded uniformly in n.) It follows that φ ◦ T is not a bounded functional on X . 26.4 Define an element ∈ X ∗∗ by setting ˆ T f (ξ(t)) dt

( f ) = 0

for every f ∈ X ∗ .

(S.28)

This map is clearly linear, and it is bounded since +ˆ , ˆ T T | ( f )| ≤  f  X ∗ ξ(t) X dt ≤ ξ(t) X dt  f  X ∗ 0

0

´T

and 0 ξ(t) X dt < ∞. Since X is reflexive, it follows that there exists an element y ∈ X such that = y ∗∗ , i.e. for all f ∈ X ∗ .

( f ) = f (y) Using (S.28) this yields f (y) =

ˆ T 0

f (ξ(t)) dt.

(S.29)

To prove the inequality, let f ∈ X ∗ be the support functional for y, i.e.  f  X ∗ = 1 and f (y) = y X ; then from (S.29) we obtain *ˆ * ˆ T * T * * * ξ(t) dt * = y X = f (y) = f (ξ(t)) dt * * 0 * 0 X ˆ t ˆ T  f  X ∗ ξ(t) X dt = ξ(t) X dt. ≤ 0

0

26.5 Suppose that T does not map L q onto (L p )∗ , i.e. that there exists  ∈ (L p )∗ that cannot be realised as T (g) for any g ∈ L q . Since T (L q ) is a closed subspace of (L p )∗ , it follows using the distance functional from Proposition 20.4 that there exists some non-zero F ∈ (L p )∗∗ such that F() = 0 for every  ∈ T (L q ). Since L p is reflexive it follows that there exists f ∈ L p such that F() = ( f )

for every  ∈ (L p )∗ .

It follows that ( f ) = 0 for every  ∈ (L p )∗ , and so f = 0, which in turn implies that F = 0. But this contradicts the fact that F is non-zero, so T must be onto. Chapter 27 27.1 We know that, given any linear subspace M and y = M, there exists an f ∈ X ∗ such that f | M = 0 and f (y) = 0. So if f (x) = 0 for every f ∈ X ∗ such that f | M = 0 we must have x ∈ M. So if xn ∈ M and xn  x, then for all such f we have f (x) = limn→∞ f (xn ) = 0 and so x ∈ M. (Or, using Theorem 27.7, M is convex so closed implies weakly closed.) 27.2 Consider the bounded sequence (e( j) ) in 1 , and suppose that (e(n k ) ) is a weakly convergent subsequence. Choose f ∈ ∞ such that f n k = (−1)k and is zero otherwise. Then L f (e(n k ) ) = (−1)k which does not converge as k → ∞. It follows that 1 is not

Solutions to Exercises

391

reflexive, since the unit ball is weakly (sequentially) compact in any reflexive Banach space. 27.3 The closed convex hull H of (xn ) is closed and convex, so it is also weakly closed. It follows that x is contained in H . So x can be approximated by finite linear convex combinations of the xn . Now, denote by conv(A) the convex hull of A, and notice that conv({xn }∞ n=1 ) =

conv{x1 , . . . , xk };

k=1

it follows that their closures are equal. We know that x∈

conv{x1 . . . , xk },

k=1

and so

0 = lim dist ⎝x, n→∞

n

⎞ conv{x1 , . . . , xk }⎠ = lim dist(x, conv{x1 , . . . , xn }). n→∞

k=1

It follows that we can choose a sequence (yn ) with yn ∈ conv{x1 , . . . , xn } such that yn → x. 27.4 If x = 0, then the result is immediate. So assume that x = 0. For n sufficiently large that xn  = 0, set xn x yn = and y= . xn  x Then yn  y and yn  = 1. It follows yn + y  2y and so y ≤ lim inf (yn + y)/2. n→∞

But also y = 1 and yn  = 1, so lim sup  12 (yn + y) ≤ 1,

(S.30)

n→∞

whence  12 (yn + y) → 1. The uniform convexity of X implies that yn − y → 0 as n → ∞: suppose not; then there is an ε > 0 and a subsequence yn k such that yn k − y > ε. But then it follows from the uniform convexity of X that * * * yn k + y * * * * 2 * n, (n)

→ 1 as n → ∞ for every j, and x (n) ∞ = 1. So if the lemma held ∞ in  we would have x (n)  1, where 1 is the sequence consisting entirely of 1s. However, L(x (n) ) = 0 for every n, but L(1) = 1, so x (n)  1.

so that x j

27.7 We have e( j)  0 as j → ∞ (see the beginning of Chapter 27). Since T is compact, we then have T e( j) → 0 using Lemma 27.4. 27.8 We first prove the ‘if’ part. Let fˆ(x) = f (x) − f (0); then, given ε > 0, there exists δ(ε) > 0 such that | fˆ(x)| < ε whenever |x| < 2δ(ε). Define ⎧ ˆ ⎪ ⎨ f (x)   2δ(ε) ≤ |x| ≤ 1 ˆf ε (x) = fˆ(x) |x| − 1 δ(ε) < |x| < 2δ(ε) δ(ε) ⎪ ⎩ 0 |x| ≤ δ(ε). Then  fˆε − fˆ∞ < ε and fˆε = 0 on (−δ/2, δ/2). It follows using (27.11) that ˆ 1 as n → ∞ fˆε (t)φn (t) dt → 0 −1

and, using (27.12), ) ˆ )ˆ ˆ 1 ) ) 1 1 ) ) ˆ ˆ | fˆε (t) − fˆ(t)|φn (t) dt ≤ ε. f (t)φn (t) dt − f ε (t)φn (t)) ≤ ) ) ) −1 −1 −1 Therefore, using (27.13) again, )ˆ ) )ˆ ) ) 1 ) ) 1 ) ) ) ) ) ˆ = f (t)φ (t) dt − f (0) (t) dt f (t)φ ) ) ) ) n n ) −1 ) ) −1 )

Solutions to Exercises

393

)ˆ ) ) 1 ) ) ) ≤) fˆε (t)φn (t) dt ) + ε, ) −1 ) which proves (27.10). For ‘only if’, (27.11) is clearly required since it is a consequence of (27.10) with f ≡ 1 on [−1, 1], and (27.12) is immediate from (27.10) when g(0) = 0. The only thing left to prove is the boundedness in (27.13); to see this, we regard ˆ kn (t) f (t) dt f →

−1 ∗ ∗ as an element Fn of C([−1, 1]) , and (27.10) says that Fn  δ0 (where δ0 ( f ) := f (0) ∗ for f ∈ C([−1, 1])), so by Fn C must be bounded. We showed in Example 11.9 that

Fn C ∗ =

ˆ 1

−1

|kn (t)| dt,

and so (27.13) now follows. 27.9 Each Cn is clearly closed, convex, and non-empty; since Cn is a subset of B X it is also bounded. Suppose that x ∈ Ck for some k; then, for each ε > 0, there is an m(ε) such x − y < ε for some y ∈ conv{xk , . . . , xk+m(ε) } such that x − y < ε. It follows that n > k + m(ε). | f n (x)| = | f n (x − y)| < ε This shows that f n (x) → 0 as n → ∞. However, since f k (x j ) ≥ θ for all j ≥ k, it follows that we have f k (z) ≥ θ for all z ∈ conv{xk , xk+1 , xk+2 , . . .}. In particular, if x ∈ ∩n Cn , then we must have f n (x) ≥ θ for each n. But this contradicts the fact that f n (x) → 0, and so ∩n Cn = ∅. 27.10 For each n choose some xn ∈ Cn ; then, since (xn ) is bounded, there is a sub/ Ck for some k then by sequence (xn j ) that converges weakly to some x ∈ X . If x ∈ Corollary 21.8 there exists f ∈ X ∗ such that f (x) < inf f (y) ≤ f (xn j ) y∈Ck

for all j, since xn j ∈ Cn j ⊂ Ck . It follows that f (x) ≤ lim f (xn j ) = f (x), j→∞

References

Adams, R.A. (1975) Sobolev spaces. Academic Press, New York. Banach, S. ([1932] 1978) Théorie des opérations linéaires. Repr. Chelsea, New York. Bergman, G. M. (1997) The Axiom of Choice, Zorn’s Lemma, and all that. https:// math.berkeley.edu/∼gbergman/grad.hndts/AC+Zorn+.ps. Bollobás, B. (1990) Linear analysis. Cambridge University Press, Cambridge. Brezis, H. (2011) Functional analysis, Sobolev spaces and partial differential equations. Springer, New York. Brown, A. L. and Page, A. (1970) Elements of functional analysis. Van Nostrand Reinhold, London. Carothers, N. L. (2005) A short course on Banach space theory. Cambridge University Press, Cambridge. Clarkson, J. A. (1936) Uniformly convex spaces. Trans. AMS 40, 396–414. Costara, C. and Popa, D. (2003) Exercises in functional analysis. Kluwer Academic, Dordrecht, Netherlands. Eberlein, W. F. (1947) Weak compactness in Banach spaces. I. Proc. Natl. Acad. Sci. U.S.A. 33, 51–3. Enflo, P. (1973) A counterexample to the approximation problem in Banach spaces. Acta Math. 130, 309–17. Evans, L. C. (1998) Partial differential equations. American Mathematical Society, Providence, RI. Friedberg, S., Insel, A. and Spence, L. (2014) Linear algebra. Pearson, Harlow. Giles, J. R. (2000) Introduction to the analysis of normed linear spaces. Cambridge University Press, Cambridge. Goffman, C. and Pedrick, G. (1983) First course in functional analysis. Chelsea, New York. Hartman, P. (1973) Ordinary differential equations. John Wiley, Baltimore. Heinonen, J. (2003) Geometric embeddings of metric spaces, Report 90, Department of Mathematics and Statistics, University of Jyväskylä. Helly, E. (1912) Über lineare Funkionaloperationen. S.-B. K. Akad. Wiss. Wien Math.Naturwiss. Kl. 121, 265–97. Holland, F. (2016) A leisurely elementary treatment of Stirling’s formula. Irish Math. Soc. Bull. 77, 35–43.

394

References

395

James, R. C. (1951) A non-reflexive Banach space isometric with its second conjugate space. Proc. Natl. Acad. Sci. U.S.A. 37, 174–17. James, R. C. (1964) Weak compactness and reflexivity. Israel J. Math. 2, 101–19. Jordan, P. and von Neumann, J. (1935) On inner products in linear metric spaces. Ann. Math. 36, 719–23. Körner, T. W. (1989) Fourier analysis. Cambridge University Press, Cambridge. Kreyszig, E. (1978) Introductory functional analysis with applications. John Wiley, New York. Lax, P. D. (2002) Functional analysis. John Wiley, New York. Megginson, R. E. (1998) An introduction to Banach space theory. Graduate Texts in Mathematics 183. Springer, New York. Meise, R. and Vogt, D. (1997) Introduction to functional analysis. Oxford University Press, Oxford. Munkres, J. R. (2000) Topology, second edition. Prentice Hall, Upper Saddle River, NJ. Naylor, A. W. and Sell, G. R. (1982) Linear operator theory in engineering and science. Springer Applied Mathematical Sciences 40. Springer, New York. Priestley, H. (1997) Introduction to integration. Oxford University Press, Oxford. Pryce, J. D. (1973) Basic methods of linear functional analysis. Hutchinson, London. Renardy, M. and Rogers, R. C. (1993) An introduction to partial differential equations. Texts in Applied Mathematics 13. Springer, New York. Rudin, W. (1966) Real and complex analysis. McGraw-Hill, New York. Rudin, W. (1991) Functional analysis. McGraw-Hill, New York. Rynne, B. P. and Youngson, M. A. (2008) Linear functional analysis. 2nd edn. Springer, London. Stein, E. M. and Shakarchi, R. (2005) Real analysis: measure theory, integration, and Hilbert spaces. Princeton University Press, Princeton, NJ. Sutherland, W. A. (1975) Introduction to metric and topological spaces. Oxford University Press, Oxford. Yosida, K. (1980) Functional analysis. Springer Classics in Mathematics. Springer, Berlin. Young, N. (1988) An introduction to Hilbert space. Cambridge University Press, Cambridge. Zeidler, E. (1995) Applied functional analysis. Springer Applied Mathematical Sciences 108. Springer, New York.

Index

adjoint operator, 159, see also Banach adjoint of a matrix transformation, 162 of compact operator is compact, 177 of shifts, 162 of unbounded operator, 265 relationship to Banach adjoint, 227 self-adjoint, see self-adjoint operator spectrum, 168 sums and products, 161 algebra (of continuous functions), 77 separates points, 79 almost everywhere, 94, 308 convergence of sequences converging in L p , 313 Apollonius’s identity, 108 approximate eigenvalues, 271 Arzelà–Ascoli Theorem compact version, 85 precompact version, 83 Axiom of Choice, 301 equivalent to Zorn’s Lemma, 302 B(X ), 139 B(X, Y ), 139 complete when Y is complete, 145 norm, 139 Baire Category Theorem meagre form, 242 residual form, 240 Banach adjoint, 222 inherits compactness, 227 isomorphisms and isometries, 223 relationship to Hilbert adjoint, 227 Banach Fixed Point Theorem, 63 Banach Isomorphism Theorem, 251 Banach limits, 224–226

Banach space, 54, see also completeness complexification, 171 Banach–Alaoglu Theorem, 293, 328 Banach–Steinhaus Theorem, see Principle of Uniform Boundedness basis for a topology, 320 Hamel, see Hamel basis orthonormal, see orthonormal basis Schauder, see Schauder basis basis constant, 252 Bernstein polynomials, 72 Bessel’s inequality, 115 bijective, 12 Bolzano–Weierstrass Theorem, 26 bounded linear operator, 138 complexification, 171 composition again bounded, 140 is continuous, 139 kernel is closed, 146 norm, 139 on a finite-dimensional space, 138 range need not be closed, 146 spectral radius, 172 bounded set, 21 c0 , 6 as normed space, 40 dual space is 1 , 206, 209 is complete, 58, 64 is not reflexive, 279 is separable, 50, 51 c00 , 6 dense in  p , 1 ≤ p < ∞, 50 C([a, b]), 5 definition of supremum norm, 41

396

Index

dual space, 201 is infinite-dimensional, 9 is separable, 75 not complete with L 1 norm, 89 Cb (X ; K), 41 is complete, 59 C(X ; K) is complete, 60 C() dense in L p (), 96 C 1 ([a, b]) is complete, 60 C k ([a, b]), 61 Cauchy sequence, 53 Cauchy–Schwarz inequality, 30 in an inner-product space, 103 in L 2 , 104 chain, 14, 301 characteristic function, 307 Clarkson’s inequalities, 135 clin, see closed linear span closed ball, 36 closed convex hull, 236 of compact set is compact, 239 Closed Graph Theorem, 255 closed linear span and linear functionals, 227 of an orthonormal set, 124 closed operator, 267 extension of symmetric operator, 268 closed set, 20 definition with sequences, 21 closed subspace, 48 inherits reflexivity, 279 closure, 24 compact, 26, 323 closed bounded subsets of K, 26 closed bounded subsets of Kn , 28 continuous function on a compact set, 28 equivalent to sequentially compact in a metric space, 324 Hilbert cube, 124 Hilbert–Schmidt operator, 176 image under a continuous map, 28 implies closed and bounded, 27 product of compact sets, 27 subsets of finite-dimensional spaces, 68 unit ball in finite dimensions, 69 compact operator, 173 adjoint is compact, 177 and weak convergence, 284 composition, 178 K (X, Y ), see K (X, Y ) spectrum, 263

397

spectrum contains 0, 178 when also self-adjoint eigenvalues tends to zero, 183 eigenvectors form a basis, 185 spectrum is closure of point spectrum, 186 comparable, 14, 301 completeness, 54 and absolutely convergent sequences, 62 of B(X, Y ), 145 of c0 , 58, 64 of C(X ; K), 60 of Cb (X ; K), 59 of C 1 ([a, b]), 60 of Fb (X ; K), 58 of K (X, Y ), 174 of Kn , 54 of  p , 57 of ∞ , 64 of L p , 96, 311 of closed subspaces of complete spaces, 55 of finite-dimensional spaces, 67 of products, 56 of quotient space, 64 preserved by equivalent norms, 55 preserved under isomorphisms, 55 completion of a metric space, 98 of a normed space, 91 unique up to isometric isomorphism, 93 complex-linear map, 214 complexification of a Banach space, 171 of a bounded linear operator, 171 of a compact self-adjoint operator, 189 of a Hilbert space, 171 Condensation of Singularities, 244 for a residual set, 247 conjugate indices, 202 conjugate-linear map, 10 continuity equivalent to sequential continuity, 22 in metric spaces, 22 in terms of a sub-basis, 322 in terms of open sets, 22 in topological spaces, 320 uniform, 29 continuous function, see also continuity on a compact set, 28 continuous spectrum, 270 Contraction Mapping Theorem, 63

398

convergence and inner products, 106 in metric spaces, 21 in normed spaces, 43, 64 in terms of open sets, 21 in topological spaces, 320 convex function, 37, 51 convex hull, 235 closed, see closed convex hull convex set, 36 and linear functionals, 235 closed implies weakly closed, 289 closest point in Hilbert space, 126 functional separation theorem, 230 countably additive, 306 cover, 26 density, 24 of C() in L p (), 313 diagonal argument, 84 dimension, 9 Dini’s Theorem, 86 direct sum, 129 discrete metric, 19 distance functional, 220 in Hilbert spaces, 226 distance to a closed set, 68 domain of an operator, 265 Dominated Convergence Theorem, 310 dual space, 153, 201 of c0 , 206, 209 of C([a, b]), 201 of 1 , 206 of  p , 204 of L p , 1 ≤ p < ∞, 207, 315 proof using reflexivity, 281 proof using Riesz Representation Theorem, 209 proof using uniform convexity, 317 of Rn , 154 separability passed from X ∗ to X , 221 e( j) (basis sequence in  p ), 9 eigenfunctions of Sturm–Liouville problem, 190 form a basis, 197 eigenspace, 166 eigenvalues, 165, 166 bounded by operator norm, 166 in finite dimensions, 165 linearly independent eigenvectors, 166

Index

min-max expression, 189 multiplicity, 166 of compact operator, 258–263 of compact self-adjoint operator, 182–183, 186 of self-adjoint operators are real, 181 of Sturm–Liouville problem are positive, 191 are simple, 193 real for a symmetric operator, 270 eigenvectors, 166 linearly independent for distinct eigenvalues, 166 of compact self-adjoint operator form a basis, 185 equicontinuous, 83 equivalent norms, 43 define same open sets, 45 on finite-dimensional spaces, 66 preserve compactness and convergence, 45 preserve completeness, 55 extreme point, 236 extreme set, 237 contains an extreme point, 238 F(U, V ), 4 Fb (X ; K) is complete, 58 factorial, 72 Fatou’s Lemma, 309 finite-dimensional spaces all are norms equivalent, 66 compact subsets, 68 have compact unit balls, 69 is complete, 67 linear operators are bounded, 138 Fourier series cosine series, 75, 81, 125 exponential series, 82, 120 non-convergence, 245–247 Parseval’s identity, 120 sine series, 76, 198 Fourier transform, 172 Friedrichs extension, 269 Fubini’s Theorem, 163, 310 functional separation theorem complex version, 232 real version, 230

Gram–Schmidt orthonormalisation, 114 Green’s function, 193

Index

399

Hahn–Banach Theorem complex version, 216 in normed spaces, 219 proof in a Hilbert space, 217 proof in a separable space, 218 real version, 211 Hamel basis, 8, 16 for any vector space, 14 non-existence for Banach space, 70, 247 Hellinger–Toeplitz Theorem, 256 Helly’s Theorem, 293 Hilbert adjoint, see adjoint operator Hilbert cube is compact, 124 Hilbert space, 107 closed subspace is Hilbert, 107 complexification, 171 is reflexive, 275 non-separable example, 123 products, 108 separable (H ≡ 2 ), 122 weak convergence, 282 Hilbert–Schmidt operator, 176 norm bounds operator norm, 178 Hilbert–Schmidt Theorem, 184 Hölder’s inequality, 202 homeomorphism, 31 hyperplanes, 233 and linear functionals, 233

invertibility, 147 equivalent conditions, 147 in finite dimensions, 148 is open in B(X, Y ), 148 of commuting operators, 150 of products, 149 isometric isomorphism, 46 Banach adjoint, 223 preserves reflexivity, 279 isomorphism, 46 Banach adjoint, 223 preserves completeness, 55

identity map, 11 image, see range induced norm, 104 infinite-dimensional, 9 initial segment, 301 injective, 11 iff surjective in finite dimensions, 12 inner product, 101 and convergence, 106 induces norm, see induced norm on Kn , 102 on 2 , 102 on L 2 , 103 inner product space, 101 complete, see Hilbert space interior, 23 inverse, 12 linear if map is linear, 12 of composition, 13 of invertible self-adjoint operator is self-adjoint, 164 Inverse Mapping Theorem, 251

L(X, Y ) and L(X ) are vector spaces, 11 1 , see also  p dual space is ∞ , 206 is not reflexive, 279 weak and strong convergence coincide, 286 2 inner product, 102 p, 5 as normed space, 40 dual space is q for 1 ≤ p < ∞, 204 Hölder’s inequality, 202 inclusions, 40 is complete, 57 is infinite-dimensional, 9 is reflexive for 1 < p < ∞, 276 is separable for 1 ≤ p < ∞, 49 Schauder basis, 111 weak convergence, 285 ∞ , see also  p dual is not 1 , 222 is complete, 64 is not reflexive, 279 is not separable, 49

J (mapping into second dual), 274 Jordan–von Neumann Theorem, 108 K = R or C, 3 Kn as a normed space, 38 inner product, 102 is complete, 54 standard metric, 18 K (X, Y ) is complete, 174 kernel of a bounded linear map is closed, 146 of a linear map, 11 trivial implies injective, 11 Krein–Milman Theorem, 238

400

L 2 , 95 Cauchy–Schwarz inequality, 104 inner product, 103 L p , 95, 311 C() dense in L p (), 313 dual space is L q , 207, 315 proof using reflexivity, 281 proof using Riesz Representation Theorem, 209 proof using uniform convexity, 317 Hölder’s inequality, 203 inclusions, 209 is complete, 96, 311 is reflexive for 1 < p < ∞, 277 is separable for 1 ≤ p < ∞, 313 is separable [L p (a, b)], 96 is uniformly convex, 135 norm by duality, 208 L ∞ , 96, 311 dual is not L 1 , 222 Lax–Milgram Lemma, 158 Lebesgue integral, 310 Lebesgue measure, 305 left shift, see shift operators Legendre polynomials, 136 linear functionals, 153 and convex sets, 235 bounded, see bounded linear functional linear map, see linear operator linear operator, 10 bounded, see bounded linear operator compact, see compact linear operator composition is linear, 11 continuous iff bounded, 139 has linear inverse, 12 linear span, 7 linearly independent, 7 Lipschitz function, 85 approximating continuous functions, 88 Lusin’s Theorem, 307 maximal element, 301 maximal linearly independent set, 8 Mazur’s Theorem, 289 meagre, 242 measurable function, 306 measurable set, 305 measure zero, 94 metric, 17 derived from a norm, 36 discrete, 19

Index

on subsets and products, 19 metric space, 17 bounded set, 21 continuity, 22 convergence, 21 open ball, 20 product, 19, 30 Milman–Pettis Theorem, 281 min-max expression for eigenvalues, 189 Minkowski functional, 228 Minkowski’s inequality, 208 in Kn , 39 in  p , 40 modulus of continuity, 86 Monotone Convergence Theorem, 209, 309 multiplicity of eigenvalues, 166 n

k , 72 Neumann series, 151 norm, 35 all equivalent on a finite-dimensional space, 66 equivalent, see equivalent norms gives rise to a metric, 36 induced by inner product, see induced norm L p , 42 of bounded linear operator, 139 on B(X, Y ), 139 alternative definitions, 150 on finite-dimensional space, 39 on X ∗ (dual space), 153 supremum, 41 normed space, 35 complete, see Banach space convergence, 42 nowhere dense, 241 null sequences, 6 numerical range, 180, 188, 226

one-to-one, see injective onto, see surjective open ball in metric spaces, 20 in normed spaces, 36 Open Mapping Theorem, 249 open set, 20 is union of open balls, 31 ordinary differential equation with continuous right-hand side, 88 with Lipschitz right-hand side, 64 orthogonal, 112

Index

orthogonal complement, 129 orthogonal projection, 129 orthogonal series convergence, 116 orthonormal basis, 117 criteria, 118 for any Hilbert space, 125 in finite-dimensional spaces, 115 orthonormal set, 112 via Gram–Schmidt process, 114 outer measure, 305 parallelogram law, 105 characterises inner products, 108 Parseval’s identity, 116, 118, 124 for Fourier series, 120 partial order, 13, 301 point spectrum, 166, 270 pointwise convergence, 51 and weak convergence, 288 polarisation identity, 106 polynomials, 5 dense in L p (a, b), 96 orthogonal in L 2 (−1, 1), 132 spectral mapping theorem, 170 precompact, 83 preimage, 22 Principle of Uniform Boundedness, 242 product metric, 19 of compact sets is compact, 27 of compact topological spaces is compact, 327 of complete spaces is complete, 56 of Hilbert spaces, 108 of invertible maps, 149 of metric spaces, 30 of vector spaces, 4 topology, 326 Pythagoras Theorem, 113 Q + iQ, 25 quotient space, 15 in Hilbert spaces, 109, 136 is a vector space, 15 is complete, 64 Radon–Nikodym Theorem, 315 range, 11 finite-dimensional implies compact, 173

401

of a bounded linear map need not be closed, 146 real-linear map, 214 reflexivity, 275 and weak sequential compactness, 294, 297 c0 not reflexive, 279 C([−1, 1]) not reflexive, 277 inherited by closed subspaces, 279 1 and ∞ not reflexive, 279 of Hilbert spaces, 275 of L p for 1 < p < ∞, 277 of  p for 1 < p < ∞, 276 preserved under isometric isomorphisms, 279 X reflexive iff X ∗ reflexive, 278 residual, 240 residual spectrum, 270 empty for self-adjoint operators, 271 resolvent set for bounded operators, 165 for maps between Banach spaces, 251 for unbounded operators, 270 Riesz map, 155 is isometric isomorphism when H is real, 202 Riesz Representation Theorem, 155 Riesz’s Lemma, 68 right shift, see shift operators Schauder basis, 110 characterisation of a basis, 256 for  p , 111 in Banach spaces, 252 Schur’s Theorem, 286 second dual, 273 canonical mapping J , 274 self-adjoint operator, 161 closed in B(H ), 164 for unbounded operators, 266 has real eigenvalues, 181 inverse is self-adjoint if bounded, 164 orthogonality of eigenfunctions, 181 residual spectrum is empty, 271 when also compact eigenvalues tend to zero, 183 eigenvectors form a basis, 185 spectrum is closure of point spectrum, 186 seminorm, 211, 217 separability, 25 equivalent characterisations, 48

402

Index

of c0 , 50, 51 of C([a, b]), 75 of  p , 49 of L p , 1 ≤ p < ∞, 313 of L p (a, b), 96 of products, 31 passes from X ∗ to X , 221 passes to subsets, 25 sequence Cauchy, see Cauchy sequence in Banach space absolute convergence implies convergence, 61 sequential continuity, 31 equivalent to continuity, 22 sequentially compact, 26, 323 equivalent to compact in a metric space, 324 shift operators, 141 adjoints, 162 eigenvalues and spectrum, 167–169 not invertible, 150 σ -algebra, 306 simple function, 306 Spectral Mapping Theorem, 170, 183 spectral radius, 172 spectrum in finite dimensions, 166 of adjoint operators, 168 of bounded operators, 166 compact subset of C, 167 of compact operators, 263 contains 0, 178 of maps between Banach spaces, 251 of unbounded operators, 270–272 ∗∗ (map into second dual), 273 Stone–Weierstrass Theorem complex version, 81 real version, 79 strictly convex Banach space, 135, 226 strong convergence, 282 Sturm–Liouville problem, 190 eigenfunctions form a basis, 197 eigenvalues are positive, 191 eigenvalues are simple, 193 integral formulation, 195 orthogonality of eigenfunctions, 191 sub-basis (for a topology), 321 subalgebra, see algebra subcover, 26 sublinear functional, 211 subsequence, 26

subspace, 6 finite-dimensional subspace is closed, 70 of normed space need not be closed, 50 support functional, 219 in Hilbert spaces, 226 supremum norm, 41 surjective, 12 iff injective in finite dimensions, 12 symmetric operator, 161 eigenvalues are real, 270 Tonelli’s Theorem, 310 topological space, 320 topology, 319 basis, 320 on product space, 326 sub-basis, 321 weak-∗, 322 totally bounded, 87 totally ordered, 301 triangle inequality, 17, 35 Tychonoff’s Theorem, 327 unbounded operator, 265 adjoint, 265 resolvent and spectrum, 269 self-adjoint, 266 spectrum is real and closed, 272 uniform continuity, 29 uniform convergence, 58 uniformly convex Banach space, 109 closest point in a convex set, 136 L p , 135 weak plus norm convergence implies strong, 296 upper bound, 14, 301 vector space, 3 always has a Hamel basis, 14, 16 dimension, 9 product, 4 subspace, 6 weak convergence, 282–283 and pointwise convergence, 288 becomes strong under action of a compact operator, 284 in a uniformly convex Banach space becomes strong with convergence of norms, 296 in Hilbert spaces, 282

Index

becomes strong with convergence of norms, 284 in 1 is strong, 286 in  p , 285 weak-∗ compactness in separable spaces, 293 weak-∗ convergence, 290 weak-∗ topology, 322 and weak-∗ convergence, 322 metrisable on B X ∗ , 329 weakly closed, 289 closed convex sets are weakly closed, 289 weakly sequentially compact, 294 if and only if reflexive, 297

unit ball in reflexive space, 294 Weierstrass Approximation Theorem, 71 well-ordered set, 302 Wronskian, 192 X ∗ = B(X ; K) (dual space), 153 x ∗∗ (map into second dual), 273 Young’s inequality, 202, 208 Zorn’s Lemma, 14, 125, 211, 238, 326 equivalent to Axiom of Choice, 302

403