The aim of this book is to provide a concise but complete introduction to the main mathematical tools of nonlinear funct

*343*
*37*
*4MB*

*English*
*Pages 622*
*Year 2018*

- Author / Uploaded
- Nikolaos S. Papageorgiou
- Patrick Winkert

*Table of contents : PrefaceContents1. basic topology2. measure theory3. basic functional analysis4. banach spaces of functions and measures5. convex functions – nonsmooth analysis6. nonlinear analysisBibliographyIndexList of symbols*

Nikolaos S. Papageorgiou and Patrick Winkert Applied Nonlinear Functional Analysis

Also of Interest Functional Analysis. A Terse Introduction Gerardo Chacón, Humberto Rafeiro, Juan Camilo Vallejo, 2016 ISBN 978-3-11-044191-8, e-ISBN (PDF) 978-3-11-044192-5, e-ISBN (EPUB) 978-3-11-043364-7

Convex and Set-Valued Analysis. Selected Topics Aram V. Arutyunov, Valeri Obukhovskii, 2016 ISBN 978-3-11-046028-5, e-ISBN (PDF) 978-3-11-046030-8, e-ISBN (EPUB) 978-3-11-046041-4

Complex Analysis. A Functional Analytic Approach Friedrich Haslinger, 2017 ISBN 978-3-11-041723-4, e-ISBN (PDF) 978-3-11-041724-1, e-ISBN (EPUB) 978-3-11-042615-1

Singular Solutions of Nonlinear Elliptic and Parabolic Equations Alexander A. Kovalevsky, Igor I. Skrypnik, Andrey E. Shishkov, 2016 ISBN 978-3-11-031548-6, e-ISBN (PDF) 978-3-11-033224-7, e-ISBN (EPUB) 978-3-11-039008-7

The d-bar Neumann Problem and Schrödinger Operators Friedrich Haslinger, 2014 ISBN 978-3-11-031530-1, e-ISBN (PDF) 978-3-11-031535-6, e-ISBN (EPUB) 978-3-11-037783-5

Nikolaos S. Papageorgiou and Patrick Winkert

Applied Nonlinear Functional Analysis | An Introduction

Mathematics Subject Classification 2010 26-XX, 28-XX, 46-XX, 47-XX, 49-XX Authors Prof. Dr. Nikolaos S. Papageorgiou National Technical University of Athens Department of Mathematics Zografou Campus 15780 Athens Greece [email protected] Dr. Patrick Winkert Technische Universität Berlin Institut für Mathematik Straße des 17. Juni 136 10623 Berlin Germany [email protected]

ISBN 978-3-11-051622-7 e-ISBN (PDF) 978-3-11-053298-2 e-ISBN (EPUB) 978-3-11-053183-1 Library of Congress Control Number: 2018939852 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2018 Walter de Gruyter GmbH, Berlin/Boston Cover image: Nikolaos S. Papageorgiou, Patrick Winkert Typesetting: le-tex publishing services GmbH, Leipzig Printing and binding: CPI books GmbH, Leck www.degruyter.com

| This book is dedicated in memory of the first author’s mother M. S. Papageorgiou and in memory of the second author’s father Wolfgang Winkert who both passed away during its preparation.

Preface The aim of this book is to present the foundations of modern Nonlinear Functional Analysis and equip the reader with all the necessary tools to continue with theoretical and/or applied research in the field. Nonlinear Functional Analysis is a very broad subject and has applications in many different areas of physics, mechanics, engineering, and economics. In fact, it emerged as a distinct discipline within mathematical analysis specifically as a way to address these needs in a mathematically rigorous way. This way Nonlinear Functional Analysis distinguished itself from the classical Linear Functional Analysis and acquired an interdisciplinary character. The present book provides a starting point to follow some of the main paths of Nonlinear Functional Analysis, especially those leading to applications. The goal is to present the theories and techniques to the newcomer, which will allow him/her to proceed to more specialized topics. The first three chapters present the main elements of topology, measure theory, and Banach space theory, which are needed to proceed further. In the last three chapters we present more advanced and specialized topics that are motivated by the applications. In Chapter 4 we examine certain spaces of functions and measures that provide the functional framework in the applied problems. We deal with Lebesgue, Lebesgue-Bochner, and Sobolev spaces, which are the basic tools in the study of boundary valued problems. We also study spaces of absolutely continuous functions, of functions of bounded variation, and of measures that eventually lead to Young measures. All these constitute the modern tools in dealing with problems of the calculus of variations, control theory and optimization, as well as mathematical economics. In Chapter 5 we deal with nonsmooth and multivalued analysis, two fields of mathematical analysis that emerged simultaneously in the early 1960’s and developed in parallel, feeding each other with new notions and methods. As a result, we deal with convex functions and their duality and subdifferential theory. We also examine the approximation properties of sets and extend the subdifferential theory to the nonconvex one in terms of locally Lipschitz functions in the sense of Clarke. Furthermore, we present the main topological and measure theoretic aspects of set-valued maps with applications to integral functionals. In Chapter 6 we finally study topics that are traditionally associated with what is called “Nonlinear Analysis.” These are operators of monotone type, degree theory, fixed point theory, variational principles such as Ekeland’s Variational Principle, and variational convergence such as Γ- or epigraphical convergence. With this choice of material, we believe that the reader will be properly equipped at the end to do research in this exciting field of mathematical analysis. Each chapter is followed by at least 50 problems. We encourage the reader to try them in order to test his/her understanding of the material. The solutions to the problems will be posted on the personal site of the second author. Our hope is that the reader, with the help of the material in this book, can proceed with confidence in the many different parts of this field. https://doi.org/10.1515/9783110532982-201

VIII | Preface Finally the authors wish to thank Dr. Apostolos Damialis, Maria Dassing, and Nadja Schedensack of De Gruyter for their kind support and help during the preparation of this book. Nikolaos S. Papageorgiou, Athens, Greece Patrick Winkert, Berlin, Germany January 2018

Contents Preface | VII 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

Basic Topology | 1 Basic Notions | 1 Separation and Countability Properties – Convergence | 9 Weak, Product, and Quotient Topologies | 19 Connectedness and Compactness | 25 Metric Spaces – Baire Category | 44 Function Spaces | 59 Semicontinuous Functions – Miscellaneous Notions | 65 Remarks | 73

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8

Measure Theory | 83 Basic Notions, Measures, and Outer Measures | 83 Measurable Functions – Integration | 98 Convergence Theorems and L p -Spaces | 110 Signed Measures and Radon–Nikodym Theorem | 127 Regular and Radon Measures | 137 Analytic (Souslin) Sets | 147 Selection and Projection Theorems | 156 Remarks | 167

3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

Basic Functional Analysis | 179 Topological Vector Spaces, Hahn–Banach Theorem | 179 Three Fundamental Theorems | 197 Weak and Weak* Topologies | 207 Separable and Reflexive Banach Spaces | 216 Hilbert Spaces | 226 Bounded and Unbounded Linear Operators | 238 Compact Operators – Fredholm Operators | 252 Remarks | 268

4 4.1 4.2 4.3 4.4 4.5 4.6

Banach Spaces of Functions and Measures | 281 L p -Spaces | 281 Lebesgue–Bochner Spaces | 300 Functions of Bounded Variations | 322 Absolutely Continuous Functions | 340 Sobolev Spaces | 351 Spaces of Measures | 362

X | Contents

4.7 4.8

Young Measures | 375 Remarks | 383

5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

Convex Functions – Nonsmooth Analysis | 393 Convex Functions – Continuity Properties | 393 Differentiability of Convex Functions | 402 Conjugate Functions – Convex Subdifferential | 412 Proximinal and Chebyshev Sets | 427 Smoothness of the Norm | 431 Multifunctions – Integral Functionals | 436 Lipschitz and Locally Lipschitz Functions | 460 Remarks | 473

6 6.1 6.2 6.3 6.4 6.4.1 6.4.2 6.5 6.6 6.7 6.8

Nonlinear Analysis | 483 Operators of Monotone Type | 483 Brouwer Degree | 511 Leray–Schauder Degree | 528 Fixed Point Theory | 536 Metric Fixed Point Theory | 537 Topological Fixed Point Theory | 547 Fixed Point Index | 553 Variational Principles | 561 Variational Convergence | 570 Remarks | 579

Bibliography | 587 Index | 599 List of Symbols | 609

1 Basic Topology Topology, as its name suggests¹, deals with geometric properties of objects that depend only on their relative positions and not on notions such as size or magnitude. The properties studied by topology are preserved by certain continuous transformations. Discontinuous transformations destroy topological properties. In this chapter we present the basic items of point-set topology that are needed to examine certain topics of applied analysis. We do not claim to have an exhaustive presentation of the subject.

1.1 Basic Notions We start with the definition of topology. Definition 1.1.1. Let X be a set and let τ ⊆ 2X be such that the following hold: (a) X and 0 both belong to τ; (b) τ is closed under arbitrary unions, that is, if {U i }i∈I ⊆ τ is any family of sets in τ, then ⋃i∈I U i ∈ τ; (c) τ is closed under finite intersections, that is, if {U i }i∈I ⊆ τ is a finite family of sets in τ, then ⋂i∈I U i ∈ τ. Then we say that τ is a topology on X. The sets in τ are called open sets. The complements of the elements of τ are called closed sets. In addition we say that the pair (X, τ) is a topological space. Remark 1.1.2. When the topology τ is clearly understood from the context, then we drop it and simply say that X is a topological space. From the definition above it is clear that the family of closed sets contains X and 0 and it is closed under finite unions and arbitrary intersections. If X is a set with two topologies τ1 and τ2 such that τ1 ⊆ τ2 , then we say that τ1 is weaker than τ2 or that τ2 is stronger than τ1 . The intersection of any family of topologies on X is also a topology that is weaker than every member of the family but stronger than any other topology with this property. Note that for any set X there is a strongest topology on X, namely τ = 2X known as the discrete topology. Moreover, there also exists a weakest topology on X, namely τ = {X, 0} known as the trivial topology. In general, a topology is a very large collection of subsets. So it is useful to have a smaller collection of elements of τ, which generates the topology by taking unions. Definition 1.1.3. Let (X, τ) be a topological space. A basis (or base) for the topology τ is a subfamily B of τ such that every member of τ is the union of elements in B. The 1 it comes from the Greek word τóπoς = location or position https://doi.org/10.1515/9783110532982-001

2 | 1 Basic Topology elements of B are called basic open sets and τ is the topology generated by B. A subfamily L of τ is a subbasis of the topology τ if the family of finite intersections of elements in L is a basis for τ. The elements of L are called subbasic open sets. In the definition above, we have assumed a topology on X and defined a basis for it. On the other hand, one might start with a basis and using it, generates a topology on X by taking unions. However, not every family in 2X is a basis for a topology. The next proposition gives necessary and sufficient conditions for a family to generate a topology. Proposition 1.1.4. A family B ⊆ 2X is a basis for a topology on X if and only if (a) ⋃ B = X, that is, the union of the elements of B is X; (b) if B1 , B2 ∈ B and x ∈ B1 ∩ B2 , then there exists B ∈ B such that x ∈ B ⊆ B1 ∩ B2 . Proof. ⇒: The assertion in (a) follows from the fact that X is open; see Definition 1.1.3. Let us prove (b). We know that B1 ∩ B2 is open. So, according to Definition 1.1.3, B1 ∩ B2 is the union of elements in B. Hence we can find B ∈ B such that x ∈ B ⊆ B1 ∩ B2 . ⇐: Let τ be all unions of elements of B. We need to show that τ is a topology on X; see Definition 1.1.1. Evidently 0 ∈ τ and X ∈ τ; see (a). In addition, from its definition, τ is closed under arbitrary unions. We have to show that τ is closed under finite intersections. So, let U1 , U2 ∈ τ. Then U1 ∩ U2 ∈ τ. Given x ∈ U1 ∩ U2 , there exist B1 , B2 ∈ B such that x ∈ B1 ⊆ U1 and x ∈ B2 ⊆ U2 . Therefore, x ∈ B1 ∩ B2 ⊆ U1 ∩ U2 . By (b) there is B(x) ∈ B such that x ∈ B(x) ⊆ U1 ∩ U2 . Obviously, U1 ∩ U2 = ⋃x B x ∈ τ. Thus τ is a topology on X. Remark 1.1.5. We say that τ is the topology generated by B and we often write τ(B) to emphasize the basis generating the topology. Corollary 1.1.6. If (X, τ) is a topological space and B is a subfamily of τ such that for each U ∈ τ and x ∈ U, we can find V ∈ B such that x ∈ V ⊆ U, then B is a basis for the topology τ. Proposition 1.1.7. If (X, τ) is a topological space and B is a basis for τ, then U ∈ τ, that is, U is open, if and only if for every x ∈ U there exists V x ∈ B such that x ∈ V x ⊆ U. Proof. ⇒: This follows from (b) of Proposition 1.1.4. ⇐: We have U = ⋃x V x ∈ τ. Definition 1.1.8. Two bases B and B of X are said to be equivalent if τ(B) = τ(B ). Directly from Propositions 1.1.4 and 1.1.7 we have the following characterization of equivalent topological bases. Proposition 1.1.9. Two bases B and B in X are equivalent if and only if (a) for every B ∈ B and x ∈ B, there exists B ∈ B such that x ∈ B ⊆ B; (b) for every B ∈ B and x ∈ B , there exists B ∈ B such that x ∈ B ⊆ B .

1.1 Basic Notions

| 3

Example 1.1.10. In ℝN with N ∈ ℕ, let B = {B r (x) : x ∈ ℝN , r > 0} with B r (x) = {u ∈ ℝN : |u − x| < r}. Then B is a basis for the so-called Euclidean topology (or standard topology) on ℝN . So, every open set in ℝN is the union of open balls. More generally this is also true for every metric space. There is a local version of the notion of topological basis. Definition 1.1.11. Let (X, τ) be a topological space and x ∈ X. We say that B(x) ⊆ τ is a local basis (or a local base) at x if the following hold: (a) x ∈ V for every V ∈ B(x); (b) if x ∈ U ∈ τ, then there exists V ∈ B(x) such that x ∈ V ⊆ U. Definition 1.1.12. Let (X, τ) be a topological space and A ⊆ X. (a) A neighborhood of x ∈ X is any open set U such that x ∈ U. (b) We say that x ∈ A is an interior point of A if we can find U ∈ τ such that x ∈ U ⊆ A. ∘

The interior of A, denoted by int A (or by A), is the set of all interior points of A. (c) We say that x ∈ X is a cluster point (or a limit point or an accumulation point) of A if every open set containing x contains a point of A distinct from x. The set of all cluster points of A is called the derived set of A and is denoted by A . The closure of A, denoted by A (or cl A), is the union of A with its set of cluster points, that is, A = A ∪ A . (d) We say that x ∈ X is a boundary point of A if x ∈ A ∩ (X \ A). The set of boundary points of A is called the boundary of A and is denoted by bd A (or by ∂A). Remark 1.1.13. Note that a cluster point or a boundary point of A need not belong to A. In the sequel we denote by N(x) the family of all neighborhoods of x ∈ X. Proposition 1.1.14. If (X, τ) is a topological space and A, C ⊆ X, then the following hold: (a) int A = ⋃{U ∈ τ : U ⊆ A}, that is, int A is the largest open set contained in A; (b) A is open if and only if A = int A; (c) A ⊆ C implies int A ⊆ int C; (d) int(A ∩ C) = int A ∩ int C. Proof. (a) Let Ã = ⋃{U ∈ τ : U ⊆ A}. Then Ã is open and by Definition 1.1.12(b) it is clear that int A ⊆ A.̃ On the other hand, if x ∈ A,̃ then there is U ∈ τ, U ⊆ A such that x ∈ U. Hence, x is an interior point of A, therefore Ã ⊆ int A. We conclude that Ã = int A. (b) This is an immediate consequence of (a). (c) We have int A ⊆ A ⊆ C and since int A is open, it follows that int A ⊆ int C, see part (a). (d) We have A ∩ C ⊆ A and A ∩ C ⊆ C. Then int(A ∩ C) ⊆ int A and int(A ∩ C) ⊆ int C because of part (c). This gives int(A ∩ C) ⊆ int A ∩ int C .

(1.1.1)

4 | 1 Basic Topology On the other hand, int A ∩ int C is an open subset of A ∩ C. Hence, because of (a), int A ∩ int C ⊆ int(A ∩ C) .

(1.1.2)

From (1.1.1) and (1.1.2) we conclude that int A ∩ int C = int(A ∩ C). Remark 1.1.15. In general it is not true that int(A ∪ C) = int A ∪ int C. Indeed let X = ℝ with the Euclidean topology, see Example 1.1.10, and let A = [0, 1] and C = [1, 2]. Then int A = (0, 1) ,

int C = (1, 2) and

int(A ∪ C) = (0, 2) .

In general we can easily show that if {A i }i∈I is an arbitrary family of subsets of X, then ⋃ int A i ⊆ int ⋃ A i . i∈I

i∈I

There is an analogous proposition for the closure. Proposition 1.1.16. If (X, τ) is a topological space and A, C ⊆ X, then the following hold: (a) A = ⋂{D : D closed, D ⊇ A}, that is, A is the smallest closed set containing A; (b) A is closed if and only if A = A; (c) A ⊆ C implies A ⊆ C; (d) A ∪ C = A ∪ C. Proof. (a) Let A∗ = ⋂{D : D closed, D ⊇ A}. Evidently, A∗ is closed and so X\A∗ is open. Hence, if x ∈ ̸ A∗ , then we find U ∈ N(x) such that U ∩ A = 0. Therefore, x ∈ ̸ (A ∪ A ) = A and so A ⊆ A∗ . Now suppose that x ∈ A∗ \ A. Then there exists U ∈ N(x) such that U ∩ A = 0. Let C = X \ U. Then C is closed and C ⊇ A. Hence A∗ ⊆ C and so x ∈ C, a contradiction. Therefore A = A∗ . (b) This is an immediate consequence of (a). (c) We have A ⊆ C ⊆ C and since C is closed, it follows that A ⊆ C, see part (a). (d) Note that A ∪ C is closed and contains A ∪ C. Hence A∪C⊆A∪C.

(1.1.3)

Since A, C ⊆ A ∪ C, we have A, C ⊆ A ∪ C, see part (c). Hence A∪C⊆A∪C.

(1.1.4)

From (1.1.3) and (1.1.4) we conclude that A ∪ C = A ∪ C. Remark 1.1.17. In general it is not true that A ∩ C = A ∩ C. To see this, let X = ℝ with the Euclidean topology and let A = (0, 1) as well as C = (1, 2). Then A ∩ C = 0 and A ∩ C = [0, 1] ∩ [1, 2] = {1}. In general we can easily show that if {A i }i∈I is an arbitrary family of subsets of X, then ⋂ Ai ⊆ ⋂ Ai . i∈I

i∈I

1.1 Basic Notions | 5

– –

In addition, the following formulas are easy to verify: x ∈ A if and only if x ∈ (A \ {x}); (A ∪ C) = A ∪ C , A \ C ⊆ (A \ C) , A ⊆ A ;

–

(⋂ A i ) ⊆ ⋂ Ai with an arbitrary index set I;

i∈I

i∈I

–

⋃ Ai ⊆ (⋃ A i ) with an arbitrary index set I; i∈I

i∈I

– A = A ; – A ⊆ C implies A ⊆ C ; – (A \ {x}) = A = (A ∪ {x}) . The last formula means that the derived set remains unchanged if we add or remove a finite number of elements. If x ∈ A \ A , then we say that x is isolated. Proposition 1.1.18. If (X, τ) is a topological space and A ⊆ X, then the following hold: (a) bd A = A ∩ (X \ A) = bd(X \ A); (b) bd A, int A, int(X \ A) are pairwise disjoint sets whose union is X; (c) bd A is a closed set; (d) A = int A ∪ bd A; (e) A is open if and only if bd A ⊆ X \ A; (f) A is closed if and only if bd A ⊆ A; (g) A is closed and open (usually called clopen) if and only if bd A = 0. Proof. (a)–(d) These are immediate consequences of Definition 1.1.12. (e) ⇒: Since A is open we have A = int A due to Proposition 1.1.14(b). From part (b) we know that int A and bd A are disjoint sets. Therefore bd A ⊆ X \ A. ⇐: Since bd A ⊆ X \ A, no point of A is a boundary point. Hence, every point of A is an interior point, see part (d). Therefore, A = int A, that is, A is open. (f) This follows from (e) by taking complements. (g) Combine (e) and (f). Definition 1.1.19. A subset A of a topological space X is said to be dense if A = X. We say that the topological space X is separable if it has a countable, dense subset. Remark 1.1.20. It is easy to see that A is dense in the topological space (X, τ) if and only if for every U ∈ τ, U ≠ 0 we have U ∩ A ≠ 0. Clearly ℝN is separable since we can take the set of vectors with rational coordinates as a countable, dense set. Definition 1.1.21. A subset A of a topological space X is said to be nowhere dense if int A = 0. Remark 1.1.22. From the definition above we see that A ⊆ X is nowhere dense if and only if X \ A is dense in X. It follows that A ⊆ X is nowhere dense if and only if X \ (X \ A) = 0 or that A is nowhere dense if and only if A ⊆ (X \ A). Any set A that

6 | 1 Basic Topology contains a dense set is itself dense. Similarly, any subset of a nowhere dense set is nowhere dense. The closure of a nowhere dense set is nowhere dense. Proposition 1.1.23. If X is a topological space and A ⊆ X is open or closed, then bd A is nowhere dense. Proof. Suppose that A is open. Then bd A = A \ A, see Proposition 1.1.18(d). Hence, int bd A = int(A \ A) = 0, which shows that bd A is nowhere dense. Similarly, if A is closed, then bd A = A ∩ (X \ A), see Definition 1.1.12(d). Therefore, by Proposition 1.1.14(d), int bd A = int A ∩ int (X \ A). Hence, int bd A = 0 and so bd A is nowhere dense in X. Definition 1.1.24. Let (X, τ) be a topological space and A ⊆ X. The subspace or relative topology on A is the family τ(A) = {U ∩ A : U ∈ τ} . It is also called the trace of τ on A. It is easy to see that τ(A) is a topology on A. Proposition 1.1.25. If (X, τ) is a topological space, B is a basis for the topology τ and A ⊆ X, then B(A) = {U ∩ A : U ∈ B} is a basis for τ(A). Proof. Let U ∈ τ and u ∈ U ∩ A. We can find V ∈ B such that u ∈ V ⊆ U. Then u ∈ V ∩ A ⊆ U ∩ A. This implies that B(A) is a basis for τ(A); see Corollary 1.1.6. Proposition 1.1.26. If (X, τ) is a topological space, A ∈ τ and V ∈ τ(A), then V ∈ τ. Proof. Since V ∈ τ(A) we have V = U ∩ A with U ∈ τ. But U ∩ A ∈ τ since A ∈ τ. Proposition 1.1.27. If (X, τ) is a topological space and A ⊆ X, then D ⊆ A is τ(A)-closed if and only if D = C ∩ A with closed C ⊆ X. Proof. ⇒: Since D ⊆ A is τ(A)-closed, that is, relatively closed, we have A \ D = U ∩ A with U ∈ τ. Then D = A \ (A \ D) = A \ (U ∩ A) = (X \ U) ∩ A = C ∩ A with closed C = X \ U. ⇐: Let U = X \ C. Then U ∈ τ and we have A \ D = A \ (C ∩ A) = (X \ C) ∩ A = U ∩ A , which implies that A \ D is τ(A)-open and so D is τ(A)-closed. As a consequence of Proposition 1.1.26 we have the following observation concerning neighborhoods of a point x ∈ A. Corollary 1.1.28. If (X, τ) is a topological space, A ⊆ X, x ∈ A and V ⊆ A, then V ∈ NA (x), where NA (x) denotes the τ(A)-neighborhoods of x, if and only if V = U ∩ A with U ∈ N(x). This discussion on relativization of topologies leads naturally to the following notion, which will be used in the sequel.

1.1 Basic Notions

| 7

Definition 1.1.29. A property of topological spaces is said to be hereditary if every subset with the relative (subspace) topology exhibits this property. The notion of continuity is central in point-set topology. It is the main tool that allows us to determine which mathematical properties are intrinsic to a particular topological space. Definition 1.1.30. Let X, Y be topological spaces. We say that a map f : X → Y is continuous at x ∈ X if for every U ∈ N(f(x)) we can find V ∈ N(x) such that f(V) ⊆ U. We say that f : X → Y is continuous if it is continuous at every x ∈ X. Remark 1.1.31. From the last definition it is clear that continuity is a local property. The next proposition provides a useful global characterization of continuity. Proposition 1.1.32. If (X, τ X ) and (Y, τ Y ) are two topological spaces and f : X → Y, then f is continuous if and only if f −1 (τ Y ) ⊆ τ X , that is, f returns open sets in Y to open sets in X. Proof. ⇒: Let U ∈ τ Y . Then U is a neighborhood of each of its points. So, f −1 (U) contains a neighborhood of everyone of its points. Hence f −1 (U) ∈ τ X . ⇐: This is immediate from Definition 1.1.30. Remark 1.1.33. Since f −1 preserves all set theoretic operations, in the proposition above we may replace τ Y by a basis BY or even better by a subbasis LY . We have a counterpart of Proposition 1.1.32 with closed sets instead of open sets. Proposition 1.1.34. If X and Y are topological spaces and f : X → Y, then f is continuous if and only if for every closed C ⊆ Y, f −1 (C) is closed in X. Proposition 1.1.35. If X and Y are topological spaces and f : X → Y, then the following statements are equivalent. (a) f is continuous; (b) f(A) ⊆ f(A) for every A ⊆ X; (c) f −1 (C) ⊆ f −1 (C) for every C ⊆ Y. Proof. (a) ⇒ (b): Let A ⊆ X and x ∈ A. Consider U ∈ N(f(x)) and choose V ∈ N(x) such that f(V) ⊆ U, see Definition 1.1.30. We have x∈A

⇒

V ∩ A ≠ 0

⇒

⇒

f(V) ∩ f(A) ≠ 0

f(V ∩ A) ≠ 0 ⇒

U ∩ f(A) ≠ 0 .

Since U ∈ N(f(x)) is arbitrary it follows that x ∈ f(A). Hence f(A) ⊆ f(A). (b) ⇒ (c): Let A = f −1 (C). Then by hypothesis f(A) ⊆ f(A) = f(f −1 (C)) ⊆ C and so A = f −1 (C) ⊆ f −1 (C). (c) ⇒ (a): Let C ⊆ Y be closed. Then by hypothesis f −1 (C) ⊆ f −1 (C) and so −1 f (C) = f −1 (C), that is, f −1 (C) is closed. From Proposition 1.1.34 it follows that f is continuous.

8 | 1 Basic Topology

Proposition 1.1.36. Let X, Y and Z be topological spaces. (a) If f : X → Y and g : Y → Z are continuous maps, then g ∘ f : X → Z is continuous. (b) If f : X → Y is a continuous map and A ⊆ X, then f A : A → Y is continuous for the subspace topology of A. (c) If X = ⋃i∈I U i with U i open and f : X → Y is a map such that f U i is continuous, then f : X → Y is continuous. Proof. (a) If U is open in Z, then g−1 (U) is open in Y and f −1 (g −1 (U)) is open in X, see Proposition 1.1.32. But recall that f −1 (g−1 (U)) = (g ∘ f)−1 (U). So, by Proposition 1.1.32, g ∘ f is continuous. (b) Let i : A → X be the inclusion map where A is endowed with the subspace topology. Evidently i is continuous and since f A = f ∘ i we derive the conclusion using part (a). −1 (c) Let V ⊆ Y be open. Then f −1 (V) ∩ U i = (f U i ) (V) is open in X for all i ∈ I. Therefore f −1 (V) = ⋃i∈I f −1 (V) ∩ U i is open in X. Taking Proposition 1.1.32 into account yields the continuity of f . Continuing in the same way, we prove the so-called “Pasting Lemma.” Proposition 1.1.37 (Pasting Lemma). If X and Y are topological spaces, X = A ∪ B with closed subsets A and B of X, f : A → Y and g : B → Y are continuous maps where A and B are endowed with the subspace topology and f(x) = g(x) for all x ∈ A ∩ B. Then h : X → Y defined by {f(x) if x ∈ A h(x) = { , g(x) if x ∈ B { is continuous. Proof. Let C be a closed subset of Y. Then h−1 (C) = f −1 (C) ∪ g −1 (C) .

(1.1.5)

By hypothesis f −1 (C) is closed in A and since A is closed in Y, from Proposition 1.1.27, we have that f −1 (C) is closed in X. Similarly g −1 (C) is closed in X. From (1.1.5) it follows that h−1 (C) is closed in X. Hence, by Proposition 1.1.34, h is continuous. In general the direct image of an open (resp. closed) set by a map need not be open (resp. closed) even if the map is continuous. For this reason we introduce the following definition. Definition 1.1.38. Let X and Y be two topological spaces. We say that a map f : X → Y is open (respectively, closed) if the image of every open (respectively, closed) set in X is open (respectively, closed) in Y. Remark 1.1.39. It is easy to see that the notions of continuous map, open map, and closed map are independent.

1.2 Separation and Countability Properties – Convergence | 9

Proposition 1.1.40. Let (X, τ X ) and (Y, τ Y ) be topological spaces and f : X → Y, then the following statements are equivalent: (a) f is open; (b) f(int A) ⊆ int f(A) for every A ⊆ X; (c) if BX is a basis for τ X , then f(BX ) ⊆ τ Y . Proof. (a) ⇒ (b): We have f(int A) ⊆ f(A) and by hypothesis f(int A) is open. By Proposition 1.1.14(a) it follows that f(int A) ⊆ int f(A). (b) ⇒ (c): Let V ∈ BX . Then by hypothesis f(V) = f(int V) ⊆ int f(V). Hence, f(V) = int f(V), that is, f(V) ∈ τ Y . (c) ⇒ (a): Let V ⊆ X be open. Then V = ⋃i∈I V i with V i ∈ BX . We have f(V) = f (⋃ V i ) = ⋃ f(V i ) ∈ τ Y . i∈I

i∈I

Therefore, f is open. Next we identify a subfamily of continuous functions that is in the core of point-set topology. Definition 1.1.41. Let X and Y be two topological spaces and f : X → Y is a bijection. We say that f is a homeomorphism if both f and f −1 are continuous. Then we say that the spaces X and Y are homeomorphic. Instead of homeomorphism we also say that f is bicontinuous. As an easy consequence of this definition and of Proposition 1.1.40 we have the following proposition. Proposition 1.1.42. Let X and Y be topological spaces and let f : X → Y be a bijection, then the following statements are equivalent: (a) f is a homeomorphism; (b) f is continuous and open; (c) f is continuous and closed; (d) f(A) = f(A) for every A ⊆ X. Remark 1.1.43. Given a homeomorphism f : X → Y, U ⊆ X is open if and only if f(U) ⊆ Y is open. Thus a homeomorphism gives a bijection between the topologies of X and Y. Hence, any property of X that is expressed using only the topology of X, yields the same property on Y. Such a property of X is said to be a topological property of X.

1.2 Separation and Countability Properties – Convergence The so-called separation properties determine how rich the supply is of open sets in a given topological space. This is important because the supply of open sets determines the supply of continuous functions. We need to have a rich enough supply of continuous functions in order to produce interesting results.

10 | 1 Basic Topology We start with a notion, which for analysis, is the minimal requirement for a topological space. Definition 1.2.1. A topological space X is said to be Hausdorff (or T2 -space) if for every pair x, u ∈ X we can find U ∈ N(x) and V ∈ N(u) such that U ∩ V = 0. Since our aim is to use topology to investigate problems in analysis, from now on all topological spaces considered are Hausdorff. Let us give an example of a space that is important in algebraic geometry and that is not Hausdorff. Example 1.2.2. Let n ∈ ℕ and let P denote the set of all polynomials in n variables {x1 , . . . , x n }. Given p ∈ P, let Z(p) = {(x1 , . . . , x n ) ∈ ℝn : p(x1 , . . . , x n ) = 0} . Let B be the family of all complements of the set Z(p) with p ∈ P. One can show that B is a basis for a topology of ℝn . This topology is called the “Zariski topology” on ℝn and it turns out that it is not Hausdorff. Proposition 1.2.3. The Hausdorff property is hereditary and topological. Proof. Let (X, τ) be the topological space and A ⊆ X endowed with the subspace topology τ(A). Consider two distinct points x, u ∈ A. We can find U, V ∈ τ with x ∈ U and u ∈ V such that U ∩ V = 0. Then U ∩ A ∈ τ(A), V ∩ A ∈ τ(A) and (U ∩ A)∩(V ∩ A) = 0. Hence, (A, τ(A)) is Hausdorff. Let X be a Hausdorff topological space, Y a topological space, and f : X → Y a homeomorphism. If y, v ∈ Y are distinct points, then f −1 (y), f −1 (v) ∈ X are distinct as well. Since X is Hausdorff we can find U, V ∈ τ such that f −1 (y) ∈ U, f −1 (v) ∈ V and U ∩ V = 0. This implies that y ∈ f(U), v ∈ f(V) are both open sets in Y and f(U)∩ f(V) = 0. Therefore, Y is Hausdorff as well. Proposition 1.2.4. If X is a Hausdorff topological space and A ⊆ X is finite, then A is closed. Proof. It suffices to show that every singleton {x} is closed. So let u ∈ X with u ≠ x. Then we can find U ∈ N(x) and V ∈ N(u) such that U ∩ V = 0. This means that x ∈ ̸ {u}. Therefore {x} = {x} and so every singleton {x} is closed. Proposition 1.2.5. If X is a Hausdorff topological space and A ⊆ X, then x ∈ A , that is, x is a cluster point of A, if and only if every U ∈ N(x) contains infinitely many points of A. Proof. ⇒: Arguing by contradiction, suppose that we can find U ∈ N(x) such that U ∩ A is a finite set. Then U ∩ (A \ {x}) is finite. Let U ∩ (A \ {x}) = {x k }nk=1 . From Proposition 1.2.4 we know that {x k }nk=1 is a closed subset of X. Hence X \ {x k }nk=1 is open. Then V = U ∩ (X \ {x k }nk=1 ) ∈ N(x) and V ∩ A = 0, a contradiction to the fact that x ∈ A .

1.2 Separation and Countability Properties – Convergence | 11

⇐: By hypothesis, every U ∈ N(x) intersects A at infinitely many points. Then according to Definition 1.1.12(c), we have x ∈ A . Proposition 1.2.6. For a topological space X the following statements are equivalent: (a) X is Hausdorff; (b) Given x ∈ X and u ≠ x we can find U ∈ N(x) such that u ∈ ̸ U; (c) For every x ∈ X we have {x} = ⋂{U : U ∈ N(x)}. Proof. (a) ⇒ (b): Let x ∈ X and u ≠ x. Since by hypothesis X is Hausdorff we can find U ∈ N(x) and V ∈ N(u) such that U ∩ V = 0. This means that u ∈ ̸ U. (b) ⇒ (c): Let u ≠ x. By hypothesis we can find U ∈ N(x) such that u ∈ ̸ U. Therefore we conclude that {x} = ⋂{U : U ∈ N(x)}. (c) ⇒ (a): Let x ≠ u. We can find U ∈ N(x) such that u ∈ ̸ U and V ∈ N(u) such that x ∈ ̸ V. We set U = U ∩ (X \ V) ∈ N(x) and V = V ∩ (X \ U) ∈ N(u). Evidently U ∩ V = 0 and this shows that X is Hausdorff. Now we strengthen the separation property. Definition 1.2.7. A Hausdorff topological space X is said to be regular (or T3 -space) if for each closed set C ⊆ X and each x ∈ ̸ C we can find open sets U and V such that x ∈ U, C ⊆ V and U ∩ V = 0. Proposition 1.2.8. A Hausdorff topological space X is regular if and only if for every point x ∈ X and every U ∈ N(x) we can find W ∈ N(x) such that W ⊆ U. Proof. ⇒: Let x ∈ X and U ∈ N(x). Then X \ U is a closed set not containing x. Since by hypothesis X is regular, we can find open sets W, V such that x∈W,

X\U ⊆V

and

W∩V =0.

(1.2.1)

We have W ⊆ X \ V and so W ⊆ X \ V since X \ V is closed. Then, because of (1.2.1), W ⊆ X \ V ⊆ X \ (X \ U) = U . This means that W ∈ N(x) is the desired neighborhood of x. ⇐: Let x ∈ X and let C ⊆ X be closed such that x ∈ ̸ C. Then X \ C ∈ N(x) and so by hypothesis we can find W ∈ N(x) such that W ⊆ X \ C. Then W and X \ W are open sets such that x ∈ W, C ⊆ X \ W and W ∩ (X \ W) = 0 which by Definition 1.2.7 means that X is regular. Proposition 1.2.9. A Hausdorff topological space X is regular if and only if for every point x ∈ X and every closed set C ⊆ X such that x ∈ ̸ C we can find open sets U, V for which we have U ∩ V = 0. Proof. ⇒: Let x ∈ X and let C ⊆ X be a closed set such that x ∈ ̸ C. Since by hypothesis, X is regular, invoking Proposition 1.2.8, we can find W ∈ N(x) such that W ⊆ X \ C. A new application of Proposition 1.2.8 produces U ∈ N(x) such that U ⊆ W. Let V = X \ W,

12 | 1 Basic Topology which is open. Then we obtain U ⊆ W ⊆ W ⊆ X \ C, which gives C ⊆ X \ W = V. Therefore, U and V is the desired pair of open sets. ⇐: This is obvious from Definition 1.2.7. Proposition 1.2.10. The regularity property is hereditary and topological. Proof. Let A ⊆ X and let D ⊆ A be relatively closed and let x ∈ A \ D. From Proposition 1.1.27 we have D = C ∩ A with closed C ⊆ X. Since x ∈ ̸ C and X is regular, we can find open subsets U, V of X such that x ∈ U, C ⊆ V and U ∩ V = 0. Then U ∩ A, V ∩ A are relatively open in A, x ∈ U ∩ A and D ⊆ V ∩ A. This shows that A with the relative (subspace) topology is regular. Let f : X → Y be a homeomorphism and y ∈ Y, C ⊆ Y closed with y ∈ ̸ C. Let x = f −1 (y) and Ĉ = f −1 (C). Evidently Ĉ ⊆ X is closed and x ∈ ̸ C.̂ Since X is regular we can find open subsets U,̂ V̂ of X such that x ∈ U,̂ Ĉ ⊆ V̂ and Û ∩ V̂ = 0. This gives ̂ = U, f(C)̂ = C ⊆ f(V) ̂ = V and f(U) ̂ ∩ f(V) ̂ = 0 since f is a homeomorphism. But y ∈ f(U) from Proposition 1.1.42 we have that U, V are open subsets of Y. Hence we conclude that Y is regular. We further strengthen the separation property. Definition 1.2.11. A Hausdorff topological space X is said to be normal (or T4 -space) if for each pair A, C of disjoint closed sets in X, we can find open sets U, V such that A ⊆ U, C ⊆ V and U ∩ V = 0. Remark 1.2.12. The definition above can be equivalently stated as follows: “If U1 , U2 are open sets in X such that X = U1 ∪ U2 , then we can find closed subsets C1 , C2 of X such that C1 ⊆ U1 , C2 ⊆ U2 and X = C1 ∪ C2 .” The next two propositions characterize normality and are proven with arguments similar to the ones used in Propositions 1.2.8 and 1.2.9. Proposition 1.2.13. A Hausdorff topological space X is normal if and only if for each closed set C ⊆ X and each open set U ⊆ X such that C ⊆ U we can find an open set V ⊆ X for which we have C ⊆ V ⊆ V ⊆ U. Proposition 1.2.14. A Hausdorff topological space X is normal if and only if for each pair A, C of disjoint closed sets in X we can find open sets U, V in X such that A ⊆ U, C ⊆ V and U ∩ V = 0. Proposition 1.2.15. (a) A closed subset of a normal space is normal. (b) Normality is preserved under continuous, closed surjections. Proof. (a) Let X be a normal topological space and A ⊆ X a closed set. Suppose that C ⊆ A is relatively closed. Then C ⊆ X is closed by Proposition 1.1.27. This observation leads immediately to the normality of A. (b) Let X be a normal topological space, Y a topological space, and f : X → Y a continuous, closed surjection. Suppose that U1 , U2 are open subsets of Y such that

1.2 Separation and Countability Properties – Convergence | 13

Y = U1 ∪ U2 . Then Û 1 = f −1 (U1 ), Û 2 = f −1 (U2 ) are open in X and X = Û 1 ∪ Û 2 . The normality of X implies that we can find closed subsets Ĉ 1 , Ĉ 2 of X such that Ĉ 1 ⊆ Û 1 , Ĉ 2 ⊆ Û 2 and X = Ĉ 1 ∪ Ĉ 2 ; see Remark 1.2.12. Since f is closed we have that C1 = f(Ĉ 1 ), C2 = f(Ĉ 2 ) are closed subsets of Y and C1 ⊆ U1 , C2 ⊆ U2 as well as Y = C1 ∪ C2 . According to Remark 1.2.12, this means that Y is normal as well. Remark 1.2.16. Part (a) of Proposition 1.2.15 fails if the subset is not closed. For a counterexample we refer to Dugundji [91, p. 145]. As we already mentioned in the beginning of this section, richness in open sets implies richness in continuous functions. This is illustrated in the theorem that follows. The result is known as “Urysohn’s Lemma.” Theorem 1.2.17 (Urysohn’s Lemma). A Hausdorff topological space X is normal if and only if for each pair A, C of disjoint closed subsets of X we can find a continuous function f : X → [0, 1] such that f A = 0 and f C = 1. Proof. ⇒: Let D be the set of all rationals r of the form r = k/2n with 0 ≤ k/2n ≤ 1, that is, k = 0, 1, . . . , 2n dyadic fractions. We show that for every r ∈ D we can assign an open set U(r) such that (a) A ⊆ U(0) ⊆ U(0) ⊆ X \ C, U(1) = X \ C. (b) r < r implies U(r) ⊆ U(r ). We proceed by induction on the exponent n ∈ ℕ. So, let E n = {U (

k ) : k = 0, 1, . . . , 2n } , n ∈ ℕ . 2n

Then E0 = {U(0), U(1) = X \ C} and (a) is satisfied by Proposition 1.2.13. Suppose that E n−1 have been constructed. Clearly we need to define U(k/2n ) for k = odd. For k = odd, from the induction hypothesis, we have U(

k−1 k+1 )⊆ U( n ) , 2n 2

see (b). So we define U(k/2n ) = U with U being an open set such that, due to Proposition 1.2.13, k−1 k+1 U( n )⊆ U ⊆ U ⊆ U( n ) . 2 2 This completes the induction and we have defined the collection {U (

k ) : k = 0, 1, . . . , 2n , n ∈ ℕ} . 2n

We define the desired function f by setting {0 f(x) = { sup{r : x ∈ ̸ U(r)} {

if x ∈ U(r) for every r = dyadic fraction as above , otherwise .

14 | 1 Basic Topology Then f has values in [0, 1] and f A = 0, f C = 1. So it remains to show that f is continuous. Note that the intervals {[0, a), (a, 1] : 0 < a < 1}} form a subbasis for [0, 1] with the Euclidean topology. So, according to Remark 1.1.33 it suffices to show that f −1 ([0, a)) and f −1 ((a, 1]) are open. Note that f(x) < a if and only if x ∈ U(r) for some r < a. It follows that f −1 ([0, a)) = ⋃r a if and only if x ∈ ̸ U(r) for some r > a. Therefore f −1 ((a, 1]) = ⋃r>a (X \ U(r)), which is open. This proves the continuity of f . ⇐: Let A, C ⊆ X be disjoint closed sets. By hypothesis we can find a continuous function f : X → [0, 1] such that f A = 0

and

f C = 1 .

(1.2.2)

Let U = {x ∈ X : f(x) < 1/2} and V = {x ∈ X : f(x) > 1/2}. Then U, V ⊆ X are open, U ∩ V = 0, A ⊆ U, C ⊆ V, see (1.2.2), which implies that X is normal. Remark 1.2.18. We can have a form of this result that is a little more flexible. To be more precise, we can replace [0, 1] by [a, b] with a, b ∈ ℝ, a ≤ b and f A = a, f C = b. Indeed, let f0 be the continuous separating function postulated by Theorem 1.2.17. Then set f = (b − a)f0 + a. Evidently this function has the desired properties. There is another such functional characterization of normality, namely the so-called “Tietze Extension Theorem.” We state this result at the end of this section and for its proof, which is rather technical, we refer to Dugundji [91]. Evidently we have Normal

⇒

Regular

⇒

Hausdorff .

None of these implications is in general reversible. Between regular and normal spaces we can fit another class given in the next definition. Definition 1.2.19. A Hausdorff topological space X is said to be completely regular if for each x ∈ X and each closed set C ⊆ X with x ∈ ̸ C, we can find a continuous function f : X → [0, 1] such that f(x) = 0 and f C = 1. Now we pass to the countability properties of a topological space. Definition 1.2.20. (a) A topological space X is said to be first countable if it has a countable local basis at each point of X. (b) A topological space X is said to be second countable if it has a countable basis. Remark 1.2.21. Evidently a second countable space is also first countable. The converse is not true. Every metric space (X, d) is first countable. Indeed for every x ∈ X, B(x) = {B r (x) : r ∈ ℚ} with B r (x) = {u ∈ X : d(u, x) < r} is a countable local basis at +x and so X is first countable. Proposition 1.2.22. Every second countable space is separable.

1.2 Separation and Countability Properties – Convergence | 15

Proof. Let X be a second countable space and let B be the countable basis of X. Let D be the countable set formed by choosing an element from each nonempty basic open set. Then Corollary 1.1.6 implies that D = X. Remark 1.2.23. The converse of the proposition above is not true. Consider the space X = ℝ topologized with the topology that has as its basis intervals of the form (a, b] with a, b ∈ ℝ. This topology is known as the upper limit topology and is denoted by τ u . We can easily check that the Euclidean topology on X = ℝ is weaker than τ u . The space (ℝ, τ u ) is first countable. To see this, consider B(x) = {(r, x] : r ∈ ℚ} for each x ∈ ℝ. In addition, (ℝ, τ u ) is separable. Indeed, the rationals are a countable dense subset. However, (ℝ, τ u ) is not second countable. To see this, note that if {(a n , b n ]}n∈ℕ is a countable collection in τ u , then by choosing a, b ≠ b n for all n ∈ ℕ, the open set (a, b] cannot be expressed as a union of sets in the countable collection. The proposition above also says that every nonseparable metric space is first countable but not second countable. Proposition 1.2.24. (a) Second countability is preserved by continuous open surjections. (b) Second countability is hereditary. (c) Separability is preserved by continuous surjections. Proof. (a) Let X be a second countable topological space, Y another topological space, and f : X → Y a continuous open surjection. Consider a basis {U n }n∈ℕ for the topology of X, and using Corollary 1.1.6, we see that {f(U n )}n∈ℕ is a countable basis for Y. (b) This is obvious. (c) Let X be a separable topological space, Y another topological space and f : X → Y a continuous surjection. Consider D ⊆ X as being a countable dense subset. From Proposition 1.1.35(b) we have Y = f(X) = f(D) ⊆ f(D). Hence, Y = f(D) and f(D) is countable. Remark 1.2.25. Clearly, an open subset of a separable topological space is separable for the subspace topology. If X is a second countable topological space, then every subset of X endowed with the subspace topology is separable. Definition 1.2.26. Let (X, τ) be a topological space. (a) An open cover of X is a collection D ⊆ τ such that X = ⋃{U : U ∈ D}. A subcover of an open cover D is a subfamily D of D such that X = ⋃{U : U ∈ D }. (b) We say that X is a Lindelöf space if every open cover contains a countable subcover. The next result relates the Lindelöf property with second countability. It is known as “Lindelöf’s Theorem.” Theorem 1.2.27 (Lindelöf’s Theorem). Every second countable space is Lindelöf. Proof. Let X be a second countable topological space and {U n }n≥1 a countable basis of X. Consider an open cover D = {V i }i∈I of X. For each x ∈ X, let V i(x) ∈ {V i }i∈I be such that x ∈ V i(x) . Let U n(x) ∈ {U n }n≥1 be such that x ∈ U n(x) ⊆ V i(x) . Then the family

16 | 1 Basic Topology ∈ D be such that {U n(x) }x∈X is a countable open cover of X. For each U n(x) let V i(x) U n(x) ⊆ V i(x) . Then the collection {V i(x) }x∈X is a countable subcover of D. Therefore, X is Lindelöf.

Remark 1.2.28. The converse of the Theorem above is not true. Consider the space (ℝ, τ u ); see Remark 1.2.23. Then we can show that it is Lindelöf (see Dugundji [91]), but it is not second countable; see again Remark 1.2.23. Proposition 1.2.29. (a) The Lindelöf property is preserved by continuous surjections. (b) A closed subset of a Lindelöf space is Lindelöf for the subspace topology. Proof. (a) Let X be a Lindelöf space, Y another topological space, and f : X → Y a continuous surjection. Consider an open cover {U i }i∈I of Y. Then {V i }i∈I = {f −1 (U i )}i∈I is an open cover of X. Since X is Lindelöf, we can find a countable subcover {V n }n∈ℕ = {f −1 (U n )}n∈ℕ . Then {U n }n∈ℕ is a countable subcover of {U i }i∈I and so we conclude that Y is Lindelöf. (b) Let X be a Lindelöf space and C ⊆ X a closed subset. Consider an open cover {V i }i∈I of C with the subspace topology. Then V i = U i ∩ C with U i ⊆ X open. Then {U i , X \ C}i∈I is an open cover of X. Since X is Lindelöf we can find a countable subcover {U n }n∈ℕ . Then {U n ∩ C}n∈ℕ is a countable subcover of {V i }i∈I . So, we conclude that C with the subspace topology is Lindelöf. We know that a sequence is a map from ℕ into X but it is more convenient to think of a sequence as a subset of X indexed by ℕ. We generalize this notion by replacing ℕ with a more general index set. Definition 1.2.30. Let X be a set. (a) A relation is any subset R ⊆ X × X. Given a relation, it is more suggestive to write xRy instead of (x, y) ∈ R. We say that R is reflexive if xRx for all x ∈ X. We say that R is symmetric if xRy implies yRx. We say that R is antisymmetric if xRy and yRx imply x = y. We say that R is transitive if xRy and yRz imply xRz. (b) A relation R is called an equivalence relation if it is reflexive, symmetric, and transitive. (c) A relation R is called a partial order if it is antisymmetric and transitive. In this case we write x ≤ y if and only if xRy or x = y (a reflexive partial order) and x < y if and only if xRy and x ≠ y (a strict partial order). A linear order R is a partial order such that for all x, u ∈ X, either xRu or uRx. A chain is a linearly ordered subset of a partially ordered set. (d) A directed set is a partially ordered set (I, ≤) such that for any α, β ∈ I we can find k ∈ I such that α ≤ k and β ≤ k. Remark 1.2.31. Many authors require that a partial order is also reflexive. Definition 1.2.30(c) is more flexible and allows both “≤” and “ 0 and for all x ∈ A, then f ̂ can be chosen so that |f ̂(x)| ≤ M for all x ∈ X.

1.3 Weak, Product, and Quotient Topologies Let X and {Y i }i∈I be topological spaces and f i : X → Y i be continuous functions. From Proposition 1.1.32 we see that if we strengthen (enrich) the topology on X, we preserve the continuity of the f i ’s. Thus it is natural to inquire what the smallest topology on X is, which preserves the continuity of the f i ’s. This leads to the notions of weak and product topologies, which occur in a prominent position in many areas of analysis such as functional analysis. Definition 1.3.1. Let X be a nonempty set, let {(Y i , τ i )}i∈I be a family of Hausdorff topological spaces and let f i : X → Y i with i ∈ I be a family of functions. The weak topology or initial topology on X generated by the family of functions {f i }i∈I is the weakest topology on X that makes all f i ’s continuous. The weak topology is denoted by w(X, {f i }) or simply by w if X and {f i } are clearly understood. Remark 1.3.2. Simple set theory reveals that the weak topology is generated, that is, it has as subbasis, the sets of the form {f i−1 (V) : V ∈ τ i , i ∈ I} .

(1.3.1)

Recalling that to check continuity it suffices to consider the inverse image of subbasic sets, another more economical subbasis is given by {f i−1 (V) : V ∈ Li , i ∈ I}

(1.3.2)

with a subbasis Li for the topology τ i . Then a basis for the weak topology is produced by taking finite intersections of the sets above; see (1.3.1) and (1.3.2). An important special case is when Y i = ℝ for all i ∈ I. This is the case of the weak topology in functional analysis. Then the subbasic elements are of the form U(x; f, ε) = {u ∈ X : |f(u) − f(x)| < ε} with x ∈ X, f ∈ {f i } and ε > 0. Proposition 1.3.3. A net {x α }α∈J converges to x for the weak topology, which is denoted w

by x α → x, if and only if f i (x α ) → f i (x) for all i ∈ I.

20 | 1 Basic Topology Proof. ⇒: This follows from Proposition 1.2.37, since each f i is w-continuous. (V i k ) be a basic neighborhood of X where V i k ∈ τ i k . Since by ⇐: Let V = ⋂nk=1 f i−1 k hypothesis f i k (x α ) → f i k (x), we can find α i k ∈ J such that x α ∈ f i−1 (V i k ) for all α ≥ α i k . k

(1.3.3)

Since J is directed we can find α0 ≥ α i k for all k ∈ {1, . . . , n}. Then x α ∈ V for all α ≥ α0 w

because of (1.3.3). This implies x α → x in X. Proposition 1.3.4. If Z is another topological space and g : Z → X is a map, then g is continuous for the weak topology on X if and only if f i ∘ g is continuous for all i ∈ I. Proof. ⇒: From Proposition 1.1.36(a) we know that f i ∘ g is continuous for all i ∈ I. ⇐: Let U ⊆ X be weakly open. Then U=

⋃

⋂ f i−1 (V i )

with V i ∈ τ i .

arbitrary finite

This gives g −1 (U) =

⋃

⋂ g −1 (f i−1 (V i )) =

arbitrary finite

⋃

⋂ (f i ∘ g)−1 (V i ) ,

arbitrary finite

which is open in Z, and thus g is continuous. Consider X endowed with the weak topology w(X, {f i }). Suppose that A ⊆ X. Then we can consider on A the subspace topology induced by w(X, {f i }). However, we can also consider the weak topology w(A, {f i A }); see Proposition 1.1.36(b). It is natural to ask what the relation is between these two topologies on A. It is easy to see that the two topologies have the same convergent nets. This leads to the next result. Proposition 1.3.5. If X is endowed with the weak topology w(X, {f i }) and A ⊆ X, then w(X, {f i })A = w(A, {f i A }). As we already mentioned, an analyst requires that a topological space is at least Hausdorff. So we need to know the conditions that guarantee that the weak topology is Hausdorff. Definition 1.3.6. Let X and {Y i }i∈I be sets and let f i : X → Y i be a family of functions. We say that the family {f i }i∈I is separating (or total) if for every pair (x, u) ∈ X × X with x ≠ u we can find i0 ∈ I such that f i0 (x) ≠ f i0 (u). Proposition 1.3.7. If w(X, {f i }) is the weak topology on X, then w(X, {f i }) is Hausdorff if and only if {f i }i∈I is separating. Proof. ⇒: Arguing by contradiction, suppose that the family {f i }i∈I is not separating. So, we can find a pair (x, u) ∈ X × X with x ≠ u such that f i (x) = f i (u) for all i ∈ I. Let U ∈ Nw (x) where Nw (x) is the family of weak neighborhoods of x. Then we can find

1.3 Weak, Product, and Quotient Topologies | 21

{f i k }nk=1 ⊆ {f i }i∈I and V i k ∈ τ i k with k ∈ {1, . . . , n} such that n

x ∈ ⋂ f i−1 (V i k ) ⊆ U . k

(1.3.4)

k=1

Since f i (x) = f i (u) for all i ∈ I, we have n

u ∈ ⋂ f i−1 (V i k ) . k k=1

Due to (1.3.4) it follows u ∈ U. We infer that (X, w) is not Hausdorff, a contradiction. ⇐: As before, we proceed indirectly. Suppose that (X, w) is not Hausdorff. Then according to Proposition 1.2.35 we can find a net {x α }α∈I ⊆ X such that w

xα → x

and

w

x α → x̂ ,

x ≠ x̂ .

For every i ∈ I we have f i (x α ) → f i (x) and f i (x α ) → f i (x)̂ in Y i , which is Hausdorff. Hence, f i (x) = f i (x)̂ for all i ∈ I, see Proposition 1.2.35. This means that the family {f i }i∈I is not separating, a contradiction. Next we derive some useful results concerning the weak topology. Let (X, τ) be a Hausdorff topological space. We will use the following notations: – C(X, ℝ) = {f : X → ℝ : f is continuous}; – Cb (X, ℝ) = {f : X → ℝ : f is bounded and continuous}. Proposition 1.3.8. If (X, τ) is a Hausdorff topological space, then w(X, C(X, ℝ)) = w(X, Cb (X, ℝ)). Proof. Since Cb (X, ℝ) ⊆ C(X, ℝ) we infer that w(X, Cb (X, ℝ)) ⊆ w(X, C(X, ℝ)). So we need to show that the opposite inclusion also holds. Let U be a subbasic open set in w(X, C(X, ℝ)). Then we have U(x; f, ε) = {u ∈ X : |f(u) − f(x)| < ε} with x ∈ X, f ∈ C(X, ℝ) and ε > 0. Let g(u) = min{f(x) + ε, max{f(x) − ε, f(u)}} . Evidently we have g ∈ Cb (X, ℝ) and U(x; g, ε) = U(x; f, ε), which implies that w(X, C(X, ℝ)) ⊆ w(X, Cb (X, ℝ)). This proves the assertion. The next theorem characterizes completely regular spaces (see Definition 1.2.19) via the weak topologies of the previous proposition. Theorem 1.3.9. A Hausdorff topological space (X, τ) is completely regular if and only if τ = w(X, C(X, ℝ)) = w(X, Cb (X, ℝ)). Proof. ⇒: Let U ∈ τ and x ∈ U. Since X is completely regular, we can find f ∈ C(X, ℝ) such that f(x) = 0 and f X\U = 1. Let V = {u ∈ X : f(u) < 1}. Then V is w(X, C(X, ℝ))-

22 | 1 Basic Topology open, V ⊆ U, and x ∈ V. Therefore, U is w(X, C(X, ℝ))-open and so we infer that τ ⊆ w(X, C(X, ℝ)) .

(1.3.5)

From Definition 1.3.1 it is clear that we always have w(X, C(X, ℝ)) ⊆ τ. This along with (1.3.5) and Proposition 1.3.8 yields τ = w(X, C(X, ℝ)) = w(X, Cb (X, ℝ)). ⇐: Let C ⊆ X be closed and x ∈ ̸ C. Then U = X \ C ∈ Nw (x) where Nw (x) is the family of weak neighborhoods of x. So we can find V = ⋂ni=1 {u ∈ X : |f i (u) − f i (x)| < 1}, f i ∈ C(X, ℝ) for all i ∈ {1, . . . , n}, such that x ∈ V ⊆ U. For each i ∈ {1, . . . , n} we define g i (u) = min{1, |f i (u) − f i (x)|} and set g = max1≤i≤n g i . Obviously g : X → [0, 1] is continuous and g(x) = 0 as well as gC = 1. This proves that X is completely regular. A weak topology of special interest is the product topology. So, let {(X i , τ i )}i∈I be a family of Hausdorff topological spaces. Let X = ∏i∈I X i . The generic element x ∈ X is denoted by x = (x i ). For every i ∈ I let p i : X → X i be defined by p i (x) = x i where p i is th

the projection map in the i =-component of the Cartesian product. Definition 1.3.10. The product topology on X is the weak topology w(X, {p i }). Remark 1.3.11. A basic element for the product topology has the form V = ∏i∈I V i with V i ∈ τ i for all i ∈ I and V i = X i for all but a finite number of i’s. In addition, note that x α = (x αi ) → x = (x i ) in X = ∏i∈I X i if and only if x αi → x i for all i ∈ I. Note that if A i ⊆ X i then ∏i∈I A i = ∏i∈I A i and each projection map p i is open. Proposition 1.3.12. X = ∏i∈I X i with the product topology is Hausdorff. Proof. Recall that each X i is Hausdorff. Let x = (x i ) ∈ X and u = (u i ) ∈ X with x ≠ u. Then we can find at least one i0 ∈ I such that x i0 ≠ u i0 . We can find U i0 , V i0 ∈ τ i0 such −1 that x i0 ∈ U i0 , u i0 ∈ V i0 and U i0 ∩ V i0 = 0. Let U = p−1 i0 (U i0 ) and V = p i0 (V i0 ). Then both are open in the product topology and x ∈ U, u ∈ V and U ∩ V = 0. This implies that X is Hausdorff with the product topology. Proposition 1.3.13. If {(X i , τ i )}i∈I is a family of Hausdorff topological spaces, then X = ∏i∈I X i endowed with the product topology is regular if and only if (X i , τ i ) is regular for each i ∈ I. Proof. ⇒: Each X i is homeomorphic to a slice of X = ∏i∈I X i . Hence, the implication follows from Proposition 1.2.10. ⇐: Let x = (x i ) ∈ X = ∏i∈I X i and let U be any subbasic neighborhood of x. Then U = ∏i∈I V i with V i = X i for all i ∈ I \ {i0 }, V i0 ∈ τ i0 . Exploiting the regularity of X i0 we can find W i0 ∈ τ i0 such that x i0 ∈ W i0 ⊆ W i0 ⊆ V i0 ,

(1.3.6)

see Proposition 1.2.8. Let W = ∏i∈I W i with W i = X i for all i ∈ I \ {i0 } and W i0 as above. Then W is open in the product topology and because of Remark 1.3.11 as well as (1.3.6),

1.3 Weak, Product, and Quotient Topologies | 23

it follows that x ∈ W ⊆ W = ∏ W i ⊆ ∏ Vi = V . i∈I

i∈I

This proves that X = ∏i∈I X i is regular with the product topology; see Proposition 1.2.8. The Cartesian product of normal spaces need not be normal. For a counterexample, see Dugundji [91, p. 145]. However, we have the following result. Proposition 1.3.14. If {(X i , τ i )}i∈I is a family of Hausdorff topological spaces and X = ∏i∈I X i endowed with the product topology is normal, then (X i , τ i ) is normal for each i ∈ I. Proof. Note that for each i ∈ I, X i is homeomorphic to a slice of X = ∏i∈I X i , which is closed, and hence normal due to Proposition 1.2.15(a). Then the result follows from Proposition 1.2.15(b). Next we will consider the complementary situation to the one that led to the weak topology. So, let X, Y be topological spaces and f : X → Y be a continuous map. If we weaken the topology on Y we preserve the continuity of f . Hence, we want to identify the largest topology on Y for which f remains continuous. Definition 1.3.15. Let (X, τ) be a topological space, Y a set, and f : X → Y a surjection. The quotient topology on Y induced by f is τ q = {U ⊆ Y : f −1 (U) ∈ τ}. When Y is endowed with the quotient topology, then we say that f is a quotient map. Remark 1.3.16. The quotient topology on Y makes f continuous and it is clearly the largest topology on Y that does this. Proposition 1.3.17. If (X, τ X ), (Y, τ Y ) are topological spaces and f : X → Y is supposed to be a continuous, open surjection, then f is a quotient map, that is τ Y = τ q . Proof. By definition τ Y ⊆ τ q . On the other hand, if U ∈ τ q , then f −1 (U) ∈ τ X and since f is open, we have U = f(f −1 (U)) ∈ τ Y and so τ q ⊆ τ Y . Therefore τ Y = τ q . Corollary 1.3.18. If {(X i , τ i )}i∈I are Hausdorff topological spaces and X = ∏i∈I X i is endowed with the product topology, then τ i = τ q for each i ∈ I. Proof. Just recall that each projection map p i : X = ∏i∈I X i → X i is a continuous open surjection. Proposition 1.3.19. If (X, τ X ), (Y, τ Y ) are topological spaces and f : X → Y is supposed to be a continuous, closed surjection, then f is a quotient map, that is τ Y = τ q . Proof. Recall that τ Y ⊆ τ q . Let U ∈ τ q . Then f −1 (U) ∈ τ X and so X \ f −1 (U) =: C ⊆ X is closed. Since f is closed, we have that f(C) ⊆ Y is τ Y -closed. Note that U = Y \ f(C) ∈ τ Y . Hence τ q ⊆ τ Y and we conclude that τ Y = τ q .

24 | 1 Basic Topology The next proposition gives a criterion to recognize when a function defined on a quotient space is continuous. Proposition 1.3.20. If (X, τ X ), (Y, τ Y ), and (Z, τ Z ) are topological spaces, f : X → Y is a quotient map and g : Y → Z, then g is continuous if and only if g ∘ f is continuous. Proof. ⇒: This follows from Proposition 1.1.36(a). ⇐: Let U ∈ τ Z . Then (g ∘ f)−1 (U) = f −1 (g −1 (U)) ∈ τ X . Hence g −1 (U) ∈ τ Y since f is a quotient map, see Definition 1.3.15. This proves the continuity of g. Now we will show that the whole topic of the quotient topology can be covered by considering Y to be X/R with R being an equivalence relation; see Definition 1.2.30(b). Suppose f : X → Y is a surjection and define the relation R ⊆ X × X by setting xRx if and only if f(x) = f(x ). Let e(x) be the equivalence class for x. Evidently f e(x) is constant. Then the map f ̂ : X/R → Y defined by f ̂(e(x)) = f(x) is actually well-defined and a bijection. Note that if e(x) = e(x ), then f(x) = f(x ). In order to topologize X/R consider the standard quotient map e : X → X/R and consider the quotient topology induced by e. Then we have the following result. Proposition 1.3.21. If X is a topological space, Y is a set, f : X → Y is a surjection and R is the equivalence relation defined above, then X/R and Y are homeomorphic when both are endowed with the quotient topology. Remark 1.3.22. Instead of using the equivalence relation we may assume that X is partitioned by a collection C of disjoint subsets. Then we define an equivalence relation by setting xRu if and only if x, u are in the same element of C. Then we can consider X/R. The simplest kind of quotient space can be obtained by the equivalence relation R in which only one equivalence class has more than one element e(x0 ) = A and for all other equivalence classes we have e(x) = {x} with x ∈ X \ A. Then X/R is denoted by X/A and we obtain the quotient (identification) space by collapsing A to a single element {x0 }. Example 1.3.23. (a) The quotient space of [0, 1] obtained by identifying 0 and 1 is homeomorphic to a circle. (b) The quotient space of I 2 = [0, 1] × [0, 1] by identifying the boundary with a single point is homeomorphic to a sphere in ℝ3 . (c) The quotient space of I 2 = [0, 1] × [0, 1] by identifying the points (0, x2 ) and (1, 1 − x2 ) with 0 ≤ x2 ≤ 1 is homeomorphic to the Möbius strip. (d) Let X = I 2 = [0, 1] × [0, 1] and consider an equivalence relation R ⊆ X × X defined as follows: (x1 , 0)R(x1 , 1) for every 0 ≤ x1 ≤ 1 ,

(1.3.7)

(0, x2 )R(1, x2 ) for every 0 ≤ x2 ≤ 1 .

(1.3.8)

1.4 Connectedness and Compactness | 25

Then the quotient space is realized in two steps and gives a space homeomorphic to the torus. The first step is determined by (1.3.7), which produces a cylinder and then in the second step determined by (1.3.8), where we identify the two bases of the cylinder to generate the torus. (e) If we replace (1.3.8) in the example above by (0, x2 )R(1, 1 − x2 ) for every 0 ≤ x2 ≤ 1 , then the resulting quotient space X/R is the Klein bottle . (f) Let D ⊆ ℝ2 be the unit disc, that is, D = {(x1 , x2 ) ∈ ℝ2 : x21 + x22 ≤ 1} and consider the equivalence relation xR(−x)

for all ∂D = {(x1 , x2 ) ∈ ℝ2 : x21 + x22 = 1} .

That means, diametrically opposite points are identified. Then the quotient space D/R is called the projective plane and is denoted by P2 . One can proceed similarly to define P n for any n ∈ ℕ0 as the space obtained from S n = {x ∈ ℝn+1 : |x| = 1} by identifying each point x with its antipode −x. The space P n is known as the projective n-space.

1.4 Connectedness and Compactness The property of connectedness says that the space has only one piece. It is a very important topological invariant with important applications in many other branches of mathematics. It is not difficult to come up with a definition of this very intuitive notion. Definition 1.4.1. Let X be a topological space. A separation of X is a pair (U, V) of disjoint, nonempty, open sets of X such that X = U ∪ V. If such a separation exists, we say that the space is disconnected. If there is no such separation for X, then we say that the space is connected. A set A ⊆ X is connected, if it is a connected space when endowed with the subspace topology. Note that in a separation the two sets are both open and closed. We say that they are clopen. Example 1.4.2. (a) The space (ℝ, τ u ), see Remark 1.2.23, is disconnected and the sets {x ∈ ℝ : x > λ} and {x ∈ ℝ : x ≤ λ} with λ ∈ ℝ form a separation of ℝ. (b) The rationals ℚ with the relative Euclidean topology form a disconnected space. The sets {x ∈ ℝ : x > π} ∩ ℚ and {x ∈ ℝ : x < π} ∩ ℚ form a separation of ℚ. (c) A discrete space that is not a singleton is disconnected and the empty set is disconnected since there are no open sets to form a separation of it.

26 | 1 Basic Topology (d) ℝ \ {0} is disconnected since (−∞, 0) and (0, +∞) form a separation. Similarly, ℝ2 \ ℝ is disconnected and we can have a separation using the sets U = {(x1 , x2 ) ∈ ℝ : x2 > 0} and

V = {(x1 , x2 ) ∈ ℝ : x2 < 0} .

Here, U is called the upper half plane and V is said to be the lower half plane. (e) ℝ, endowed with the Euclidean topology, is connected. To show this we argue by contradiction. So, suppose that ℝ is disconnected and (U, V) is a separation of ℝ. Let x ∈ U and y ∈ V and assume without loss of generality that x < y. Then Û = U ∩ [x, y] is closed and bounded in ℝ. Hence, û = sup Û ∈ U.̂ Furthermore, û ∈ ̸ V since U and V are disjoint. Therefore û < y and (u,̂ y] ⊆ V. Thus, û ∈ V and so û ∈ V. It follows that û ∈ U ∩ V, a contradiction. This proves the connectedness of ℝ. Remark 1.4.3. From the examples 1.4.2(b) and (e) we see that connectedness is not a hereditary property. Proposition 1.4.4. The connected subsets of ℝ are singletons and intervals (open, closed, or half-open). Proof. Clearly singletons are connected. In addition, the argument in Example 1.4.2(e) shows that intervals are connected. It remains to show if A ⊆ ℝ is connected, then A is an interval. If A is not an interval, then we can find x, y ∈ A and u ∈ ̸ A such that x < u < y. Then U = A ∩ {v ∈ ℝ : v < c} and V = A ∩ {v ∈ ℝ : v > c} are a separation of A, a contradiction. Proposition 1.4.5. Let X be a topological space. The following statements are equivalent: (a) X is disconnected. (b) There is a nonempty, proper subset of X, which is both open and closed. (c) There is a continuous function from X into the two-point space {a, b}. (d) X has a nonempty, proper subset A such that A ∩ (X \ A) = 0. Proof. (a) ⇒ (b): Since X is disconnected, it admits a separation (U, V). Then U as well as V are nonempty clopen. (b) ⇒ (a): Suppose A is a proper, nonempty subset of X that is clopen. Let C = X \ A. Then (A, C) is a separation of X and so X is disconnected. (a) ⇒ (c): Let (U, V) be a separation of X. Then the function f : X → {a, b} defined by {a if x ∈ U , f(x) = { b if x ∈ V { is continuous. (c) ⇒ (a): Since f : X → {a, b} is continuous, then U = f −1 (a) and V = f −1 (b) are disjoint, open sets in X such that X = U ∪ V. So, (U, V) is a separation of X and we conclude that X is disconnected. (a) ⇒ (d): Let (U, V) be a separation of X. Then U ∩V = U ∩V = 0. So U ∩(X \ U) = 0. (d) ⇒ (a): We have that A and (X \ A) are disjoint, closed sets whose union is X. Hence A and (X \ A) are also open and form a separation of X.

1.4 Connectedness and Compactness | 27

Corollary 1.4.6. Let X be a topological space. The following statements are equivalent: (a) X is connected. (b) The only subsets of X that are open and closed are 0 and X. (c) There is no continuous function from X onto the two-point space {a, b}. (d) X has no nonempty, proper subset A such that A ∩ (X \ A) = 0. Proposition 1.4.7. If X, Y are topological spaces, X is connected and f : X → Y is continuous, then f(X) is connected. Proof. Since f : X → f(X) is continuous we may assume that f is a continuous surjection. Arguing by contradiction, suppose that Y = f(X) is disconnected and let (U, V) be a separation of Y. Then f −1 (U) and f −1 (V) are disjoint, open sets in X such that X = f −1 (U) ∪ f −1 (V). Hence X is disconnected, a contradiction. Remark 1.4.8. The last proposition gives at once that all open intervals in ℝ are connected. Indeed recall that every open interval is homeomorphic to ℝ and that ℝ is connected; see Example 1.4.2(e). If on a connected set A we adjoin some of its limit points we preserve connectedness. Proposition 1.4.9. If X is a topological space, A ⊆ X is connected and A ⊆ C ⊆ A, then C is connected. Proof. Arguing by contradiction, suppose that C is disconnected. Hence by Proposition 1.4.5, there exists a continuous surjection f : C → {0, 1}. Since A is connected from Corollary 1.4.6 we have that f(A) = {0} or f(A) = {1}. To fix things assume that f(A) = {0}. From Proposition 1.1.35 we have f(A) ⊆ f(A) = {0}. Hence, f(C) = {0}, a contradiction. Corollary 1.4.10. If X is a topological space and A ⊆ X is connected, then A is connected as well. Another useful result in determining whether or not a given subset is connected, is the following one. Proposition 1.4.11. If (X, τ) is a topological space and A ⊆ X, then A is disconnected if and only if there exist open sets U, V ∈ τ such that U ∩ A ≠ 0 ,

V ∩ A ≠ 0 ,

U∩V∩A=0,

and

A⊆U∪V .

Proof. ⇒: We have A = Û ∪ V̂ with U,̂ V̂ ∈ τ(A) with the subspace topology τ(A) and Û = U ∩ A

as well as

V̂ = V ∩ A

with

U, v ∈ τ .

Then we can easily check that U and V have the desired properties. ⇐: Let Û = U ∩ A ≠ 0 and V̂ = V ∩ A ≠ 0. We have that U,̂ V̂ ∈ τ(A) and they are disjoint with A = Û ∪ V.̂ Therefore, A is disconnected.

28 | 1 Basic Topology It is obvious that connectedness is not preserved by arbitrary unions. Additional restrictions are needed. Proposition 1.4.12. If X is a topological space and {A i }i∈I is any family of connected subsets of X such that ⋂i∈I A i ≠ 0, then ⋃i∈I A i is connected. Proof. Let C = ⋃i∈I A i . Suppose that C is disconnected. Then by Proposition 1.4.5 we can find a continuous map f : C → {0, 1}. Since each A i is connected, f A i is not surjective for all i ∈ I. Let x0 ∈ ⋂i∈I A i . Then f(x) = f(x0 ) for all x ∈ A i and for all i ∈ I. So, f is not surjective, a contradiction. Connectedness is preserved by arbitrary Cartesian products. Proposition 1.4.13. If {X i }i∈I is an arbitrary family of nonempty, connected topological spaces, then X = ∏i∈I X i , endowed with the product topology, is connected as well. Proof. Arguing by contradiction, suppose that X is disconnected. So there is a continuous map f : X → {0, 1}. Fix u = (u i )i∈I ∈ X and let i1 ∈ I. We define f i1 : X i1 → X by setting f i1 (x i ) = y = (y i )i∈I with y i = u i for i ≠ i1 and y i1 = x i1 . Evidently f i1 is continuous, which implies the continuity of f ∘ f i1 : X i1 → {0, 1}. By hypothesis, X i1 is connected. So, f ∘ f i1 is constant and (f ∘ f i1 )(x i1 ) = f(u) for every x i1 ∈ X i1 . Hence f(x) = f(u) for all x ∈ X, which are equal to u except for the i1 -component. We repeat this process with another index i2 ∈ I. Continuing this way we see that f(x) = f(u) for all x ∈ X, which are equal to u except on a finite number of coordinates. This set is dense in X and so by Proposition 1.1.35(b), f is constant, a contradiction. This proves that X is connected. Corollary 1.4.14. The space ℝn with n ∈ ℕ is connected. Example 1.4.15. Let A = {(0, y) ∈ ℝ2 : 0 ≤ y ≤ 1} and C = {(x, y) ∈ ℝ2 : 0 < x ≤ 1, y = sin π/x}. Evidently C is connected because of Propositions 1.4.7 and 1.4.13. Furthermore, S = C = A ∪ C is connected; see Corollary 1.4.10. The set S is known as the topologist’s sine curve. Remark 1.4.16. It is clear that intersection of even two connected spaces need not be connected. Furthermore, suppose that {A n }n∈ℕ is a decreasing sequence of connected spaces. Then ⋂n≥1 A n need not be connected. To see this, let X = I 2 \ {(x, 0) : 1/2 ≤ x ≤ 2/3} with I = [0, 1] and A n = {(x, y) ∈ X : y ≤ 1/n} with n ∈ ℕ. A disconnected space can be decomposed in a unique way into connected components and the number of components can be viewed as an indication of how disconnected the space is. Definition 1.4.17. A component of a topological space X is a maximal connected subset C of X. That is, C is connected and it is not properly contained in a connected subset of X.

1.4 Connectedness and Compactness | 29

Remark 1.4.18. A component is necessarily closed. Indeed, from Corollary 1.4.10 we know that C is connected. The maximality of C implies that C = C. Hence, C is closed. The family of distinct components of X form a partition of X. To see this, note if C, C are two distinct components of X and C ∩ C ≠ 0, then from Proposition 1.4.12 we have that C ∪ C is connected, contradicting the maximality of the components. Moreover, for the same reason, each x ∈ X belongs in a unique component. Given x ∈ X let C(x) denote the component of X containing x. Then, for points x, u ∈ X, C(x) and C(u) are either identical or disjoint. Every connected subset of X is contained in one component and X is connected if and only if it has only one component. Finally if {U, V} is a separation of X and C is a component of X, then C ⊆ U or C ⊆ V. Taking into account the remarks above and Proposition 1.4.19, we infer the following result. Proposition 1.4.19. If X, Y are topological spaces and f : X → Y is continuous, then the image of each component of X lies in a component of Y. Remark 1.4.20. In particular, a homeomorphism f induces a 1-1 correspondence between the components of X and Y with C(x) being homeomorphic to C(f(x)) for all x ∈ X. Definition 1.4.21. (a) A topological space X is totally disconnected provided that each component of X is a singleton. (b) A point x ∈ X is a cut point of a connected topological space X provided that X \ {x} is disconnected. We say that x ∈ X is an n-cut point provided that X \ {x} has n-components. From Proposition 1.4.19 it follows the following result. Proposition 1.4.22. Homeomorphic spaces have the same number of cut points of each type. From an analytical point of view, the notion of path-connectedness is more natural. Path-connectedness is a topological property stronger than connectedness and it is useful in many applications. It is a very intuitive notion that in a path-connected space any two distinct points can be joined by a continuous path in the space. Definition 1.4.23. (a) A path in a topological space X is a continuous map σ : [0, 1] → X. We say that σ(0) is the initial point of the path and σ(1) is the final point of the path. The set σ([0, 1]) ⊆ X is called a curve in X. If σ is a path in X, then σ(t) = σ(1 − t) for all t ∈ [0, 1] is the reverse path. (b) A topological space X is said to be path-connected provided that for each pair of points x, u ∈ X there is a path in X with initial point x and final point u. A subset C of X is path-connected if C has this property for the subspace topology. The next proposition compares connectedness and path-connectedness.

30 | 1 Basic Topology

Proposition 1.4.24. Every path-connected topological space is connected. Proof. Suppose that X is path-connected and let u ∈ X. For each x ∈ X, let σ x be the path in X with initial point u and final point x. Let C x = σ x ([0, 1]) be the corresponding curve. From Proposition 1.4.7 we know that C x ⊆ X is connected. Note that u ∈ ⋂x∈X C x . So, from Proposition 1.4.12, it follows that ⋃x∈X C x = X is connected. Remark 1.4.25. The converse of the above is not true in general. As a counterexample, consider the topologist’s sine curve S = A ∪ C introduced in Example 1.4.15. Then S is connected but not path-connected. To prove that S is not path-connected, we show that it is not possible to join a point in A to a point in C by a path in S. To this end, let a ∈ A and let σ : [0, 1] → X be a path with initial point a. Note that A is closed in S (see Proposition 1.1.27), and so σ−1 (A) ⊆ [0, 1] is closed and nonempty, since 0 ∈ σ−1 (A). Let t ∈ σ−1 (A) and choose a small ε > 0 such that σ((t − ε, t + ε)) ⊆ B1/2 (σ(t)) = {u ∈ ℝ2 : |u − σ(t)| ≤ 1/2}, which is possible since σ is continuous. Note that S ∩ B1/2 (σ(t)) consists of a closed interval on the y-axis of ℝ2 together with parts of the curve y = sin(π/x), each of which is homeomorphic to a closed interval. Moreover, any two of these parts are disjoint in S ∩ B1/2 (σ(t)). So A ∩ B1/2 (σ(t)) is a component of S ∩ B1/2 (σ(t)). Since σ(t) ∈ A ∩ B1/2 (σ(t)) and (t − ε, t + ε) is connected, we must have σ((t − ε, t + ε)) ⊆ A ∩ B1/2 (σ(t)). This shows that σ−1 (A) ⊆ [0, 1] is open. Hence σ−1 (A) = [0, 1] being both closed and open. So, σ([0, 1]) ⊆ A and this proves that S cannot be path-connected. Proposition 1.4.26. If X is a topological space and u ∈ X, then X is path-connected if and only if each x ∈ X can be joined to u by a path. Proof. ⇒: This is obvious. ⇐: Let x, x ∈ X and consider the paths σ, σ : [0, 1] → X such that σ has initial point x and final point u as well as σ having initial point u and final point x . We define σ̂ : [0, 1] → X by {σ(2t) if t ∈ [0, 21 ] , ̂ ={ σ(t) σ (2t − 1) if t ∈ [ 12 , 1] . { This is a continuous path since σ(1) = σ (0) = u; see Proposition 1.1.37. Moreover, ̂ ̂ σ(0) = x and σ(1) = x . Therefore, X is path-connected. Definition 1.4.27. Let σ1 , σ2 be two paths in X such that σ1 (t) = σ2 (0). The path composition of σ1 and σ2 denoted by σ1 ∗ σ2 is the path in X defined by {σ1 (2t) if t ∈ [0, 21 ] , (σ1 ∗ σ2 )(t) = { σ (2t − 1) if t ∈ [ 21 , 1] . { 2 The next result is a straightforward consequence of Definition 1.4.23(b) and of Proposition 1.1.36(a).

1.4 Connectedness and Compactness | 31

Proposition 1.4.28. If X, Y are topological spaces, X is path-connected, and f : X → Y is continuous, then f(X) is path-connected. Remark 1.4.29. It follows that path-connectedness is a topological invariant. In contrast to connectedness, see Corollary 1.4.10, the closure of a path-connected set need not be path-connected. We consider the topologist’s sine curve from Example 1.4.15. We have S = C and C is path-connected; see Proposition 1.4.28. However, we proved that S is not path-connected; see Remark 1.4.25. Many results about connectedness have analogues for path connectedness. Proposition 1.4.30. If X is a topological space and {A i }i∈I is any family of path-connected subsets of X such that ⋂i∈I A i ≠ 0, then ⋃i∈I A i is path-connected. Proof. Let x ∈ ⋃i∈I A i and pick u ∈ ⋂i∈I A i . Since x ∈ A i0 for some i0 ∈ I, we can join x and u by a path in X since A i0 is path-connected. Proposition 1.4.26 implies that ⋃i∈I A i is path-connected. Proposition 1.4.31. If {X i }i∈I is an arbitrary family of nonempty, path-connected topological spaces, then X = ∏i∈I X i endowed with the product topology is path-connected as well. Proof. Let x = (x i ), u = (u i ) ∈ X. For each i ∈ I, X i is path-connected so we can find a path σ i with initial point x i and final point u i . Then σ = (σ i ) is a path in X joining x and u; see Proposition 1.3.4. Hence, X is path-connected as well. Definition 1.4.32. A path component of a topological space is a maximal pathconnected subset C of X. That is, C is path-connected and it is not properly contained in a path-connected subset of X. Remark 1.4.33. Path components have almost the same properties as components. So every x ∈ X belongs to exactly one path component denoted by P(x). If x ≠ x , then P(x) ∩ P(x ) = 0 or P(x) = P(x ). Every path-connected set C ⊆ X is contained in a path component and X is path-connected if and only if X has only one path component. Note that we said almost the same properties. The reason for this, in contrast to components, is that path components need not be closed. Consider the topologist’s sine curve S = A ∪ C, see Example 1.4.15. Then A and C are the path components of S but C is not closed; recall that C = S. A path component of X is a subset of some component of X. Connectedness and path-connectedness are global topological properties since they concern the whole topological space. Local topological properties concern the structure of the space near a particular point, if we recall the notion of first countability; see Definition 1.2.20(a). In the next definition we provide local versions of the notions of connectedness and of path-connectedness.

32 | 1 Basic Topology Definition 1.4.34. A topological space X is said to be locally connected (resp. locally path-connected) if for every x ∈ X and every U ∈ N(x) we can find a connected (resp. path-connected) V ∈ N(x) such that V ⊆ U. Remark 1.4.35. Equivalently X is locally connected (resp. locally path-connected) if and only if every x ∈ X has a local basis consisting of connected (resp. path-connected) sets. A space can be connected (resp. path-connected) without being locally connected (resp. locally path-connected). Consider the topologist’s sine curve (see Example 1.4.15), which is connected but not locally connected. Of course local connectedness (resp. local path-connectedness) does not imply connectedness (resp. path-connectedness). Consider the union of two disjoint, closed balls in ℝN . Proposition 1.4.36. A topological space X is connected if and only if for each open set U ⊆ X each component of U is open. Proof. ⇒: Let C be a component of the open set U ⊆ X. Given x ∈ C we can find a connected open set V x ⊆ U with x ∈ V x . We have V x ⊆ C and since x ∈ C was arbitrary, we conclude that C is open. ⇐: Let x ∈ X and let U ∈ N(x). Then by hypothesis the component C of U containing x is open and so X is locally connected. Corollary 1.4.37. If a topological space X is locally connected then every component of X is open (and closed). Proposition 1.4.38. If X is a topological space, then the following statements are equivalent: (a) Every path component of X is open, hence closed as well. (b) Every point of X has a path-connected neighborhood. Proof. (a) ⇒ (b): Let x ∈ X and let C(x) be the path component containing x. By hypothesis C(x) is open and so X is locally path-connected. (b) ⇒ (a): Let C be a path component and x ∈ C. By hypothesis we can find a path-connected U ∈ N(x). Hence, U ⊆ C and since x ∈ C is arbitrary we conclude that C is open. Note that X \ C is the union of the remaining open path components, as we just proved, and it is open, so C is closed.

We saw that path-connectedness is stronger than connectedness; see Proposition 1.4.24. The next proposition provides conditions for the two notions to be equivalent. Proposition 1.4.39. A topological space X is path-connected if and only if X is connected and every x ∈ X has a path-connected neighborhood. Proof. ⇒: This follows from Proposition 1.4.24 and the fact that X is a neighborhood of every x ∈ X, and by hypothesis it is path-connected.

1.4 Connectedness and Compactness | 33

⇒: According to Proposition 1.4.38 every path component of X is open and closed in X. Since X is connected, it follows that it has only one path component, and hence X is path-connected. Corollary 1.4.40. An open subset of ℝn is connected if and only if it is path-connected. Remark 1.4.41. The corollary above fails for nonopen sets in ℝn . To see this, consider the topologist’s sine curve. Now we pass to another fundamental topological notion, namely the notion of compactness. This concept is an abstraction to general topological spaces of a property of closed and bounded intervals, cf. the Heine–Borel Theorem. Compactness does not mean only small in size. It is more than that. For example the intervals [0, 1] and (0, 1) have the same size but [0, 1] is compact while (0, 1) is not. Compactness is important in analysis since it combines well with continuity. Definition 1.4.42. Let X be a Hausdorff topological space. We say that X is compact if every open cover admits a finite subcover; see Definition 1.2.26. A subset A ⊆ X is compact provided A, endowed with the relative subspace topology, is compact. Remark 1.4.43. Since compact subsets of a non-Hausdorff space need not be closed (a rather awkward situation), we have included in the definition of compactness that X is Hausdorff. Since relatively open sets in A are of the form U ∩ A with U ⊆ X open, the definition of compactness of A ⊆ X takes the following form: “A ⊆ X is compact if and only if every open cover of A by open sets in X admits a finite subcover.” Definition 1.4.44. Let X be a set and L ⊆ 2X \ {0}. We say that L has the finite intersection property if every finite subcollection of L has a nonempty intersection. Proposition 1.4.45. Let X be a Hausdorff topological space. The following statements are equivalent: (a) X is compact. (b) Every family of nonempty, closed subsets of X with the finite intersection property has a nonempty intersection. (c) Every net in X has a convergent subnet in X. Proof. (a) ⇒ (b): Let L be a family of nonempty, closed subsets of X with the finite intersection property. If ⋂C∈L C = 0, then X = ⋃C∈L (X \ C) and so {X \ C}C∈L is an open cover of X. The compactness of X implies that we can find a finite subcover such that X = ⋃nk1 (X \ C k ) with n ∈ ℕ. Then ⋂nk=1 C k = 0, contradicting the fact that L has the finite intersection property. (b) ⇒ (a): Let D be an open cover of X. Then X = ⋃U∈D U and so ⋂U∈D (X \ U) = 0. This means that the finite intersection property does not hold for the collection {X \ U}U∈D and so we can find {U k }nk=1 ⊆ D such that ⋂nk=1 (X \ U k ) = 0. Hence, X = ⋃nk=1 U k and so we conclude that X is compact.

34 | 1 Basic Topology (b) ⇒ (c): Let {x i }i∈I be a net in X. Let A α = {x i }i≥α with α ∈ I. Then {A α }α∈I is a family of nonempty, closed subsets of X with the finite intersection property. So, by hypothesis we can find x ∈ ⋂α∈I A α . Evidently, x is a cluster point of {x i }i∈I . So, using Proposition 1.2.36 we can find a subnet of {x i }i∈I converging to x ∈ X. (c) ⇒ (b): Let L be a family of nonempty, closed subsets of X with the finite intersection property. Let F be the family of all finite intersections of members of L. Then F has the finite intersection property and since L ⊆ F it suffices to show that ⋂D∈F D ≠ 0. Since the intersection of two elements in F is again an element of F, we see that F is directed. Let x D ∈ D with D ∈ F. Then {x D }D∈F ⊆ X is a net and so by hypothesis it has a cluster point x. Then x ∈ D for all D ∈ F and so ⋂D∈F D ≠ 0. Proposition 1.4.46. If X is a compact topological space and C ⊆ X is closed, then C is compact. Proof. Let L be a cover of C by sets open in X. Then L0 = L ∪ (X \ C) is an open cover of X. Since X is compact, L0 has a finite subcover {U k , X \ A}nk=1 with U k ∈ L. Then C ⊆ ⋃nk=1 U k and so C is closed; see Remark 1.4.43. Proposition 1.4.47. If X is a Hausdorff topological space and C ⊆ X is compact, then C is closed. Proof. Let {x i }i∈I ⊆ C be a net such that x i → x. Since X is compact, we can find a subnet {u α }α∈I such that u α → x ∈ C; see Propositions 1.4.45 and 1.2.40. Therefore, we conclude that C ⊆ X is compact. Corollary 1.4.48. If X is a compact topological space and A ⊆ X, then A is compact if and only if A is closed. Proposition 1.4.49. If X is Hausdorff topological space and K1 , K2 are compact, disjoint subsets of X, then we can find open U, V ⊆ X such that K1 ⊆ U, K2 ⊆ V and U ∩ V = 0. Proof. First assume that K1 = {u} is a singleton. Then for each x ∈ K2 we can find open sets U x , V x ⊆ X such that u ∈ U x , x ∈ V x and U x ∩ V x = 0 because X is Hausdorff. Then {V x }x∈K2 is an open cover of K2 . The compactness of K2 implies that we can find a finite subcover {V x k }nk=1 . Let n

U = ⋂ U xk k=1

n

and

V = ⋃ V xk . k=1

Both are open sets in X, u ∈ U and K2 ⊆ V. So, we have proven the proposition when K1 is a singleton. Now consider the case of a general compact set K1 ⊆ X. From the previous part of the proof we know that for every u ∈ K1 we can find open U u , V u ⊆ X such that u ∈ U u , K1 ⊆ V u and U u ∩ V u = 0. Note that {U u }u∈K1 is an open cover of K1 and so by

1.4 Connectedness and Compactness | 35

the compactness we can find a finite subcover {U u k }nk=1 . Set n

n

U = ⋃ U xk

V = ⋂ V xk .

and

k=1

k=1

Then both are open sets in X, K1 ⊆ U, K2 ⊆ V and U ∩ V = 0. Corollary 1.4.50. A compact topological space is normal. The next result is one of the main theorems on compactness. Theorem 1.4.51. If X, Y are Hausdorff topological spaces, K ⊆ X is compact, and f : X → Y is continuous, then f(K) ⊆ Y is compact. Proof. Let {V i }i∈I be an open cover of f(K). Then {f −1 (V i )}i∈I is an open cover of K. The compactness of K implies the existence of a finite subcover {f −1 (V i k )}nk=1 , that is K ⊆ ⋃nk=1 f −1 (V i k ). Hence n

n

n

f(K) ⊆ f ( ⋃ f −1 (V i k )) = ⋃ f(f −1 (V i k )) ⊆ ⋃ V i k . k=1

k=1

k=1

Therefore, f(K) is compact. In ℝ the compact sets are closed and bounded; see the Heine–Borel Theorem. So, Theorem 1.4.51 yields the following result known as the “Weierstraß-Theorem.” Theorem 1.4.52 (Weierstraß Theorem). If X is a compact topological space and f : X → ℝ is continuous, then there exist x0 , x̂ ∈ X such that f(x0 ) = inf[f(x) : x ∈ X]

and

f(x)̂ = sup[f(x) : x ∈ X] .

Remark 1.4.53. In addition, Theorem 1.4.51 implies that compactness is a topological property. Theorem 1.4.54. If X, Y are Hausdorff topological spaces, X is compact and f : X → Y is a continuous bijection, then f is a homeomorphism. Proof. Let C ⊆ X be closed. Then C is compact because of Corollary 1.4.48. Taking into account Theorem 1.4.51, we conclude that f(C) ⊆ Y is compact, hence closed as well; see Proposition 1.4.47. Therefore, f is a closed function and then by Proposition 1.1.42, f is a homeomorphism. Compactness is preserved by Cartesian products. This is the celebrated “Tychonoff’s Product Theorem.” To prove this result, we need some preliminary material. First we present three statements of set theory that are equivalent. Axiom of Choice: Let K be any set-valued map on a set X such that K(x) ≠ 0 for all x ∈ X. Then there is a function k on X such that k(x) ∈ K(x) for all x ∈ X.

36 | 1 Basic Topology Zorn’s Lemma: Let (X, ≤) be a partially ordered set such that for every chain C ⊆ X there is an upper bound û ∈ X, that is, x ≤ û for all x ∈ C. Then X has a maximal element, that is, there exists x0 ∈ X such that there is no v ∈ X with x0 < v; see Definition 1.2.30(c). Hausdorff Maximal Principle: For every partially ordered set (X, ≤) there is a maximal chain C ⊆ X. Lemma 1.4.55. If (X, τ) is a Hausdorff topological space and L0 is a collection of subsets of X with the finite intersection property, then there exists a maximal collection L of subsets of X with the finite intersection property and containing L0 . Moreover, finite intersections of elements in L are again in L and every subset of X intersecting every set in L is in L. Proof. The family of all collections of sets in X with the finite intersection property and containing L0 is partially ordered by inclusion. Therefore, the Hausdorff Maximal Principle implies the existence of a maximal chain C. Let L = ⋃a∈C a. Let {A k }nk=1 ⊆ L. It belongs to at most n-collections a k and {a k }nk=1 is linearly ordered. So, there is a collection a n that contains the others. Hence, A k ∈ a n for all k = 1, . . . , n and ⋂nk=1 A k ≠ 0 because of the finite intersection property. Thus, L has the finite intersection property. Note again that L is maximal. Let L be the collection of all finite intersections of sets in L. Then L0 ⊆ L and it has the finite intersection property. Hence, by maximality L = L. Finally, let A ⊆ X be such that A ∩ D ≠ 0 for all D ∈ L. Then the collection L = L ∪ {A} has the finite intersection property and contains L0 . Therefore, by the maximality, A ∈ L. We will use this lemma to prove “Tychonoff’s Product Theorem.” Theorem 1.4.56 (Tychonoff’s Product Theorem). If {(X i , τ i )}i∈I are compact topological spaces, then X = ∏i∈I X i endowed with the product topology is compact. Proof. Let L0 be a collection of closed sets in X with the finite intersection property and let L be the maximal collection postulated by Lemma 1.4.55. Note that while the elements of L0 are closed, those of L need not be closed. We will show that ⋂ D ≠ 0 .

(1.4.1)

D∈L

For each i ∈ I, let Li be the i-projection of L, that is, Li = {p i (D) : D ∈ L}. The elements of this collection need not be open nor closed. However, since L has the finite intersection property, it follows that so does Li . Then Li = {p i (D) : D ∈ L} has a nonempty intersection; see Proposition 1.4.45. Let x i ∈ ⋂D∈L p i (D) ⊆ X i and x = (x i ) ∈ X. We claim that x ∈ D for all D ∈ L.

1.4 Connectedness and Compactness | 37

Let U ∈ N(x). Then from the definition of the product topology we know that we can find i1 , . . . , i n ∈ I, and U i k ∈ τ i k with k = 1, . . . , n such that n

x ∈ ⋂ p−1 i k (U i k ) ⊆ U . k=1

Note that x i k ∈ U i k ∩ Li k , hence U i k ∩ Li k ≠ 0. Therefore, p−1 i k (U i k ) ∩ L ≠ 0. Thus, n −1 Lemma 1.4.55 implies that p−1 (U ) ∈ L. Hence p (U ) ⋂ ik i k ∈ L. We conclude that k=1 i k ik (1.4.1) holds and this implies that X is compact; see Proposition 1.4.45. Let us now introduce some generalizations of the notion of compactness. Definition 1.4.57. Let (X, τ) be a Hausdorff topological space. (a) We say that X is countably compact if every countable open cover has a finite subcover. (b) We say that X is limit point compact (or that is has the Bolzano–Weierstraß property) if every sequence {x n }n≥1 ⊆ X has at least one cluster point. (c) We say that X is sequentially compact if every sequence has a τ-convergent subsequence. Remark 1.4.58. Clearly, “Compactness” implies “Countable Compactness” and “Sequential Compactness” implies “Limit Point Compactness.” In general both implications are not reversible. Combining Definition 1.4.57 and Proposition 1.4.45 gives the following result. Proposition 1.4.59. A Hausdorff topological space (X, τ) is countably compact if and only if every countable family of closed sets with the finite intersection property has a nonempty intersection. Proposition 1.4.60. A Hausdorff topological space (X, τ) is countably compact if and only if it is limit point compact. Proof. ⇒: Let {x n }n≥1 ⊆ X and define A m = {x n }n≥m with m ∈ ℕ. Then {A m }m≥1 are closed sets with the finite intersection property. So, ⋂m≥1 A m ≠ 0 by Proposition 1.4.59. Any x ∈ ⋂m≥1 A m ≠ 0 is a cluster point of the sequence. Therefore, X is limit point compact. ⇐: Let {C n }n≥1 be closed sets in X with the finite intersection property. Let x n ∈ n ⋂k=1 C k with n ∈ ℕ. The limit point compactness of X implies that {x n }n≥1 has at least one cluster point x. Then x ∈ {x n }n≥1 ⊆ ⋂n≥1 C n = ⋂n≥1 C n ≠ 0. Using Proposition 1.4.59, this implies that X is countably compact. Corollary 1.4.61. “Sequential Compactness” implies “Countable Compactness.” The reverse assertion is true under some additional assumptions. Proposition 1.4.62. If (X, τ) is a Hausdorff topological space that is first countable and countably compact, then X is sequentially compact.

38 | 1 Basic Topology Proof. Let {x n }n≥1 ⊆ X and x ∈ {x n }n≥1 . Let {U k }k∈ℕ ⊆ N(x) such that U k+1 ⊆ U k for all k ∈ ℕ. Recall that X is first countable. Choose x m ∈ U m ∩ {x n }n≥1 with m ∈ ℕ. Then {x m }m≥1 is a subsequence of {x n }n≥1 τ-converging to x. Therefore, X is sequentially compact. This proposition together with Lindelöf’s Theorem (see Theorem 1.2.27), gives the following result. Theorem 1.4.63. If (X, τ) is a Hausdorff topological space that is second countable, then the following statements are equivalent: (a) X is compact. (b) X is countably compact. (c) X is limit point compact. (d) X is sequentially compact. Next we introduce a modification of compactness to a local property. Definition 1.4.64. A Hausdorff topological space (X, τ) is said to be locally compact if for every x ∈ X there exists U ∈ N(x) such that U is compact. Remark 1.4.65. A set A ⊆ X such that A is compact is said to be relatively compact (or precompact). The space ℝN with the Euclidean topology is locally compact but not compact. Recall the Heine–Borel Theorem, which says that A ⊆ ℝN is compact if and only if A is closed and bounded. Bounded means that there exists r > 0 such that A ⊆ B r = {u ∈ ℝN : |u| ≤ r}. Proposition 1.4.66. Let (X, τ) be a Hausdorff topological space. The following statements are equivalent: (a) X is locally compact. (b) For every x ∈ X and every U ∈ N(x) there is a relatively compact V ∈ N(x) such that x ∈ V ⊆ V ⊆ U. (c) For every compact K and U ∈ τ such that U ⊇ K, there exists a relatively compact V ∈ τ such that K ⊆ V ⊆ V ⊆ U. (d) X has a basis consisting of relatively compact open sets. Proof. (a) ⇒ (b): Let x ∈ X and U ∈ N(x). Taking into account the local compactness of X we find W ∈ N(x) such that W is compact. Corollary 1.4.50 implies that W endowed with the relative topology is regular. Then W ∩ U is a neighborhood of x in W. Proposition 1.2.8 implies the existence of an open set D ⊆ W such that x∈D⊆D W

W

⊆W∩U,

where D denotes the closure of D in the relative topology of W. We have D = S ∩ W with S ∈ τ. Let V = S ∩ W ∈ N(x). This is the desired neighborhood of x. (b) ⇒ (c): Let K ⊆ X be compact and U ∈ τ such that U ⊇ K. For every x ∈ K we can find V x ∈ N(x) relatively compact such that x ∈ V x ⊆ V x ⊆ U. Evidently {V x }x∈K is

1.4 Connectedness and Compactness | 39

an open cover of K and so using compactness we can find a finite subcover {V x }nk=1 . Then V = ⋃nk=1 V x k ∈ τ, V is compact and K ⊆ V ⊆ V ⊆ U. (c) ⇒ (d): Let B = {U ∈ τ : U is compact}. Then since {x} is compact, assertion (c) implies that B is a basis; see Corollary 1.1.6. (d) ⇒ (a): This is obvious. Proposition 1.4.67. If (X, τ) is a Hausdorff, second countable, locally compact topological space, then X has a countable basis consisting of relatively compact open sets. Proof. Let {U n }n≥1 be a basis of X. Fix n ∈ ℕ and let {V x }x∈U n be an open cover of U n such that V x is compact and V x ⊆ U n for all x ∈ U n ; see Proposition 1.4.66. From Proposition 1.2.24(b) we know that U n is second countable. So, Lindelöf’s Theorem (see Theorem 1.2.27) implies that we can find a countable subcover {V kn }k≥1 of U n . Then the family B = {V kn : n, k ∈ ℕ} is a countable basis of X consisting of relatively compact open sets. The next proposition places more precisely locally compact spaces in the chart of topological spaces. Proposition 1.4.68. Every locally compact topological space is completely regular; see Definition 1.2.19. Proof. Let x ∈ X and C ⊆ X be a closed set such that x ∈ ̸ C. Applying Proposition 1.4.66(c) yields the existence of relatively compact sets V1 , V2 ∈ τ such that x ∈ V1 ⊆ V 1 ⊆ V2 ⊆ V 2 ⊆ U = X \ C . The set V 2 is compact, and hence normal; see Corollary 1.4.50. Then, Urysohn’s Lemma on normality (see Theorem 1.2.17) implies the existence of a continuous function f : V 2 → [0, 1] such that f V 2 \V1 = 0 and f(x) = 1. Let {f(x) if x ∈ V 2 , f ̂(x) = { 0 if x ∈ X \ V 2 . { According to Proposition 1.1.37, f ̂ is continuous and f ̂A = 0 and f ̂(x) = 1. Hence, X is completely regular. Proposition 1.4.69. Local compactness is preserved by continuous open surjections. Proof. Let X, Y be Hausdorff topological spaces with X locally compact and f : X → Y being a continuous, open surjection. Let y ∈ Y and choose x ∈ X such that f(x) = y. Then there exists U ∈ N(x) being relatively compact. Since f is open, f(U) ∈ N(y) and f(U) ⊆ Y is compact; see Theorem 1.4.51. Finally we have y ∈ f(U) ⊆ f(U) = f(U) with f(U) being compact. Therefore, Y is locally compact as well. Of course every compact space is locally compact. In fact the following proposition is easy to prove.

40 | 1 Basic Topology Proposition 1.4.70. If (X, τ) is a locally compact topological space, U ∈ τ and C ⊆ X is closed, then U ∩ C endowed with the relative topology is locally compact. Proof. Let x ∈ U ∩ C. Choose V ∈ N(x) relatively compact such that x ∈ V ⊆ V ⊆ U. Then V ∩ (U ∩ C) is a neighborhood of x in the relative topology of U ∩ C. It holds V ∩ (U ∩ C)

τ(U∩C)

= V ∩ (U ∩ C) = V ∩ C

and the latter is closed in V, hence compact. Therefore, U ∩ C is locally compact. Corollary 1.4.71. Every open subset and every closed subset of a locally compact space is locally compact for the relative topology. We ask the natural question of when we can consider a Hausdorff topological space as a subspace of a compact topological space. Local compactness is the right concept for answering this question. Definition 1.4.72. Let X be a Hausdorff topological space. A compactification of X is a compact topological space Y such that X is homeomorphic to a dense subset of Y. So we may think that X is an actual dense subset of Y. Proposition 1.4.73. If (X, τ) is a Hausdorff topological space and (X,̂ τ)̂ a compactification of X, then X is locally compact if and only if X ∈ τ.̂ Proof. ⇒: Let x ∈ X and choose U ∈ NX (x) relatively compact. We can find V ∈ NX (x) such that x ∈ V ⊆ U. We have V = W ∩ X with W ∈ NX̂ (x) and W = W ∩ X̂ = W ∩ X ⊆ W ∩ X = V ⊆ U = U ⊆ X . ̂ This implies that x is τ-interior in X, hence X ∈ τ.̂ ⇐: We know that (X,̂ τ)̂ is compact, hence locally compact. Since X ∈ τ̂ we conclude from Corollary 1.4.71 that X must be locally compact. The simplest compactification of noncompact, locally compact topological spaces is the so-called “Alexandrov one-point compactification.” Definition 1.4.74. Let X be a Hausdorff topological space and ∞ an object not in X, called the point at infinity. Let X̂ = X ∪ {∞} and define a topology τ̂ on X̂ specifying the following open sets: (a) τ ⊆ τ;̂ (b) X̂ \ K with K ⊆ X compact; (c) X.̂ Then we say that (X,̂ τ)̂ is the one-point compactification of X. Theorem 1.4.75. If X̂ = X ∪ {∞} is as in Definition 1.4.74 and is endowed with the topology τ̂ and (X, τ) is not compact, then (X,̂ τ)̂ is a compactification of X and X̂ is Hausdorff if and only if X is locally compact.

1.4 Connectedness and Compactness |

41

Proof. First we show that (X,̂ τ)̂ is compact. So, let L be an open cover of X.̂ Then L must have a member U such that ∞ ∈ U. Then by Definition 1.4.74, X̂ \ U is compact and so it has a finite subcover {U k }nk=1 ⊆ L. Evidently {U k , U}nk=1 ⊆ L is a finite open cover of X̂ and so we conclude that (X,̂ τ)̂ is compact. It is easy to see from Definition 1.4.74 that τ̂ X = τ, that is, the subspace topology of X ⊆ X̂ is τ. Since X is not compact, each ̂ of ∞, X̂ \ K with K compact must intersect X. Hence ∞ is a limit point τ-neighborhood of X and so X̂ = X. This proves that (X,̂ τ)̂ is a compactification of X. Suppose now that X̂ is Hausdorff and let x ∈ X. We can find U, V ∈ τ̂ such that ∞ ∈ U, x ∈ V and U ∩ V = 0. This implies V ⊆ X̂ \ U = K with K compact; see Definition 1.4.74. Therefore, X is locally compact. Conversely, suppose that X is locally compact. Let x ∈ X and choose V ∈ τ such that x ∈ V ⊆ V with V compact. Let U = X̂ \ V. Then ∞ ∈ U, x ∈ V and U ∩ V = 0. Hence, X̂ is Hausdorff. Example 1.4.76. The Alexandrov compactification of ℝn is the n-sphere S n = {u ∈ ℝn+1 : |u| = 1}. To see this, let N = (0, 0, . . . , 0, 1) ∈ ℝn+1 be the north pole. We define the stereographic projection h : S n \ {N} → ℝn by h ((u k )n+1 k=1 ) =

(u k )n+1 k=1 . 1 − u n+1

This map sends a point u ∈ S n \ {N} to a point x ∈ ℝn where the line from N to x intersects ℝn . It is a homeomorphism with inverse map h−1 ((x k )nk=1 ) =

((2x k )nk=1 , |x|2 − 1) . |x|2 + 1

Therefore, S n \ {N} is homeomorphic to ℝn . Then h extends to a homeomorphism of S n with the Alexandrov compactification ℝ̂ n of ℝn . We can easily visualize the stereographic projection when n = 1 (Fig. 1.1). N=∞

0

Fig. 1.1: Alexandrov one-point compactification of ℝn .

This map was known to map makers long ago. From the discussion above we see that by removing a single point from S n we obtain a space homeomorphic to ℝn . Which

42 | 1 Basic Topology point we remove is irrelevant because we can rotate any point of S n into any other. For convenience we remove the north pole N. Definition 1.4.77. A Hausdorff topological space X is said to be σ-compact if it can be expressed as the union of at most countably many compact spaces. Proposition 1.4.78. Let (X, τ) be a Hausdorff topological space. The following statements are equivalent: (a) X is locally compact and σ-compact. (b) X = ⋃k≥1 U k with U k open, relatively compact such that U k ⊆ U k+1 with k ∈ ℕ. (c) X is locally compact and Lindelöf. Proof. (a) ⇒ (b): By hypothesis we have X = ⋃k≥1 K k with K k ⊆ X compact. Proposition 1.4.66(c) says that we can find U1 ⊇ K1 open and relatively compact. By induction we can find U k open, relatively compact such that U k ⊇ U k−1 ∪ K k . Then {U k }k≥1 is the desired sequence of open sets. (b) ⇒ (c): Let L = {U i }i∈I be an open cover of X. For each m ∈ ℕ we can find k }n(m) ⊆ L that covers U = compact. The family {U k : 0 ≤ k ≤ a finite subfamily {U m i m k=1 n(m), m ∈ ℕ} ⊆ L is a countable subcover; thus X is Lindelöf. (c) ⇒ (a): Let L = {U x }x∈X be a cover by relatively compact open sets; see Proposition 1.4.66(c). The Lindelöf property implies that we can extract a countable subcover. Therefore, X is σ-compact. We introduce a generalization of σ-compactness that is determined by some requirement on the behavior of their coverings. Definition 1.4.79. Let X be a Hausdorff topological space. (a) Given two covers L = {U i }i∈I and L = {V j }j∈J of X. We say that L is a refinement of L if for each i ∈ I there is a j ∈ J such that U i ⊆ V j . We write L ≺ L . (b) We say that a cover L = {U i }i∈I of X is locally finite if for every x ∈ X there exists V ∈ N(x) that intersects a finite number of U i ’s. (c) We say that the cover L = {U i }i∈I of X is point finite if for every x ∈ X there are at most finitely many indices i ∈ I such that x ∈ U i . Remark 1.4.80. Given two covers L = {U i }i∈I and L = {V j }j∈J of X we can define L0 = {U i ∩ V j : (i, j) ∈ I × J}, which is also a cover of X refining both L and L . Moreover, if both L and L are locally finite (resp. point finite), then so is L0 . A common refinement of both L and L is also a refinement of L0 . A refinement of a cover may contain more elements than the given cover. Definition 1.4.81. A refinement L = {U i }i∈I of the cover L = {V j }j∈J is said to be precise if I = J and U i ⊆ V i for all i ∈ I. Proposition 1.4.82. If X is a Hausdorff topological space and the cover L = {V j }j∈J of X has a locally finite (resp. point finite) refinement L = {U i }i∈I , then it has a precise locally finite (resp. point finite) refinement L̂ = {Û j }j∈J . Moreover, if L is open, then so is L.̂

1.4 Connectedness and Compactness |

43

Proof. Let ξ : I → J be the map that assigns to each i ∈ I a j ∈ J such that U i ⊆ V j ; see Definition 1.4.79(a). For every j ∈ J let Û j = ⋃{U i : ξ(i) = j} (some Û j may be empty). Then Û j ⊆ V j for every j ∈ J and L̂ = {Û j }j∈J is a cover of X. Clearly, L̂ is locally finite (resp. point finite) if L is and it is open if L is open. Definition 1.4.83. A Hausdorff topological space X is said to be paracompact if each open cover of X admits a locally finite refinement. An immediate consequence of this definition is the following result. Proposition 1.4.84. Every compact topological space is paracompact. Closely related to paracompactness is the notion of partition of unity, which is essentially a variable convex combination. Definition 1.4.85. Let X be a Hausdorff topological space and f : X → ℝ a function. (a) The support of f is the closed set supp f := {x ∈ X : f(x) ≠ 0}. (b) A partition of unity on X is a family {f i }i∈I of continuous functions f i : X → [0, 1] such that (i) {supp f i }i∈I form a locally finite closed cover of X; (ii) ∑i∈I f i (x) = 1 (the sum is well-defined because of (i)). If L = {V j }j∈J is an open cover of X, then we say that a partition of unity {f j }j∈J is subordinated to L if supp f j ⊆ V j for each j ∈ J. There is a close relation between paracompactness and partition of unity. The proof of the following theorem is very technical and so it is omitted. We refer to Dugundji [91, Theorem 4.2, p. 170]. Theorem 1.4.86. A Hausdorff topological space is paracompact if and only if every open cover on X admits a locally finite partition of unity subordinated to the open cover. This theorem allows us to fix the place of paracompactness in the chart of topological spaces. Proposition 1.4.87. Every paracompact space is normal. Proof. Let C1 and C2 be two disjoint, closed subspaces of X. We consider the open cover L = {X \ C1 , X \ C2 }. Then Theorem 1.4.86 implies that there is a partition of unity {f1 , f2 } subordinated to L. Then f1 C2 = 1 and f1 C1 = 0 and so by Urysohn’s Normality Lemma (see Theorem 1.2.17) we conclude that X is normal. Closing this section, we mention that there is a “locally compact” version of the Tietze Extension Theorem; see Theorem 1.2.44. This version of the Tietze result reads as follows; see Hewitt–Stromberg [145, Theorem 7.40, p. 99]. Theorem 1.4.88. If X is locally compact, K ⊆ X is a nonempty, compact set and U ⊆ K is open and K ⊆ U, then for every f ∈ C(K, ℝ) there exists f ̂ ∈ C(X, ℝ) with compact support such that f ̂K = f and f vanishes on X \ U.

44 | 1 Basic Topology

1.5 Metric Spaces – Baire Category Metric spaces are a very important class of topological spaces. In fact the development of metric spaces led to the more general notion of topological space. In metric spaces the metric leads to an analysis that is primarily based in the properties of the real line. Definition 1.5.1. Let X be a set. A metric on X is a map d : X × X → ℝ such that the following hold: (a) d(x, u) = 0 if and only if x = u; (b) d(x, u) = d(u, x) for all x, u ∈ X (symmetry); (c) d(x, u) ≤ d(x, v) + d(v, u) for all x, u, v ∈ X (triangle inequality). The pair (X, d) of a set X and of a metric d on X is said to be a metric space. If d does not satisfy (a), then d is called a semimetric (in French “ecart”) and (X, d) is a semimetric space. Remark 1.5.2. If d is a metric, then, based on (a)–(c), it is clear that d(x, y) ≥ 0 for all x, y ∈ X. If d is a semimetric and ∼ is the equivalence relation defined by x ∼ u if and ̂ only if d(x, u) = 0, then X/ ∼ is a metric space with metric d([x], [u]) = d(x, u). Here, for x ∈ X, [x] is the corresponding equivalence class. Definition 1.5.3. (a) Let (X, d) be a metric space and A ⊆ X. The diameter of A is defined by diam A = sup[d(x, u) : x, u ∈ A] . If diam A < ∞, then we say that A is bounded. Otherwise A is unbounded. When diam X < ∞, then we say that d is a bounded metric. In addition, for x ∈ X and r > 0, the open ball with center x and radius r is defined by B r (x) = {u ∈ X : d(u, x) < r} . The corresponding closed ball with center x and radius r is defined by B r (x) = {u ∈ X : d(u, x) ≤ r} . (b) Let (X, d) be a metric space. A set A ⊆ X is said to be d-open (or simply open) if for every x ∈ A we can find r = r(x) > 0 such that B r (x) ⊆ A. The collection τ d = {A ⊆ X : A is d-open} is a topology on X called the metric topology on (X, d). (c) A topological space (X, τ) is said to be metrizable if τ = τ d for some metric d on X. This metric is then said to be compatible with the topology. If for two metrics d1 and d2 on X, we have τ d1 = τ d2 , then we say that d1 and d2 are equivalent. Remark 1.5.4. The distinction between metric and metrizable spaces is a subtle one. In the case of a metric space we already have a fixed metric. For a metrizable space

1.5 Metric Spaces – Baire Category | 45

we have not yet decided from the multitude of equivalent metrics. Note that if d is ̂ compatible, then so is kd with k ∈ ℕ or d(x, u) = (d(x, u))(1 + d(x, u)) and d̂ 0 (x, u) = min{1, d(x, u)}. The last two metrics are bounded even if d is not. From the triangle inequality we have |d(x, u) − d(y, v)| ≤ d(x, y) + d(u, v) for all x, u, y, v ∈ X .

(1.5.1)

It follows that d is jointly continuous. Of course τ d is Hausdorff and first countable and τd u n → u if and only if d(u n , u) → 0. In Proposition 1.2.22 we saw that second countability implies separability. For metrizable spaces the two notions are equivalent. Proposition 1.5.5. A metrizable space is second countable if and only if it is separable. Proof. ⇒: This follows from Proposition 1.2.22. ⇐: Let (X, τ) be a separable metrizable space and d a compatible metric, that is, τ d = τ. Let D ⊆ X be a countable dense set and consider the collection L = {B1/n (x) : x ∈ D, n ∈ ℕ}. Clearly, L is a countable basis for the topology τ; see Corollary 1.1.6. Combining this proposition with Proposition 1.2.24(b) we have the following result. Corollary 1.5.6. If X is a separable metrizable space and A ⊆ X, then A is separable. Definition 1.5.7. Let (X, τ) be a topological space. A set A is said to be an F σ -set if it is the union of at most countably many closed sets. A set C is said to be a G δ -set if it is the intersection of at most countably many open sets. Proposition 1.5.8. If X is a metrizable space, then every closed set is G δ and every open set is F σ . Proof. Let C ⊆ X be closed. Then U n = {x ∈ X : d(x, C) < 1/n} is open because of the continuity of d. Furthermore C = ⋂n≥1 U n . So C is G δ . Next let U ⊆ X be open. Since X \ U is closed, the first part yields that X \ U = ⋂n≥1 U n with U n open. Hence, U = ⋃n≥1 (X \ U n ) and so U is F σ . Definition 1.5.9. (a) Let (X, d) be a metric space. A sequence {x n }n≥1 ⊆ X is said to be a Cauchy sequence if for any given ε > 0 there exists n0 = n0 (ε) ≥ 1 such that d(x n , x m ) ≤ ε for all n, m ≥ n0 , that is, d(x n , x m ) → 0 as n, m → +∞. We say that (X, d) is complete if every Cauchy sequence in X converges in X. (b) Let (X, τ) be a topological space. We say that X is topologically complete if there is a compatible complete metric d, that is, τ d = τ. Remark 1.5.10. The property of completeness is metric dependent. So it can happen that two metrics are equivalent, that is, they generate the same topology, but one is complete and the other not. On the other hand, topological completeness is a topological property.

46 | 1 Basic Topology Example 1.5.11. The interval (−1, 1) with the usual metric is not a complete metric space but it is topologically complete since it is homeomorphic to ℝ, which is complete. The function h : (−1, 1) → ℝ defined by h(x) = x/(1 − x2 ) for all x ∈ (−1, 1) is a homeomorphism between the two spaces. Definition 1.5.12. Let (X, d) and (Y, ρ) be two metric spaces. A map f : X → Y is said to be an isometry if d(x, u) = ρ(f(x), f(u)) for all x, u ∈ X. If f is a surjective isometry, then we say that X and Y are isometric spaces. Otherwise we say that f is an isometric embedding. Remark 1.5.13. Thus an isometric surjection is a distance preserving homeomorphism. In the case of an isometric embedding f : X → Y we may think of X as a subspace of Y. Every metric space can be isometrically and densely embedded in a complete metric space. Theorem 1.5.14. If (X, d) is any metric space, then there is a complete metric space (Y, ρ) and an isometry f : X → Y such that f(X) is dense in Y. We say that Y is the completion of X. Proof. Let f x (u) = d(x, u) for all x, u ∈ X. Choose a point v ∈ X and let S(X, d) = {f v + h : h ∈ Cb (X, ℝ)} . On S(X, d) we consider the supremum metric d∞ defined by ̂ : x ∈ X] . d∞ (f v + h, f v + h)̂ = sup [h(x) − h(x) For any x, u, y ∈ X we have |d(x, y) − d(u, y)| ≤ d(x, u) (see (1.5.1)) and equality holds if y = x or y = u. Therefore, for any u ∈ X, taking x = v, we have f u − f x ∈ Cb (X, ℝ) , d∞ (f x , f u ) = d(x, u) . In addition we have f u ∈ S(X, d) and S(X, d) does not depend on the choice of v ∈ X. Hence, the map x → f x from X into S(X, d) is an isometry for d and d∞ . Let Y be the d∞ -closure of the range of this map into S(X, d). But (Cb (X, ℝ), d∞ ) is complete; recall that the uniform limit of continuous functions is continuous. Hence (Y, d∞ ) is complete and this is the completion of (X, d). Now we can provide a necessary and sufficient condition for the completeness of a metric space. The necessary part of the result is known as “Cantor’s Intersection Theorem.” Theorem 1.5.15. A metric space (X, d) is complete if and only if every decreasing sequence {C n }n≥1 of nonempty, closed subsets of X such that diam C n → 0 as n → ∞, has a singleton intersection.

1.5 Metric Spaces – Baire Category |

47

Proof. ⇒: Let C = ⋂n≥1 C n . Then diam C ≤ diam C n for all n ∈ ℕ. Hence, diam C = 0. This means that C is empty or a singleton. We show that C ≠ 0. For each n ∈ ℕ we pick u n ∈ C n . Then for n ≥ m we have d(u n , u m ) ≤ diam C m → 0 as m → ∞. So {u n }n≥1 ⊆ X is a Cauchy sequence and the completeness of X implies that there exists u ∈ X such that u n → u. Evidently u ∈ C and so C = ⋂n≥1 C n = {u}. ⇐: Let {u n }n≥1 ⊆ X be a Cauchy sequence. Set C n = {u k : k ≥ n}. Since {u n }n≥1 is a Cauchy sequence, we have diam C n → 0. By hypothesis ⋂n≥1 C n = {u} and so we have u n → u in X, which means that X is complete. Now we consider the Cartesian product of metric spaces. To this end, let {X n }n≥1 be a sequence of nonempty Hausdorff topological spaces and let X = ∏n≥1 X n be furnished with the product topology. Proposition 1.5.16. The product topology on X = ∏n≥1 X n is metrizable if and only if the space X n is metrizable for each n ∈ ℕ. Proof. ⇒: Let d be a compatible metric for X. For each n ∈ ℕ we fix a y n ∈ X n . Then for u ∈ X m we define û = (u k )k≥1 ∈ X by setting u k = y k for k ≠ m and u m = u. Now ̂ It is easy to see that d m we define a metric d m on X m by setting d m (u, v) = d(u,̂ v). is indeed a metric on X m . Note that d-convergence in X is equivalent to componentwise convergence. From this it follows easily that τ d m coincides with the topology of X m . ⇐: Assume that each X n is metrizable and let d n be a compatible metric. We define a metric d on the product X by setting 1 d n (u n , v n ) . n 1 + d (u , v ) 2 n n n n≥1

d((u n ), (v n )) = ∑

It is straightforward that d is a metric. Let {û α }α∈J = {(u αn )}α∈J ⊆ X be a net. We have d(û α , u)̂ → 0 with û = (u n ) if and only if lim d n (u αn , u n ) = 0 , α∈J

(1.5.2)

for all n ∈ ℕ. From (1.5.2) we infer that the product topology and the τ d -topology on X coincide. In a similar fashion we can also have the following result. Proposition 1.5.17. The product topology on X is topologically complete if and only if the space X n is topologically complete for each n ∈ ℕ. Proposition 1.5.18. If {X n }n≥1 is a sequence of metrizable spaces and X = ∏n≥1 X n , then X is separable if and only if X n is separable for each n ∈ ℕ. Proof. ⇒: This is a consequence of the fact that the continuous image of a separable space is separable as well; see Proposition 1.2.24(c). In our case the continuous map is the projection to the nth factor.

48 | 1 Basic Topology ⇐: From the proof of Proposition 1.5.16 we know that the product topology on X is generated by the metric 1 d n (u n , v n ) n 1 + d (u , v ) 2 n n n n≥1

d(u,̂ v)̂ = ∑

for all û = (u n ), v̂ = (v n ) ∈ X .

For each n ∈ ℕ let D n be a countable, dense subset of X n . Fix u n ∈ D n for each n ∈ ℕ and consider the set D ⊆ X defined by D = {(y n ) ∈ X : y n ∈ D n for each n ∈ ℕ and y n = u n eventually} . Evidently D ⊆ X is countable and dense. Therefore X is separable. Definition 1.5.19. The Hilbert cube is the space ℍ = [0, 1]ℕ , that is, the space of all real sequences with values in [0, 1]. Remark 1.5.20. Evidently ℍ is topologically complete, separable, and compact, which follows from the Propositions 1.5.17 and 1.5.18 as well as Theorem 1.4.56. The next theorem, known as “Urysohn’s Theorem,” says that in a sense ℍ is the canonical separable metrizable space. Theorem 1.5.21 (Urysohn’s Theorem). Every separable metrizable space is homeomorphic to a subset of ℍ. Proof. Let (X, d) be a separable metric space and D = {y n }n≥1 a countable dense subset. We define ξ n (u) = min{1, d(u, y n )} for all n ∈ ℕ and consider ξ : X → ℍ defined by ξ(u) = (ξ n (u))n≥1 for all u ∈ X. Each ξ n is continuous, hence so is ξ . Suppose that ξ(u) = ξ(v) and let {y n k }k≥1 ⊆ {y n }n≥1 such that y n k → u. We have limk→∞ d(v, y n k ) = 0, hence d(v, u) = 0, which means that u = v and so ξ is 1 − 1. Finally we need to show that ξ −1 is continuous. To this end, let ξ(v n ) → ξ(v). Pick ε > 0 and u m such that d(v, u m ) < ε. Note that d(v n , u m ) → d(v, u m ) as n → ∞ , which means d(v n , u m ) < ε for all n ≥ n0 . Hence, by the triangle inequality we derive d(v n , v) < 2ε for all n ≥ n0 . Therefore, v n → v and so ξ −1 is continuous. Some features of metrizable spaces are not topological and depend on the particular compatible metric. Such are Cauchy sequences (see Definition 1.5.9(a)) and uniform continuity, which we are about to introduce. Definition 1.5.22. Let (X, d) and (Y, ρ) be two metric spaces and f : X → Y a map. (a) We say that f is uniformly continuous if for every given ε > 0 there exists δ = δ(ε) > 0 such that d(x, u) < δ

implies

ρ(f(x), f(u)) < ε

for all x, u ∈ X .

1.5 Metric Spaces – Baire Category |

49

(b) We say that f is k-Lipschitz if ρ(f(x), f(u)) ≤ kd(x, u) for all x, u ∈ X with k > 0 . Remark 1.5.23. A continuous function need not be uniformly continuous. For example, the function f(x) = x2 for x ∈ ℝ is continuous but not uniformly continuous. Indeed, note that for ε > 0 the δ > 0 gets smaller as |x| increases. A k-Lipschitz map is uniformly continuous. A 1-Lipschitz map is called nonexpansive and if k ∈ (0, 1) we say that f is a contraction. Proposition 1.5.24. If (X, d) is a metric space and φ : ℝ+ → ℝ+ is continuous satisfying (a) φ is nondecreasing, that is, x ≤ u implies φ(x) ≤ φ(u) for all x, u ≥ 0; (b) φ is subadditive, that is, φ(x + u) ≤ φ(x) + φ(u) for all x, u ≥ 0; (c) φ(x) = 0 if and only if x = 0, then φ ∘ d is a metric on X and the identity maps i1 : (X, d) → (X, φ ∘ d) and

i2 : (X, φ ∘ d) → (X, d)

are both uniformly continuous. Proof. Applying (a)–(c) it is straightforward to check that φ ∘ d is a metric on X. Moreover, for given ε > 0 there exists δ > 0 such that 0 ≤ t < δ implies 0 ≤ φ(t) < ε as well as 0 ≤ φ(t) < η = φ(ε) implies 0 ≤ t < δ. Here we have used the continuity and monotonicity of φ. Thus we have uniform continuity for both i1 and i2 . Proposition 1.5.25. If (X, d) and (Y, ρ) are two metric spaces and f : X → Y is uniformly continuous, then f maps Cauchy sequences in X to Cauchy sequences in Y. Proof. Let {u n }n≥1 be a Cauchy sequence in X, and for ε > 0 choose δ = δ(ε) > 0 such that d(x, v) < δ implies ρ(f(x), f(v)) < ε for all x, v ∈ X. Let B ⊆ X be a ball of radius less than δ/2, which contains {u n }n≥n0 for some n0 ∈ ℕ. Then f(B) contains {f(u n )}n≥n0 . Note that diam B < δ. Hence diam f(B) < ε. Thus f(B) is included in a ball D ⊆ Y of radius ε > 0 and so D ⊇ {f(u n )}n≥n̂ for some n̂ ∈ ℕ. Since ε > 0 is arbitrary, we conclude that {f(u n )}n∈ℕ ⊆ Y is a ρ-Cauchy sequence. Remark 1.5.26. The result above fails if f is only continuous. To see this consider the function f(x) = 1/x for all x ∈ (0, 1), which is continuous but not uniformly continuous. Let u n = 1/n with n ∈ ℕ. This is a Cauchy sequence in (0, 1) but f(u n ) = n, which is not a Cauchy sequence. Theorem 1.5.27. If (X, d) is a metric space, D ⊆ X a set, (Y, ρ) is a complete metric space and f : D → Y is uniformly continuous, then there exists a unique uniformly continuous map f ̂ : D → Y such that f ̂D = f . In particular, if Y = ℝ then supD |f| = supD |f|. Proof. Let ũ ∈ D. Then we find a sequence {u n }n≥1 ⊆ D such that u n → ũ in (X, d). The sequence {u n }n≥1 is a d-Cauchy sequence and then {f(u n )}n≥1 ⊆ Y is a ρ-Cauchy

50 | 1 Basic Topology sequence because of Proposition 1.5.25. The completeness of Y implies that f(u n ) → y ∈ Y. This y is independent of the particular sequence in D approaching ũ ∈ D. Indeed, let {x n }n≥1 ⊆ D be another sequence such that x n → ũ in (X, d). We define {x n hn = { u { n

if n = odd if n = even

with

n∈ℕ.

We see that h n → ũ and then f(h n ) → y. Note that {f(h n )}n≥1 is a Cauchy sequence and for the subsequence {f(u n )}n≥1 we have that it converges to y in (Y, ρ). Hence, we have shown that y is independent of the sequence u n → ũ ∈ D. Therefore, we can set f ̂(u)̃ = y. Now we show that f ̂ is uniformly continuous. From the uniform continuity of f we know that for given ε > 0 there exists δ > 0 such that d(x, u) < δ

implies

ρ(f(x), f(u)) < ε

for all x, u ∈ D .

(1.5.3)

Suppose x, v ∈ D with d(x, v) < δ. Then there exist {x n }n≥1 , {u n }n≥1 ⊆ D such that x n → x and v n → v in (X, d). Hence, d(x n , v n ) → d(x, v) and so d(x n , v n ) < δ for all n ≥ n0 . Taking (1.5.3) into account we conclude that ρ(f(x n ), f(v n )) < ε for all n ≥ n0 . Hence, ρ(f(x), f(v)) ≤ ε. This proves the uniform continuity of the extension f ̂. Clearly this extension is unique and we have supD |f| = supD |f ̂|. Definition 1.5.28. Let (X, d) be a metric space. Recall that Cb (X, ℝ) = {f : X → ℝ | f is bounded and continuous} . We also introduce the subspace Ub (X, ℝ) = {f : X → ℝ | f is bounded and uniformly continuous} of Cb (X, ℝ). On them we consider the supremum metric defined by d∞ (f, g) = sup |f(x) − g(x)| . x∈X

Remark 1.5.29. If X is a metrizable space and d, e are two compatible metrics, then in general we have U d (X, ℝ) ≠ U e (X, ℝ). For example, the function x → 1/x on (0, 1) is not uniformly continuous for the usual metric on (0, 1), but it is uniformly continuous for the metric ρ(x, u) = |1/x − 1/u| for all x, u ∈ (0, 1). Proposition 1.5.30. If (X, d) is a metric space, then X is isometrically embedded into U d (X, ℝ). Proof. We fix u0 ∈ X and then for each x ∈ X, let η x : X → ℝ be the function defined by η x (u) = d(x, u) − d(u0 , u) for all u ∈ X. We have |η x (u) − η x (v)| ≤ |d(x, u) − d(x, v)| + |d(u0 , u) − d(u0 , v)| ≤ 2d(u, v) ,

1.5 Metric Spaces – Baire Category | 51

which shows that η x is 2-Lipschitz. In addition we have η x (u) ≤ d(x, u0 ) for all u ∈ X. Thus, η x is bounded. Consequently we have η x ∈ U d (X, ℝ). Note that |η x (u) − η v (u)| ≤ d(x, v) for all u ∈ X , implying d∞ (η x , η v ) ≤ d(x, v). Moreover, we have |η x (v) − η v (v)| = d(x, v). Therefore, d∞ (η x , η v ) = d(x, v), which means that x → η x is an isometry. This proves that X is isometrically embedded into U d (X, ℝ). Now we turn our attention to compact metric spaces. Definition 1.5.31. Let (X, d) be a metric space and ε > 0. An ε-net in X is a finite set A in X such that X = ⋃a∈A B ε (a). That is, for every x ∈ X there exists a ∈ A such that d(x, a) < ε. We say that (X, d) is totally bounded if for every ε > 0 it has an ε-net. Remark 1.5.32. Clearly a compact metric space is totally bounded. Proposition 1.5.33. If the metric space (X, d) is totally bounded, then it is separable. Proof. For each n ∈ ℕ, let A n ⊆ X be a finite set such that X = ⋃x∈A n B1/n (x). Let D = ⋃n≥1 A n . Then D is countable and dense in X. Proposition 1.5.34. If (X, d) is a sequentially compact metric space and let L be an open cover of X, then there is a δ > 0 such that every A ⊆ X with diam A < δ is contained in some U ∈ L. Proof. Arguing by contradiction, suppose that we cannot find such a δ > 0. Then for every n ∈ ℕ choose A n ⊆ X with diam A n < 1/n and A n is not contained in any U ∈ L. Choose x n ∈ A n . Since X is sequentially compact, by passing to a subsequence if necessary, we may assume that x n → x. Let U ∈ L ∩ N(x) and choose ϱ > 0 such that B ϱ (x) ⊆ U. Then x n ∈ B ϱ/2 (x) for all n ≥ n0 with 1/n0 < ϱ/2. Since diam A n0 < 1/n0 < ϱ/2, we have A n0 ⊆ B ϱ (x) ⊆ U, a contradiction. This proves the proposition. Remark 1.5.35. A δ > 0 satisfying the property above is called the Lebesgue number of the cover L. The next theorem provides a complete characterization of compact metric spaces. Theorem 1.5.36. If (X, d) is a metric space, then the following statements are equivalent: (a) X is compact; (b) X is complete and totally bounded; (c) X is sequentially compact. Proof. (a) ⇒ (b): Since (X, d) is compact, every Cauchy sequence {x n }n≥1 has a cluster point x ∈ X, see Remark 1.4.58. We claim that x n → x in X. Since {x n }n≥1 is a Cauchy sequence, there exists n0 ∈ ℕ for every given ε > 0 such that d(x n , x m ) < ε

for all n, m ≥ n0 .

(1.5.4)

52 | 1 Basic Topology Since x is a cluster point of the Cauchy sequence, we can find k ≥ n0 such that d(x k , x) < ε .

(1.5.5)

Then, combining (1.5.4) and (1.5.5), we have for n ≥ n0 d(x n , x) ≤ d(x n , x k ) + d(x k , x) < 2ε , which means that x n → x in X and so X is complete. For every ε > 0 we have X = ⋃x∈X B ε (x). The compactness of X implies that we can find x1 , . . . , x m such that X = ⋃m n=1 B ε (x n ). Thus, X is totally bounded. (b) ⇒ (c): Let {x n }n≥1 be a sequence in X. Since X is totally bounded, a subsequence S1 of {x n }n≥1 must be in a set B1 = {u ∈ X : d(y1 , u) < 1}. Evidently, B1 is totally bounded. Hence, there exists a subsequence S2 of S1 , which will be in B2 = {u ∈ B1 : d(y2 , u) < 1/2}. By induction for each n ∈ ℕ we can have a subsequence S n+1 of S n , which is in B n+1 = {u ∈ X : d(y n+1 , u) < 1/(n + 1)}. Let i1 < i1 < . . . < i n < . . . be such that x i n ∈ S n . Then {x n }n∈ℕ is Cauchy sequence and thus converges. This proves that X is sequentially compact. (c) ⇒ (a): Let L be an open cover of X and let δ > 0 be the Lebesgue number of L; see Proposition 1.5.34 and Remark 1.5.35. First we show that X is totally bounded. If this is not the case, then we can find ε > 0 such that no finite family of balls of radius ε > 0 cover X. Inductively we can generate a sequence {x n }n≥1 ⊆ X such that for all n ∈ ℕ, x n ∈ ̸ ⋃k 0 and x ∈ X, let V x = f −1 (B ε/2 (f(x))) ∈ N(x). Then, for u, v ∈ V x we have ρ(f(u), f(v)) < ε .

(1.5.6)

We know that X is sequentially compact because of Theorem 1.5.36. By Proposition 1.5.34 there exists δ > 0 such that for every v ∈ X B δ (v) ⊆ V x

for some x ∈ X .

(1.5.7)

1.5 Metric Spaces – Baire Category | 53

Recall that this δ is called the Lebesgue number of the cover L = {V x }x∈X ; see Proposition 1.5.34 and Remark 1.5.35. Then, because of (1.5.6) and (1.5.7), u ∈ B δ (v) implies ρ(f(u), f(v)) < ε. Hence, f is uniformly continuous. The next proposition is an easy consequence of the relevant definitions. Proposition 1.5.40. (a) Every metric space X is first countable. (b) For a metric space X the notions of separability, second countability, and Lindelöf are all equivalent. Proof. (a) For every x ∈ X, let B(x) = {B r (x) : r ∈ ℚ}. Then B is a countable local basis at x ∈ X. Therefore X is first countable. (b) First we show that “separability” implies “second countability.” Let {u n }n≥1 be dense in X. Then B = {B r (u n ) : r ∈ ℚ, n ∈ ℕ} is a countable basis of X, hence X is second countable. Theorem 1.2.27 says that “second countable” implies “Lindelöf.” Finally we show that “Lindelöf” implies “separable.” Consider the open cover {B ε (x)}x∈X with ε > 0 of X. By the Lindelöf property there exists a countable subcover {B ε (x k )}k∈ℕ . Let A(ε) = {x k }k∈ℕ . Then D = ⋃n≥1 A(1/n) is a countable dense subset of X. Therefore X is separable. Remark 1.5.41. In contrast to general topological spaces (see Proposition 1.2.22), for metric spaces, separability and second countability are equivalent notions. Combining Proposition 1.5.40 with Theorem 1.4.63 we have the following result. Theorem 1.5.42. Let (X, d) be a metric space. Then the following assertions are equivalent: (a) X is compact. (b) X is countably compact. (c) X is limit point compact. (c) X is sequentially compact. Definition 1.5.43. A Hausdorff topological space (X, τ) is said to be Polish if it is separable and there exists a compatible metric d, that is τ = τ d , for which X is complete. Remark 1.5.44. In a Polish space the compatible metric is not a priori fixed. We know that it exists and generates the topology of X and that the space furnished with this metric is complete. There are many topological spaces that are Polish, but the corresponding complete metric is not particularly simple or natural. However, many constructions and facts depend only on the existence of a complete metric and not on the exact choice. Proposition 1.5.45. If X is a Polish space and A ⊆ X is open or closed, then A is Polish. Proof. From Corollary 1.5.6 we know that A is separable. First suppose that A is open. We assume that A ≠ X and let d be the compatible metric on X for which X is complete.

54 | 1 Basic Topology Let

1 1 ̂ d(x, u) = d(x, u) + − c c d(x, A ) d(u, A )

for all x, u ∈ A .

(1.5.8)

It is easy to see that d̂ is a metric on A. We show that d̂ metrizes the subspace topology on A. From the triangle inequality we have d(x, A c ) − d(u, A c ) ≤ d(x, u) , which implies that x → d(x, A c ) is 1-Lipschitz, equivalently nonexpansive. Therefore, d̂

u n → u if and only if u n → u. Hence, d̂ metrizes the subspace topology on A. ̂ Suppose that {u n }n≥1 ⊆ A is a d-Cauchy sequence. Then, from (1.5.8) it is clear that d

d

{u n }n≥1 is also a d-Cauchy sequence. Therefore, u n → u ∈ X. If u ∈ A c , then d(u n , A c ) → ̂ n , u m ) → +∞ as n, m → +∞, a contradiction. Thus, 0 and so from (1.5.8) we have d(u d ̂ u ∈ A and so u n → u, which proves the completeness of (A, d). Now suppose that A is closed. Then d A = dA×A is complete and so A is Polish. Proposition 1.5.46. Countable products and countable intersections of Polish spaces are Polish spaces. Proof. For the products the result follows from Propositions 1.5.16, 1.5.17 and 1.5.18. For the intersections let ∆ = {(u n ) ∈ ∏ X n : u j = u k for all j, k} . n≥1

Then ∆ is closed, hence Polish; see Proposition 1.5.45. But ∆ is homeomorphic to ⋂n≥1 X n . The next result is known as “Alexandrov’s Theorem” and gives a characterization of Polish spaces. Theorem 1.5.47 (Alexandrov’s Theorem). If (X, τ) is a Polish space, then A ⊆ X is Polish if and only if A is a G δ -subset of X. Proof. ⇒: Let d be a compatible metric for X and d0 a compatible complete metric for A. For each n ∈ ℕ, let V n be the union of the open subsets U of X for which U ∩ A ≠ 0 and d0 -diam(U ∩ A) < 1/n, where d0 -diam denotes the diameter for the metric d0 . Since d and d0 induce the same topology on A we have τ

A ⊆ A ∩ ( ⋂ Vn ) .

(1.5.9)

n≥1 τ

Let u ∈ A ∩ (⋂n≥1 V n ). Since u ∈ ⋂n≥1 V n we can find a sequence {U n }n≥1 of neighborhoods of x such that 1 U n ∩ A ≠ 0 and d0 - diam(U n ∩ A) < . n

1.5 Metric Spaces – Baire Category | 55

Evidently, by replacing U n with a small neighborhood of u, we may assume that {U n }n≥1 is decreasing and d-diam U n ≤ 1/n. Since (A, d0 ) is complete, from Theorem 1.5.15, we have that {u0 } = ⋂ U n ∩ A

τ(A)

.

(1.5.10)

n≥1 τ

τ

For every n ∈ ℕ we have d-diam U n ≤ 1/n and u, u0 ∈ U n . Hence, because of (1.5.10), τ τ u = u0 . Therefore, A ∩(⋂n≥1 V n ) ⊆ A and due to (1.5.9) it holds that A = A ∩(⋂n≥1 V n ). τ Invoking Proposition 1.5.8 for the closed A , we conclude that A is a G δ -subset of X. ⇐: By hypothesis A = ⋂n≥1 U n with U n ⊆ X open for all n ∈ ℕ. From Proposition 1.5.45 we know that each U n is Polish and so Proposition 1.5.46 implies that ⋂n≥1 U n = A is Polish. Remark 1.5.48. From the last theorem we recover the part of Proposition 1.5.45 concerning open sets. Corollary 1.5.49. The set of irrational numbers with the topology induced by ℝ is Polish. Remark 1.5.50. We mention some more Polish spaces: – Every locally compact, σ-compact metrizable space is Polish. – Every locally compact and second countable Hausdorff space is Polish. This is a consequence of the so-called “Urysohn Metrization Theorem,” which says that every regular, second countable space is metrizable. – ℕ∞ is Polish (see Proposition 1.5.46) and in fact every Polish space is a continuous image of ℕ∞ . More precisely every Polish space is a one-to-one continuous image of a closed subset of ℕ∞ . On ℕ∞ we consider the tree metric defined by {0 t (p,̂ q)̂ = { 1 {k

–

if p̂ = q̂ if p̂ ≠ q̂ and k = min{n ∈ ℕ : p n ≠ q n }

for all p̂ = (p n ), q̂ = (q n ) ∈ ℕ∞ . This is a complete metric on ℕ∞ compatible with the product topology. Every Polish space is a G δ in some metrizable compactification.

Definition 1.5.51. A Hausdorff space X is said to be a Souslin space if there exist a Polish space Y and a continuous surjection f : Y → X. Remark 1.5.52. Equivalently we can say that the Hausdorff topological space (X, τ) is Souslin if and only if there is a topology τ0 ⊇ τ on X such that (X, τ0 ) is homeomorphic to a quotient of a Polish space. A Souslin space is always separable but need not be metrizable. Anticipating some basic material from Chapter 3, we mention that an infinite dimensional separable Banach space with the weak topology is Souslin, but not metrizable. Similarly for the dual X ∗ of an infinite dimensional separable Banach space endowed with the w∗ -topology.

56 | 1 Basic Topology Definition 1.5.53. The Souslin subspaces of a Polish space are called analytic sets. Souslin spaces have nice stability properties. Proposition 1.5.54. (a) Closed and open subsets of Souslin spaces are Souslin spaces. (b) Countable products of Souslin spaces are Souslin. (c) Countable intersections and countable unions of Souslin subspaces of a Hausdorff topological space V are Souslin. Proof. (a): Let X be a Souslin space. Then according to Definition 1.5.51 there exists a Polish space Y and a continuous surjection f : Y → X. Let E ⊆ X be a closed (resp. open) set. Then f −1 (E) ⊆ Y is closed (resp. open) and so by Proposition 1.5.45 f −1 (E) is Polish. Also f f −1 (E) is continuous and surjective onto f(f −1 (E)) = E since f is a surjection. Therefore, by Definition 1.5.51, E is Souslin. (b): Let {X n }n≥1 be a family of Souslin spaces. For every n ∈ ℕ there exists a Polish space Y n and a continuous surjection f n : Y n → X n . Set Y = ∏n≥1 Y n , X = ∏n≥1 X n and f ̂ = (f n )n≥1 : X → Y defined by f ̂({y n }) = (f n (y n ))n≥1 . Then Y is Polish by Proposition 1.5.46 and f ̂ is a continuous surjection. So, X is a Souslin space. (c): Let {X n }n≥1 be a family of Souslin subspaces of V and let X = ∏n≥1 X n . We introduce V̂ = V ℕ and ∆̂ the diagonal of V,̂ that is, ∆̂ = {û = (u n )n≥1 : u n = u for all n ∈ ℕ}. From Proposition 1.3.12 we know that V̂ is Hausdorff and so Problem 1.1 implies that ∆̂ ⊆ V̂ is closed. Let f ̂ : V → ∆̂ be the canonical map of V onto ∆̂ defined by f ̂(u) = (u, u, . . . , u, . . .). Then f ̂(X) = ∆̂ ∩ (∏n≥1 X n ) and f ̂ is a homeomorphism of X onto a closed subspaces of ∏n≥1 X n . But by part (b) ∏n≥1 X n is Souslin, hence by part (a) f ̂(X) is Souslin. Therefore X is Souslin. Now we consider the union ⋃n≥1 X n . For every n ∈ ℕ we can find a Polish space Y n and a continuous surjection f n : Y n → X n . Let X̃ n = {n} × X n and Ỹ n = {n} × Y n . Note that both are Polish spaces. Now we consider the map f ñ : Ỹ n → X̃ n defined by f ñ (n, y) = (n, f n (y)) for all n ∈ ℕ and for all y ∈ Y n . Evidently f ñ is a continuous surjection. Let Ỹ = ⋃n≥1 Ỹ n (this set is known as the free or disjoint union of the Y n s and sometimes it is denoted by ∑n≥1 Ỹ n ) and similarly we set X̃ = ⋃n≥1 X̃ n . The function f ̃ : Ỹ → X̃ defined by f ̃Ỹ n = f ñ for all n ∈ ℕ is a continuous surjection. The space Ỹ is Polish; see Proposition 1.5.46. Let h : X̃ → ⋃n≥1 X n be the canonical projection, that is, h(n, u) = u for all n ∈ ℕ and for all u ∈ X n . This is a homeomorphism onto ⋃n≥1 X n . Then g = h ∘ f ̃ : Ỹ → ⋃n≥1 X n is a continuous surjection, hence ⋃n≥1 X n is Souslin. Directly from Definition 1.5.51, we have the following useful property of Souslin spaces. It shows that although Souslin spaces are not necessarily metrizable, they are sequentially determined. Proposition 1.5.55. If X is a Souslin space and A ⊆ X, then there exists a countable set D ⊆ A such that D is sequentially dense in A. Proof. Let Y be a Polish space and f : Y → X a continuous surjection. Let B = f −1 (A) ⊆ Y. Then B is separable and so there exists a countable dense subset D0 ⊆ B, that is,

1.5 Metric Spaces – Baire Category | 57

Y

D0 ⊇ B. Since f is surjective we know that D = f(D0 ) ⊆ A is countable and sequentially dense in A. Definition 1.5.56. A Hausdorff topological space X is said to be strongly Lindelöf if every open subset of X with the subspace topology is Lindelöf; see Definition 1.2.26(b). Proposition 1.5.57. Every Souslin space X is strongly Lindelöf. Proof. Let Y be a Polish space and f : Y → X a continuous surjection. Evidently Y is strongly Lindelöf; see Propositions 1.5.40(b) and 1.5.45. We can easily check that the continuous image of a strongly Lindelöf space is strongly Lindelöf. Hence X must be strongly Lindelöf. Definition 1.5.58. Let X, {Y α }α∈I be sets and f α : X → Y α a family of functions. We say that the family {f α }α∈I is separating (or total) if for every pair (x, u) ∈ X × X with x ≠ u we have f α (x) ≠ f α (u) for some α ∈ I. Lemma 1.5.59. If X is a Souslin space, {Y α }α∈I is a family of Hausdorff topological spaces and f α : X → Y α with α ∈ I is a separating family of continuous maps, then we can find a countable subset D ⊆ I such that {f α }α∈D remains separating. Proof. Replacing the Y α s by their free union (see the proof of Proposition 1.5.54(c)), we see that without any loss of generality we may assume that Y α = Y for all α ∈ I. Let ∆ X ⊆ X × X and ∆ Y ⊆ Y × Y be the diagonals. If (x, u) ∈ ∆ cX , then we can find α ∈ I such that (f α (x), f α (u)) ∈ ∆ cY . So, the open sets (f α , f α )−1 (∆ cY ) with α ∈ I form an open cover of ∆ cX . The space X × X is strongly Lindelöf; see Propositions 1.5.54(b) and 1.5.57. Therefore we can find a countable D ⊆ I such that {(f α , f α )−1 (∆ Y )}α∈D is a countable open cover of ∆ cX . This means that {f α }α∈D remains separating. Combining this lemma with Problem 1.41 we can state the following result concerning compact Souslin spaces. Theorem 1.5.60. Every compact Souslin space is metrizable, hence Polish. Remark 1.5.61. An improvement of this theorem can be found in Problem 1.42. The Baire category notion gives a topological meaning to the notion of the size of a set. It is based on density. So, according to Baire, a subset A of a Hausdorff topological space X is considered to be very small (sparse) if there is no nonempty open set U ⊆ X such that A ∩ U is dense in U, that is, A has an empty interior. Then large sets are those that are not countable unions of sparse sets. Definition 1.5.62. Let X be a Hausdorff topological space and A ⊆ X. (a) We say that A is nowhere dense if int A = 0. (b) We say that A is of first category if it is the countable union of nowhere dense sets. (c) We say that A is of second category if it is not of first category.

58 | 1 Basic Topology Remark 1.5.63. Note that ℚ is of first category and at the same time dense in ℝ. The set A ⊆ X is nowhere dense if and only if int(X \ A) is dense in X. Definition 1.5.64. A Hausdorff topological space X is said to be a Baire space if the intersection of each countable family of dense, open sets in X is dense. Proposition 1.5.65. A Hausdorff topological space X is of second category in itself if and only if every countable family of dense open sets in X has nonempty intersection. Proof. ⇒: Let {U n }n≥1 be dense, open sets. Then {U nc }n≥1 = {X \ U n }n≥1 are nowhere dense, closed sets and so ⋃n≥1 U nc is of first category. Since by hypothesis X is of second category we have X \ ( ⋃ U nc ) = ⋂ U n ≠ 0 . n≥1

n≥1

⇐: Arguing by contradiction, suppose that X is of first category. Then X = ⋃n≥1 C n with C n being nowhere dense and closed for each n ∈ ℕ. We have X \ ( ⋃ C n ) = ⋂ (X \ C n ) ≠ 0 n≥1

n≥1

since each X \ C n = U n with n ∈ ℕ is dense and open, a contradiction. This shows that X must be of second category. Proposition 1.5.66. If X is a compact Hausdorff topological space and A ⊆ X is a G δ -set, then A is a Baire space. Proof. First we show that X is a Baire space. Let {U n }n≥1 be dense, open sets in X and let V ⊆ X be a nonempty, open set. We have U1 ∩ V ≠ 0 and U1 ∩ V is open. From Corollary 1.4.50 we know that X is normal, hence regular as well. So, we can find an open W1 ⊆ X such that W 1 ⊆ U1 ∩ V; see Proposition 1.2.8. Similarly, for n ∈ {2, 3, . . .} there exists open W n ⊆ X such that W n ⊆ U n ∩ W n−1 . Evidently {W n }n≥1 is a decreasing sequence of compact sets, hence ⋂n≥1 W n ≠ 0. But ⋂n≥1 W n ⊆ (⋂n≥1 U n ) ∩ V. So, every open set V ⊆ X has a nonempty intersection with ⋂n≥1 U n and this shows that ⋂n≥1 U n is dense in X. Hence, X is a Baire space. Without loss of generality we may assume that A is dense in X since we can always replace X by A. Let {U n }n≥1 be dense, open subsets of A. Then U n = V n ∩ A with a dense and open V n ⊆ X for every n ∈ ℕ. Then ⋂ (V n ∩ A) = ( ⋂ V n ) ∩ A . n≥1

n≥1

From the first part of the proof we know that ⋂n≥1 V n ⊆ X is dense. Therefore ⋂n≥1 U n = ⋂n≥1 (V n ∩ A) is dense in A. This proves that A is a Baire space. Corollary 1.5.67. If X is a complete metric space and X = ⋃n≥1 C n with closed C n ⊆ X for all n ∈ ℕ, then there exists a number n0 ∈ ℕ such that int C0 ≠ 0.

1.6 Function Spaces | 59

Now Theorems 1.4.75 and 1.5.47 lead to the so-called “Baire Theorem.” Theorem 1.5.68 (Baire Theorem). (a) Every locally compact Hausdorff topological space is a Baire space. (b) Every topologically complete Hausdorff space is a Baire space. We conclude this section with an important result known as “Stone’s Theorem.” For the proof we refer to Dugundji [91, p. 186]. Theorem 1.5.69 (Stone’s Theorem). Every metrizable space is paracompact.

1.6 Function Spaces Let (X, τ X ) and (Y, τ Y ) be two Hausdorff topological spaces. By C(X, Y) we denote the space of continuous functions f : X → Y. In this section we topologize this space and study its properties. Definition 1.6.1. Let K ⊆ X be compact and U ⊆ Y be open. We set W(K, U) = {f ∈ C(X, Y) : f(K) ⊆ U} . The compact-open topology (or c-topology) on C(X, Y) is the topology τ ζ on C(X, Y) having as subbasis the family {W(K, U) : K ⊆ X is compact and U ⊆ Y is open} . Remark 1.6.2. A basic element for the τ ζ -topology is given by m

⋂ W(K n , U n ) n=1

with compact K n ⊆ X and open U n ⊆ Y for all n ∈ {1, . . . , m}. Note that C(X, Y) ⊆ Y X . So, we can consider on C(X, Y) the relative product topology that is the topology of pointwise convergence and is denoted by τ p . Since W({x}, U) ∈ τ ζ for all x ∈ X and all open U ⊆ Y, it follows that τp ⊆ τζ .

(1.6.1)

Note that we have m

m

m

m

⋂ W(K n , U) = W ( ⋃ K n , U) ,

⋂ W(K, U n ) = W (K, ⋂ U n ) ,

n=1 m

n=1

n=1 m

⋂ W(K n , U n ) ⊆ W ( ⋃ K n , ⋃ U n ) , n=1

n=1

n=1

m

W(K, U)

τζ

τY

⊆ W(K, U ) .

n=1

Proposition 1.6.3. If (X, τ X ) and (Y, τ Y ) are Hausdorff topological spaces and the function space C(X, Y) is endowed with the τ ζ -topology, then the following hold:

60 | 1 Basic Topology (a) C(X, Y) is Hausdorff; (b) C(X, Y) is regular if and only if Y is regular. Proof. (a) Let f, g ∈ C(X, Y) such that f ≠ g. We can find x ∈ X such that f(x) ≠ g(x). Because Y is Hausdorff, we can find U ∈ N(f(x)) and V ∈ N(g(x)) such that U ∩ V = 0. Then W({x}, U) ∈ τ ζ contains f , W({x}, V) ∈ τ ζ contains g , W({x}, U) ∩ W({x}, V) = 0 . This proves that (C(X, Y), τ ζ ) is Hausdorff. (b) ⇒: Evidently, Y ⊆ C(X, Y) (the subspace of constant functions) and τ ζ (Y) = τ Y . Then the regularity of Y follows from the fact that the property is hereditary; see Proposition 1.2.10. ⇐: Let f ∈ W(K, U). The set f(K) ⊆ Y is compact. So, by Problem 1.52 we can find τζ τY V ∈ τ Y such that f(K) ⊆ V ⊆ V ⊆ U. Then f ∈ W(K, U) ⊆ W(K, U) ⊆ W(K, U ); see Remark 1.6.2. This proves that (C(X, Y), τ ζ ) is regular. Remark 1.6.4. If Y is normal or first countable or second countable, then (C(X, Y), τ ζ ) need not have the same properties. Let (X, τ X ), (Y, τ Y ) and (Z, τ Z ) be three Hausdorff topological spaces. We can define the map η : C(X, Y) × C(Y, Z) → C(X, Z) given by η(f, g) = g ∘ f .

(1.6.2)

On C(X, Y), C(Y, Z) and C(X, Z) we consider the corresponding ζ -topologies. Proposition 1.6.5. The maps f → η(f, g) and g → η(f, g) are both continuous. Proof. We fix f1 ∈ C(X, Y) and prove the continuity of g → η(f1 , g) on C(Y, Z). Let W(K, U) be a subbasic neighborhood of g ∘ f1 . Note that g ∘ f1 ∈ W(K, U) if and only if g ∈ W(f1 (K), U). But the set f1 (K) ⊆ Y is compact. Hence, W(f1 (K), U) is a subbasic neighborhood of g. Therefore, η(f1 , W(f1 (K), U)) = W(K, U) and this proves the continuity of g → η(f1 , g). Next we fix g1 ∈ C(Y, Z) and consider the map f → η(f, g1 ) from C(X, Y) into C(X, Z). The proof of the continuity of this map is similar to the previous part. Note that in this case g1 ∘ f ∈ W(K, U) if and only if f ∈ W(K, g1−1 (U)) and g −1 (U) ∈ τ Y . To have joint continuity of the map η we need to strengthen the conditions on the space Y. Proposition 1.6.6. If (Y, τ Y ) is locally compact, then the map η is jointly continuous. Proof. Let (f1 , g1 ) ∈ C(X, Y) × C(Y, Z) and let W(K, U) be a subbasic neighborhood of (f1 , g1 ). Note that f1 (K) ⊆ g1−1 (U), f1 (K) ⊆ Y is compact and g1−1 (U) ⊆ Y is open. Since

1.6 Function Spaces |

61

by hypothesis Y is locally compact, we can find relatively compact V ∈ τ Y such that f1 (K) ⊆ V ⊆ V ⊆ g1−1 (U) ; see Proposition 1.4.66(c). Then we have W(K, V) ⊆ N(f1 ) ,

W(V , U) ∈ N(g1 ) ,

η(W(K, V), W(V , U)) ⊆ W(K, U) . Hence, η is jointly continuous. Definition 1.6.7. The map e : X × C(X, Y) → Y defined by e(x, f) = f(x) is called the evaluation map. If we fix x ∈ X, the map e x : C(X, Y) → Y defined by e x (f) = f(x) is called the evaluation at x map. The next proposition establishes the continuity properties of these maps. Proposition 1.6.8. (a) If Y is locally compact, then e : X × C(X, Y) → Y is continuous. (b) For every x ∈ X, the map e x : C(X, Y) → Y is continuous. Proof. Note that when Z is a singleton and η : C(Z, X) × C(X, Y) → C(Z, Y) is the composition map (see (1.6.2)) then η = e. So, (a) follows from Proposition 1.6.6 while (b) follows from Proposition 1.6.5. We want to characterize the τ ζ -compact subsets of C(X, Y). The next definition introduces notions that are crucial in this direction. Definition 1.6.9. Let (X, τ X ) be a Hausdorff topological space and (Y, d) be a metric space. (a) A set F ⊆ C(X, Y) is said to be equicontinuous at x if for a given ε > 0 there exists U ∈ N(x) such that d(f(u), f(x)) < ε for all u ∈ U and for all f ∈ F. We say that F is equicontinuous if it is equicontinuous at every x ∈ X. (b) Given f ∈ C(X, Y) with compact K ⊆ X and ε > 0, we define B K,ε (f) = {g ∈ C(X, Y) : sup[d(g(x), f(x)) : x ∈ K] < ε} . The sets B K,ε (f) form a basis for a topology τ u on C(X, Y) known as the topology of uniform convergence on compacta. Remark 1.6.10. The τ ζ -topology (see Definition 1.6.1) and the τ p -topology (see Remark 1.6.2) on C(X, Y) are defined without requiring that Y is a metric space. In contrast, the τ u -topology (see Definition 1.6.9) explicitly requires that Y must be a metric space. Nevertheless, we can prove the following remarkable result. Theorem 1.6.11. If (X, τ X ) is a Hausdorff topological space and (Y, d) is a metric space, then τ ζ = τ u . Proof. First we show that τ ζ ⊆ τ u . To this end let f ∈ W(K, U). Then f(K) ⊆ Y is compact and f(K) ⊆ U.

62 | 1 Basic Topology Claim: There exists ε > 0 such that f(K)ε = {y ∈ Y : d(y, f(K)) < ε} ⊆ U . Arguing by contradiction, suppose that the claim is not true. Then we can find {y n }n≥1 ⊆ Y \ U such that d(y n , f(K)) < 1/n. Recall that f(K) ⊆ Y is compact. So, for every n ∈ ℕ there exists v n ∈ f(K) such that d(y n , v n ) = d(y n , f(K)) < 1/n for all n ∈ ℕ. The compactness o f(K) implies that by passing to a subsequence if necessary, d

we have v n → v ∈ f(K) in Y. Since d(y n , v n ) < 1/n for all n ∈ ℕ, it follows that d

y n → v ∈ (X \ U) ∩ f(K), a contradiction, since f(K) ⊆ U. This proves that the claim is true. The claim implies that B K,ε (f) ⊆ W(K, U), that is τζ ⊆ τu .

(1.6.3)

Next we show that the opposite inclusion holds as well. Let f ∈ C(X, Y) and let B K,ε (f) ⊆ W(K, U), see (1.6.3). For every x ∈ X there exists V x ∈ N(x) such that f(V x ) ⊆ U x with U x ⊆ Y open and diam U x < ε. Since K is compact we find x1 , . . . , x n ∈ K such that K ⊆ ⋃nk=1 V x k . Let K x k = V x k ∩ K for k ∈ {1, . . . , n}. Then f ∈ ⋂nk=1 W(K x k , U x k ) ⊆ B K,ε (f) and so τu ⊆ τζ .

(1.6.4)

From (1.6.3) and (1.6.4) it follows that τ ζ = τ u . We know that τ p ⊆ τ ζ (= τ u if Y is a metric space); see (1.6.1) and Theorem 1.6.11. However, on equicontinuous sets, the two topologies coincide. Proposition 1.6.12. If (X, τ X ) is a Hausdorff topological space, (Y, d) is a metric space and F ⊆ C(X, Y) is equicontinuous, then τ p (F) = τ ζ (F), that is, the two topologies restricted on F coincide. Proof. Evidently τ p (F) ⊆ τ ζ (F). Moreover, Theorem 1.6.11 yields that τ ζ = τ u . Therefore, it suffices to find a basic element B for the τ p -topology such that f ∈ B ∩ F ⊆ B K,ε (f) ∩ F . Let ε1 , ε2 > 0 be such that 2ε1 +ε2 ≤ ε. Since F is equicontinuous and K ⊆ X is compact, we find open sets {U k }nk=1 in X such that K ⊆ ⋃nk=1 U k and for each k ∈ {1, . . . , n}, each x, u ∈ U k and f ∈ F, d(f(x), f(u)) < ε1 . We choose x k ∈ U k with k ∈ {1, . . . , n} and let B = {g ∈ C(X, Y) : d(g(x k ), f(x k )) < ε2 for all k ∈ {1, . . . n}} . Let g ∈ B ∩ F. Given x ∈ K, we find k ∈ {1, . . . , n} such that x ∈ U k . Then we have d(g(x), g(x k )) ≤ ε1 ,

d(g(x k ), f(x k )) < ε2 ,

d(f(x k ), f(x)) ≤ ε1 ,

which implies, by the triangle inequality and the choice of ε1 , ε2 > 0, that d(g(x), f(x)) < ε. Hence g ∈ B K,ε (f), thus B ∩ F ⊆ B K,ε ∩ F. This proves that τ p (F) = τ ζ (F).

1.6 Function Spaces |

63

Proposition 1.6.13. If (X, τ X ) is a Hausdorff topological space, (Y, d) is a metric space, τp and F ⊆ C(X, Y) is equicontinuous, then F is equicontinuous as well. Proof. Let x ∈ X and ε > 0. Since F is equicontinuous, there exists U ∈ N(x) such that d(f(u), f(x)) < ε for all u ∈ U and for all f ∈ F. τp Let g ∈ F . For v ∈ U we introduce V v = {h ∈ C(X, Y) : d(h(v), g(v))

0 be given and choose ε1 , ε2 > 0 such that 2ε1 + ε2 ≤ ε. We can find x1 , . . . , x n ∈ K such that Lζ ⊆ ⋃nk=1 B ε (f k̂ ). Since each f K̂ is continuous, we can find U ∈ N(x) such that d (f k̂ (u), f k̂ (x)) < ε2

for all u ∈ U and for all k ∈ {1, . . . , n} .

(1.6.5)

Let f ̂ ∈ Lζ . Then f ̂ ∈ B ε1 (f k̂ ) for some k ∈ {1, . . . , n}. For every u ∈ U we have d (f ̂(u), f k̂ (u)) < ε1 ,

d (f k̂ (u), f k̂ (x)) < ε2 ,

d (f k̂ (x), f ̂(x)) < ε1 ;

see (1.6.5). This gives d (f ̂(u), f ̂(x)) < ε for all u ∈ U, which implies that Lζ is equicontinuous, and hence, so is F. τp ⇐: From Proposition 1.6.13 we know that F is equicontinuous. Then Proposiτp τζ tion 1.6.12 implies that F = F . Recall that τ p is the relative product topology on C(X, Y) ⊆ Y X . Using Tychonoff’s Product Theorem (see Theorem 1.4.56), we have that

64 | 1 Basic Topology ∏x∈X F(x) is compact in the product topology and so F compact.

τp

is compact. Therefore, F

τζ

is

A careful inspection of the second part of the proof above reveals that for that part of the result, the local compactness of X is not needed. So, we can state the following version of the Arzela–Ascoli Theorem. Theorem 1.6.15. If (X, τ X ) is a Hausdorff topological space, (Y, d) is a metric space, and F ⊆ C(X, Y) is a set with the following two properties: (a) F is equicontinuous; (b) for every x ∈ X, F(x) = {f(x) : f ∈ F} ⊆ Y is relatively compact, τζ then F is τ ζ -compact and equicontinuous on X. When Y = ℝN , exploiting the Heine–Borel Theorem, we can have the following particular version of the Arzela–Ascoli Theorem; see Theorem 1.6.14. Theorem 1.6.16. If (X, τ X ) is a compact topological space and F ⊆ C(X, ℝN ), then F is ̂ compact for the supremum metric topology τ d̂ if and only if F is equicontinuous, d-closed, and bounded, that is, |f(u)| ≤ M for all u ∈ X and for some M > 0. Remark 1.6.17. If X is a compact space and (Y, d) is a metric space, then recall that the supremum metric d̂ or d∞ is defined by ̂ g) = d∞ (f, g) = max{d(f(x), g(x)) : x ∈ X} . d(f, d̂

Evidently, f n → if and only if f n → f uniformly on X, that is, for given ε > 0, we can d∞

find n0 = n0 (ε) ∈ ℕ such that d(f n (u), f(u)) ≤ ε for all u ∈ X and for all n ≥ n0 . It is easy to see that uniform limits of continuous maps are again continuous maps. ̂ According to Theorem 1.6.11, the d-metric topology depends only on the topology of Y and on the particular metric d. So, if d1 , d2 are two compatible metrics on Y, then the corresponding sup-metrics d̂ 1 , d̂ 2 are compatible as well. Hence we can view C(X, Y) as a topological space without specifying the particular sup-metric and refer to the topology of uniform convergence on C(X, Y). Proposition 1.6.18. If X is a compact metrizable space and Y is a separable metrizable space, then the space C(X, Y) with the τ ζ = τ u -topology is separable and metrizable. Proof. On account of Proposition 1.5.40(b) and Remark 1.6.17, it suffices to show that C(X, Y) is second countable. Let D = {x n }n≥1 ⊆ X be a dense set and {U n }n≥1 a countable basis for X. Let {B n }n≥1 be an enumeration of the countable set of all closed balls with center D and a rational radius. For n, m ∈ ℕ let W n,m = W(B n , U m ). We claim that {W n,m }n,m≥1 is a countable subbasis for C(X, Y). To this end, let V ⊆ C(X, Y) be open and let f ∈ V. We choose δ > 0 such that ̂ f) < 2δ} ⊆ V . B2δ (f) = {g ∈ C(X, Y) : d(g,

1.7 Semicontinuous Functions – Miscellaneous Notions |

65

Let d Y be a compatible metric on Y and let Y = ⋃k≥1 V k with V k ∈ {U n }n≥1 and diam V k < δ. Moreover, let d X be a compatible metric on X and write the open set f −1 (V k ) as a union of d X -balls with center u k ∈ X, a rational radius, and closure in f −1 (V k ). We have X = ⋃k≥1 f −1 (V k ) and the compactness of X implies that there exists a finite number of the balls B n with n ∈ ℕ such that ⋃ki=1 B n i = X. For each i, choose m i such that B n i ⊆ f −1 (U m i ). Let g ∈ ⋂ki=1 W(B n i , U m i ). If x ∈ X, we choose i such that x ∈ B n i and note that f(x), g(x) ∈ U m i . Since diam U m i < δ, we have ̂ f) < δ < 2δ. Hence g ∈ B2δ (f) ⊆ V. Therefore, d Y (g(x), f(x)) < δ, which gives d(g, k f ∈ ⋂i=1 W(B n i , W n i ) ⊆ V and this proves the second countability of C(X, Y). Remark 1.6.19. Combining Proposition 1.6.18 with Problem 1.21, we conclude that if Y is a Polish space, then so is C(X, Y) equipped with the τ ζ = τ u -topology.

1.7 Semicontinuous Functions – Miscellaneous Notions In this section we examine semicontinuous extended real-valued functions and at the end we introduce some topological notions that arise in various parts of nonlinear analysis. Semicontinuous ℝ∗ -valued functions, where ℝ∗ = ℝ ∪ {±∞}, provide a natural framework to study minimization or maximization problems with constraints. Here we will focus on lower semicontinuous ℝ = ℝ ∪ {+∞}-valued functions. Of course with a minus sign all results can be reformulated for upper semicontinuous ℝ̃ = ℝ ∪ {−∞}valued functions. So, let X be a set and let φ : X → ℝ = ℝ ∪ {+∞} be a function. We introduce the following sets: epi φ = {(u, λ) ∈ X × ℝ : φ(u) ≤ λ} is the epigraph of φ , φ λ = {u ∈ X : φ(u) ≤ λ} with λ ∈ ℝ is the λ-sublevel set of φ , dom φ = {u ∈ X : φ(u) < +∞} is the effective domain of φ . To avoid trivial situations, we will always consider functions with dom φ ≠ 0. In the optimization literature such functions are called proper. However, in nonlinear analysis, this name is reserved for maps that have the property where the inverse image of a compact set is compact. Note that if {φ α }α∈I is a family of R-valued functions then epi (sup φ α ) = ⋂ epi φ α , α∈I

epi (inf φ α ) = ⋃ epi φ α . α∈I

(1.7.1)

α∈I

(1.7.2)

α∈I

Definition 1.7.1. Let (X, τ) be a Hausdorff topological space and φ : X → ℝ = ℝ ∪ {+∞}. We say that φ is τ-lower semicontinuous at x ∈ X if for every λ < φ(x) there exists

66 | 1 Basic Topology U λ ∈ N(x) such that λ < f(u) for all u ∈ U λ . We say that φ is τ-lower semicontinuous if it is τ-lower semicontinuous at every x ∈ X. Proposition 1.7.2. If (X, τ) is a Hausdorff topological space and φ : X → ℝ a function, then the following statements are equivalent: (a) φ is τ-lower semicontinuous; (b) epi φ ⊆ X × ℝ is closed (we consider the product topology on X × ℝ); (c) for every λ ∈ ℝ, φ λ ⊆ X is closed; (d) φ(x) ≤ lim inf u→x φ(u) = supU∈N(x) inf u∈U φ(u) for all x ∈ X. Proof. (a) ⇒ (b): Let (u, μ) ∈ ̸ epi φ. Then μ < φ(u). Let η ∈ (μ, φ(u)). Then by Definition 1.7.1, there exists U η ∈ N(u) such that μ < η < φ(v) for all v ∈ U η . Then (U η × (−∞, η)) ∩ epi φ = 0 . Since U η × (−∞, η) is a neighborhood of (u, λ) in X × ℝ, we conclude that (X × ℝ) \ epi φ is open, hence epi φ is closed in X × ℝ with the product topology. (b) ⇒ (c): Note that φ λ × {λ} = epi φ ∩ (X × {λ}). Therefore φ λ × {λ} is closed in X × ℝ. But the map u → (u, λ) is a homeomorphism from X onto X × {λ}. Therefore φ λ is closed. (c) ⇒ (d): Let λ < φ(x). Since by hypothesis X \ φ λ is open, we can find U ∈ N(x) such that U ⊆ (X \ φ λ ). So, we have λ ≤ inf U φ, which implies λ ≤ supU∈N(x) inf u∈U φ(u) = lim inf u→x φ(u). Since λ < φ(x) is arbitrary we let λ ↗ φ(x) to conclude that φ(x) ≤ lim inf u→x φ(u). (d) ⇒ (a): Let λ < φ(x). By hypothesis λ < supU∈N(x) inf u∈U φ(u) and thus λ < inf u∈U0 φ(u) for some U0 ∈ N(x). Hence, φ is τ-lower semicontinuous at any x ∈ X. Remark 1.7.3. If φ : X → ℝ̃ = ℝ ∪ {−∞}, then instead we use the hypograph hyp φ = {(u, λ) ∈ X × ℝ : λ ≤ φ(u)} and the λ-superlevel set φ λ = {u ∈ X : φ(u) ≥ λ}. We have that φ is upper semicontinuous if and only if hyp φ is closed if and only if for λ ∈ ℝ, φ λ is closed if and only if φ(x) ≥ lim supu→x φ(u) = inf U∈N(x) supu∈U φ(u) for all x ∈ X. Proposition 1.7.2 leads to some useful stability properties for lower semicontinuous functions. Proposition 1.7.4. If (X, τ) is a Hausdorff topological space and φ α : X → ℝ with α ∈ I, is a family of τ-lower semicontinuous functions, then the following hold: (a) supα∈I φ α is τ-lower semicontinuous; (b) if I is finite, then inf α∈I φ α is τ-lower semicontinuous. Proof. (a) This follows from (1.7.1) and Proposition 1.7.2. (b) Since I is finite and the finite union of closed sets is closed, the result follows from (1.7.2) and Proposition 1.7.2. Similarly, using Proposition 1.7.2, we have the following result.

1.7 Semicontinuous Functions – Miscellaneous Notions |

67

Proposition 1.7.5. If (X, τ) is a Hausdorff topological space and φ, ψ : X → ℝ are τlower semicontinuous functions, then φ + ψ is τ-lower semicontinuous. On metric spaces semicontinuous functions can be realized as monotone limits of Lipschitz functions. Proposition 1.7.6. If (X, d) is a metric space and φ : X → ℝ is bounded from below, then φ is lower semicontinuous if and only if there exists an increasing sequence of Lipschitz continuous bounded functions φ̂ n : X → ℝ such that φ̂ n (u) ↗ φ(u) for all u ∈ X. Proof. ⇒: For every n ∈ ℕ let φ n : X → ℝ be defined by φ n (u) = inf[φ(x) + nd(x, u) : x ∈ X] .

(1.7.3)

Clearly {φ n }n≥1 is increasing and φ n ≤ φ for every n ∈ ℕ. Moreover, for every v ∈ X we have φ n (u) ≤ φ(x) + nd(x, u) ≤ φ(x) + nd(x, v) + nd(v, u) for all x ∈ X . This gives φ n (u) ≤ φ n (v) + nd(v, u), hence |φ n (u) − φ n (v)| ≤ nd(v, u). Thus each φ n is Lipschitz. ̃ We have φ n (u) ↗ φ(u) ≤ φ(u) for all u ∈ X. Given ε > 0, from (1.7.3), we see that there exists x n ∈ X such that φ(x n ) + nd(x n , u) ≤ φ n (u) + ε .

(1.7.4)

Let η ≤ φ(x) for all x ∈ X. So, from (1.7.4), we have d(x n , u) ≤

1 [φ n (u) + ε − η] . n

(1.7.5)

Hence, if u ∈ dom φ, then d(x n , u) ≤ 1/n[φ(u) + ε − η], which shows that d

xn → u .

(1.7.6)

̃ Hence if we pass to the limit as n → ∞ in (1.7.4) and use (1.7.6), then φ(u) ≤ φ(u) + ε. ̃ ̃ Since ε > 0 is arbitrary, we let ε ↘ 0 and obtain φ(u) ≤ φ(u), which implies φ(u) = φ(u) for all u ∈ dom φ. ̃ ̃ If u ∈ ̸ dom φ, then we claim that φ(u) = +∞. Indeed if φ(u) ∈ ℝ, then from (1.7.5) we have 1 ̃ d(x n , u) ≤ [φ(u) + ε − η] . n d

̃ Hence, x n → u. So, as above we obtain +∞ = φ(u) ≤ φ(u) < +∞, a contradiction. Thus φ n (u) ↗ +∞ for all u ∈ ̸ dom φ. Finally let φ̂ n = min{φ n , n}. Then φ̂ n is bounded as well. Remark 1.7.7. If φ : X → ℝ̃ = ℝ ∪ {−∞} is upper semicontinuous and bounded above, then we can find a decreasing sequence of Lipschitz continuous bounded functions φ̂ n : X → ℝ such that φ̂ n (u) → φ(u) for all u ∈ X as n → ∞.

68 | 1 Basic Topology From Proposition 1.7.6 and Remark 1.7.7, we infer the following useful result. Corollary 1.7.8. If (X, d) is a metric space and φ ∈ Cb (X, ℝ), then there exist two sequences of Lipschitz continuous bounded functions ξ n , η n : X → ℝ such that (a) {ξ n }n≥1 is increasing and ξ n (u) ↗ φ(u) for all u ∈ X; (b) {η n }n≥1 is decreasing and η n (u) ↘ φ(u) for all u ∈ X. In general pointwise convergence of functions does not imply uniform convergence. However, with additional hypotheses we can have this. The result is known as “Dini’s Theorem.” Theorem 1.7.9 (Dini’s Theorem). If (X, τ) is a countably compact Hausdorff topological space, φ n : X → ℝ with n ∈ ℕ is an increasing (resp. decreasing) sequence of lower (resp. upper) semicontinuous functions and φ n (u) → φ(u) for all u ∈ X with φ : X → ℝ upper (resp. lower) semicontinuous, then φ is continuous and φ n → φ uniformly, that is ̂ n , φ) = sup d(φ x∈X |φ n (x) − φ(x)| → 0 as n → ∞. Proof. We do the case of a lower semicontinuous sequence. The other case is obtained by multiplying with −1. From Proposition 1.7.4(a), we have that φ is lower semicontinuous as well, hence continuous. Then, for all n ∈ ℕ, φ n −φ ≤ 0 and it is lower semicontinuous. Given ε > 0, let U n = {u ∈ X : (φ n − φ)(u) > −ε}. Then {U n }n≥1 is an open cover of X and so by countable compactness we can find a finite subcover; see Definition 1.4.57(a). Since {U n }n≥1 are increasing, then for some n ∈ ℕ, U n = X. Hence −ε < (φ m − φ)(u) ≤ 0 for all m ≥ n. Therefore, φ n → φ uniformly on X. Remark 1.7.10. The hypotheses in Theorem 1.7.9 can not be relaxed. Let φ n (x) = x n for all x ∈ [0, 1). Then φ n ↘ 0 but the convergence is not uniform. The domain [0, 1) is not compact. Moreover, if X = [0, 1], then φ n (x) = x n → χ{1} (x) and again the convergence is not uniform since χ{1} is not lower semicontinuous. Note that the characteristic function {1 if x ∈ C , χ C (x) = { 0 if x ∈ ̸ C { of a closed set C is only upper semicontinuous. Next we introduce some topological notions that are used often in problems of nonlinear analysis. Definition 1.7.11. Let (X, τ) be a Hausdorff topological space and A ⊆ X. We say that A is a retract of X if there is a continuous map r : X → A such that rA = id A . The map r : X → A is called a retraction. Remark 1.7.12. Equivalently we can say that A ⊆ X is a retract of X if id A is continuously extendable to X. The concept of retracts is a topological notion, that is, if h : X → Y is a homeomorphism and A ⊆ X is a retract of X, then h(A) is a retract of Y. Example 1.7.13. (a) X and for u ∈ X, the singletons {u} are retracts of X.

1.7 Semicontinuous Functions – Miscellaneous Notions |

n

69

n

(b) If B1 = {u ∈ ℝn : |u| ≤ 1} and S n−1 = {u ∈ ℝn : |u| = 1}, then B1 is a retract of ℝn with a retraction given by {u r(u) = { |u| u {

if |u| ≥ 1 , if |u| < 1 ,

while S n−1 is a retract of ℝn \ {0} with a retraction given by r(u) = u/|u| for all u ∈ ℝn \ {0}. (c) Every nonempty closed subset of the Polish space ℕ∞ is a retract of ℕ∞ . Proposition 1.7.14. If (X, τ) is a Hausdorff topological space and A is a retract of X, then A is closed. Proof. Arguing by contradiction, suppose that A is not closed and let x ∈ A \ A. Then, for a retraction r, we have r(x) ≠ x and so we can find U ∈ N(x), V ∈ N(r(x)) such that U ∩ V = 0 since X is assumed to be Hausdorff. Because of the continuity of r, there holds r(U) ⊆ V. Let u ∈ A ∩ U, recall x ∈ A, then r(u) = u ∈ V, a contradiction. Proposition 1.7.15. If X is a Hausdorff topological space and A ⊆ X, then A is a retract of X if and only if for every Hausdorff topological space Y every continuous map f : A → Y is continuously extendable on all of X. Proof. ⇒: Let r : X → A be a retraction. Then f ∘ r : X → Y is a continuous extension of f . ⇐: Let Y = A. Then, according to Remark 1.7.12, A is a retract of X. Definition 1.7.16. Let X, Y be two Hausdorff topological spaces and f, g : X → Y two continuous maps. A homotopy from f to g is a continuous map h : [0, 1] × X → Y such that h(0, ⋅) = f(⋅) and h(1, ⋅) = g(⋅). Then we say that f and g are homotopic and write f ≃ g (or f ≃ g (h) if we need to emphasize the homotopy). Remark 1.7.17. We can think of the homotopy as a time dependent deformation, with the parameter t ∈ [0, 1] being the time, of f into g as time moves from 0 to 1. This deformation is continuous. So there are no breaks or jumps. Proposition 1.7.18. ≃ is an equivalence relation on C(X, Y). Proof. First, we see that f ≃ f via the constant homotopy h(t, ⋅) = f(⋅) for all t ∈ [0, 1]. Now let f, g ∈ C(X, Y) and suppose that f ≃ g. Denote by h : [0, 1] × X → Y the ̃ x) = h(1 − t, x) for all t ∈ [0, 1] and for all x ∈ X is corresponding homotopy. Then h(t, a homotopy from g to f . Therefore g ≃ f . Finally if f ≃ g (h1 ) and g ≃ k(h2 ), then {h1 (2t, x) h(t, x) = { h (2t − 1, x) { 2

if x ∈ [0, 12 ] , if x ∈ [ 12 , 1]

for all t ∈ [0, 1] and for all x ∈ X is a homotopy from f to k; see Proposition 1.1.37. Hence, f ≃ k.

70 | 1 Basic Topology Definition 1.7.19. Let X, Y be two Hausdorff topological spaces. (a) If f ∈ C(X, Y) is homotopic to a constant map, then we say that f is nullhomotopic and we write that f ≃ 0. (b) We say that the space X is contractible if idX is nullhomotopic. (c) If φ ∈ C(X, Y) and ψ ∈ C(Y, X), then we say that ψ is a homotopy inverse of φ if ψ ∘ φ ≃ idX and φ ∘ ψ ≃ idY . If φ has a homotopy inverse, then φ is said to be a homotopy equivalence. In this case we say that X and Y are homotopy equivalent (or of the same homotopy type). Remark 1.7.20. It is easy to check by applying Proposition 1.7.18 that homotopy equivalence is an equivalence relation. Note that every convex set in ℝN is contractible and, more generally, every star-shaped set in ℝN is contractible. Recall that a set A ⊆ ℝN is star-shaped, if there exists u0 ∈ A such that for every u ∈ A, the line segment [u0 , u] = {(1 − t)u0 + tu : 0 ≤ t ≤ 1} is contained in A. In general, a contractible space is one that can be continuously shrunk to a point. Indeed, according to Definition 1.7.19(b), there exists a continuous map h : [0, 1] × X → X such that h(0, x) = x for all x ∈ X and h(1, x) = x0 for all x ∈ X with x0 ∈ X. Definition 1.7.21. Let X be a Hausdorff topological space. (a) A continuous map h : [0, 1] × X → X is a deformation of X if h(0, ⋅) = idX . Moreover, if h(1, X) ⊆ A ⊆ X, then we say that h is a deformation of X onto A. (b) A closed set A ⊆ X is a (resp. strong) deformation retract of X if there exists a deformation h : [0, 1] × X → X of X onto A such that h(1, ⋅)A = idA (resp. such that h(t, ⋅)A = idA for all t ∈ [0, 1]). The deformation h is called a (resp. strong) deformation retraction. Remark 1.7.22. Note that A ⊆ X is a deformation retract if and only if there exists a retraction r : X → A (see Definition 1.7.11), such that i A ∘ r ≃ idX ; see Definition 1.7.16. Then, since r ∘ i A = idA , we infer that the inclusion map i A : A → X is a homotopy equivalence. Example 1.7.23. From Example 1.7.13(b), we know that S n is a retract of ℝn+1 \ {0}. In fact it is a strong deformation retract. Indeed, consider the deformation h : [0, 1] × (ℝn+1 \ {0}) → ℝn+1 defined by h(t, x) = (1 − t)x + t

x |x|

for all t ∈ [0, 1] and for all x ∈ ℝn+1 \ {0} .

Directly from the previous definitions we have the following result. Proposition 1.7.24. If X is a Hausdorff topological space, then the following statements are equivalent: (a) X is contractible. (b) X is homotopy equivalent to a singleton. (c) Any point of X is a deformation retract of X.

1.7 Semicontinuous Functions – Miscellaneous Notions | 71

Proposition 1.7.25. If Y is a Hausdorff topological space, then f ∈ C(S n , Y) is nullhomon topic if and only if there exists a f ̂ ∈ C(B1 , Y) such that f ̂S n = f , that is, f ̂ is a continuous n extension of f on B1 . Proof. ⇒: Since 0 ≃ f , there exists a homotopy h : [0, 1]×S n → Y such that h(0, ⋅) = u0 and h(1, ⋅) = f . Let {u0 if 0 ≤ |x| ≤ 12 , f ̂(x) = { x h (2|x| − 1, |x| ) if 12 ≤ |x| ≤ 1 . { n Then f ̂ ∈ C(B1 , Y) and f ̂S n = f . n ⇐: Let h(t, x) = f ̂(tx) for all t ∈ [0, 1] and for all x ∈ B . Then, using this homotopy, 1

we see that 0 ≃ f .

The next notion is related to the Tietze Extension Theorem; see Theorem 1.2.44. Definition 1.7.26. A Hausdorff topological space X is said to be an absolute retract (AR for short) if the following are true: (a) X is metrizable; (b) for any metrizable space Y and any closed set A ⊆ Y each f ∈ C(A, X) can be extended to a f ̂ ∈ C(Y, X), that is, f ̂A = f . Remark 1.7.27. So an AR can replace ℝ in the Tietze Extension Theorem, see Theorem 1.2.44, for metric spaces. Proposition 1.7.28. If X is an AR and C is a retract of X, then C is an AR. Proof. Let Y be a metrizable space, A ⊆ Y a closed set, and f ∈ C(A, C). Let r : X → C be a retraction. Since X is an AR, there exists f ̂ ∈ C(Y, X) such that f ̂A = f . Then f0̂ = r ∘ f ̂ ∈ C(Y, C) is the desired extension of f . Now we will identify some useful spaces that are AR. The first result is known as “Dugundji’s Extension Theorem.” Theorem 1.7.29 (Dugundji’s Extension Theorem). If X is a metrizable space, A ⊆ X is closed, Y is a locally convex space, and f ∈ C(A, Y), then there exists f ̂ ∈ C(X, Y) such that f ̂A = f and f ̂(X) ⊆ conv f(A). Proof. Let d be a compatible metric on X. For x ∈ X and r > 0, let B(x, r) = {u ∈ X : d(u, x) < r}. We consider the family {B(x, 1/2d(x, A) : x ∈ X \ A}. This is an open cover of X \ A. Since X \ A is paracompact (see Theorem 1.5.69), there exists a locally finite refinement {U α }α∈I . For U α choose B(x α , 1/2d(x α , A)) such that 1 U α ⊆ B (x α , d(x α , A)) ; (1.7.7) 2 see Definition 1.4.79(a). We choose u α ∈ A such that d(x α , u α ) ≤ 2d(x α , A) .

(1.7.8)

72 | 1 Basic Topology We have d(x α , A) ≤ 2d(x, A)

for all x ∈ U α .

(1.7.9)

To see (1.7.9) note that for all x ∈ U α d(x α , A) ≤ d(x α , x) + d(x, A) ≤

1 d(x α , A) + d(x, A) ; 2

see (1.7.7). Hence, (1.7.9) holds. Moreover we have d(u, u α ) ≤ 6d(u, x) for all u ∈ A and all x ∈ U α .

(1.7.10)

Again, to see (1.7.10), note that, because of (1.7.7) and (1.7.8), for all u ∈ A and for all x ∈ Uα , d(u, u α ) ≤ d(u, x) + d(x, x α ) + d(x α , u α ) 1 ≤ d(u, x) + d(x α , A) + 2d(x α , A) 2 ≤ d(u, x) + d(x, A) + 4d(x, A) ≤ 6d(u, x) . Thus, (1.7.10) holds. Invoking Theorem 1.4.86, there exists a partition of unity {ξ α }α∈I subordinated to the cover {U α }α∈I . We define {f(u) if u ∈ A , f ̂(u) = { ∑ ξ (u)f(u α ) if u ∈ X \ A . { α∈I α

(1.7.11)

Clearly, f ̂A = f and f ̂ is continuous on the open set X \ A. We need to show the continuity of f at the points of A. Let u ∈ A and V ∈ N(f(u)). Since Y is locally convex and f is continuous at u, we can find a convex set C and a δ > 0 such that f (A ∩ B δ (u)) ⊆ C ⊆ V . 6

(1.7.12)

Let x be any point of B δ/6 (u) \ A. Since the cover {U α }α∈I is locally finite, it belongs to finitely many sets U α1 , . . . , U α n . Then d(x, u) < δ/6 and since x ∈ U α we have d(u, u α i ) < δ for all i ∈ {1, . . . , n}; see (1.7.10). This implies that u α i ∈ A ∩ B δ (u) for all i ∈ {1, . . . , n}. Because of (1.7.11) and since C is convex it follows that f ̂(u) ∈ C. Therefore, due to (1.7.12), f ̂(B δ/6 (u)) ⊆ V. Hence f ̂A is continuous. Corollary 1.7.30. If C is a convex subset of a locally convex space X and C is metrizable, then C is an AR. Next we show that in an infinite dimensional normed space X, the unit sphere ∂B1 = {u ∈ X : ‖u‖ = 1} is an AR. To do this we will need the following remarkable result due to Klee [176].

1.8 Remarks | 73

Theorem 1.7.31. If X is an infinite dimensional normed space and K ⊆ X is compact, then X \ C and X are homeomorphic. Using this theorem, we can prove the following important result. Theorem 1.7.32. If X is an infinite dimensional normed space, then ∂B1 = {u ∈ X : ‖u‖ = 1} is an AR and a retract. Proof. By Theorem 1.7.31 X and X \ {0} are homeomorphic. Due to Corollary 1.7.30, X is an AR. Hence X \ {0} is an AR as well. Applying the radial retraction r : X \ {0} → ∂B1 defined by r(u) = u/‖u‖ for all u ∈ X \ {0}, we see that ∂B1 is a retract of X \ {0}, hence an AR; see Proposition 1.7.25. Therefore we conclude that ∂B1 is an AR and a retract of X. Remark 1.7.33. The result fails if X is finite dimensional. We will show this in Section 6.4 by using fixed point theory.

1.8 Remarks (1.1) Point set topology emerged as a coherent field of mathematics with Hausdorff’s 1914 book [140]. Hausdorff found the right set of axioms to introduce the notion of topology in a general setting. He provided a unified framework for all previous topological research. Abstract spaces were first introduced by Fréchet [117] and Riesz [240]. The notion of a subbasis (see Definition 1.1.3) is due to Bourbaki [42]. The books of Choquet [65], Dugundji [91], Kelley [172], Kuratowski [183, 184], Munkres [226], Nagata [228], and Willard [309] are excellent references for all topics of point-set topology discussed here. (1.2) The Hausdorff property (see Definition 1.2.1) was among the axioms for a topology used by Hausdorff. Before Hausdorff spaces, there was a more general class, the T1 -spaces introduced by Fréchet and Riesz. Definition 1.8.1. A topological space X is a T1 -space if and only if for every distinct x, u ∈ X, there is a neighborhood of each not containing the other. Remark 1.8.2. In such spaces singletons are closed sets. Regular spaces (see Definition 1.2.7) were introduced by Vietoris [294] and the normality property is due to Tietze [285]. Many authors define regularity and normality of T1 -spaces (see Definition 1.8.1): for example, Kelley [172] and Munkres [226]. Here we follow Dugundji [91]. Urysohn’s Lemma (see Theorem 1.2.17) was proven by Urysohn [289]. The companion Theorem 1.2.17 (Tietze Extension Theorem) was proven by Tietze [284]. The notion of complete regularity (see Definition 1.2.19) is due to Urysohn [289]. The notions of first and second countability (see Definition 1.2.20) were defined by Hausdorff [140] while the notion of separability is due to Fréchet [117]. The Lindelöf

74 | 1 Basic Topology property (see Definition 1.2.26(b)) goes back to Lindelöf [200] for Euclidean spaces. The general study of Lindelöf spaces started with the paper of Kuratowski–Sierpinski [182]. E. H. Moore [219] and E. H. Moore–Smith [220] developed the general theory of convergence using nets, although the term is due to Kelley [171]. Subnets (see Definition 1.2.38) were introduced by E. H. Moore [221] and studied in detail by Kelley [171]. There is an alternative approach using filters instead of nets. This approach is used by Bourbaki [45]. (1.3) Weak topologies are discussed in Bourbaki [45] under the name “initial topologies.” Moreover, quotient topologies were first studied by Alexandrov [4] and R. L. Moore [223]. Weak topologies are important in Banach space theory. (1.4) The notion of connectedness (see Definition 1.4.23(b)) is even older and appears in the work of Weierstraß. Locally connected spaces (see Definition 1.4.34) were introduced by Hahn [135] and are discussed in detail in the books of Dugundji [91] and Kuratowski [184]. Here is another notion of “connectedness” for metric spaces that can traced back to the work of Cantor. Definition 1.8.3. A metric space (X, d) is said to be well-chained (or well-linked) if for every pair (x, u) ∈ X × X and every ε > 0 there exists a finite sequence v1 , . . . , v n of points in X such that v1 = x, v n = u and d(v k , v k+1 ) ≤ ε for all k ∈ {1, . . . , n − 1}. That means x and u can be joined by a chain of steps at most equal to ε. Proposition 1.8.4. Every connected metric space is well-chained. For compact metric spaces we have “connected ⇐⇒ well-chained.” The term “compact space” is due to Fréchet [117] who used it to describe sequential compactness of metric spaces. Hausdorff [140] observed that the sequential definition of compactness is equivalent to the general definition (see Definition 1.4.42) for metric spaces. Alexandrov–Urysohn [5] used Definition 1.4.42 to describe compact spaces and called them “bicompact spaces.” The Product Theorem of Tychonoff (see Theorem 1.4.56) was proven by Tychonoff [288] and showed that Definition 1.4.42 is the right one, that is, more general for compactness since it passes to arbitrary products. Local compactness was introduced by Alexandrov [3] and Tietze [285]. For a topological vector space, local compactness is equivalent to finite dimensionality. Local compactness is important in integration theory and in the theory of topological groups. The problem of compactification was initiated by Alexandrov [3] who introduced the one-point compactification; see Definition 1.4.74. Paracompactness was defined by Dieudonne [81] with important contributions of Michael [213, 215], [216]. (1.5) The extension of topological considerations beyond the realm of Euclidean spaces was achieved by Fréchet [117] who introduced metric spaces and allowed the “points” under consideration to be abstract objects and not real numbers or real vectors. The idea of completion of metric spaces can be traced back to Cauchy who tried to define

1.8 Remarks | 75

irrational numbers as the limits of Cauchy sequences of rational numbers. The notion of complete metric space can be found in Fréchet [117] and the general completion construction is due to Hausdorff [140]. The supremum metric (see Definition 1.5.28) although attributed to Fréchet, was first used by Weierstraß back in 1885. The systematic study of continuous maps and homeomorphisms started with Fréchet [117] although the idea of homeomorphism (but in a less general context) was used by Poincaré back in 1895. Next we present an important theorem that gives us conditions under which a Hausdorff topological space is metrizable. The result is due to Urysohn [290] and is known as the “Urysohn Metrization Theorem.” Theorem 1.8.5 (Urysohn Metrization Theorem). Every second countable regular topological space is metrizable. Polish spaces are discussed in Bourbaki [45] and Souslin spaces in L. Schwartz [268]. More about them in the Remarks of Chapter 2. The notions of first and second category spaces (see Definition 1.5.62(b),(c)) were introduced by Baire [20] who also proved Theorem 1.5.68(b). Theorem 1.5.68(a) is due to R. L. Moore [222] and Theorem 1.5.69 is due to A. H. Stone [278]. (1.6) The compact-open topology (see Definition 1.6.1) was defined and studied in detail by Arens [10] and Fox [116]. The Arzela–Ascoli Theorem (see Theorem 1.6.14) was first proven for C[0, 1] by Arzela [11] (the necessary part) and by Ascoli [12] (the sufficient part). Definition 1.8.6. A Hausdorff topological space X is a k-space (or a compactly generated space) if the following condition hold: “C ⊆ X is closed if and only if C ∩ K is closed for every K ⊆ X compact.” Theorem 1.8.7. (a) Every locally compact space is a k-space. (b) Every first countable space is a k-space. Remark 1.8.8. In particular a metric space is a k-space. This leads us to the following generalization of Theorem 1.6.14. Theorem 1.8.9. Theorem 1.6.14 remains true if Y is only a k-space (not necessarily metric space). In this general form the result is due to Kelley [172, pp. 233-234]. (1.7) For further results on semicontinuous functions we refer to Dal Maso [70]. The next notion is important in variational problems. Definition 1.8.10. A function φ : X → ℝ = ℝ ∪ {+∞} is said to be coercive (sequentially coercive) if for every λ ∈ ℝ the sublevel set φ λ = {x ∈ X : φ(x) ≤ λ} is relatively compact (relatively sequentially compact).

76 | 1 Basic Topology Remark 1.8.11. Sequentially coercivity implies coercivity. Another name for coercivity is inf-compactness (sequential inf-compactness). Note that lower semicontinuity and coercivity are antagonistic notions. More precisely, let τ1 , τ2 be two Hausdorff topologies on X and assume that τ2 ⊆ τ1 . Then for a function φ : X → ℝ = ℝ ∪ {+∞} we have that “φ is τ2 -lower semicontinuous” implies “φ is τ1 -lower semicontinuous” as well as “φ is τ1 -coercive” implies “φ is τ2 -coercive.” A balance between these two properties leads to the choice of a good topology for variational analysis. For additional information on retracts, absolute retracts, homotopies, etc. we refer to Borsuk [40], Hu [159] and Granas–Dugundji [133].

Problems Problem 1.1. Suppose that X, Y are Hausdorff topological spaces and f : X → Y is a continuous map. Show that the set C = {(x, u) ∈ X × X : f(x) = f(u)} is closed in X × X with the product topology. Problem 1.2. Suppose that X, Y are Hausdorff topological spaces and f, g : X → Y are continuous maps. Show that {x ∈ X : f(x) = g(x)} is closed in X. Problem 1.3. Show that every subspace of a completely regular space is completely regular. Moreover show that X = ∏α∈I X α with the product topology is completely regular if and only if each factor space X α is completely regular. Problem 1.4. Show that X is completely regular if and only if it is homeomorphic to a subspace of some cube. Problem 1.5. Show that a topological space X is Hausdorff if and only if the diagonal D = {(u, u) ∈ X × X : u ∈ X} is closed in X × X with the product topology. Problem 1.6. Suppose that X is a Hausdorff topological space and let {u n }n≥1 ⊆ X be a sequence such that u n → u ∈ X. Show that the set K = {u n }n≥1 ∪ {u} is compact. Is the result true for nets? Justify your answer. Problem 1.7. Show that a regular Lindelöf space is normal. Problem 1.8. Suppose that X, Y are Hausdorff topological spaces, Y is compact and f : X → Y. Show that f is continuous if and only if Gr f = {(u, y) ∈ X × Y : y = f(u)} is closed in X × Y with the product topology. Problem 1.9. Suppose that {X α }α∈I are Hausdorff topological spaces and K α ⊆ X α with α ∈ I are compact sets. Let U ⊆ X = ∏α∈I X α be an open set for the product topology such that ∏α∈I K α ⊆ U. Show that there exists a basic open set V (for the product topology) such that ∏α∈I K α ⊆ V ⊆ U.

1.8 Remarks | 77

Problem 1.10. Let X, Y be Hausdorff topological spaces and let f : X → Y be a map with Gr f = {(u, y) ∈ X × Y : y = f(u)}, which is closed in X × Y with the product topology. Show that for every compact K ⊆ Y, f −1 (K) ⊆ X is closed. Problem 1.11. Let X be a locally compact topological space. Show that X is second countable if and only if it is separable and metrizable. Problem 1.12. Let X, Y be Hausdorff topological spaces and A ⊆ X, B ⊆ Y are nonempty sets. Show that A × B is closed (resp. open, dense) in X × Y with the product topology if and only if A and B are closed (resp. open, dense) in X and Y, respectively. Problem 1.13. Suppose that X is a normal topological space and A ⊆ X closed. Show that the following statements are equivalent: (a) A is a G δ -set. (b) There exists a continuous map f : X → Y such that A = f −1 (0). (c) For every closed C ⊆ X with A ∩ C = 0, there exists a continuous function f : X → [0, 1] such that f −1 (0) = A and f(B) = 1. Problem 1.14. Let (X, τ) be a Hausdorff topological space and L ⊆ τ be a subbasis of the topology. Assume that every L-cover of X admits a finite subcover. Show that (X, τ) is compact. Remark: this result is known as “Alexandrov’s Subbasis Theorem.” Problem 1.15. Let (X, d) be a metric space. Show that there exists a normed space V and an isometry ξ : X → V such that ξ(X) ⊆ V is closed. Remark: this result is known as the “Arens–Eells Embedding Theorem.” Problem 1.16. Let A ⊆ ℝN be connected and let A ε = {u ∈ ℝN : d(u, A) < ε}. Show that A ε is connected and path-connected. Problem 1.17. Let X be a Hausdorff topological space that is connected and A is a proper nonempty subset of X. Show that bd A ≠ 0. Problem 1.18. Let X be a Hausdorff topological space that is connected and A ⊆ X. Assume that bd A is connected. Show that A is connected as well. Problem 1.19. Let X be a Hausdorff topological space and A ⊆ X a connected set. Consider a set D ⊆ X such that A ∩ D ≠ 0 and A ∩ (X \ D) ≠ 0. Show that A ∩ bd D ≠ 0. Problem 1.20. Let X be a Hausdorff topological space, {K α }α∈I is a family of compact subsets of X and U ⊆ X is an open set such that ∩α∈I K α ⊆ U. Show that there exists a finite F ⊆ I such that ⋂α∈F K α ⊆ U. Problem 1.21. Let X be a compact topological space and (Y, d) a metric space. On C(X, Y) we consider the supremum metric d∞ ; see Definition 1.5.28. Show that C(X, Y) is d∞ -complete if and only if Y is d-complete. Problem 1.22. Show that a compact metric space cannot be isometric to a proper subset of itself.

78 | 1 Basic Topology Problem 1.23. Let (X, d) be a compact metric space. Show that: (a) Every nonexpansive map f : X → X (see Remark 1.5.23) is an isometry. (b) If f : X → X satisfies d(x, u) ≤ d(f(x), f(u)) for all x, u ∈ X, then f is an isometry. Problem 1.24. Let X be a noncompact, locally compact Hausdorff topological space and X̂ is its one-point Alexandrov compactification; see Theorem 1.4.75. Show that X̂ is metrizable if and only if X is second countable. Problem 1.25. Let (X, d) and (Y, ρ) be two metric spaces. Show the following two statements: (a) If f : X → Y is continuous, then there exists an equivalent metric d̂ on X such that f : (X, d)̂ → (Y, ρ) is Lipschitz continuous. (b) If L is a countable family of continuous functions from X into Y, then there exists an equivalent metric d̂ on X and an equivalent metric ρ̂ on Y such that each f ∈ L with f : (X, d)̂ → (Y, ρ)̂ is Lipschitz continuous. Problem 1.26. Let X be a Hausdorff topological space, (Y, d) a metric space, f : X → Y a continuous map, and D f = {x ∈ X : f is not continuous at x}. Show that D f is an F σ -set. Problem 1.27. Is there a function f : [0, 1] → ℝ with D f being the irrational numbers in [0, 1] (see Problem 1.26)? Justify your answer. Problem 1.28. Let X be a Hausdorff topological space and φ : X → ℝ = ℝ ∪ {+∞} a coercive and lower semicontinuous (resp. sequentially coercive and sequentially lower semicontinuous) function. Show that there exists u0 ∈ X such that φ(u0 ) = inf[φ(u) : u ∈ X]. Problem 1.29. Let φ : ℝN → ℝ be a function such that lim|u|→∞ φ(u)/|u| > 0. Show that φ is coercive in the sense of Definition 1.8.10. Problem 1.30. Let X, Y be metrizable spaces with Y compact and φ : X × Y → ℝ = ℝ ∪ {+∞} lower semicontinuous. Let m(u) = inf[φ(u, y) : y ∈ Y]. Show that m : X → ℝ is lower semicontinuous and for every u ∈ X there exists y0 ∈ Y such that m(u) = φ(u, y0 ). Problem 1.31. Suppose that X is a k-space (see Definition 1.8.6) and Y is a Hausdorff topological space. Show that f : X → Y is continuous if and only if f K is continuous for every compact K ⊆ X. Problem 1.32. Let X be a metric space, A ⊆ X closed, and V ⊆ [0, 1] × X an open set such that [0, 1] × A ⊆ V. Show that there exists an open set U ⊆ X such that A ⊆ U and [0, 1] × U ⊆ V. Problem 1.33. Let X be a Hausdorff topological space and A ⊆ X closed. Show that A is a deformation retract of X if and only if A is a retract of X and X is deformable into A. Problem 1.34. Let X be an AR. Show that any open set U ⊆ X is also an AR.

1.8 Remarks | 79

Problem 1.35. Show that ℚ is not topologically complete. Problem 1.36. Let {1 if x ∈ ℚ , χℚ (x) = { 0 if x ∈ ̸ ℚ { being the characteristic function of the rationals. Show that χℚ is not the pointwise limit of a sequence of continuous functions. Problem 1.37. Let (X, d) be a compact metric space and f : X → X an isometry. Show that f is surjective. Problem 1.38. Is the pointwise limit of lower semicontinuous functions a lower semicontinuous function? How about the uniform limit? Justify your answer. Problem 1.39. Show that the set of irrational numbers ℝ \ ℚ is topologically complete. Problem 1.40. Let X, Y be Hausdorff topological spaces and f : X → Y. Show that Gr f = {(u, y) ∈ X × Y : y = f(u)} is a retract of X × Y. Problem 1.41. Let (X, τ) be a compact topological space and suppose that there exists a countable, separating family F of continuous functions f : X → Y with (Y, d) a metric space. Show that τ is metrizable. Problem 1.42. Show that every locally compact Souslin space is Polish. Problem 1.43. Let (X, d) be a metric space and C1 , C2 ⊆ X nonempty, disjoint, closed sets with C2 compact. Show that d(C1 , C2 ) = inf[d(u, v) : u ∈ C1 , v ∈ C2 ] > 0. Problem 1.44. Let X be a locally compact and σ-compact topological space. Show that every open cover L of X has a locally finite open refinement {V n }n≥1 such that V n is compact for all n ∈ ℕ. Problem 1.45. Let X be a metrizable, locally compact, σ-compact topological space. Show the following: (a) Every open set U ⊆ X can be written as U = ⋃n≥1 K n with compact K n and K n ⊆ int K n+1 for all n ∈ ℕ. (b) Every compact set K ⊆ X can be written as K = ⋂n≥1 U n with U n open, U n compact and U n ⊇ U n+1 for all n ∈ ℕ. Problem 1.46. Let X be a locally compact space and X̂ its one-point Alexandrov compactification. Set V = {f ̂ ∈ C(X,̂ ℝ) : f(∞) = 0}. For every f ̂ ∈ V, let f ̃ = f ̂X . Show that f ̂ → f ̃ is an isometry of V onto C0 (X, ℝ) = {f ∈ C(X, ℝ) : for every ε > 0 there exists compact K ⊆ X such that |f(x)| < ε for all x ∈ X \ K} being the space of continuous functions on X vanishing at infinity.

80 | 1 Basic Topology Problem 1.47. Let X, Y be Hausdorff topological spaces, {V α }α∈I an open cover of Y, and f : X → Y a continuous map such that f α = f f −1 (V α ) : f −1 (V α ) → V α is a homeomorphism for every α ∈ I. Show that f is a homeomorphism. Problem 1.48. Let X, Y be Hausdorff topological space, f : X → Y a map, and G = Gr f = {(u, y) ∈ X × Y : y = f(u)}. Let g : X → G be defined by g(u) = (u, f(u)). Show that f is continuous if and only if g is a homeomorphism. Problem 1.49. Let X be a Baire space, Y a separable metric space and f : X → Y a map such that the inverse image of any open set is a F σ -set. Show that f is continuous at every point of a dense G δ -set. Problem 1.50. Let X be a second countable regular topological space and U ⊆ X an open set. Show that there exists a continuous function f : X → [0, 1] such that f(u) > 0 for all u ∈ U and f(u) = 0 for all u ∈ X \ U. Problem 1.51. Let X, Y be Hausdorff topological spaces, f : X → Y a continuous map, C n ⊆ X closed for all n ∈ ℕ, C n ↘ C being nonempty compact, and for every U ⊇ C open, there is n ∈ ℕ such that C n ⊆ U. Show that f(C) = ⋂n≥1 f(C n ) = ⋂n≥1 f(C n ). Problem 1.52. Let X be a regular topological space, K ⊆ X compact and U ⊆ X open such that K ⊆ U. Show that there exists an open set V ⊆ X such that K ⊆ V ⊆ V ⊆ U.

Figure 1.2 shows the relations between various spaces introduced in this chapter.

1.8 Remarks | 81

Hausdorff

Regular Completely Regular

Second Countable

Baire Locally Compact

Completely Normal

Normal

Perfectly Normal

Paracompact

Metric

Regular Lindelöf

Separable

Separable Metric

σ-Compact

Strongly Lindelöf

Borel

Compact

Complete Metric

Souslin

Polish

Compact Metric

Fig. 1.2: Topological spaces: From Compact Metric to Hausdorff.

2 Measure Theory Measure Theory is the part of mathematical analysis that deals with the development of a precise way to measure large classes of sets and how to integrate functions. It started at the end of the 19th century with the works of Jordan, Borel, Young, and Lebesgue. By that time it was evident that the Riemann integral had serious limitations and had to be replaced by a new integral that was more general (that is, more functions could be integrated) and more flexible (that is, it led to more efficient calculus rules and in particular convergence theorems). The construction of Lebesgue turned out to be extremely fruitful and launched “Measure Theory.” The idea of Lebesgue to partition the f(x)-axis (instead of the x-axis as is done in the Riemann integral) was a remarkable conceptual insight, which allowed the full power of measure theory to reveal itself. In this chapter we present some basic aspects of this theory, which are needed to deal with the topics that follow.

2.1 Basic Notions, Measures, and Outer Measures We start by defining algebras and σ-algebras. These are families of subsets of a given set. On σ-algebras, the theory exhibits its full strength. Definition 2.1.1. Let X be a set and L ⊆ 2X a nonempty family of subsets. (a) We say that L is an algebra (or a field) if A, B ∈ L implies A ∪ B ∈ L and A c = X \ A ∈ L. That is, L is closed under finite unions and complementation. (b) We say that L is a σ-algebra (or a σ-field) if L is an algebra and it is closed under countable unions, that is, if {A n }n≥1 ⊆ L, then ⋃n≥1 A n ∈ L. Remark 2.1.2. Note that if L is an algebra, then 0, X ∈ L. Indeed, let A ∈ L. Then A c ∈ L and so X = A ∪ A c ∈ L. Hence 0 = X c ∈ L. Moreover, by de Morgan’s law, every algebra (resp. σ-algebra) is closed under finite (resp. countable) intersections. If E ⊆ X, then the restriction (or trace) of L on E is defined by LE = {E ∩ A : A ∈ L}. Example 2.1.3. (a) There are two extreme cases: L1 = {0, X} and L2 = 2X . Both are σ-algebras with L1 being the smallest with respect to inclusion and L2 being the greatest one. (b) Let X = [0, 1) and let L be the finite union of intervals [a, b) ⊆ [0, 1). Then L is an algebra but not an σ-algebra since E = ⋂n≥1 [0, 1/n) = {0} ∈ ̸ L. Evidently the intersection of σ-algebras is again a σ-algebra. This leads to the following definitions.

https://doi.org/10.1515/9783110532982-002

84 | 2 Measure Theory Definition 2.1.4. (a) Let X be a set and let F ⊆ 2X be nonempty. The σ-algebra generated by F, denoted by σ(F), is defined by σ(F) = ⋂ {L ⊆ 2X : F ⊆ L, L is a σ-algebra} . (b) Let (X, τ) be a Hausdorff topological space. The Borel σ-algebra is defined by B(X) = σ(τ). As we will see later in our discussion of measures it is often more convenient to start with families that have less structure than σ-algebras and eventually pass to the σ-algebra they generate. Definition 2.1.5. Let X be a set and let L ⊆ 2X be a nonempty family of subsets. (a) We say that L is a ring if A, B ∈ L implies A ∪ B ∈ L and A \ B ∈ L. That is, L is closed under finite unions and relative complementation. (b) We say that L is a σ-ring if L is a ring and it is closed under countable unions, that is, if {A n }n≥1 ⊆ L, then ⋃n≥1 A n ∈ L. (c) We say that L is a semiring if the following hold: (i) 0 ∈ L; (ii) A, B ∈ L implies A ∩ B ∈ L; (iii) A, B ∈ L implies A \ B = ⋃nk=1 C k for some n ∈ ℕ and disjoint {C k }nk=1 ⊆ L. Remark 2.1.6. Note that if L is a ring and A ∈ L, then 0 = A \ A ∈ L. So, the empty set is always an element of a ring. Hence if L is a ring and X ∈ L, then L is an algebra. Thus we see that the collection of all finite subsets of X is a ring but not an algebra unless X is a finite set. On the other hand the collection of all finite subsets of X and of their complements is an algebra but not a σ-algebra unless X is a finite set. If L is a ring and A, B ∈ L, then A ∩ B = A \ (A \ B) ∈ L. So, a ring is also closed under finite intersections. Similarly A∆B = (A \ B) ∪ (B \ A) ∈ L and so a ring is also closed under symmetric differences. We have the following relations among the notions introduced thus far:

σ-algebra

σ-ring

ring

semiring

algebra

Apart from trivial cases, σ(L) (see Definition 2.1.4(a)) cannot be constructively obtained from L. In order to overcome this difficulty, we introduce the following notions. Definition 2.1.7. Let X be a set and D ⊆ 2X . We say that D is a Dynkin system (or a λ-system) if the following conditions hold:

2.1 Basic Notions, Measures, and Outer Measures | 85

(i) X ∈ D; (ii) A, B ∈ D with B ⊆ A implies A \ B ∈ D; (iii) {A n }n≥1 ⊆ D increasing implies A = ⋃n≥1 A n ∈ D. Remark 2.1.8. Evidently (ii) implies that 0 is in every Dynkin system and {0, X} as well as 2X are both Dynkin systems. Consider also the following conditions on the family D ⊆ 2X : (iv) A ∈ D implies A c ∈ D; (v) for every disjoint sequence {A n }n≥1 ⊆ D we have ⋃n≥1 A n ∈ D. It is easy to show that D is a Dynkin system if and only if (i), (iv), and (v) hold if and only if (i), (ii), and (v) hold. Definition 2.1.9. Let X be a set and L ⊆ 2X a nonempty family of subsets of X. We say that L is a monotone class if {A n }n≥1 ⊆ L is increasing or decreasing, then A = ⋃ A n ∈ L or

A = ⋂ An ∈ L .

n≥1

n≥1

Remark 2.1.10. Any σ-algebra is a monotone class but a topology is not in general. Of course 2X is always a monotone class and the intersection of a family of monotone classes is a monotone class. So, there is a smallest monotone class containing a nonempty family L ⊆ 2X . A monotone class that is also an algebra is also a σ-algebra. The next result is known as the “Dynkin System Theorem.” The name “Dynkin’s π−λ Theorem” can be also found in the literature. Theorem 2.1.11 (Dynkin System Theorem). If X is a set, L ⊆ 2X is a nonempty family of subsets that is closed under finite intersections, and D is a Dynkin system such that D ⊇ L, then D ⊇ σ(L). Proof. Let D0 be the smallest Dynkin system containing L. Evidently D0 ⊆ D. Moreover, σ(L) is a Dynkin system. So, we also have D0 ⊆ σ(L). Let R = {A ∈ D0 : A ∩ B ∈ D0 for every B ∈ L} . Since L is closed under finite intersections we have L ⊆ R and since D0 is a Dynkin system, we have that R is a Dynkin system as well. Therefore D0 = R .

(2.1.1)

R

Let = {E ∈ D0 : E ∩ D ∈ D0 for all D ∈ D0 }. Because of (2.1.1), it holds that D0 = R and so we have that L ⊆ R , and R is a Dynkin system. Hence, D0 = R , which means that D0 is closed under finite intersections. Thus, D0 is a σ-algebra; see Remark 2.1.8. Hence, σ(L) = D0 ⊆ D . Monotone classes are closely related to σ-algebras and by Theorem 2.1.11 are also related to Dynkin systems. The next result illustrates this and is known as the “Monotone Class Theorem.”

86 | 2 Measure Theory Theorem 2.1.12 (Monotone Class Theorem). If X is a set, L ⊆ 2X is an algebra and M ⊆ 2X is a nonempty, monotone class such that M ⊇ L, then M ⊇ σ(L). Proof. Let Σ = σ(L) and let M0 be the smallest monotone class containing L. Evidently M0 ⊆ M. If we show that Σ = M0 , then we are done. To this end, we fix A ∈ M0 and let M0A = {B ∈ M0 : A ∩ B, B \ A ∈ M0 } . Then M0A is a monotone class. If A ∈ L, then since L is an algebra, we have M0 ⊆ M0A , hence M0 = M0A . So, for any B ∈ M0 we have A ∩ B, A \ B, B \ A ∈ M0

for any A ∈ L .

Thus, L ⊆ M0B , which implies M0 = M0B . Then we see that M0 is an algebra and so it follows that M0 is a σ-algebra; see Remark 2.1.10. It follows that Σ ⊆ M0 and because Σ is also a monotone class containing L we conclude that Σ = M0 ⊆ M. Remark 2.1.13. From the proof above we see that if L ⊆ 2X is an algebra, then σ(L) coincides with the smallest monotone class generated by L. Therefore, the algebra L is a monotone class if and only if L is a σ-algebra. Since the Borel σ-algebra (see Definition 2.1.4(b)) is an important σ-algebra, we state some easy but useful facts concerning its generation. The first result is an immediate consequence of Theorem 2.1.11. Proposition 2.1.14. If X is a Hausdorff topological space, then the Borel σ-Algebra is the smallest Dynkin system containing the open sets or the closed sets. In the context of metric spaces we can state a little different characterization of the Borel sets. Proposition 2.1.15. If X is a metrizable space, then the Borel σ-Algebra B(X) is the smallest family of subsets of X that includes the open sets and it is closed under countable intersections and under countable disjoint unions. Proof. From Proposition 1.5.8 we know that every closed set is G δ . Hence, every family of sets that contains the open sets and is also closed under countable intersections, must contain the closed sets. Then the result follows from Problem 2.1. For a similar result for families containing the closed sets, we need to require that we have closure under arbitrary unions, not just disjoint ones. Proposition 2.1.16. If X is a metrizable space, then the Borel σ-Algebra B(X) is the smallest family of subsets of X that includes the closed sets and it is closed under countable intersections and under countable unions.

2.1 Basic Notions, Measures, and Outer Measures | 87

Proof. Recall again from Proposition 1.5.8 that every open set is F σ . Hence every family of sets that contains the closed sets and is closed under countable unions, must contain the open sets as well. Again an appeal to Problem 2.1 concludes the proof. Remark 2.1.17. In a Hausdorff topological space the closure of any set belongs to the Borel σ-algebra being closed. Similarly for the interior of any set being open and the boundary of any set being closed. Recalling that singletons are closed sets, we infer that countable sets are Borel. Finally, compact sets are also Borel being closed. For the real line ℝ we can choose among many different generators of the Borel σ-algebra. So let L1 = {(a, b) : a < b}, L4 = {[a, b] : a < b}, L7 = {[a, +∞) : a ∈ ℝ}, L10 = closed sets of ℝ .

L2 = {[a, b) : a < b}, L5 = {(a, ∞) : a ∈ ℝ}, L8 = {(−∞, b] : b ∈ ℝ},

L3 = {(a, b] : a < b} , L6 = {(−∞, b) : b ∈ ℝ} , L9 = open sets of ℝ ,

Moreover, by Lrk , k ∈ {1, . . . , 8} we denote the collection of intervals in Lk with rational endpoints. The next result is straightforward. Proposition 2.1.18. B(ℝ) = σ(Lk ) for all k ∈ {1, . . . , 10} and B(ℝ) = σ(Lrk ) for all k ∈ {1, . . . , 8}. In many cases we will deal with the extended real line ℝ∗ = ℝ ∪ {±∞}. In this case we have the following. Definition 2.1.19. It holds that B(ℝ∗ ) = σ(B(ℝ) ∪ {{+∞}, {−∞}}). Remark 2.1.20. Evidently B(ℝ∗ ) = {the B(ℝ)-sets or the B(ℝ)-sets with +∞ or −∞ or both attached to them}. From Proposition 2.1.18 and Definition 2.1.19 we obtain the following. Proposition 2.1.21. It holds that card(B(ℝ)) = card(B(ℝ∗ )) = c being the cardinality of the continuum. Now we pass to set functions. Definition 2.1.22. Let X be a set, 0 ∈ L ⊆ 2X and μ : L → ℝ∗ is a set function. (a) We say that μ is monotone if A ⊆ B with A, B ∈ L implies μ(A) ≤ μ(B) . (b) We say that μ is additive (or finitely additive) if {A k }nk=1 ⊆ L are pairwise disjoint and ⋃nk=1 A k ∈ L implies μ(⋃nk=1 A k ) = ∑nk=1 μ(A k ). (c) We say that μ is σ-additive (or countably additive) if {A k }k≥1 ⊆ L are pairwise disjoint and ⋃k≥1 A k ∈ L implies μ(⋃k≥1 A k ) = ∑k≥1 μ(A k ).

88 | 2 Measure Theory (d) We say that μ is subadditive if {A k }nk=1 ⊆ L and ⋃nk=1 A k ∈ L imply μ(⋃nk=1 A k ) ≤ ∑nk=1 μ(A k ). (e) We say that μ is σ-subadditive if {A k }k≥1 ⊆ L and ⋃k≥1 A k ∈ L imply μ(⋃k≥1 A k ) ≤ ∑k≥1 μ(A k ). (f) When L = Σ is a σ-algebra, then we say that the set function μ : Σ → ℝ∗ = ℝ∪{±∞} is a signed-measure if it takes only one of the values +∞ and −∞, μ(0) = 0, and it is σ-additive. If μ takes only nonnegative values, then we say that μ is a measure. (g) A pair (X, Σ) with X being a set and Σ ⊆ 2X being a σ-algebra is said to be a measurable space. If μ is a measure on (X, Σ), then (X, Σ, μ) is said to be a measure space. We say that μ is finite (or that the measure space (X, Σ, μ) is finite) if μ(X) < ∞. We say that μ is σ-finite if X = ⋃n≥1 X n with X n ∈ Σ and μ(X n ) < +∞ for all n ∈ ℕ. Example 2.1.23. (a) Let X be a nonempty set and Σ = 2X . The set function μ : Σ → [0, +∞] defined by {card(A) if A is finite , μ(A) = { +∞ otherwise , { is a measure known as the counting measure. If X is finite (resp. countable), then μ : Σ → [0, +∞] is finite (resp. σ-finite). More generally, let f : X → [0, +∞) be a function and define μ : 2X → [0, +∞] by setting μ(A) = ∑ f(x) = sup [ ∑ f(x) : F ⊆ A is finite] . x∈A

x∈F

Then μ : 2X → [0, +∞] is a measure that is σ-finite if {x ∈ X : f(x) > 0} is countable. Evidently, if f(x) = 1 for all x ∈ X, then we have the counting measure. If f(x0 ) = 1 and f(x) = 0 if x ≠ x0 , then μ : 2X → [0, +∞] is called the Dirac measure at x0 and is denoted by δ x0 . (b) Let X be an uncountable set and let Σ = {A ⊆ X : A is countable or A c is countable} . Then Σ is a σ-algebra being the σ-algebra of countable or co-countable sets. The set function μ : Σ → [0, 1] defined by {0 if A is countable , μ(A) = { 1 if A c is countable, that is, A is co-countable { is a finite measure. The next proposition summarizes the main properties of measures. Proposition 2.1.24. Let (X, Σ, μ) be a measure space. Then the following hold: (a) μ(A ∪ B) + μ(A ∩ B) = μ(A) + μ(B) for all A, B ∈ Σ. (b) μ(A) = μ(B) + μ(A \ B) for all A, B ∈ Σ with B ⊆ A. (c) μ(B) ≤ μ(A) for all A, B ∈ Σ with B ⊆ A (monotonicity).

2.1 Basic Notions, Measures, and Outer Measures | 89

(d) μ(⋃k≥1 A k ) ≤ ∑k≥1 μ(A k ) for all {A k }k≥1 ⊆ Σ (σ-subadditivity). (e) If {A k }k≥1 ⊆ Σ is increasing, then μ(⋃k≥1 A k ) = limk→∞ μ(A k ) (continuity from below). (f) If {A k }k≥1 ⊆ Σ is decreasing and μ(A1 ) < +∞, then μ(⋂k≥1 A k ) = limk→∞ μ(A k ) (continuity from above). Proof. (a) By additivity we have μ(A) = μ(A ∩ B) + μ(A \ B)

and

μ(B) = μ(A ∩ B) + μ(B \ A) .

Adding these two equations gives μ(A) + μ(B) = μ(A ∩ B) + [μ(A ∩ B) + μ(A \ B) + μ(B \ A)] = μ(A ∩ B) + μ(A ∪ B) again by the additivity. (b) Let A = B ∪ (A \ B) and use the additivity we obtain μ(A) = μ(B) + μ(A \ B). (c) Since μ is nonnegative, the assertion follows from (b). (d) Let B1 = A1 and B k = A k \ ⋃k−1 i=1 A i for k ≥ 2. Then the sets {B k }k≥1 are disjoint and ⋃k≥1 B k = ⋃k≥1 A k . Then, taking the σ-additivity and part (c) into account it follows μ (⋃ A k ) = μ (⋃ B k ) = ∑ μ(B k ) ≤ ∑ μ(A k ) . k≥1

k≥1

k≥1

k≥1

(e) Let A0 = 0. Then n

μ (⋃ A k ) = ∑ μ(A k \ A k−1 ) = lim ∑ μ(A k \ A k−1 ) = lim μ(A n ) . k≥1

n→∞

k≥1

n→∞

k=1

(f) Let B k = A1 \ A k . Then {B k }k≥1 ⊆ Σ is increasing, μ(A1 ) = μ(A k ) + μ(B k ) for all k ∈ ℕ, see part (b), and ⋃k≥1 B k = A1 \ ⋂k≥1 A k . By parts (e) and (b) there holds μ(A1 ) = μ (⋂ A k ) + lim μ(B k ) = μ (⋂ A k ) + lim [μ(A1 ) − μ(A k )] . k≥1

k→∞

k≥1

k→∞

Hence, subtracting μ(A1 ) < ∞ from both sides gives μ (⋂k≥1 A k ) = limk→∞ μ(A k ). Remark 2.1.25. Clearly, the condition μ(A1 ) < +∞ in Proposition 2.1.24(f) can be replaced by the hypothesis that μ(E n ) < +∞ for some n ∈ ℕ since the first (n − 1) sets do not affect the intersection. It turns out that continuity from below (see Proposition 2.1.24(e)) for an additive set function is equivalent to σ-additivity. Proposition 2.1.26. If X is a set, L ⊆ 2X is an algebra of sets in X and μ : L → [0, +∞] is an additive set function, then μ is σ-additive if and only if μ is continuous from below, that is, if {A n }n≥1 ⊆ L is increasing, ⋃n≥1 A n ∈ L, then μ(⋃n≥1 A n ) = limn→∞ μ(A n ).

90 | 2 Measure Theory Proof. ⇒: This follows from the proof of Proposition 2.1.24(e). ⇐⇒: Suppose we have continuity from below. Let {B k }k≥1 ⊆ L be a sequence of pairwise disjoint sets such that ⋃k≥1 B k ∈ L. We set A n = ⋃nk=1 B k . From the continuity from below hypothesis, it follows n

μ (⋃ B k ) = μ (⋃ A k ) = lim μ(A n ) = lim ∑ μ(B k ) = ∑ μ(B k ) . k≥1

k≥1

n→∞

n→∞

k=1

k≥1

This shows that μ : L → [0, +∞] is σ-additive. We get a similar result when we suppose continuity from above at the empty set. Proposition 2.1.27. If X is a set, L ⊆ 2X is an algebra of sets in X and μ : L → [0, +∞] is an additive set function with μ(X) < +∞, then μ is σ-additive if and only if μ is continuous from above at the empty set, that is, if {A k }k≥1 ⊆ L is a decreasing sequence such that ⋂k≥1 A k = 0, then limk→∞ μ(A k ) = 0. Proof. ⇒: This implication follows again from the proof of Proposition 2.1.24(f). ⇐: Let {A k }k≥1 ⊆ L be an increasing sequence such that ⋃k≥1 A k ∈ L. Let B n = (⋃k≥1 A k )\ A n for all n ∈ ℕ. Then {B n }n≥1 ⊆ L is decreasing and ⋂n≥1 B n = 0. Therefore, by hypothesis, we have 0 = lim μ(B n ) = μ (⋃ A k ) − lim μ(A n ) . n→∞

k≥1

n→∞

Hence, μ (⋃k≥1 A k ) = limk→∞ μ(A k ) and so μ is continuous from below. Then Proposition 2.1.26 implies that μ is σ-additive. The next result gives a necessary and sufficient condition for two finite measures to be equal. It suffices to know that they coincide on a generating family that is closed under finite intersections. Proposition 2.1.28. If (X, Σ) is a measurable space, Σ = σ(L) with L closed under finite intersections, μ1 , μ2 are two finite measures on Σ and μ1 (X) = μ2 (X) as well as μ1 L = μ2 L , then μ1 = μ2 . Proof. Let D = {A ∈ Σ : μ1 (A) = μ2 (A)}. Applying Proposition 2.1.24(b) and (c), we see that D is a Dynkin system; see Definition 2.1.7. Moreover, by hypothesis, L ⊆ D. Then, invoking Theorem 2.1.11, we infer that Σ = σ(L) = D, which means that μ1 = μ2 . Corollary 2.1.29. If X is a Hausdorff topological space, B(X) is its Borel σ-field and μ1 , μ2 are two finite measures on B(X), which coincide on the open or closed sets, then μ1 = μ2 . In the next definition we introduce a notion that will lead us to a property reminiscent of the intermediate value property.

2.1 Basic Notions, Measures, and Outer Measures | 91

Definition 2.1.30. Let (X, Σ, μ) be a measure space. (a) We say that the measure μ : Σ → [0, +∞] is semifinite if for every A ∈ Σ with μ(A) > 0, there exists B ∈ Σ with B ⊆ A such that 0 < μ(B) < +∞. (b) We say that A ∈ Σ is an atom of μ if 0 < μ(A) < +∞ and for every B ⊆ A with B ∈ Σ either μ(B) = 0 or μ(B) = μ(A). A measure without any atoms is called nonatomic. Remark 2.1.31. The measure μ on Σ is nonatomic if for every set A ∈ Σ with μ(A) > 0, there exists B ∈ Σ with B ⊆ A such that 0 < μ(B) < μ(A). For the Dirac measure {1 if x0 ∈ A , δ x0 (A) = { 0 otherwise , {

with x0 ∈ X, A ∈ Σ ,

we see that {x0 } is an atom. The main examples of atoms are singletons {x} with positive measure. Here is the result that recalls the intermediate value property. Proposition 2.1.32. If (X, Σ, μ) is a nonatomic measure space, then the range of μ is the interval [0, μ(X)]. Proof. We fix λ ∈ (0, μ(X)) and define L = {A ∈ Σ : 0 < μ(A) ≤ λ}. First we show that L ≠ 0. The nonatomicity of μ implies the existence of B ∈ Σ such that 0 < μ(B) < μ(X). The same argument (nonatomicity of μ) implies that we can find E1 , E2 ∈ Σ such that B = E1 ∪ E2 , E1 ∩ E2 = 0 and μ(E1 ), μ(E2 ) ∈ (0, μ(B)). It follows that at least one of the sets E1 , E2 satisfies μ(E1 ) ∈ (0, 1/2μ(B)]. Proceeding inductively, suppose that we produced E1 , . . . , E n ∈ Σ such that μ(E n ) ∈ (0,

1 μ(B)] . 2n

(2.1.2)

Applying again the nonatomicity of μ there exists E n+1 ∈ Σ with E n+1 ⊆ E n such that μ(E n+1 ) ∈ (0, 1/2μ(E n )]. Evidently, because of (2.1.2) we have μ(E n+1 ) ≤ 1/2n+1 μ(B). Therefore, (2.1.2) holds for all n ∈ ℕ. Moreover, for a large enough n ∈ ℕ, we have μ(E n ) ≤ λ. Hence, E n ∈ L for a large enough n ∈ ℕ, thus yielding L ≠ 0. Next we show that there exists a Σ-set with measure equal to λ. To this end, let D0 = 0 and suppose that D n ∈ Σ is given. Let λ n = sup [μ(C) : C ∈ Σ, D n ⊆ C, μ(C) ≤ λ] . Choose D n+1 ∈ Σ such that D n ⊆ C n+1

and

λn −

1 ≤ μ(D n+1 ) ≤ λ n . n

(2.1.3)

It holds 0 < λ n+1 ≤ λ n ≤ λ and so limn→∞ λ n = λ̂ exists and λ̂ ≤ λ. We define D̂ = ⋃ D n . n≥1

(2.1.4)

92 | 2 Measure Theory This implies, due to (2.1.3) and Proposition 2.1.24(e), that μ(D)̂ = lim μ(D n ) = λ̂ .

(2.1.5)

n→∞

We need to show that λ̂ = λ. If λ̂ < λ, then μ(X \ D)̂ = μ(X) − μ(D)̂ > λ − λ̂ > 0; see Proposition 2.1.24(b). Reasoning as in the first part of the proof with X replaced by X \ D̂ and λ replaced by λ − λ̂ > 0, we produce C∈Σ,

C ⊆ X \ D̂

and

0 < μ(C) < λ − λ̂ .

(2.1.6)

Then, the subadditivity yields λ̂ = μ(D)̂ < μ(C ∪ D)̂ ≤ λ, which gives, because of (2.1.5) and (2.1.6), that λ n < μ(C ∪ D)̂ for all sufficiently large n ∈ ℕ. But D n ⊆ C ∪ D̂ for all n ∈ ℕ; see (2.1.4). This contradicts the definition of λ n for large enough n ∈ ℕ. We conclude that λ̂ = λ and the proof is finished. The notion of outer measure is an abstract generalization of the “outer area” when we apply the exhaustion method of Archimedes to calculate the area of a bounded region in ℝ2 . Definition 2.1.33. Let X be a nonempty set and μ∗ : 2X → [0, +∞] be a set function. We say that μ∗ is an outer measure if it satisfies the following conditions: (a) μ∗ (0) = 0; (b) μ∗ is monotone, that is, A ⊆ B implies μ∗ (A) ≤ μ∗ (B); (c) μ∗ is σ-subadditive, that is, μ∗ (⋃n≥1 A n ) ≤ ∑n≥1 μ∗ (A n ). We say that the outer measure μ∗ is finite (resp. σ-finite) if μ∗ (X) < +∞ (resp. X = ⋃n≥1 X n and μ∗ (X n ) < +∞ for all n ∈ ℕ). A way to produce an outer measure is to start with a family of elementary sets on which a measure is naturally defined (for example intervals in ℝ and rectangles in ℝ2 ) and approximate any set from above by countable unions of such elementary sets. This process is formalized in the proposition that follows. Proposition 2.1.34. If X is a nonempty set, L ⊆ 2X is such that 0, X ∈ L, ϑ : L → [0, +∞] satisfies ϑ(0) = 0 and for any A ∈ L we set μ∗ (A) = inf [ ∑ ϑ(E n ) : E n ∈ L, A ⊆ ⋃ E n ] , n≥1

(2.1.7)

n≥1

then μ∗ is an outer measure. Proof. First note that in (2.1.7) the infimum is taken over by a nonempty set since A ⊆ X and by hypothesis, X ∈ L. Moreover, μ∗ (0) = 0 and it is clear from (2.1.7) that A ⊆ B implies μ∗ (A) ≤ μ∗ (B). Finally we show the σ-additivity of μ∗ . So, let {A k } ⊆ 2X and ε > 0. For each k ∈ ℕ we can find {E kn }n≥1 ⊆ L such that A k ⊆ ⋃ E kn n≥1

and

∑ ϑ (E kn ) ≤ μ∗ (A k ) + n≥1

ε . 2k

2.1 Basic Notions, Measures, and Outer Measures | 93

Let A = ⋃k≥1 A k . Then we have A ⊆ ⋃ E kn k,n≥1

and

∑ ϑ (E kn ) ≤ ∑ μ∗ (A k ) + ε . k,n≥1

k≥1

This gives, due to (2.1.7), μ∗ (A) ≤ ∑k≥1 μ∗ (A k ) + ε. Letting ε ↘ 0, we conclude that μ∗ is σ-subadditive. Therefore μ∗ is an outer measure. Example 2.1.35. Let f : ℝ → ℝ be an increasing function. Let L be the family of all intervals (a, b] with a, b ∈ ℝ and set ϑ((a, b]) = f(b) − f(a). Then the conditions in Proposition 2.1.34 are satisfied and by applying (2.1.7) we can define an outer measure μ∗ . This outer measure is called the Lebesgue–Stieltjes outer measure and if f(x) = x for all x ∈ ℝ it is called the Lebesgue outer measure. Note that μ∗ ((a, b]) = f(b) − lim+ f(x) ≤ f(b) − f(a) = ϑ((a, b]) . x→a

Thus, the inequality is strict at those points where f is not continuous from the right. Now we will pass from outer measures to measures. Outer measures, although defined on the entire power set 2X have the disadvantage that they are not σ-additive. However, when restricted to a particular subset of 2X , they become σ-additive. In this direction we need the following remarkable definition due to Carathéodory. Definition 2.1.36. Let X be a nonempty set and μ∗ is an outer measure on 2X . We say that A ⊆ X is μ∗ -measurable, if μ∗ (B) = μ∗ (B ∩ A) + μ∗ (B ∩ A c ) for all B ⊆ X, that is, A splits additively all sets in X. Remark 2.1.37. From Definition 2.1.33 we know that it holds that μ∗ (B) ≤ μ∗ (B ∩ A) + μ∗ (B ∩ A c ) for all B ⊆ X , due to the subadditivity property of the outer measure. In order to check the μ∗ -measurability of a set A ⊆ X it suffices to show that μ∗ (B) ≥ μ∗ (B ∩ A) + μ∗ (B ∩ A c )

for all B ⊆ X with μ∗ (B) < +∞ .

This definition of Carathéodory essentially says that the outer measure μ∗ (A) of A is equal to its inner measure μ∗ (X) − μ∗ (A c ). For this reason Definition 2.1.36 is the right one and leads to a σ-algebra on which μ∗ is σ-additive, hence a measure. This is shown in the next theorem known as the “Carathéodory Theorem.” Theorem 2.1.38 (Carathéodory Theorem). If X is a nonempty set and μ∗ : 2X → [0, +∞] is an outer measure, then the family Σ∗ of all μ∗ -measurable sets is a σ-algebra and μ = μ∗ Σ∗ is a measure. Proof. The symmetric character of Definition 2.1.36 implies that Σ∗ is closed under complementation.

94 | 2 Measure Theory Next let A, E ∈ Σ∗ and let B ⊆ X. We have μ∗ (B) = μ∗ (B ∩ A) + μ∗ (B ∩ A c ) = μ∗ (B ∩ A ∩ E) + μ∗ (B ∩ A ∩ E c ) + μ∗ (B ∩ A c ∩ E) + μ∗ (B ∩ A c ∩ E c ) . Note that A ∪ E = (A ∩ E) ∪ (A △ E) = (A ∩ E) ∪ (A ∩ E c ) ∪ (A c ∩ E). Hence, by the subadditivity, μ∗ (B ∩ (A ∪ E)) ≤ μ∗ (B ∩ A ∩ E) + μ∗ (B ∩ A ∩ E c ) + μ∗ (B ∩ A c ∩ E) . This implies μ∗ (B ∩ (A ∪ E)) + μ∗ (B ∩ (A ∪ E)c ) ≤ μ∗ (B) . Hence, see Remark 2.1.37, A ∪ E ∈ Σ∗ and thus, Σ∗ is an algebra. In addition, if A, E ∈ Σ∗ and A ∩ E = 0, then μ∗ (A ∪ E) = μ∗ ((A ∪ E) ∩ A) + μ∗ ((A ∪ E) ∩ A c ) = μ∗ (A) + μ∗ (E) where we recall that μ∗ (A ∩ E) = 0. This means that μ∗ is additive on Σ∗ . Now we show that Σ∗ is a σ-algebra. Let {A n }n≥1 ⊆ Σ ∗ and let D = ⋃n≥1 A n . Since from the first part of the proof, we have k

D k = ⋃ A n ∈ Σ∗

k−1

and

D k \ ⋃ A n ∈ Σ∗

n=1

for all k ∈ ℕ ,

n=1

without any loss of generality we may assume that the sets {A n }n≥1 ⊆ Σ∗ are mutually disjoint. For any B ⊆ X, since D n , A n ∈ Σ ∗ , we have for all n ∈ ℕ μ∗ (B) = μ∗ (B ∩ D n ) + μ∗ (B ∩ D cn ) = μ∗ (B ∩ A n ) + μ∗ (B ∩ ( ⋃ A i )) + μ∗ (B ∩ D cn ) . i≤n−1

Then, by induction on n ∈ ℕ, we show that n

n

μ∗ (B) = ∑ μ∗ (B ∩ A i ) + μ∗ (B ∩ D cn ) ≥ ∑ μ∗ (B ∩ A i ) + μ∗ (B ∩ D c ) i=1

since

μ∗

i=1

is additive and since D n ⊆ D for all n ∈ ℕ. We let n → ∞ and obtain μ∗ (B) ≥ ∑ μ∗ (B ∩ A i ) + μ∗ (B ∩ D c ) ≥ μ∗ (B ∩ D) + μ∗ (B ∩ D c ) i≥1

by the σ-subadditivity; see Definition 2.1.36. This implies that D ∈ Σ ∗ (see Remark 2.1.37) and μ∗ (B) = ∑i≥1 μ(B ∩ A i ) + μ(B ∩ D c ). Let B = D ⊆ X. Then μ∗ (D) = ∑i≥1 μ∗ (A i ) and so we conclude that Σ∗ is a σ-algebra and μ = μ∗ Σ∗ is a measure.

2.1 Basic Notions, Measures, and Outer Measures | 95

Definition 2.1.39. Let (X, Σ, μ) be a measure space. (a) A set A ∈ Σ is said to be μ-null (or simply null if μ is clearly understood) if μ(A) = 0. (b) We say that μ is complete if Σ contains all subsets of null sets. Remark 2.1.40. If A is μ-null and B ⊆ A, then μ(B) = 0, provided B ∈ Σ. But in general it need not be the case that B ∈ Σ. For example this is the case with the Borel σ-algebra B(ℝ). However, completeness can always be achieved by simply extending the domain of the measure. This is done in the next proposition whose proof is straightforward and so it is omitted. Proposition 2.1.41. If (X, Σ, μ) is a measure space, N = {D ∈ Σ : μ(D) = 0}, Σ μ = {A ∪ E : A ∈ Σ, E ⊆ D ∈ N} and μ(A ∪ E) = μ(A) for all A ∪ E ∈ Σ μ , then Σ μ is a σ-algebra and μ is a complete measure on Σ μ . Let (X, Σ∗ , μ) be the measure space produced in Theorem 2.1.38. Proposition 2.1.42. (X, Σ∗ , μ) is a complete measure space. Proof. Assume that μ∗ (A) = 0. Then, by the subadditivity, the monotonicity, and since μ∗ (A) = 0, for any B ⊆ X, we have μ∗ (B) ≤ μ∗ (B ∩ A) + μ∗ (B ∩ A c ) ≤ μ∗ (B ∩ A c ) ≤ μ∗ (B) . This gives A ∈ Σ∗ and so μ = μ∗ Σ∗ is complete. Now let X be a set and let L ⊆ 2X be a semiring. We consider a σ-additive set function μ : L → [0, +∞]. Applying Proposition 2.1.34, we can define the outer measure μ∗ : 2X → [0, +∞] corresponding to μ. It holds that μ∗ (A) = μ(A) for all A ∈ L. We have the following result. Proposition 2.1.43. If D is a semiring satisfying L ⊆ D ⊆ Σ ∗ , then μ∗ is the unique extension of μ to a σ-additive set function on D. Proof. Let λ : D → [0, +∞] be a σ-additive extension of μ on D and let λ∗ be the corresponding outer measure; see Proposition 2.1.34. If A ⊆ X and {E n }n≥1 ⊆ L are such that A ⊆ ⋃n≥1 E n , then λ∗ (A) ≤ ∑ λ∗ (E n ) = ∑ λ(E n ) = ∑ μ(E n ) . n≥1

n≥1

n≥1

This implies λ∗ (A) ≤ μ∗ (A) for every A ⊆ X .

(2.1.8)

In order to show that λ = μ∗ on D, it suffices to show that μ∗ (A) ≤ λ(A) for all A ∈ D with μ∗ (A) < +∞. Recall that μ is σ-additive. Fix A ∈ D with μ∗ (A) < +∞ and ε > 0. Consider {E n }n≥1 ⊆ L such that A ⊆ ⋃ En n≥1

and

∑ μ(E n ) ≤ μ∗ (A) + ε ; n≥1

(2.1.9)

96 | 2 Measure Theory see Proposition 2.1.34. Taking Problem 2.2 into account we find pairwise disjoint {C n }n≥1 ⊆ L such that Ê = ⋃ E n = ⋃ C n ∈ σ(D) . n≥1

We know that μ∗

σ(D)

n≥1

and λ∗

σ(D) are both measures that coincide with μ on L. Therefore

μ∗ (E)̂ = ∑ μ∗ (C n ) = ∑ μ(C n ) = ∑ λ(C n ) = λ∗ (E)̂ . n≥1

n≥1

(2.1.10)

n≥1

Moreover, because of (2.1.8) and (2.1.9) as well as the σ-subadditivity of μ∗ and since μ∗ L = μ, we have λ∗ (Ê \ A) ≤ μ∗ (Ê \ A) = μ∗ (E)̂ − μ∗ (A) ≤ ∑ μ(E n ) − μ∗ (A) ≤ ε .

(2.1.11)

n≥1

Hence μ∗ (A) ≤ μ∗ (E)̂ = λ∗ (E)̂ = λ(A) + λ∗ (Ê \ A) ≤ λ(A) + ε; see (2.1.10) and (2.1.11). Letting ε ↘ 0, we obtain μ∗ (A) ≤ λ(A). Therefore, λ(A) = μ∗ (A) for all A ∈ D. The Lebesgue measure on ℝ was the starting point of “Measure Theory.” So, let us look in some detail at how we can produce it using the previous abstract theory. To this end, we introduce L = {(a, b] : a ≤ b, a, b ∈ ℝ} with (a, a] = 0. This is a semiring of subsets of ℝ. Let λ : L → [0, +∞] be the set function defined by λ((a, b]) = b − a. This set function is σ-additive and σ-finite. Using Proposition 2.1.43, we know that λ has a unique extension to Σ∗ = Σ λ being the σ-field of λ∗ -measurable sets; see Definition 2.1.36. We continue to denote this extension by λ. Then – λ is the Lebesgue measure on ℝ. – Σ∗ = Σ λ is the σ-algebra of the Lebesgue measurable subsets of ℝ. Note that λ is translation invariant, that is λ(A) = λ(A + x) for all A ∈ Σ λ and for all x ∈ ℝ. Moreover, we have λ(θA) = |θ|λ(A) for all A ∈ Σ λ and for all θ ∈ ℝ. From the previous discussion it is not clear if Σ λ = 2ℝ . In fact the next theorem shows that this is not the case. Indeed there are subsets of ℝ that are not Lebesgue measurable. Theorem 2.1.44. There is no translation invariant measure defined on all of 2ℝ , which assigns to every interval its length. Proof. We will define a subset of ℝ, which is not Lebesgue measurable. On ℝ we consider the following equivalence relation x∼u

if and only if

x−u∈ℚ.

Choose a single element x ∈ [0, 1] from every equivalence class formed by ∼. Here we assume that the Axiom of Choice holds. Let A ⊆ [0, 1] be the set formed by these

2.1 Basic Notions, Measures, and Outer Measures | 97

representatives. Suppose that A ∈ Σ λ . Then by translation invariance we have that {A + r}r∈ℚ is a countable, Lebesgue measurable partition of ℝ with λ(A + r) = η independent of r ∈ ℚ. If η = 0, then we have a contradiction to the fact that λ(ℝ) = +∞. If η > 0, then, with D = ℚ ∩ [0, 1], we obtain 2 = λ([0, 2]) = ∑r∈D λ(A + r) = +∞, again a contradiction. Hence, A ∈ ̸ Σ λ . In general the measure theoretic and topological properties of sets in ℝ differ. Example 2.1.45. Singletons have a Lebesgue measure of zero. Hence, λ(ℚ) = 0. Let {r n }n≥1 ⊆ [0, 1] be an enumeration of the rationals in [0, 1]. Let I n = (r n −ε/2n , r n +ε/2n ) and let U = (0, 1) ∩ (⋃n≥1 I n ). Evidently, U ⊆ [0, 1] is open and dense, so topologically “large.” On the other hand we have λ(U) ≤ ∑n≥1 ε/2n = ε. Hence, U is measure theoretically “small.” Similarly, C = [0, 1] \ U is nowhere dense and closed, thus topologically small, but λ(C) ≥ 1 − ε, thus it is measure theoretically “large.” The Cantor set will help us to get an idea on what the relation is between B(ℝ) and Σ λ . Example 2.1.46. The Cantor set is constructed as follows. Let C0 = [0, 1]. We trisect [0, 1] and remove the open middle third (1/3, 2/3). We set C1 = [0, 1/3] ∪ [2/3, 1]. Then we trisect each of the two intervals of C1 and remove the open middle thirds. We obtain C2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]. We proceed inductively. So, suppose we have C n . This consists of 2n closed intervals. We trisect each one of them and remove the open middle thirds. The remaining part of C n is the set C n+1 , which is the union of 2n+1 disjoint closed intervals. Evidently {C n }n≥1 is decreasing. Then the Cantor set C of [0, 1] is defined by C = ⋂n≥1 C n . This set consists of those points x ∈ [0, 1], which in base −3 have an expansion x = ∑k≥1 a k 1/3k with a k ≠ 1 for all k ∈ ℕ. Proposition 2.1.47. The Cantor set C has the following properties: (a) C is compact and nowhere dense. (b) λ(C) = 0. (c) card(C) = c = the cardinality of the continuum. Proof. (a) Clearly C is closed since it is the intersection of closed sets. Hence C is compact. Moreover, int C = 0 as it contains no interval since at each stage, each interval has length 1/3n . Therefore, C is nowhere dense. (b) At each stage we remove 2n−1 open intervals each one of length 1/3n . Therefore the total measure of the removed set at the nth step is 2n−1 /3n . Hence, we have 2n−1 1 2 n = ∑( ) =1. n 3 2 n≥1 3 n≥1

λ([0, 1] \ C) = ∑

Thus, λ(C) = 0. (c) Let x ∈ C. Then x = ∑k≥1 a k /3k with a k = 0 or a k = 2 for all k ∈ ℕ. Let f(x) = ∑k≥1 c k /2k with c k = a k /2 for all k ∈ ℕ, the base −2 expansion of x ∈ C. Hence, f : C → [0, 1] is onto, thus card(C) = c.

98 | 2 Measure Theory Remark 2.1.48. The Cantor set is interesting because it is “large” from the cardinality point of view but negligible from the measure theoretic point of view. We can generalize the above construction and have “Cantor-like sets” that still satisfy (a) and (c) from Proposition 2.1.47. So, let I be a bounded interval and ϑ ∈ (0, 1). We call the open interval with the same midpoint as I and length ϑλ(I) the open middle ϑ. Now let {ϑ k }k≥1 ⊆ (0, 1) and produce a decreasing sequence {Ĉ k }k≥1 of closed sets in [0, 1] as follows: Ĉ 0 = [0, 1] and Ĉ k is produced by removing the open middle ϑ k from each component interval of Ĉ k−1 . We set Ĉ = ⋂k≥1 Ĉ k . We still have that Ĉ is compact and nowhere dense and card(C)̂ = c. Concerning the Lebesgue measure, note that λ(Ĉ k ) = (1 − ϑ k )λ(Ĉ k−1 ) for all k ≥ 2. So, λ(C)̂ = ∏k≥1 (1 − ϑ k ) = limn→∞ ∏nk=1 (1 − ϑ k ). If ϑ k = ϑ ∈ (0, 1) for all k ∈ ℕ, then λ(C)̂ = 0. Note that the Cantor set corresponds to the particular case of ϑ = 1/3. If ϑ k → 0 sufficiently fast as k → ∞, then λ(C)̂ > 0. In particular, ∏k≥1 (1 − ϑ k ) > 0 if and only if ∑k≥1 ϑ k < +∞. We point out that part (c) of the proposition above implies that there are 2c Lebesgue measurable subsets of ℝ. On the other hand card(B(ℝ)) = c. So, there are many more Lebesgue measurable sets than Borel sets in ℝ although it is not easy to produce a set that is Lebesgue measurable but not a Borel set. For such a concrete set we refer to Federer [109, p. 68].

2.2 Measurable Functions – Integration The Lebesgue integral is defined for measurable functions. For this reason we start this section with a discussion of measurable functions. Definition 2.2.1. Let (X, Σ) and (Y, L) be two measurable spaces and f : X → Y be a map. We say that f is (Σ, L)-measurable if f −1 (A) ∈ Σ for all A ∈ L. If X, Y are Hausdorff topological spaces, then they become measurable spaces by considering their Borel σ-algebras B(X), B(Y) and then f is said to be Borel measurable (or simply a Borel function). When Y = ℝ or Y = ℝ∗ we always use the Borel σ-field of Y. Remark 2.2.2. The reason that we use the Borel σ-algebra on ℝ as range space is that the Lebesgue σ-algebra Σ λ , as the completion of B(ℝ), is in general too large for the Lebesgue measure; see Remark 2.1.48. In particular, there exists a continuous, nondecreasing function h : [0, 1] → [0, 1] and a Lebesgue measurable set C ⊆ [0, 1] such that h−1 (C) is not Lebesgue measurable (assuming the Axiom of Choice). In fact h(x) = 1/2[f ̂(x) + x] with f ̂ being the function from the proof of Proposition 2.1.47(c) extended to all of [0, 1] by declaring it to be constant on each interval missing from C. Then f ̂ is nondecreasing and continuous and is known as the Cantor function. Proposition 2.2.3. If (X, Σ) and (Y, L) are measurable spaces, L = σ(a) and f : X → Y, then f is (Σ, L)-measurable if and only if f −1 (A) ∈ Σ for all A ∈ a.

2.2 Measurable Functions – Integration | 99

Proof. ⇒: This is immediate from Definition 2.2.1. ⇒: Let D = {A ⊆ Y : f −1 (A) ∈ Σ}. Evidently D ⊇ a and D is a σ-algebra. Therefore, D ⊇ σ(a) = L and this proves the (Σ, L)-measurability of f . Combining Propositions 2.1.18 and 2.2.3 we have the following result. Proposition 2.2.4. If (X, Σ) is a measurable space and f : X → ℝ, then the following statements are equivalent: (a) f is Σ-measurable; (b) f −1 ((a, +∞)) ∈ Σ for all a ∈ ℝ; (c) f −1 ([a, +∞)) ∈ Σ for all a ∈ ℝ; (d) f −1 ((−∞, a]) ∈ Σ for all a ∈ ℝ; (e) f −1 ((−∞, a)) ∈ Σ for all a ∈ ℝ. Remark 2.2.5. In case f is ℝ∗ -valued, we need to add the requirement that f −1 (±∞) ∈ Σ in the statements (b)–(e). Evidently we can take a ∈ ℚ in (b)–(e). Immediately from Definition 2.2.1, we have that the composition preserves measurability. Proposition 2.2.6. If (X, Σ), (Y, L), (Z, D) are measurable spaces and f : X → Y, g : Y → Z are measurable maps, then h = g ∘ f : X → Z is measurable as well. Moreover, we have the following as a consequence of Proposition 2.2.3. Proposition 2.2.7. If X, Y are Hausdorff topological spaces and f : X → Y is continuous, then f is Borel measurable. Proposition 2.2.8. If (X, Σ) is a measurable space and f, g : X → ℝ are Σ-measurable functions, then f ± g and fg are both Σ-measurable. Proof. If f(x) + g(x) < a, then f(x) < a − g(x). Let c ∈ ℚ be such that f(x) < c < a − g(x). So, we have that {x ∈ X : f(x) + g(x) < a} = ⋃ [{x ∈ X : f(x) < c} ⋂ {x ∈ X : g(x) < a − c}] ∈ Σ . c∈ℚ

Hence f + g is Σ-measurable. Since −g is Σ-measurable, if g is, it follows that f − g is Σ-measurable as well. For any h : X → ℝ being Σ-measurable and a ≥ 0, we have 1

1

{x ∈ X : h(x)2 > a} = {x ∈ X : h(x) > a 2 } ⋃ {x ∈ X : h(x) < −a 2 } ∈ Σ . Therefore h2 is Σ-measurable. Since fg = 1/2 [(f + g)2 − f 2 − g2 ] using the fact above and the Σ-measurability of f + g, we conclude that fg is Σ-measurable. Remark 2.2.9. The result above is also valid for R∗ -valued functions, provided we always take the same value for f ± g at the points where it is undefined, that is, of the

100 | 2 Measure Theory form ∞ − ∞. In addition, recalling that we always define 0(±∞) = 0, the function fg is Σ-measurable for R∗ -valued f and g. Proposition 2.2.10. If (X, Σ) is a measurable space and f n : Σ → ℝ∗ with n ∈ ℕ are Σ-measurable, then sup{f n }m n=1 ,

inf{f n }m n=1 ,

sup f n ,

inf f n ,

n≥1

n≥1

lim inf f n , n→∞

lim sup f n n→∞

are all Σ-measurable. Proof. Let g(x) = sup1≤n≤m f n (x). Then for all a ∈ ℝ, we have m

{x ∈ X : g(x) > a} = ⋃ {x ∈ X : f n (x) > a} ∈ Σ . n=1

̂ Thus g is Σ-measurable. Similarly, if g(x) = supn≥1 f n (x), then for all a ∈ ℝ, we have ̂ > a} = ⋃ {x ∈ X : f n (x) > a} ∈ Σ . {x ∈ X : g(x) n≥1

In a similar fashion we also show that inf 1≤n≤m f n and inf n≥1 f n are both Σ-measurable. Finally, recall that lim inf n→∞ f n = supk≥1 inf n≥k f n and lim supn→∞ f n = inf k≥1 supn≥k f n , to conclude that both are Σ-measurable. When a sequence of measurable functions does not converge pointwise, we can still have the measurability of the set of points where pointwise convergence occurs. Proposition 2.2.11. If (X, Σ) is a measurable space and f n : X → ℝ with n ≥ 1 is a sequence of Σ-measurable functions, then the set C = {x ∈ X : limn→∞ f n (x) exists} ∈ Σ. Proof. Given x ∈ C, we have that {f n (x)}n≥1 ⊆ ℝ is a Cauchy sequence. So, for ε = 1/n with n ∈ ℕ we can find m = m(ε) ∈ ℕ such that |f m+k (x) − f m (x)|

0}). Let D ⊆ I be a non-Borel set and let V = ⋃x∈D U x . Evidently V ⊆ I I is open and f −1 (V) = D. This shows that f is not measurable. Definition 2.2.14. Let (X, Σ, μ) be a measure space. A statement about x ∈ X is said to hold almost everywhere or a.e. (for almost all x or a.a. x ∈ X) if it holds for all x ∈ ̸ D with μ(D) = 0. Note that the set of all x ∈ X for which the statement holds will be in Σ μ but not necessarily in Σ.

102 | 2 Measure Theory Measurability is not affected by changing the function on a μ-null set. Proposition 2.2.15. If (X, Σ, μ) is a complete measure space, (Y, L) is a measurable space, f : X → Y is (Σ, L)-measurable and g : X → Y satisfies f(x) = g(x) for μ-a.a. x ∈ X, then g is (Σ, L)-measurable as well. Next we will introduce the functions, which are the building blocks for the theory of integration. Definition 2.2.16. Let (X, Σ) be a measurable space. (a) Given A ⊆ X, the characteristic function χ A of A is defined by {1 if x ∈ A , χ A (x) = { 0 if x ∈ ̸ A . { (b) A simple function is a measurable function s : X → ℝ, which has finite range. So, if a1 , . . . , a n are the distinct values of s, then we can write s(x) = ∑nk=1 a k χ A k (x) with A k = {x ∈ X : s(x) = a k } ∈ Σ. We call this the standard representation of s. Remark 2.2.17. Since in probability theory a characteristic function is a Fourier transform, probabilists use the name indicator function and denote it by i A . On the other hand, in nonsmooth analysis and optimization, this name and symbol are reserved for another function, namely {0 if x ∈ A , i A (x) = { +∞ if x ∈ ̸ A . { A simple function is a linear combination with distinct coefficients of characteristic functions of disjoint sets whose union is X. One of the coefficients a k may well be zero, but still the term a k χ A k is implicitly understood in the standard representation so as to have X = ⋃nk=1 A k . If s and τ are simple functions, then so are s + τ and sτ. Simple functions approximate measurable functions. Proposition 2.2.18. If (X, Σ) is a measurable space and f : → [0, +∞] is a Σ-measurable function, then there exists a sequence {s n }n≥1 of simple functions on X such that 0 ≤ s1 (x) ≤ s2 (x) ≤ . . . ≤ s n (x) → f(x) for all x ∈ X as n → ∞ . Moreover the convergence is uniform on any set on which f is bounded from above. Proof. Given n ∈ ℕ we partition the interval [0, n) into n2n half-open intervals of length 1/2n . Then for each 1 ≤ k ≤ n2n with k ∈ ℕ we define D n,k = {x ∈ X :

k k−1 ≤ f(x) < n } , 2n 2

D n = {x ∈ X : f(x) ≥ n} .

The Σ-measurability of f implies that D n,k , D n ∈ Σ. We set n2n

sn = ∑ k=1

k−1 χ D n,k + nχ D n . 2n

2.2 Measurable Functions – Integration |

103

Evidently this is a simple function for every n ∈ ℕ. Let x ∈ D n,k . Then 2k 2k − 2 ≤ f(x) < n+1 , n+1 2 2 which implies that s n+1 (x) = (2k − 2)/2n+1 or s n+1 (x) = (2k − 1)/2n+1 . Hence s n (x) ≤ s n+1 (x). Now let x ∈ D n . Then f(x) ≥ n and we have f(x) ≥ n + 1 or n ≤ f(x) < n + 1. If the first case holds, then s n+1 (x) ≥ n + 1 > n = s n (x). In the second case, let k ∈ {1, . . . , (n + 1)2n+1 } such that (k − 1)/2n+1 ≤ f(x) < k/2n+1 . Since f(x) > n it follows that k/2n+1 > n, hence k = (n + 1)2n+1 . Therefore, s n+1 (x) = n + 1 − 1/2n+1 > n = s n (x). This proves that s n ≤ s n+1 . Now we prove the pointwise convergence. So, fix x ∈ X such that f(x) ∈ [0, +∞) and let n > f(x). Then 0 ≤ f(x) − f n (x)

0 and for all x ∈ X, then (2.2.3) holds for every x ∈ X provided n > M. Therefore f n → f uniformly. If f + = max{f, 0} and f − = {−f, 0}, then f = f + − f − as well as |f| = f + + f − and if f : X → ℝ is Σ-measurable, then so are f + and f − ; see Proposition 2.2.10. So using Proposition 2.2.18 on each of the functions f + and f − we have the following. Corollary 2.2.19. If (X, Σ) is a measurable space and f : X → ℝ is Σ-measurable, then there exists a sequence {s n }n≥1 of simple functions on X such that |s1 | ≤ |s2 | ≤ . . . ≤ |s n | ≤ . . . |f| . . . ,

s n (x) → f(x) for all x ∈ X .

Moreover if f is bounded, then the convergence is uniform. We can extend these results to maps with values in a separable metric space. This is useful when studying integration of Banach space-valued maps; see the Lebesgue– Bochner integral in Section 4.2. Proposition 2.2.20. If (X, Σ) is a measurable space, (Y, d) is a separable metric space and f : X → Y, then the following hold: (a) If (Y, d) is in addition totally bounded, then f is Σ-measurable if and only if it is the d-uniform limit of a sequence of simple functions with values in Y. (b) f is Σ-measurable if and only if f is the d-pointwise limit of a sequence of simple functions with values in Y. Proof. (a) ⇒: Suppose that f : X → Y is Σ-measurable and let ε > 0. Since Y is by hypothesis totally bounded, there exists y1 , . . . , y m ∈ Y such that Y = ⋃m k=1 B ε (y k ) with B ε (y k ) = {y ∈ Y : d(y, y k ) < ε}. We set A1 = B ε (y1 ) and A k+1 = B ε (y k+1 ) \ ⋃ki=1 B ε (y i )

104 | 2 Measure Theory for all k ∈ {1, . . . , m − 1}. Then {A k }m k=1 are mutually disjoint Borel sets in Y whose union is Y. We have m

X = ⋃ f −1 (A k )

and

f −1 (A k ) ∩ f −1 (A n ) = 0 if k ≠ n .

k=1

We define s : X → Y by s(x) = y k if x ∈ f −1 (A k ). Evidently s is a simple function and d(s(x), f(x)) < ε for all x ∈ X. Therefore f is the d-uniform limit of a sequence of simple functions with values in Y. ⇐: This is a consequence of Proposition 2.2.12. (b) By Theorem 1.5.21 there is a homeomorphism (embedding) ξ : Y → ℍ onto a subset of the Hilbert cube ℍ = [0, 1]ℕ . Let e(u, y) = dℍ (ξ(u), ξ(y)) for all u, y ∈ Y. Then e is a metric on Y, compatible with d and (Y, e) is totally bounded. By part (a) we know that f is the e-uniform limit of a sequence of simple functions. Since e and d are topologically equivalent, we have that the sequence of simple functions is d-pointwise convergent to f . Definition 2.2.21. Let {(Y α , Lα )}α∈I be a family of measurable spaces and f α : X → Y α be a map for each α ∈ I. There is a unique σ-algebra on X with respect to which the f α ’s are all measurable and this is the σ-algebra generated by the sets f α−1 (A α ) for all A α ∈ Lα and all α ∈ I. It is called the σ-algebra generated by {f α }α∈I and is denoted by σ({f α }). Proposition 2.2.22. If (Y, L) is a measurable space, f : X → Y and g : X → ℝ are given maps, then g is σ(f)-measurable if and only if there exists a L-measurable h : Y → ℝ such that g = h ∘ f . Proof. ⇒: First we assume that g is a σ(f)-simple function. Then g = ∑nk=1 a k χ A k with a k ∈ ℝ and A k ∈ σ(f). For k ∈ {1, . . . , n} let C k ∈ L be such that A k = f −1 (C k ). We set h = ∑nk=1 a k χ C k . Then h is a L-simple function on Y and clearly g = h ∘ f . Now suppose that g is a general σ(f)-measurable function. Then by Corollary 2.2.19 there exists a sequence {s n }n≥1 of σ(f)-simple functions such that s n (x) → g(x) for all x ∈ X. From the first part of the proof we can find h n : Y → ℝ with n ∈ ℕ being L-measurable functions such that s n = h n ∘ f with n ∈ ℕ. Let E = {y ∈ Y : limn→∞ h n (y) exists in ℝ}. Since h n (f(x)) = s n (x) → g(x) it follows that f(X) ⊆ E. Define h(y) = lim h n (y) if y ∈ E n→∞

and

h(y) = 0 if y ∈ ̸ E .

From the inclusion f(X) ⊆ E it follows that g = h ∘ f . Moreover, from Proposition 2.2.11 we know that E ∈ L. Hence h n χ E is L-measurable and since h n χ E → hχ E it follows that h is L-measurable. ⇐: This follows from Proposition 2.2.6. Definition 2.2.23. Let {(X α , Σ α )}α∈I be a family of measurable spaces. Set X = ∏α∈I X α and let p α : X → X α with α ∈ I be the corresponding projection (coordinate) maps. Then the product σ-algebra on X denoted by ⨂α∈I Σ α is defined by ⨂α∈I Σ α = σ({p α }).

2.2 Measurable Functions – Integration |

105

Remark 2.2.24. Let (X, Σ), (Y, L) be two measurable spaces. A set of the form A × B with A ∈ Σ, B ∈ L is said to be a measurable rectangle . By R we denote the family of measurable rectangles in X × Y. It is easy to see that R is an algebra. Then Σ ⨂ L = σ(R). More generally if the index set I is countable, then ⨂ Σ α = σ (∏ A α : A α ∈ Σ α ) . α∈I

α∈I

Proposition 2.2.25. If {(X α , Σ α )}α∈I are measurable spaces and each Σ α is generated by aα , then ⨂α∈I Σ α is generated by â = {p−1 α (B α ) : B α ∈ aα , α ∈ I}. Moreover, if the index set I is countable, then ⨂α∈I Σ α is generated by ã = {∏α∈I B α : B α ∈ aα }. Proof. From Definition 2.2.23 it is clear that σ(a)̂ ⊆ ⨂α∈I Σ α . Let ̂ ,α ∈ I . Dα = {B ⊆ X α : p−1 α (B) ∈ σ(a)} It is easy to see that Dα is a σ-algebra and aα ⊆ Dα . Therefore Σ α ⊆ Dα for all α ∈ I. Hence ⨂α∈I Σ α ⊆ σ(a)̂ and so equality holds. The second assertion follows from Remark 2.2.24. Proposition 2.2.26. If {X k }nk=1 are Hausdorff topological spaces, then the following hold: (a) ⨂nk=1 B(X k ) ⊆ B(∏nk=1 X k ); (b) If {X k }nk=1 are second countable, then ⨂nk=1 B(X k ) = B(∏nk=1 X k ). Proof. (a) By Proposition 2.2.25, ⨂nk=1 B(X k ) is generated by the sets p−1 k (U k ) with open U k ⊆ X k for all k ∈ {1, . . . , n}. These sets are open in X = ∏nk=1 X k and so, we infer that ⨂nk=1 B(X k ) ⊆ B(X). (b) Let Dk be a countable basis of X k , k ∈ {1, . . . , n}. Recall that every open set in X k is a countable union of elements in Dk . Therefore B(X) is generated by Dk and B(X) is generated by D̂ = {∏nk=1 B k : B k ∈ Dk }. Hence, we conclude that ⨂nk=1 B(X k ) = B(X). Definition 2.2.27. Let X, Y be nonempty sets and A ⊆ X × Y. For each x ∈ X and each y ∈ Y, the x-section of A (resp. the y-section of A) are defined by A x = {y ∈ Y : (x, y) ∈ A}

(resp. A y = {x ∈ X : (x, y) ∈ A}) .

Clearly for every x ∈ X and every y ∈ Y we have 0x = 0y = 0 and (X × Y)x = Y as well as (X × Y)y = X. Remark 2.2.28. If {A α }α∈I ⊆ X × Y, then for all x ∈ X and for all y ∈ Y we have (⋃ A α ) = ⋃ (A α )x , α∈I

x y

α∈I

(⋃ A α ) = ⋃ (A α )y , α∈I

α∈I

(⋂ A α ) = ⋂ (A α )x , α∈I

x y

α∈I

(⋂ A α ) = ⋂ (A α )y . α∈I

α∈I

106 | 2 Measure Theory So, it follows that if L is a σ-algebra on X and D = {A ⊆ X × Y : A y ∈ L for all y ∈ Y}, then D is a σ-algebra on X × Y. Similarly for F being a σ-algebra on Y. Finally, if (X, Σ) and (Y, L) are measurable spaces and A ⊆ X × Y, then we say that A has measurable sections if for all x ∈ X and for all y ∈ Y, A x ∈ L and A y ∈ Σ. Proposition 2.2.29. If (X, Σ) and (Y, L) are measurable spaces and A ∈ Σ ⨂ L, then A has measurable sections. Proof. Let D̂ = {A ⊆ X × Y : A x ∈ L and A y ∈ Σ for all x ∈ X and for all y ∈ Y} . Then D̂ is a σ-algebra that contains measurable rectangles. Note that {B (A × B)x = { 0 {

if x ∈ A if x ∈ ̸ A

{A and (A × B)y = { 0 {

if y ∈ B if y ∈ ̸ B .

Therefore, we have that σ(ℝ) = Σ ⨂ L ⊆ D,̂ see Remark 2.2.24. Definition 2.2.30. Let (X, Σ) be a measurable space, Y and V are two Hausdorff topological spaces and f : X × Y → V. We say that f is a Carathéodory function if the following properties hold: (a) x → f(x, y) is Σ-measurable for every y ∈ Y; (b) y → f(x, y) is continuous for every x ∈ X. Proposition 2.2.31. If (X, Σ) is a measurable space, Y is a separable metrizable space, V is a metrizable space and f : X × Y → V is a Carathéodory function, then f is jointly measurable, that is, f is (Σ ⨂ B(Y), B(V))-measurable. Proof. Let d be a compatible metric for Y and e a compatible metric for V. Recall that Y is separable. So, let D = {y k }k≥1 be dense in Y. Moreover, let C ⊆ V be a closed set. Then f(x, u) ∈ C if and only if for every n ∈ ℕ there exists y k ∈ D such that d(u, y k )

0} and A n = {x ∈ X : f(x) ≥ 1/n} with n ∈ ℕ. Then A n ↗ A and so μ(A n ) ↗ μ(A); see Proposition 2.1.26. If μ(A) > 0, then there exists n ∈ ℕ such that μ(A n ) > 0. We have 0

0. Hence A is σ-finite. (c) The second equivalence is obvious. Moreover, if f = g μ-a.e., then ∫C fdμ = ∫C gdμ for all C ∈ Σ. So, it remains to show that ∫C fdμ = ∫C gdμ for all C ∈ Σ implies that f = g μ-a.e. To this end let C = {x ∈ X : (f − g)(x) ≠ 0} ∈ Σ. Suppose that μ(C) > 0. Setting C n = {x ∈ X : |(f − g)(x)| ≥ 1/n} ∈ Σ. As above there exists n ∈ ℕ such that μ(C n ) > 0. We have C n = C+n ∪ C−n with C+n = {x ∈ X : (f − g)(x) ≥ and

1 }∈Σ n

1 C−n = {x ∈ X : (f − g)(x) ≤ − } ∈ Σ . n

So, at least one of C+n , C−n has positive μ-measure. To fix things, suppose that μ(C+n ) > 0. Then 1 0 = ∫ (f − g)dμ ≥ μ(C+n ) > 0 , n C+n

a contradiction. Therefore μ(C) = 0 and so f = g μ-a.e. as in the assertion. The next result is known as “Markov inequality.” Proposition 2.2.41 (Markov inequality). If (X, Σ, μ) is a measure space and f : X → ℝ∗ is μ-integrable, then for any λ ∈ (0, +∞) we have μ({x ∈ X : |f(x)| ≥ λ}) ≤

1 ∫ |f|dμ . λ X

Proof. Let A λ = {x ∈ X : |f(x)| ≥ λ} ∈ Σ. Then ∞ > ∫ |f|dμ ≥ ∫ |f|dμ ≥ λμ(A λ ) X

Aλ

implies

μ(A λ ) ≤

1 ∫ |f|dμ . λ X

110 | 2 Measure Theory Proposition 2.2.42. If (X, Σ, μ) is a measure space and f : X → ℝ∗ is μ-integrable, then the following hold: (a) μ({x ∈ X : |f(x)| = +∞}) = 0, that is, f is μ-a.e. ℝ-valued; (b) if B ∈ Σ and μ(B) = 0, then ∫B fdμ = 0. Proof. (a) From Proposition 2.2.41 we see that for all λ > 0, μ({x ∈ X : |f(x)| ≥ λ}) < +∞ and limλ→+∞ μ({x ∈ X : |f(x)| ≥ λ}) = 0. Note that {x ∈ X : |f(x)| ≥ n} ↘ {x ∈ X : |f(x)| = +∞} as n → ∞ . This gives, due to Proposition 2.1.24(f), μ({x ∈ X : |f(x)| = +∞}) = lim μ({x ∈ X : |f(x)| ≥ n}) = 0 . n→∞ 0 since f = f + − f − . If f

(b) We may assume that f ≥ is a simple function, then clearly from Definitions 2.2.35(a) and 2.2.37 we have ∫B fdμ = 0. Then Definition 2.2.35(b) implies that ∫B fdμ = 0.

2.3 Convergence Theorems and L p -Spaces We start with certain convergence theorems that reveal the continuity properties of the Lebesgue integral. The first such result is the so-called “Beppo Levi Theorem.” Theorem 2.3.1 (Beppo Levi Theorem). If (X, Σ, μ) is a measure space and f n : X → ℝ∗+ with n ∈ ℕ is an increasing sequence of Σ-measurable functions such that f n ↗ f , then limn→∞ ∫X f n dμ = ∫X fdμ. Proof. From Proposition 2.2.10 we have that f is Σ-measurable. The monotonicity of the integral function implies that lim ∫ f n dμ ≤ ∫ fdμ .

(2.3.1)

n→∞

X

X

Claim: If s is a simple function and s ≤ f , then ∫X sdμ ≤ limn→∞ ∫X f n dμ. For every x ∈ X and every η ∈ (0, 1) there exists n0 = n0 (x, η) ∈ ℕ such that ηs(x) ≤ f n (x) for all n ≥ n0 . If we set B n = {x ∈ X : ηs(x) ≤ f n (x)}, then {B n }n≥1 ⊆ Σ and B n ↗ X. We have ηχ B n s ≤ χ B n f n ≤ f n . Let s = ∑m k=1 a k χ A k be the standard representation of the simple function s. Then one gets m

η ∑ a k μ(A k ∩ B n ) = η ∫ χ B n sdμ ≤ ∫ f n dμ ≤ sup ∫ f n dμ k=1

X

X

= lim ∫ f n dμ . n→∞

X

n≥1

X

(2.3.2)

2.3 Convergence Theorems and L p -Spaces

| 111

Note that for every k ∈ {1, . . . , m}, due to Proposition 2.1.24(e), it holds that μ(A k ∩B n ) ↗ μ(A k ) as n → ∞. This implies, because of (2.3.2), that m

η ∑ a k μ(A k ) = η ∫ sdμ ≤ lim ∫ f n dμ . n→∞

k=1

X

X

Recall that η ∈ (0, 1) is arbitrary. So, let η → 1− . Then ∫X sdμ ≤ limn→∞ ∫X f n dμ. This proves the claim. From the claim and Definition 2.2.35(b), we derive ∫ fdμ ≤ lim ∫ f n dμ . n→∞

X

(2.3.3)

X

From (2.3.1) and (2.3.3) we conclude that ∫X f n dμ ↗ ∫X fdμ. Corollary 2.3.2. If (X, Σ, μ) is a measure space and f : X → ℝ∗+ is Σ-measurable, then ∫X fdμ = limn→∞ ∫X s n dμ for every increasing sequence of simple functions s n ↗ f . Now we can prove the famous “Monotone Convergence Theorem.” Theorem 2.3.3 (Monotone Convergence Theorem). If (X, Σ, μ) is a measure space and f n : X → ℝ∗ with n ∈ ℕ is a sequence of Σ-measurable functions such that f n ↗ f and ∫X f1 dμ > −∞, then ∫X f n dμ ↗ ∫X fdμ as n → ∞. Proof. Just let g n = f n −f1 ≥ 0 for all n ∈ ℕ and apply Theorem 2.3.1 to this sequence. Remark 2.3.4. The hypothesis that ∫X f1 dμ > −∞ cannot be removed. To see this, consider the sequence f n = −χ[n,∞) with n ∈ ℕ. Then f n ↗ 0 but ∫X f n dμ = −∞ for all n ∈ ℕ. Moreover, there is a “decreasing” version of the theorem, namely f n ↘ f and ∫X f1 dμ < +∞ imply that ∫X f n dμ ↘ ∫X fdμ. We can also formulate Theorem 2.3.3 in a series form. Theorem 2.3.5. If (X, Σ, μ) is a measure space and f n : X → ℝ∗+ with n ∈ ℕ is a sequence of Σ-measurable functions, then ∫ ( ∑ f n ) dμ = ∑ ∫ f n dμ . X

n≥1

n≥1

X

The next convergence theorem is known as “Fatou’s Lemma.” Theorem 2.3.6 (Fatou’s Lemma). If (X, Σ, μ) is a measure space and f n , h : X → ℝ∗ with n ∈ ℕ are Σ-measurable functions, then the following hold: (a) If h ≤ f n μ-a.e. for all n ∈ ℕ and −∞ < ∫X hdμ, then ∫ lim inf f n dμ ≤ lim inf ∫ f n dμ . n→∞

X

n→∞

X

112 | 2 Measure Theory (b) If f n ≤ h μ-a.e. for all n ∈ ℕ and ∫X hdμ < +∞, then lim sup ∫ f n dμ ≤ ∫ lim sup f n dμ . n→∞

n→∞

X

X

Proof. (a) Let g n = inf k≥n f k with n ∈ ℕ. Then g n ≥ h for all n ∈ ℕ and g n ↗ lim inf n→∞ f n . Invoking the Monotone Convergence Theorem (see Theorem 2.3.3) we have ∫ g n dμ ↗ ∫ lim inf f n dμ . n→∞

X

X

It follows ∫X g n dμ ≤ ∫X f n dμ for all n ∈ ℕ which implies ∫ lim inf f n dμ ≤ lim inf ∫ f n dμ . n→∞

n→∞

X

X

(b) Just apply (a) to the sequence {−f n }n≥1 . Remark 2.3.7. The bound by h cannot be removed. To see this, consider X = ℝ and μ = λ being the Lebesgue measure. Let f n = −1/nχ[0,n] for all n ∈ ℕ. Then lim inf n→∞ ∫ℝ f n dλ = −1 < 0 = ∫X lim inf n→∞ f n dμ and so Fatou’s Lemma fails. Now we will present the main convergence theorem for the Lebesgue integral known as the “Lebesgue Dominated Convergence Theorem.” It allows us to interchange limits and integrals under general conditions and is the main reason why the Lebesgue integral is more powerful than the Riemann integral. Theorem 2.3.8 (Lebesgue Dominated Convergence Theorem). If (X, Σ, μ) is a measure space and f n : X → ℝ∗ with n ∈ ℕ is a sequence of Σ-measurable functions such that – f n (x) → f(x) for μ-a.a. x ∈ X; – |f n (x)| ≤ h(x) for μ-a.a. x ∈ X and for all n ∈ ℕ with h being a μ-integrable function, then f is μ-integrable and ∫X |f n − f|dμ → 0. In particular there holds ∫ f n dμ → ∫ fdμ X

as n → ∞ .

X

Proof. From Proposition 2.2.12 we know that f is Σ-measurable. Moreover, |f(x)| ≤ h(x) for μ-a.a. x ∈ X. Therefore, f is μ-integrable. Note that 0 ≤ |f n − f| ≤ 2h μ-a.e. for all n ∈ ℕ. Applying Fatou’s Lemma, Theorem 2.3.6, gives 0 ≤ lim inf ∫ |f n − f|dμ ≤ lim sup ∫ |f n − f|dμ ≤ 0 , n→∞

n→∞

X

X

which implies ∫X |f n − f|dμ → 0 as n → ∞. Hence, ∫(f n − f)dμ → 0 and so ∫ f n dμ → ∫ fdμ X X X

as n → ∞ .

2.3 Convergence Theorems and L p -Spaces

| 113

Remark 2.3.9. If the dominating function h is not μ-integrable, then the theorem fails in general. To see this, consider X = [0, 1] and μ = λ being the Lebesgue measure. Let 1 1 f n = nχ[0,1/n] with n ∈ ℕ. Then limn→∞ ∫0 f n dλ = 1 ≠ 0 = ∫0 limn→∞ f n dλ. We have already seen in Proposition 2.2.42(b) that integration is insensitive to changes on null sets. Hence, we can integrate functions f that are only defined on a measurable set A with a null complement by simply setting f A c = 0. This also implies ∗ that if f is ℝ -valued and it is a.e. ℝ-valued, then for the purposes of integration we can treat f as ℝ-valued. With this in mind we are led to the introduction of the following spaces of integrable functions. Definition 2.3.10. Let (X, Σ, μ) be a measure space and let 1 ≤ p < ∞. For any Σmeasurable function f : X → ℝ∗ we define 1 p

‖f‖p = (∫ |f|p dμ)

.

X

Let L p (X) = {f : X → ℝ∗ : f is Σ-measurable, ‖f‖p < +∞} . Evidently L p (X) is a vector space. However in order to have a vector space on which ‖ ⋅ ‖p is a norm, we need to take care of functions that differ only on a μ-null set. So, we consider the following equivalence relation on L p (X) f ∼h

if and only if

f(x) = h(x) for μ-a.a. x ∈ X .

Then we define L p (X) = L p (X)/ ∼. Next let f : X → ℝ∗ be Σ-measurable and define the essential supremum ‖f‖∞ by ‖f‖∞ = inf{ϑ ≥ 0 : μ({x ∈ X : |f(x)| ≥ ϑ}) = 0} with the convention that inf 0 = +∞. We define L ∞ (X) = {f : X → ℝ∗ : f is Σ-measurable, ‖f‖∞ < +∞} and L∞ (X) = L ∞ (X)/ ∼. Given 1 ≤ p < ∞ we say that 1 < p ≤ ∞ is the conjugate of p if 1/p + 1/p = 1. Note that p = p/(p − 1). Recall the following elementary inequality known as “Young’s inequality.” It is a very special case of the so-called “Young–Fenchel inequality,” which we discuss in Section 5.3. Lemma 2.3.11 (Young’s inequality). If p, p ∈ (1, ∞) are conjugate exponents and a, b ≥ 0, then ab ≤ 1/pa p + 1/p b p with equality if and only b = a p−1 . Next we will present three inequalities that are very basic in the theory of L P -spaces. The first inequality is known as “Hölder’s inequality.”

114 | 2 Measure Theory Theorem 2.3.12 (Hölder’s inequality). If (X, Σ, μ) is a measure space, 1 ≤ p < ∞, 1 < p ≤ ∞ are conjugate exponents and f ∈ L p (X), h ∈ L p (X), then fh ∈ L1 (X) and ‖fh‖1 ≤ ‖f‖p ‖h‖p . Moreover, for 1 < p < ∞, equality holds if and only if |f(x)|p p

‖f‖p

=

|h(x)|p p

‖h‖p

for μ-a.a. x ∈ X .

Proof. First assume that p ∈ (1, ∞), hence p ∈ (1, ∞). Let a = |f(x)|/‖f‖p and b = |h(x)|/‖h‖p . Then by applying Young’s inequality (see Lemma 2.3.11) it follows 1 |h(x)|p |f(x)h(x)| 1 |f(x)|p + ≤ p ‖f‖p ‖h‖p p ‖f‖p p ‖h‖p p p

(2.3.4)

p

with equality if and only if |f(x)|p /‖f‖p = |h(x)|p /‖h‖p for μ-a.a. x ∈ X. Integrating (2.3.4) it follows 1 1 1 ∫ |fh|dμ ≤ + = 1 , ‖f‖p ‖h‖p p p X

which implies ‖fh‖1 ≤ ‖f‖p ‖h‖p . If p = 1, then p = +∞ and from the definition of the L∞ -norm, we have ‖fh‖1 = ∫ |fh|dμ ≤ ‖h‖∞ ∫ |f|dμ = ‖f‖1 ‖h‖∞ . X

X

When p = = 2, the inequality is usually called the “Cauchy–Bunyakowsky–Schwarz inequality.” p

Corollary 2.3.13 (Cauchy–Bunyakowsky–Schwarz inequality). If (X, Σ, μ) is a measure space and f, h ∈ L2 (X), then fh ∈ L1 (X) and ‖fh‖1 ≤ ‖f‖2 ‖h‖2 . Moreover, equality holds if and only if f(x)2 /‖f‖22 = h(x)2 /‖h‖22 for μ-a.a. x ∈ X. The second inequality is known as the “Minkowski inequality.” In fact it is a consequence of Hölder’s inequality. Theorem 2.3.14 (Minkowski inequality). If (X, Σ, μ) is a measure space and f, h ∈ L p (X) with 1 ≤ p ≤ ∞, then ‖f + h‖p ≤ ‖f‖p + ‖h‖p . Proof. Via the triangle inequality the result is clear if p = 1 or p = +∞. So, assume that 1 < p < ∞ and that f + h ≠ 0, otherwise the result is clear. We estimate |f(x) + h(x)|p ≤ (|f(x)| + |h(x)|) |f(x) + h(x)|p−1 , which gives p

‖f + h‖p ≤ ∫ |f(x)||f(x) + h(x)|p−1 dμ + ∫ |h(x)||f(x) + h(x)|p−1 dμ . X

X

2.3 Convergence Theorems and L p -Spaces

| 115

Recall that p − 1 = p/p . So, let |f + h|p−1 ∈ L p (X) and apply Hölder’s inequality (see Theorem 2.3.12) to get p

p−1

‖f + h‖p ≤ (‖f‖p + ‖h‖p ) ‖f + h‖p

.

This implies ‖f + h‖p ≤ ‖f‖p + ‖h‖p . The third inequality is the so-called “Jensen inequality.” Theorem 2.3.15 (Jensen inequality). If (X, Σ, μ) is a finite measure space, f ∈ L1 (X) and φ : ℝ → ℝ is a convex function, then φ(

1 1 ∫ fdμ) ≤ ∫(φ ∘ f)dμ . μ(X) μ(X) X

X

Moreover, if φ is strictly convex, then equality holds if and only if f is a constant function. Proof. It is well-known that φ is continuous. See Section 5.1 for more general continuity results for convex functions. In what follows for notational economy we set (f)X =

1 ∫ fdμ μ(X)

(2.3.5)

X

being the average of f over X. The convexity of φ implies that there exists η ∈ ℝ such that η(t − (f)X ) ≤ φ(t) − φ((f)X ) for all t ∈ ℝ .

(2.3.6)

So, if t = f(x), then, due to (2.3.5), η (∫ fdμ − (f)X μ(X)) = 0 ≤ ∫(φ ∘ f)dμ − φ((f)X )μ(X) . X

X

This yields φ(

1 1 ∫ fdμ) ≤ ∫(φ ∘ f)dμ . μ(X) μ(X) X

X

Finally, if φ is strictly convex, then (2.3.6) is a strict inequality for all t ≠ (f)X . If f is not constant, then f(x) − (f)X takes on both positive and negative values on sets of positive measure. Therefore, we cannot have equality. Now let us state some consequences of theses inequalities. The first is a consequence of Hölder’s inequality; see Theorem 2.3.12. Proposition 2.3.16. If (X, Σ, μ) is a measure space, 1 ≤ p k ≤ ∞ for all k = 1, . . . , n, ∑nk=1 1/p k = 1/r ≤ 1 and f k ∈ L p k (X) for all k = 1, . . . , n, then ∏nk=1 f k ∈ L r (X) and n ∏k=1 f k r ≤ ∏nk=1 ‖f k ‖p k .

116 | 2 Measure Theory Proof. Let F = {k ∈ {1, . . . , n} : p k < ∞} and assume that F ≠ 0 or otherwise the result is clear. Then n ∏ f k ≤ ∏ f k ∏ ‖f k ‖∞ and ∑ 1 = 1 . p r k=1 r k∈F r k∈F̸ k∈F k So we may assume that F = {1, . . . , n}. First consider the case n = 2. By hypothesis one obtains r r + =1. p1 p2 Applying Hölder’s inequality for p = p1 /r and p = p2 /r to the functions |f1 |r , |f2 |r leads to ‖f1 f2 ‖rr ≤ ‖f1 ‖rp1 ‖f2 ‖rp2 . That shows the proof for n = 2. When n > 2, we argue by induction. So let 1/ϑ = ∑nk=2 1/p k . Hence 1/r = 1/p1 + 1/ϑ. Assuming that the result holds for n − 1, we have, by the induction assumptions and the validity of the case n = 2, that n n n n ∏ f k ≤ ‖f1 ‖p1 ∏ f k ≤ ‖f1 ‖p1 ∏ ‖f k ‖p k = ∏ ‖f k ‖p k . k=1 r k=2 ϑ k=2 k=1 Another useful consequence of Hölder’s inequality (see Theorem 2.3.12) is the so-called “Interpolation inequality.” Proposition 2.3.17 (Interpolation inequality). If (X, Σ, μ) is a measure space, 1 ≤ p ≤ q ≤ ∞ and f ∈ L p (X) ∩ L q (X), then f ∈ L r (X) for all p ≤ r ≤ q and ‖f‖r ≤ ‖f‖tp ‖f‖1−t q with t 1−t 1 = + r p q

with t ∈ [0, 1] .

(2.3.7)

r−p

Proof. If q = ∞, then t = p/r and |f|r ≤ ‖f‖∞ |f|p . Hence 1− p

p

‖f‖r ≤ ‖f‖∞ r ‖f‖pr = ‖f‖tp ‖f‖1−t ∞ . So, suppose now that q < ∞. Consider the conjugate exponents p/(tr), q/((1 − t)r); see (2.3.7). Then by applying Hölder’s inequality (see Theorem 2.3.12), it follows (1−t)r

‖f‖rr = ∫ |f|r dμ = ∫ |f|tr |f|(1−t)r dμ ≤ ‖f‖tr p ‖f‖q X

,

X

which gives ‖f‖r ≤ ‖f‖tp ‖f‖1−t q . In finite measure spaces, by using Hölder’s inequality, we can show that the L p -spaces decrease as p increases. Proposition 2.3.18. If (X, Σ, μ) is a finite measure space and 1 ≤ p ≤ q ≤ ∞, then L q (X) ⊆ L p (X) and ‖f‖p ≤ ‖f‖q μ(X)1/p−1/q .

2.3 Convergence Theorems and L p -Spaces

| 117

Proof. First assume that q = ∞. Then for f ∈ L∞ (X) we have p

p

‖f‖p = ∫ |f|p dμ ≤ ‖f‖∞ μ(X) . X

Next assume that q < ∞. Consider the conjugate exponents q/p and q/(q − p) and apply Hölder’s inequality for them and f ∈ L p (X) as well as 1. This gives p

1

p

1

q ‖f‖p = ∫ |f|p dμ ≤ ‖|f|p ‖ pq ‖1‖ p−q = ‖f‖q μ(X) p − q < +∞ .

X

Now we turn our attention to the Minkowski inequality; see Theorem 2.3.14. Evidently this inequality implies that (L p (X), ‖ ⋅ ‖p ) with 1 ≤ p ≤ ∞ is a normed space. In fact, it is a complete normed space, that is, a Banach space. Theorem 2.3.19. If (X, Σ, μ) is a measure space and 1 ≤ p ≤ ∞, then (L p (X), ‖ ⋅ ‖p ) is a Banach space. Proof. First assume that p = ∞. Let {f n }n≥1 ⊆ L∞ (X) be a Cauchy sequence. From Definition 2.3.10 we obtain |f n (x) − f m (x)| ≤ ‖f n − f m ‖∞

for μ-a.a. x ∈ X and for all n, m ∈ ℕ .

This gives {f n (x)}n≥1 ⊆ ℝ is a Cauchy sequence for all x ∈ X \ A with μ(A) = 0. Then, for all x ∈ X \ A, f n (x) → f(x). Let f(x) = 0 for x ∈ A. From Proposition 2.2.12 we know that f is Σ-measurable and |f(x) − f m (x)| ≤ sup ‖f n − f m ‖∞ ≤ 1 n≥m

for m ∈ ℕ large enough and for all x ∈ X \ A. This yields ‖f‖∞ ≤ ‖f m ‖∞ + 1 for m ∈ ℕ large enough. Hence, f ∈ L∞ (X) and so L∞ (X) is a Banach space. Next assume that 1 ≤ p < ∞. Let {f n }n≥1 ⊆ L p (X) be a Cauchy sequence. Recall that a Cauchy sequence is convergent if it has a convergent subsequence. So we may assume that ‖f m − f n ‖p

n with m ∈ ℕ .

(2.3.8)

Let A(n) = {x ∈ X : |f n (x) − f n+1 (x)| ≥ 1/n2 }. Then χ A(n) 1/n2 ≤ |f n − f n+1 | for all n ∈ ℕ. Thus, because of (2.3.8), μ(A(n))

1 ≤ ∫ |f n − f n+1 |p dμ < 2−np n2p

for all n ∈ ℕ .

X

Therefore

n2p < +∞ . 2np n≥1

∑ μ(A(n)) ≤ ∑ n≥1

118 | 2 Measure Theory Let C(n) = ⋃m≥n A(m). Then {C(n)}n≥1 is decreasing and μ(C(n)) → 0 as n → ∞. Hence, if C = ⋂n≥1 C(n), then μ(C) = 0 and for x ∈ X \ C we have |f n (x) − f m (x)| ≤

1 n2

for all n ∈ ℕ large enough .

Then for any m > n it holds that |f m (x) − f n (x)| ≤ ∑k≥n 1/k2 → 0 as n → ∞. So it follows that, for μ-a.a. x ∈ X, {f n }n≥1 is a Cauchy sequence and so it converges to some f(x). On the exceptional μ-null set, we put f(x) = 0. Clearly f is measurable and by Fatou’s Lemma (see Theorem 2.3.6), one gets ∫ |f|p dμ ≤ lim inf ∫ |f n |p dμ < ∞ n→∞

X

X

since a Cauchy sequence is bounded. Hence, f ∈ L p (X). Similarly, we obtain ∫ |f − f n |p dμ ≤ lim inf ∫ |f m − f n |p dμ , m→∞

X

X

which implies that f n → f in L p (X). A useful consequence of the result above is the following corollary. Corollary 2.3.20. If (X, Σ, μ) is a measure space, {f n }n≥1 ⊆ L p (X) with 1 ≤ p ≤ ∞, and f n → f in L p (X), then there is a subsequence {f n k }k≥1 of {f n }n≥1 such that f n k (x) → f(x) μ-a.e. Example 2.3.21. We have to pass to a subsequence to get pointwise convergence. To see this, consider the sequence f k = χ[(i−1)/n,i/n] for k = i + (n(n − 1))/2 with n ∈ ℕ 1 p and i = 1, . . . , n. Then ∫0 f k dλ = 1/n → 0, that is, f n → 0 in L p [0, 1]. However, lim inf k→∞ f k (x) = 0 < 1 = lim supk→∞ f k (x) for all x ∈ [0, 1] and so we do not have pointwise convergence. The next result provides a useful dense subset of the Banach space L p (X). It is a straightforward consequence of Proposition 2.2.18. Proposition 2.3.22. If (X, Σ, μ) is a measure space, then the set of simple functions in L p (X) is dense in L p (X) for 1 ≤ p ≤ ∞. We continue with the examination of the Banach spaces L p (X) for 1 ≤ p ≤ ∞. Next we examine under what conditions we can have separability of L p (X). We start with a definition. Definition 2.3.23. Let (X, Σ, μ) be a measure space. On Σ we define the semimetric d μ (A, B) = μ(A △ B)

for all A, B ∈ Σ .

2.3 Convergence Theorems and L p -Spaces

| 119

According to Remark 1.5.2 if we introduce on Σ the equivalence relation ∼ defined by A ∼ B if and only if μ(A △ B) = 0, then, on Σ(μ) = Σ/ ∼, d μ is a metric. Clearly we have d μ (A, B) = ‖χ A − χ B ‖1

for all A, B ∈ Σ(μ) .

Proposition 2.3.24. If (X, Σ, μ) is a measure space, then (Σ(μ), d μ ) is a separable metric space if and only if the Banach space L1 (X) is separable. Proof. ⇒: Let {A k }k≥1 ⊆ Σ(μ) be a countable d μ -dense subset. Then the set of all functions that are finite linear combinations of {χ A k }k≥1 with rational coefficients is a countable dense subset of L1 (X). Hence L1 (X) is separable. ⇐: By identifying an element of Σ with its characteristic function, we see that Σ(μ) can be viewed as a subset of L1 (X). Then the separability of L1 (X) implies the separability of Σ(μ). The next proposition provides a condition for the separability of (Σ(μ), d μ ). Proposition 2.3.25. If (X, Σ, μ) is a finite measure space and Σ = σ(L) with L being countable, then (Σ(μ), d μ ) is separable. Proof. Note that the ring generated by L is still countable. So we may assume that L is a ring. Then, using Problem 2.3, for every A ∈ Σ(μ) we can find B ∈ L such that d μ (A, B) = μ(A △ B) ≤ ε. Hence L is d μ -dense in Σ(μ) and so (Σ(μ), d μ ) is separable. Corollary 2.3.26. If X is a separable metric space, Σ = B(X) and μ is a finite measure on Σ, then (Σ(μ), d μ ) is separable. In fact combining Propositions 2.3.18, 2.3.24, and 2.3.25, we can state the following result. Proposition 2.3.27. If (X, Σ, μ) is a σ-finite measure space, Σ = σ(L) with L countable and a is the smallest algebra containing L, then the simple functions of the form s = ∑nk=1 a k χ A k with n ∈ ℕ, a k ∈ ℚ, A k ∈ a, μ(A k ) < ∞, k = 1, . . . , n form a countable dense subset of L p (X) for 1 ≤ p < ∞. In particular, L p (X) is separable for 1 ≤ p < ∞. For the space L∞ (X) we show that it is not separable. In order to show this first we mention the following decomposition result, which can be found in Dudley [90, p. 82]. Proposition 2.3.28. If (X, Σ, μ) is a σ-finite measure space, then μ = μ a + μ d with μ a purely atomic and μ d nonatomic. Moreover the atoms on which μ a is defined are at most countable. We can use this result to establish the nonseparability of L∞ (X). Proposition 2.3.29. If (X, Σ, μ) is a σ-finite measure space, then the Banach space L∞ (X) is not separable.

120 | 2 Measure Theory Proof. Applying Proposition 2.3.28, we split X into its atomic part X a and its nonatomic (diffuse) part X d . We consider two distinct cases: (a) X d is not μ-null. (b) X d is μ-null. Suppose that (a) holds. Then for each η ∈ (0, μ(X d )) there exists A η ∈ Σ such that μ(A η ) = η; see Proposition 2.1.32. Then {A η }η∈(0,μ(X d )) is an uncountable set of distinct Σ-sets, that is, μ(A η △ A η ) > 0 if η ≠ η . Let U η = {f ∈ L∞ (X) : ‖f − χ A η ‖∞

0 μ({x ∈ X : |f n (x) − f(x)| ≥ ε}) → 0 as n → ∞ . μ

We denote the convergence in measure by f n → f . If μ is a probability measure, that is, μ(X) = 1, then we say that the sequence {f n }n≥1 converges in probability to f . We say that the sequence {f n }n≥1 is a Cauchy sequence in measure if for every ε > 0, lim μ({x ∈ X : |f n (x) − f m (x)| ≥ ε}) = 0 .

n,m→∞

The following proposition is a straightforward consequence of the definition above. Proposition 2.3.31. If (X, Σ, μ) is a measure space, then the following hold: μ

μ

μ

(a) f n → f and h n → h imply ηf n + ϑh n → ηf + ϑh for all η, ϑ ∈ ℝ; μ

μ

μ

(b) f n → f implies f n± → f ± and |f n | → |f|; μ

μ

(b) f n → f and f n → g imply f = g μ-a.e.

2.3 Convergence Theorems and L p -Spaces

| 121

μ

Proposition 2.3.32. If (X, Σ, μ) is a finite measure space and f n → f μ-a.e., then f n → f . Proof. For every n ∈ ℕ, let A n = {x ∈ X : |f n (x) − f(x)| ≥ ε} = {x ∈ X :

|f n (x) − f(x)| ε ≥ } . 1 + |f n (x) − f(x)| 1 + ε

(2.3.9)

This gives μ(A n ) ≤ (1 + ε)/ε ∫X (|f n − f|)/(1 + |f n − f|)dμ by the Markov inequality; see Proposition 2.2.41. But from the Lebesgue Dominated Convergence Theorem (see Theorem 2.3.8), it follows 1+ε |f n − f| dμ → 0 ∫ ε 1 + |f n − f|

as n → ∞ .

X

μ

Hence μ(A n ) → 0 and so f n → f ; see (2.3.9). In fact in finite measure spaces convergence in measure is strictly weaker than pointwise convergence. Example 2.3.33. Let X = [0, 1], Σ = B([0, 1]), μ = λ[0,1] with λ being the Lebesgue measure on ℝ. Consider the sequence of Σ-measurable functions f n (x) = χ[

i 2k

, i+1 k ] 2

(x) for all i ∈ {0, 1, . . . , 2k − 1}, n = i + 2k .

It follows that λ({x ∈ [0, 1] : |f n (x)| ≥ ε}) =

1 →0 2k

as n = n(k) → +∞ .

μ

Hence, f n → 0. But the pointwise limit of the f n ’s does not exist at any x ∈ [0, 1]. The following is a variant of the Markov inequality (see Proposition 2.2.41) and is known as the “Chebyshev inequality.” Proposition 2.3.34 (Chebyshev inequality). If (X, Σ, μ) is a measure space, f ∈ L p (X), 1 ≤ p < ∞, and λ > 0, then μ({x ∈ X : |f(x)| ≥ λ}) ≤

1 p ‖f‖p . λp

p

Proof. Let A λ = {x ∈ X : |f(x)| ≥ λ}. Then ‖f‖p ≥ ∫A |f|p dμ ≥ λ p μ(A λ ). λ

Using the Chebyshev inequality we can compare convergence in L p (X) for 1 ≤ p < ∞ with convergence in measure. Proposition 2.3.35. If (X, Σ, μ) is a measure space, {f n }n≥1 ⊆ L p (X) with 1 ≤ p < ∞, μ

and ‖f n − f‖p → 0, then f n → f . Proof. Applying the Chebyshev inequality (see Proposition 2.3.34) yields the assertion of the proposition.

122 | 2 Measure Theory Although convergence in measure is strictly weaker than pointwise convergence, we can always extract from any convergent sequence in measure a pointwise convergent subsequence. μ

Proposition 2.3.36. If (X, Σ, μ) is a measure space and f n → f , then there exists a subsequence {f n k }k≥1 ⊆ {f n }n≥1 such that f n k → f μ-a.e. μ

Proof. Since f n → f there is a strictly increasing sequence {k n }n≥1 ⊆ ℕ such that μ ({x ∈ X : |f k (x) − f(x)| ≥

1 1 }) < n n 2

for all k ≥ k n .

For each n ∈ ℕ, let A n = {x ∈ X : |f k n (x)−f(x)| ≥ 1/n} ∈ Σ. We set A = ⋂k≥1 ⋃n≥k A n ∈ Σ. Then we have μ(A) ≤ μ ( ⋃ A n ) ≤ ∑ μ(A n ) ≤ n≥k

n≥k

1 2k+1

for every k ∈ ℕ .

Hence, μ(A) = 0. If x ∈ ̸ A, then there exists k0 ∈ ℕ such that x ∈ ̸ ⋃n≥k0 A n and so |f k n (x) − f(x)| < 1/n for all n ≥ k0 . Thus f k n (x) → f(x) for all x ∈ ̸ A with μ(A) = 0. Definition 2.3.37. Let (X, Σ, μ) be a measure space and let M(X) = {f : X → ℝ∗ : f is Σ-measurable}. As before, we define f ∼ h if and only if f = h μ-a.e. Then we set L0 (X) = M(X)/ ∼. When μ(X) < ∞ on L0 (X) we introduce the translation invariant metric |f − h| d μ (f, h) = ∫ dμ for all f, h ∈ L0 (X) . (2.3.10) 1 + |f − h| X

Remark 2.3.38. It is easy to check that d μ is a metric on L0 (X). For the triangle inequality, use the elementary inequality that says that a b c ≤ + . 1+a 1+b 1+c In the next proposition we show that in finite measure spaces, convergence in measure is in fact a metric convergence. a, b, c ∈ ℝ+ , a ≤ b + c

implies

Proposition 2.3.39. If (X, Σ, μ) is a finite measure space and {f n }n≥1 ⊆ L0 (X), f ∈ L0 (X), dμ

μ

then f n → f if and only if f n → f in L0 (X); see (2.3.10). Proof. In what follows for a given ε > 0 let A n = {x ∈ X : |f n (x) − f(x)| ≥ ε} = {x ∈ X :

|f n (x) − f(x)| ε ≥ },n ∈ ℕ. 1 + |f n (x) − f(x)| 1 + ε

(2.3.11)

μ

Suppose that f n → f . Then we can find n0 ∈ ℕ such that μ(A n ) ≤ ε

for all n ≥ n0 .

(2.3.12)

2.3 Convergence Theorems and L p -Spaces

| 123

Then, because of (2.3.11) and (2.3.12), it follows |f n − f| |f n − f| dμ + ∫ dμ d μ (f n , f) = ∫ 1 + |f n − f| 1 + |f n − f| An

X\A n

ε ≤ μ(A n ) + μ(X \ A n ) ≤ (1 + μ(X))ε 1+ε for all n ≥ n0 . This gives d μ (f n , f) → 0 as n → ∞. dμ

Now assume that f n → f . Then ε/(1 + ε)χ A n ≤ (f n − f)/(1 + |f n − f|) for all n ∈ ℕ; see μ

(2.3.11). This implies μ(A n ) ≤ (1 + ε)/(ε)d μ (f n , f) → 0 as n → ∞. Hence f n → f . The next notion will allow us to relax the dominating function requirement in the Lebesgue Dominated Convergence Theorem; see Theorem 2.3.8. Definition 2.3.40. Let (X, Σ, μ) be a measure space and F ⊆ L0 (X). We say that F is uniformly integrable if for every ε > 0 there exists D ε ∈ Σ with μ(D ε ) < ∞ and supf ∈F ∫X\D |f|dμ ≤ ε as well as limc→∞ supf ∈F ∫{|f|≥c} |f|dμ = 0. ε

Remark 2.3.41. In the literature one can find other definitions of uniform integrability that are equivalent to the definition above when μ(X) < ∞. Some of these alternative definitions are examined in the exercises. In particular we mention the following equivalent definition for a set F ⊆ L1 (X) to be uniformly integrable: (UI)’(a) F ⊆ L1 (X) is bounded, that is supf ∈F ‖f‖1 < ∞; (b) for every ε > 0 there exists D ε ∈ Σ with μ(D ε ) < ∞ such that supf ∈F ∫X\D |f|dμ ε ≤ ε; (c) for every ε > 0 there exists δ > 0 such that μ(A) ≤ δ implies supf ∈F ∫A |f|dμ ≤ ε. The next result is a key property of the Lebesgue integral and will help us identify uniformly integrable subsets of L1 (X). The result is referred to as the absolute continuity property of the integral. Proposition 2.3.42. If (X, Σ, μ) is a measure space and f ∈ L1 (X), then for any given ε > 0 there exists δ = δ(ε) > 0 such that A ∈ Σ, μ(A) ≤ δ

implies

∫ |f|dμ ≤ ε . A

Proof. Since f = f + − f − , without any loss of generality, we may assume that f ≥ 0. Let f n = min{f, n} with n ∈ ℕ. Then f n ↗ f and so by the Monotone Convergence Theorem (Theorem 2.3.3), we have ∫X f n dμ ↗ ∫X fdμ. So, given ε > 0 there exists n0 = n0 (ε) ∈ ℕ such that ε 0 ≤ ∫(f − f n )dμ ≤ for all n ≥ n0 . (2.3.13) 2 X

If δ = ε/(2n0 ) and A ∈ Σ satisfies μ(A) ≤ δ, then, due to (2.3.13), ∫ fdμ ≤ ∫ f n0 dμ + ∫(f − f n0 )dμ ≤ ε . A

A

X

124 | 2 Measure Theory Corollary 2.3.43. If (X, Σ, μ) is a measure space and F ⊆ L0 (X) satisfies |f(x)| ≤ h(x) for μ-a.a. x ∈ X and for all f ∈ F with h ∈ L1 (X) , then F is uniformly integrable. In particular, every finite set F ⊆ L1 (X) is uniformly integrable. Now we can state the generalization of the Lebesgue Dominated Convergence Theorem; see Theorem 2.3.8. The result is known as the “Vitali Convergence Theorem” or “Extended Dominated Convergence Theorem.” Theorem 2.3.44 (Vitali Convergence Theorem). If (X, Σ, μ) is a measure space, {f n }n≥1 ⊆ μ

L1 (X) is uniformly integrable and f n → f as n → ∞, then f ∈ L1 (X) and ‖f n − f‖1 → 0. In particular, we have ∫X f n dμ → ∫X fdμ. Proof. On account of Proposition 2.3.36, we may assume that f n → f μ-a.e. Given ε > 0, let δ > 0 and D ε ∈ Σ be as postulated by (UI) ; see Remark 2.3.41. Moreover, thanks to Egorov’s Theorem, Theorem 2.2.32, we know that there exists A ε ∈ Σ with A ε ⊆ D ε and μ(A ε ) ≤ δ such that fn → f

uniformly on D ε \ A ε .

(2.3.14)

We have ∫ |f n − f|dμ = ∫ |f n − f|dμ + ∫ |f n − f|dμ Dε

Aε

D ε \A ε

≤ ∫ |f n |dμ + ∫ |f|dμ + ‖f n − f‖L∞ (D ε \A ε ) μ(D ε ) . Aε

(2.3.15)

Aε

Note that according to (UI) (see also Definition 2.3.40), it holds that ∫ |f n |dμ ≤ ε , Aε

∫ |f n |dμ ≤ ε

for all n ∈ ℕ .

(2.3.16)

X\D ε

Moreover, by Fatou’s Lemma, one gets ∫ |f|dμ ≤ ε , Aε

∫ |f|dμ ≤ ε .

(2.3.17)

X\D ε

Taking (2.3.15), (2.3.16) and (2.3.17) into account it follows that ∫ |f n − f|dμ ≤ ∫ |f n |dμ + ∫ |f|dμ + ∫ |f n − f|dμ X

X\D ε

X\D ε

Dε

≤ 4ε + ‖f n − f‖L∞ (D ε \A ε ) μ(D ε ) for all n ∈ ℕ . Hence, because of (2.3.14) and since μ(D ε ) is finite and ε > 0 is arbitrary, it follows that f n → f in L1 (X).

2.3 Convergence Theorems and L p -Spaces

| 125

Now that once we have the convergence theorems for the Lebesgue integral, we can establish the existence and uniqueness of the product measure. So, let (X, Σ, μ) and (Y, L, ν) be two measure spaces. Suppose that Σ = σ(a) and L = σ(b). We want to define a measure ξ on rectangles of the form A × B with A ∈ a and B ∈ b such that ξ(A × B) = μ(A)ν(B) for all A ∈ a, B ∈ b .

(2.3.18)

If the generators a and b are rich enough, we can have the uniqueness of the measure ξ satisfying (2.3.18). Proposition 2.3.45. If (X, Σ, μ) and (Y, L, ν) are two measure spaces, Σ = σ(a), L = σ(b) and (i) a and b are closed under finite intersections; (ii) there exists sequences {A n }n≥1 ⊆ a, {B n }n≥1 ⊆ b with A n ↗ X, B n ↗ Y and μ(A n ) < ∞, ν(B n ) < ∞ for all n ∈ ℕ, then there is at most on measure ξ on Σ ⨂ L satisfying (2.3.18). Proof. From Proposition 2.2.25 we know that Σ ⨂ L = σ(a × b). Moreover we have An × Bn ↗ X × Y

and

ξ(A n × B n ) = μ(A n )ν(B n ) < ∞ for all n ∈ ℕ .

Proposition 2.1.28 implies the uniqueness of ξ . Now we examine the issue of the existence of the product measure. Theorem 2.3.46. If (X, Σ, μ) and (X, L, ν) are two σ-finite measure spaces, then the set function ξ : Σ × L → [0, +∞] defined by ξ(A × B) = μ(A)ν(B) for all A ∈ Σ, B ∈ L, extends uniquely to a σ-finite measure on Σ ⨂ L such that ξ(C) = ∫ ∫ χ C (x, y)dμdν = ∫ ∫ χ C (x, y)dνdμ Y X

for all C ∈ Σ ⨂ L

X Y

and x → χ C (x, y), y → χ C (x, y), x → ∫Y χ C (x, y)dν and y → ∫X χ C (x, y)dμ are measurable. Proof. Uniqueness follows from Proposition 2.3.45. Consider sequences {A n }n≥1 ⊆ Σ and {B n }n≥1 ⊆ L such that An ↗ X ,

Bn ↗ Y

and

μ(A n ) < ∞ ,

ν(B n ) < ∞

for all n ∈ ℕ .

Note that C n = A n × B n ↗ X × Y. For every n ∈ ℕ, let D n be the family of all subsets E ⊆ X × Y such that – x → χ E∩C n (x, y) and y → χ E∩C n (x, y) are measurable. – x → ∫Y χ E∩C n (x, y)dν and y → ∫X χ E∩C n (x, y)dμ are measurable. – ∫Y ∫X χ E∩C n (x, y)dμdν = ∫X ∫Y χ E∩C n (x, y)dνdμ. It is a straightforward procedure to check that D n is a Dynkin system; see Definition 2.1.7, which contains Σ × L. So, applying the Dynkin System Theorem (see Theorem 2.1.11)

126 | 2 Measure Theory yields that Σ ⨂ L ⊆ D n for all n ∈ ℕ. Since C n ↗ X × Y, Proposition 2.2.10 implies the measurability of x → χ C (x, y) and y → χ C (x, y) and then the Monotone Convergence Theorem (see Theorem 2.3.3) gives the measurability of x → ∫Y χ C (x, y)dν and of y → ∫X χ C (x, y)dμ. Finally, if E = X × Y, then we have that C → ξ(C) = ∫ ∫ χ C (x, y)dμdν = ∫ ∫ χ C (x, y)dνdμ Y X

X Y

is indeed a measure on Σ ⨂ L and ξ(A × B) = μ(A)ν(B) for all A ∈ Σ and for all B ∈ L. Definition 2.3.47. Let (X, Σ, μ) and (X, L, ν) be two σ-finite measure spaces. The unique measure ξ on Σ ⨂ L produced in Theorem 2.3.46 is called the product measure of μ and ν and is denoted by μ × ν. The measure space (X × Y, Σ ⨂ L, μ × ν) is called the product measure space. Remark 2.3.48. Now we can define the Lebesgue measure λ n on (ℝn , B(ℝn )) such that n

n

λ n (R) = ∏(b k − a k ) for all rectangles R = ∏[a k , b k ) . k=1

k=1

The next two theorems enable us to interchange the order of integration and to calculate integrals with respect to product measures using iteration. Their proofs are straightforward. Indeed, the results are true for characteristic functions, hence for simple functions. Then exploit the density of the simple functions to pass to the general case. The first result is known as “Tonelli’s Theorem.” Theorem 2.3.49 (Tonelli’s Theorem). If (X, Σ, μ) and (X, L, ν) are two σ-finite measure spaces and if f : X × Y → [0, ∞] is Σ ⨂ L-measurable, then the following hold: (a) for all y ∈ Y, x → f(x, y) is Σ-measurable and for all x ∈ X, y → f(x, y) is Lmeasurable; (b) x → ∫X f(x, y)dν is Σ-measurable and y → ∫X f(x, y)dμ is L-measurable; (c) ∫X×Y fd(μ × ν) = ∫Y ∫X f(x, y)dμdν = ∫X ∫Y f(x, y)dνdμ. The second is known as “Fubini’s Theorem.” Theorem 2.3.50 (Fubini’s Theorem). If (X, Σ, μ) and (X, L, ν) are two σ-finite measure spaces, f : X × Y → ℝ∗ is Σ ⨂ L-measurable and at least one of the following three integrals is finite ∫ |f|d(μ × ν) , X×Y

∫ ∫ |f|dμdν ,

∫ ∫ |f|dνdμ ,

Y X

X Y

then all three integrals are finite, f ∈ L1 (X × Y) and (a) x → f(x, y) ∈ L1 (X) for ν-a.a. y ∈ Y; (b) y → f(x, y) ∈ L1 (Y) for μ-a.a. x ∈ X;

2.4 Signed Measures and Radon–Nikodym Theorem | 127

(c) y → ∫X f(x, y)dμ ∈ L1 (Y); (d) x → ∫Y f(x, y)dν ∈ L1 (X); (e) ∫X×Y fd(μ × ν) = ∫Y ∫X f(x, y)dμdν = ∫X ∫Y f(x, y)dνdμ.

2.4 Signed Measures and Radon–Nikodym Theorem In this section we examine the notion of differentiating a measure ν with respect to another measure μ defined on the same σ-algebra. This differentiation theory can be developed more precisely if we extend the notion of measure and allow also negative values. This leads us to the concept of signed measure already introduced in Definition 2.1.22(f). For convenience, let us recall the definition here. Definition 2.4.1. Let (X, Σ) be a measurable space and μ : Σ → ℝ∗ is a set function. We say that μ is a signed measure if the following hold: (a) μ(0) = 0; (b) μ takes at most one of the values +∞ and −∞, that is, either μ : Σ → (−∞, +∞] or μ : Σ → [−∞, +∞); (c) for every sequence {A n }n≥1 ⊆ Σ of pairwise disjoint sets, we have μ ( ⋃ A n ) = ∑ μ(A n ) . n≥1

(2.4.1)

n≥1

Remark 2.4.2. If μ (⋃n≥1 A n ) is finite in (2.4.1), then the sum on the right-hand side must converge independently of any rearrangement since the left-hand side is independent of the order of the terms. So the sum in (2.4.1) converges absolutely. Note that if μ1 , μ2 are two measures on Σ and at least one of them is finite, then μ = μ1 − μ2 is a signed measure. Straightforward modifications in the proofs of Propositions 2.1.26 and 2.1.27 lead to the following characterization of signed measures. Proposition 2.4.3. If (X, Σ) is a measurable space and μ : Σ → ℝ is an additive set function such that μ(0) = 0, then μ is a signed measure if and only if one of the following equivalent properties holds: (a) {A n }n≥1 ⊆ Σ and A n ↗ A imply μ(A n ) → μ(A); (b) {A n }n≥1 ⊆ Σ and A n ↘ A imply μ(A n ) → μ(A); (c) {A n }n≥1 ⊆ Σ and A n ↘ 0 imply μ(A n ) → 0. As we will see in the sequel, in order to study signed measures it is convenient to write them as differences of measures. For this reason we state the following definition. Definition 2.4.4. Let (X, Σ) be a measurable space and μ : Σ → ℝ∗ is a signed measure. A set A ∈ Σ is said to be a positive (resp. negative) set for μ, if μ(B) ≥ 0 (resp. μ(B) ≤ 0) for all B ∈ Σ, B ⊆ A.

128 | 2 Measure Theory Example 2.4.5. Suppose that (X, Σ, μ) is a measure space and let f : X → ℝ∗ be a Σ-measurable function such that at least one of ∫X f + dμ and ∫X f − dμ is finite. Then the set function ν : Σ → ℝ∗ defined by ν(A) = ∫A fdμ = ∫X fχ A dμ is a signed measure and a set A ∈ Σ is positive (resp. negative, null) for ν if f ≥ 0 (resp. f ≤ 0, f = 0) μ-a.e. on A. It can happen that a set has positive μ-measure with μ being a signed measure but the set is not positive for μ. Example 2.4.6. Let X = ℝ and Σ = B(X). Consider f : ℝ → ℝ to be an odd function that is λ-integrable where λ denotes the Lebesgue measure. Assume that f(x) > 0 for all x > 0. Then ν(A) = ∫A fdλ is a signed measure (see Example 2.4.5), and any set of the form [−a, b] with 0 < a < b has positive ν-measure without being a positive set for ν. Next we will describe the structure of signed measures. We will show that X is the union of two disjoint sets, one positive and the other one negative. We start with a proposition for positive sets. Proposition 2.4.7. If (X, Σ) is a measurable space, μ : Σ → ℝ∗ is a signed measure and A ∈ Σ is a positive set for μ, then any B ∈ Σ, B ⊆ A is also a positive set for μ. Moreover, the union of any countable family of positive sets for μ is a positive set for μ. Proof. The first part of the conclusion is an immediate consequence from Definition 2.4.4. Suppose that {A n }n≥1 ⊆ Σ are positive sets for μ. Let C n = A n \ ⋃n−1 k=1 A k . Then C n ∈ Σ, C n ⊆ A n and so from the first part C n is positive for μ. Note that ⋃n≥1 A n = ⋃n≥1 C n and the C n ’s are mutually disjoint. So, if B ∈ Σ, B ⊆ ⋃n≥1 A n , then, by the σ-additivity of μ, μ(B) = ∑n≥1 μ(B ∩ C n ). Hence, μ(B) ≥ 0. So, we conclude that ⋃n≥1 A n ∈ Σ is a positive set for μ. Now we can state the following important theorem for signed measures. The result is known as the “Hahn Decomposition Theorem.” Theorem 2.4.8 (Hahn Decomposition Theorem). If (X, Σ) is a measurable space and μ : Σ → ℝ∗ is a signed measure, then there exists a positive set P ∈ Σ and a negative set N ∈ Σ such that X = P ∪ N and P ∩ N = 0. Moreover, if P , N is another such positivenegative decomposition of X, then P △ P = N △ N is μ-null. Proof. Without any loss of generality we may assume that μ has values in [−∞, +∞); see Definition 2.4.1. We define η = sup [μ(A) : A ∈ Σ, A is a positive set for μ] ≥ 0 .

(2.4.2)

Let {A n }n≥1 ⊆ Σ be a sequence of positive sets such that μ(A n ) → η. Let P = ⋃n≥1 A n . Then Propositions 2.4.7 and 2.4.3 imply that P is positive for μ and μ(P) = η < +∞ .

(2.4.3)

2.4 Signed Measures and Radon–Nikodym Theorem | 129

Let N = X \ P. We claim that N is a negative set for μ. Arguing by contradiction, suppose that N is not negative for μ. First we show that N cannot contain a positive set that is not μ-null. Indeed, if A ⊆ N is positive and μ(A) > 0, then A ∪ P is positive (see Proposition 2.4.7), and μ(A ∪ P) = μ(A) + μ(P) ≥ η (see (2.4.3)), a contradiction to the definition of η ≥ 0 (see (2.4.2)). Second, if A ⊆ N and μ(A) > 0, then there exists B ∈ Σ, B ⊆ A with μ(B) > μ(A). Indeed, since A is not positive, we can find C ∈ Σ, C ⊆ A with μ(C) < 0. Then if B = A \ C, we have μ(B) = μ(A) − μ(C) > μ(A). Since we have assumed that N is not a negative set for μ, we can produce a sequence {A n }n≥1 ⊆ Σ with A n ⊆ N for all n ∈ ℕ and a sequence {k n }n≥1 ⊆ ℕ as follows: k1 is the smallest natural number for which we can find B ∈ Σ, B ⊆ N with μ(B) > 1/k1 . We set A1 = B. Continuing inductively, let k n be the smallest natural number for which we can find B ∈ Σ, B ⊆ A n−1 with μ(B) ≥ μ(A n−1 ) + 1/k n . We set A n = B. Let A = ⋂n≥1 A n . Then by Proposition 2.4.3, it follows that ∞ > μ(A) = limn→∞ μ(A n ) ≥ ∑n≥1 1/k n , which gives k n → ∞. But as before, there exists B ∈ Σ, B ⊆ A with μ(B) ≥ μ(A) + 1/k for some k ∈ ℕ. Then for large enough n ∈ ℕ, we have k < k n and B ⊆ A n−1 , a contradiction to the construction of the sequences {A n }n≥1 ⊆ Σ and {k n }n≥1 ⊆ ℕ. It follows that N is negative for μ. Finally suppose that P , N is another such positive-negative pair. We have P\P ⊆ P and P \ P ⊆ N , which yields that P \ P is both positive and negative for μ; see Proposition 2.4.7. This gives μ(P \ P ) = 0. Similarly we can show this for the set P \ P. This completes the proof of the theorem. Remark 2.4.9. The pair (P, N) is called a Hahn decomposition for the signed measure μ. The Hahn decomposition will lead us to a canonical decomposition of a signed measure. First we state a definition that is central in our considerations in this section. Definition 2.4.10. Let (X, Σ) be a measurable space and μ, ν : Σ → [0, +∞] be two measures. (a) We say that μ and ν are mutually singular denoted by μ⊥ν if there exists two disjoint sets X μ , X ν ∈ Σ such that X = X μ ∪ X ν and for every A ∈ Σ, it holds that μ(A) = μ(A ∩ X μ )

and

ν(A) = ν(A ∩ X ν ) .

(b) We say that ν is absolutely continuous with respect to μ denoted by ν ≪ μ if for every A ∈ Σ with μ(A) = 0 it holds that ν(A) = 0. Proposition 2.4.11. If (X, Σ) is a measurable space and μ, ν : Σ → [0, +∞] are two measures with ν being finite, then ν ≪ μ if and only if for every ε > 0 there exists δ > 0 such that A ∈ Σ and μ(A) ≤ δ imply ν(A) ≤ ε .

(2.4.4)

130 | 2 Measure Theory Proof. ⇒: Arguing by contradiction suppose that the implication is not true. Then there exist ε > 0 and a sequence {A n }n≥1 ⊆ Σ such that 1 μ(A n ) ≤ n and ν(A n ) ≥ ε for all n ∈ ℕ . (2.4.5) 2 Set B k = ⋃n≥k A n ∈ Σ and B = ⋂k≥1 B k ∈ Σ. Then μ(B) ≤ μ(B k ) ≤ ∑ n≥k

1 1 →0 = 2n 2k+1

as k → +∞ .

Hence, μ(B) = 0 .

(2.4.6)

On the other hand since ν is finite, Proposition 2.1.24(f) gives ν(B) = lim ν(B n ) ≥ lim ν(A n ) ≥ ε ; n→∞

n→∞

see (2.4.5). This contradicts the hypothesis that ν ≪ μ; see (2.4.6). ⇐: If A ∈ Σ with μ(A) = 0, then ν(A) ≤ ε for all ε > 0 and so ν(A) = 0. Therefore ν ≪ μ. Remark 2.4.12. From the proposition above, we infer that if ν is finite, then ν ≪ μ if and only if limμ(A)→0 ν(A) = 0. If ν is not finite, then only the implication “⇐” is valid in Proposition 2.4.11. Example 2.4.13. Let X = (0, 1), Σ = B((0, 1)) and μ = λ be the Lebesgue measure on (0, 1). Define ν(A) = ∫A 1/xdλ(x) for all A ∈ B((0, 1)). Then ν ≪ μ, but (2.4.4) fails. Now we will use the Hahn decomposition of X to produce a canonical representation of a signed measure as the difference of two measures. The result is known as the “Jordan Decomposition Theorem.” Theorem 2.4.14 (Jordan Decomposition Theorem). If (X, Σ) is a measurable space and μ : Σ → ℝ∗ is a signed measure, then there exist unique positive measures μ+ , μ− : Σ → [0, +∞] with at least one of them finite such that μ = μ+ − μ− and μ+ ⊥μ− . Proof. Let (P, N) be a Hahn decomposition for μ; see Theorem 2.4.8. We define μ+ (A) = μ(A ∩ P) and

μ− (A) = −μ(A ∩ N) for all A ∈ Σ .

Then we have μ = μ+ − μ− and μ+ ⊥μ− . Suppose that (ξ+ , ξ− ) is another pair of measures such that μ = ξ+ − ξ− and ξ+ ⊥ξ− . Let A, B ∈ Σ such that A ∩ B = 0, A ∪ B = X and ξ+ (B) = ξ− (A) = 0. Then X = A ∪ B is another Hahn decomposition for μ and so μ(P △ A) = 0; see Theorem 2.4.8. Therefore for any D ∈ Σ it follows that ξ+ (D) = ξ+ (D ∩ A) = μ(D ∩ A) = μ(D ∩ P) = μ+ (D) , which gives ξ+ = μ+ . Similarly we show that ξ− = μ− and this proves the uniqueness of the difference decomposition.

2.4 Signed Measures and Radon–Nikodym Theorem | 131

Definition 2.4.15. The measures μ+ and μ− from the proposition above are called the positive and negative variations of μ and μ = μ+ − μ− is called the Jordan decomposition of μ. The total variation of μ is the measure |μ| defined by |μ| = μ+ + μ− . Remark 2.4.16. For every A ∈ Σ we have μ+ (A) = sup [μ(C) : C ∈ Σ, C ⊆ A, C is positive] = sup[μ(C) : C ∈ Σ, C ⊆ A] , μ− (A) = − inf [μ(C) : C ∈ Σ, C ⊆ A, C is negative] = − inf[μ(C) : C ∈ Σ, C ⊆ A] , n

n

|μ|(A) = sup [ ∑ |μ(A k )| : n ∈ ℕ, {A k }nk=1 ⊆ Σ are disjoint and A = ⋃ A k ] . k=1

k=1

Moreover, using the Jordan decomposition, we can define the Lebesgue integral with respect to a signed measure. So, let (X, Σ) be a measurable space and let μ : Σ → ℝ∗ be a signed measure. Consider f : X → ℝ∗ a Σ-measurable function and A ∈ Σ. Suppose that at least one of the integrals ∫A dfμ+ and ∫A fdμ− is finite. Then the Lebesgue integral of f over A is defined as ∫ fdμ = ∫ fdμ+ − ∫ fdμ− . A

A

A

If both integrals ∫A fdμ+ , ∫A fdμ− are finite, then we say that f is Lebesgue integrable with respect to μ over the set A ∈ Σ. The Jordan decomposition established in Theorem 2.4.14 is minimal in the following sense. Proposition 2.4.17. If (X, Σ) is a measurable space, μ : Σ → ℝ∗ is a signed measure and μ = ξ1 − ξ2 with ξ1 , ξ2 : Σ → [0, +∞] measures, then ξ1 ≥ μ+ and ξ2 ≥ μ− . Proof. We have μ ≤ ξ1 . Hence, for all A ∈ Σ, μ+ (A) = μ(A ∩ P) ≤ ξ1 (A ∩ P) ≤ ξ1 (A) . Therefore μ+ ≤ ξ1 . Similarly we show that μ− ≤ ξ2 . We extend the notions introduced in Definition 2.4.10 to signed measures. Definition 2.4.18. Let (X, Σ) be a measurable space and μ, ν : Σ → ℝ∗ be two signed measures. (a) We say that μ and ν are mutually singular denoted by μ⊥ν if |μ|⊥|ν|; see Definition 2.4.10(a). (b) We say that ν is absolutely continuous with respect to μ denoted by ν ≪ μ if |ν| ≪ |μ|; see Definition 2.4.10(b). Remark 2.4.19. If μ is a signed measure, then μ+ ⊥μ− . The notion of mutual singularity is the antithesis of the notion of absolutely continuity.

132 | 2 Measure Theory Proposition 2.4.20. If (X, Σ) is a measurable space and μ, ν : Σ → ℝ∗ are signed measures, then μ⊥ν and ν ≪ μ imply ν = 0. Proof. Since by hypothesis μ⊥ν, there exist A, B ∈ Σ with A ∩ B = 0, X = A ∪ B, and |μ|(A) = |ν|(B) = 0; see Definition 2.4.18(a). By hypothesis we also have that ν ≪ μ and so |ν|(A) = 0; see Definition 2.4.18(b). For every C ∈ Σ, it holds that |ν|(C) = |ν|(C ∩ A) + |ν|(C ∩ B) ≥ |ν(C ∩ A)| + |ν(C ∩ B)| ≥ |ν(C ∩ A) + ν(C ∩ B)| = |ν(C)| , by the additivity of ν. Hence, |ν(C)| = 0 for all C ∈ Σ and so ν ≡ 0. Proposition 2.4.21. If (X, Σ) is a measurable space and μ, ν : Σ → ℝ∗ are signed measures, then ν ≪ μ if and only if ν+ ≪ μ and ν− ≪ μ. Proof. ⇒: Suppose that A ∈ Σ satisfies |μ|(A) = 0. Then for B ∈ Σ, B ⊆ A it follows |μ|(B) = 0 and so |ν(B)| ≤ |ν|(B) = 0. From Remark 2.4.16 we have ν+ (A) = sup[ν(B) : B ∈ Σ, B ⊆ A] = 0 . Hence ν+ ≪ μ. Similarly we show that ν− ≪ μ. ⇐: Suppose that A ∈ Σ satisfies |μ|(A) = 0. By hypothesis one gets ν+ (A) = ν− (A) = 0. Recall that |ν| = ν+ + ν− ; see Definition 2.4.15. Therefore |ν|(A) = 0 and we have proved that ν ≪ μ. Remark 2.4.22. Evidently ν ≪ μ if and only if A ∈ Σ with |ν|(A) = 0 imply ν(A) = 0. In a similar fashion we also show the following facts about singular and absolutely continuous signed measures. Proposition 2.4.23. If (X, Σ) is a measurable space and μ, ν, ξ : Σ → ℝ∗ are signed measures, then the following hold: (a) μ ≪ ξ and ν ≪ ξ imply |μ| + |ν| ≪ ξ ; (b) μ⊥ξ and ν⊥ξ imply |μ| + |ν|⊥ξ ; (c) μ ≪ ξ and ν ≪ μ imply ν ≪ ξ ; (d) μ⊥ξ and ν ≪ μ imply ν⊥ξ . Definition 2.4.24. Let (X, Σ) be a measurable space and μ : Σ → ℝ∗ is a signed measure. (a) We say that μ is finite if μ(A) ∈ ℝ for every A ∈ Σ. (b) We say that μ is σ-finite if there exists a sequence {A n }n≥1 ⊆ Σ such that X = ⋃n≥1 A n and μ(A n ) ∈ ℝ for all n ∈ ℕ. Remark 2.4.25. A signed measure μ is finite if and only if |μ(X)| < +∞. Moreover, we can assume in Definition 2.4.24(b) that the A n ’s are mutually disjoint. Proposition 2.4.26. If (X, Σ) is a measurable space, ν : Σ → ℝ∗ is a finite signed measure and μ : Σ → [0, +∞] is a measure, then ν ≪ μ if and only if for every ε > 0 there exists δ > 0 such that A ∈ Σ, μ(A) ≤ δ imply |ν(A)| ≤ ε.

2.4 Signed Measures and Radon–Nikodym Theorem | 133

Proof. According to Definition 2.4.18(b), ν ≪ μ if and only if |ν| ≪ μ and recall that |ν(A)| ≤ |ν|(A) for all A ∈ Σ. Then the conclusion follows from Proposition 2.4.11. Corollary 2.4.27. If (X, Σ, μ) is a measure space and f ∈ L1 (X), then for a given ε > 0 there exists δ = δ(ε) > 0 such that A ∈ Σ with μ(A) ≤ δ imply ∫A fdμ ≤ ε. The technical result, which we prove next, will be used in the proof of the main structural result concerning signed measures, the so-called “Radon–Nikodym Theorem.” Lemma 2.4.28. If (X, Σ) is a measurable space, μ, ν are measures on Σ with μ being σ-finite, ν ≢ 0 and ν ≪ μ, then there exist ε > 0 and B ∈ Σ with 0 < μ(B) < +∞ such that εμ(C) ≤ ν(C) for all C ∈ Σ, C ⊆ B, that is, B is a positive set for μ − εν. Proof. Let {A n }n≥1 ⊆ Σ be disjoint sets such that X = ⋃n≥1 A n and μ(A n ) < +∞ for all n ∈ ℕ. Since ν ≢ 0 we can find m ∈ ℕ such that ν(A m ) > 0. We choose ε > 0 small such that ν(A m ) − εμ(A m ) = (ν − εμ)(A m ) > 0 . From Problem 2.53 we know that there exists B ∈ Σ, B ⊆ A m such that (ν − εμ)(B) > 0

and

B is a positive set for ν − εμ .

(2.4.7)

Evidently (ν − εμ)(B) < +∞. Moreover, if μ(B) = 0, then from (2.4.7) we have ν(B) > 0, which contradicts the hypothesis that ν ≪ μ. Therefore μ(B) > 0. In addition, (2.4.7) implies that εμ(C) ≤ ν(C) for all C ∈ Σ, C ⊆ B. We saw in Example 2.4.5 that for a given measure space (X, Σ, μ) and f ∈ L1 (X), the set ν function Σ ∋ A → ∫A fdμ is a signed measure. It is natural to ask whether the converse is true as well. Namely, if ν ≪ μ, then can we find f ∈ L1 (X, μ) such that dν = fdμ? The answer to this fundamental question is given by the so-called “Radon–Nikodym Theorem.” Theorem 2.4.29 (Radon–Nikodym Theorem). If (X, Σ) is a measurable space, μ : Σ → [0, +∞] is a σ-finite measure, ν : Σ → ℝ is a σ-finite signed measure and ν ≪ μ, then there exists a unique up to equality μ-a.e. Σ-measurable function f : X → ℝ∗ such that ν(A) = ∫A fdμ for all A ∈ Σ. Proof. We know that ν+ , ν− are finite measures on Σ and from Proposition 2.4.21, we know that ν+ ≪ μ and ν− ≪ μ. Moreover, one has ν = ν+ − ν− . Therefore without any loss of generality we may assume that ν is a σ-finite measure. It holds that Σ ⊆ Σ μ ⊆ Σ ν . First assume that ν is finite. We introduce the set { } L = {h ∈ L1 (X) : h ≥ 0 μ-a.e. and ∫ hdμ ≤ ν(A) for all A ∈ Σ μ } . A { } We have 0 ∈ L and so L ≠ 0. Let h1 , h2 ∈ L and A ∈ Σ μ and let B = {x ∈ A : h1 (x) ≥ h2 (x)} ,

C = A \ B = {x ∈ A : h2 (x) > h1 (x)} .

(2.4.8)

134 | 2 Measure Theory Evidently B, C ∈ Σ μ , A = B ∪ C and B ∩ C = 0. Hence ∫ max{h1 , h2 }dμ = ∫ max{h1 , h2 }dμ + ∫ max{h1 , h2 }dμ A

B

C

= ∫ h1 dμ + ∫ h2 dμ ≤ ν(B) + ν(C) = ν(A) . B

C

Thus, max{h1 , h2 } ∈ L. We define η = sup [∫ hdμ : h ∈ L] ≤ ν(X) < +∞ ; [X ] see (2.4.8). Let {h n }n≥1 ⊆ L be such that limn→∞ ∫X h n dμ = η. We set g n = max{h k }nk=1 . Then from the previous part of the proof we have {g n }n≥1 ⊆ L is increasing and ∫X g n dμ ↗ η. From the Monotone Convergence Theorem (see Theorem 2.3.3) we know that there exists g ∈ L1 (X, μ) such that g n ↗ g and ∫X gdμ = η. We have 0 ≤ g n χ A ↗ gχ A

and

∫ g n χ A dμ = ∫ g n dμ ≤ ν(A) for all n ∈ ℕ , X

A

which implies ∫A gdμ ≤ ν(A) for all A ∈ Σ μ and so g ∈ L. Finally we show that ν(A) = ∫A gdμ for all A ∈ Σ μ . Let ξ(A) = ν(A) − ∫ gdμ

for all A ∈ Σ μ .

(2.4.9)

A

Then ξ is a measure on Σ μ and ξ ≪ μ. Suppose that ξ ≢ 0. Then Lemma 2.4.28 implies that there exist ε > 0 and B ∈ Σ μ such that 0 < μ(B) < ∞

and

εμ(C) ≤ ξ(C)

for all C ∈ Σ μ , C ⊆ B .

(2.4.10)

Let h = g + εχ B . Then h ≥ 0 μ-a.e. and h ∈ L1 (X, μ). We have η = ∫X gdμ < ∫X hdμ, which gives h ∉ L .

(2.4.11)

On the other hand, for every A ∈ Σ μ , we derive, combining (2.4.8), (2.4.9), (2.4.10), ∫ hdμ = ∫[g + εχ B ]dμ = ∫ gdμ + εμ(B ∩ A) ≤ ∫ gdμ + ξ(B ∩ A) A

A

A

A

≤ ∫ gdμ + ν(B ∩ A) − ∫ gdμ = ∫ gdμ + ν(B ∩ A) A

B∩A

A\B

≤ ν(A \ B) + ν(B ∩ A) = ν(A) . This yields h∈L.

(2.4.12)

2.4 Signed Measures and Radon–Nikodym Theorem | 135

Comparing (2.4.11) and (2.4.12), we reach a contradiction. Therefore ν(A) = ∫ gdμ

for all A ∈ Σ .

A

Proposition 2.2.40(c) implies that g ∈ L1 (X, μ) is unique. Now suppose that ν is σ-finite. Then we find {A n }n≥1 ⊆ Σ of disjoint sets such that X = ⋃n≥1 A n with ν(A n ) < +∞ for all n ∈ ℕ. Let ν n = νA n for every n ∈ ℕ, that is, ν n (B) = ν(B ∩ A n ) for all n ∈ ℕ. Evidently, ν n is a finite measure on Σ and ν n ≪ μ. So, from the first part of the proof there exists a unique g n ∈ L1 (X, μ) such that ν n (B) = ∫B g n dμ for all B ∈ Σ. Recall that the A n ’s are disjoint. We define g = ∑n≥1 g n χ A n and we have that g : X → ℝ is Σ-measurable as well as ν(B) = ∑ ν(B ∩ A n ) = ∑ ∫ g n χ A n dμ = ∫ gdμ , n≥1

n≥1

B

B

see Theorem 2.3.5. Definition 2.4.30. The unique (up to equality μ-a.e.) function g : X → ℝ∗ postulated by Theorem 2.4.29 is called the Radon–Nikodym derivative of ν with respect to μ and is denoted by dν/dμ = g or by dν = gdμ. If ν is finite, then g ∈ L1 (X, μ) and if ν is a measure then g ≥ 0 μ-a.e. Theorem 2.4.29 leads to an interesting decomposition of ν. This result is known as the “Lebesgue Decomposition Theorem.” Theorem 2.4.31 (Lebesgue Decomposition Theorem). If (X, Σ) is a measurable space, μ : Σ → [0, +∞] a σ-finite measure and ν : Σ → ℝ∗ is a σ-finite signed measure, then ν = ν a + ν s with ν a ≪ μ, ν s ⊥μ and this decomposition is unique. Proof. Let ξ = μ + ν. Then ξ is a σ-finite measure on Σ and μ ≪ ξ, ν ≪ ξ . Applying Theorem 2.4.29, we can find Σ-measurable functions g, h : X → [0, +∞] such that μ(A) = ∫ gdξ A

and

ν(A) = ∫ hdξ

for all A ∈ Σ .

(2.4.13)

A

Let B = {x ∈ X : g(x) > 0} and C = {x ∈ X : g(x) = 0}. Then B, C ∈ Σ, B ∩ C = 0, X = B ∪ C ̂ ̂ and μ(C) = 0; see (2.4.13). Let ν̂ = νC , that is, ν(E) = ν(E ∩ C) for all E ∈ Σ. Then ν(B) =0 ̂ ̃ and so it follows that ν⊥μ. Let ν̃ = νB , that is, ν(E) = ν(E ∩ B) for all E ∈ Σ. We obtain ̃ v(E) = ν(E ∩ B) = ∫E∩B hdξ ; see (2.4.13) and ν = ν̃ + ν.̂ We need to show that ν̃ ≪ μ. To this end, let E ∈ Σ be such that μ(E) = 0. Then 0 = μ(E) = ∫E gdξ (see (2.4.13)) and so, since g ≥ 0 ξ -a.e., g(x) = 0 for ξ -a.a. x ∈ E. As gE∩B > 0, we must have ξ(E ∩ B) = 0, hence ν(E ∩ B) = 0 since ν ≪ ξ . Therefore ̃ ν(E) = ν(E ∩ B) and this shows that ν̃ ≪ μ. Finally we show the uniqueness of this decomposition. So, suppose that (ν a , ν s ) and (νa , νs ) are two such decompositions. Then ν a − νa = νs − ν s .

(2.4.14)

136 | 2 Measure Theory From Proposition 2.4.23 we have ν a − νa ≪ μ

and (νs − ν s )⊥μ .

(2.4.15)

From (2.4.14), (2.4.15) and Proposition 2.4.20, we conclude that ν a = νa and ν s = νs . Hence, the decomposition is unique. Definition 2.4.32. The decomposition ν = ν a + ν s provided by the previous theorem with ν a ≪ μ as well as ν s ⊥μ is called the Lebesgue decomposition of ν with respect to μ. We conclude this section with two useful results concerning setwise limits of sequences of finite measures. The first result is known as the “Vitali–Hahn–Saks Theorem.” Theorem 2.4.33 (Vitali–Hahn–Saks Theorem). If (X, Σ) is a measurable space, {ν n }n≥1 are finite signed measures, μ is a finite measure, ν n ≪ μ for all n ∈ ℕ and for all A ∈ Σ, the limit ν(A) = limn→∞ ν n (A) exists, then ν : Σ → ℝ is a signed measure such that ν ≪ μ. Proof. On account of the Jordan Decomposition Theorem (see Theorem 2.4.14) we may assume that the ν n ’s are measures. First we show that {ν n }n≥1 is in fact uniformly absolutely continuous with respect to μ, that is, for given ε > 0 there exists δ = δ(ε) > 0 such that μ(A) ≤ δ implies ν n (A) ≤ ε for all n ∈ ℕ; see Proposition 2.4.11. Let Σ(μ) and d μ be as in Definition 2.3.23. We claim that (Σ(μ), d μ ) is a complete metric space. Indeed, let S = {χ A : A ∈ Σ μ } ⊆ L1 (X, μ). Let {χ A n }n≥1 ⊆ S and assume that χ A n → f in L1 (X, μ). Then according to Corollary 2.3.20, there exists a subsequence {χ A nk }k≥1 of {χ A n }n≥1 such that χ A nk (x) → f(x) for μ-a.a. x ∈ X. Therefore, range(f) = {0, 1} and since f is measurable, there exists A ∈ Σ μ such that f = χ A . This implies that S is a closed subset of L1 (X, μ), hence a complete metric space in its own right. But S is isometrically isomorphic to (Σ(μ), d μ ). Therefore the latter is a complete metric space. Note that for every n ∈ ℕ |ν n (A) − ν n (B)| ≤ ν n (A △ B)

for all A, B ∈ Σ and ν n ≪ μ .

So, the map ν n : Σ → [0, +∞) with n ∈ ℕ is well-defined and continuous. We introduce the sets D k = {A ∈ Σ : |ν n (A) − ν m (A)| ≤ ε for all n, m ≥ k} , k ∈ ℕ . These sets are closed and Σ = ⋃k∈ℕ D k . So, according to Theorem 1.5.68(b), we can find k ∈ ℕ such that int D k ≠ 0. This means that there exists Ã ∈ D k and δ1 > 0 such that A ∈ Σ and μ(A △ A)̃ ≤ δ1 imply A ∈ D k . By hypothesis, ν i ≪ μ for all i ∈ {1, . . . , k}. So using Proposition 2.4.11 there is a δ ∈ (0, δ1 ] such that A ∈ Σ with μ(A) ≤ δ imply ν i (A) ≤ ε for all i ∈ {1, . . . , k}.

2.5 Regular and Radon Measures | 137

If A ∈ Σ and μ(A) ≤ δ, then μ((A ∪ A)̃ △ A)̃ ≤ μ(A) ≤ δ ≤ δ1 and so |ν n (A) − ν k (A)| = (ν n − ν k )(A ∪ A)̃ − (ν n − ν k )(Ã \ A) ≤ (ν n − ν k )(A ∪ A)̃ + (ν n − ν k )(Ã \ A) ≤ 2ε for all n ≥ k. Therefore it follows that A ∈ Σ with μ(A) ≤ δ imply ν n (A) ≤ 2ε + ν k (A) ≤ 3ε for all n ∈ ℕ, which is the uniform absolute continuity of {ν n }n≥1 with respect to μ. Now let {A n }n≥1 ⊆ Σ be mutually disjoint sets and ε > 0. We set A = ⋃n≥1 A n ∈ Σ. Let δ > 0 be as postulated by the uniform absolute continuity with respect to μ established in the first part of the proof. We choose k ∈ ℕ such that μ(A \ ⋃ki=1 A i ) ≤ δ; see Proposition 2.1.24(e). This implies m m ν n (A) − ∑ ν n (A i ) = ν n (A \ ⋃ A i ) ≤ ε for all n, m ≥ k . i=1 i=1 Hence

m ν(A) − ∑ ν(A i ) ≤ ε for all m ≥ k . i=1 Since ε > 0 is arbitrary, it follows that ν(A) = ∑i∈ℕ ν(A i ) and so ν is a measure. Moreover, from the first part of the proof and Proposition 2.4.11 we have ν ≪ μ. The next theorem, known as “Nikodym’s Theorem”, is an easy consequence of the theorem above. Theorem 2.4.34 (Nikodym’s Theorem). If (X, Σ) is a measurable space and let {ν n }n≥1 be a sequence of nonzero finite measures defined on Σ such that the limit limn→∞ ν n (A) exists for all A ∈ Σ, then ν(A) = limn→∞ ν n (A) with A ∈ Σ is a finite measure. Proof. Consider the set function μ : Σ → [0, +∞) defined by 1 ν n (A) n ν (X) 2 n n∈ℕ

μ(A) = ∑

for all A ∈ Σ .

Evidently μ is a finite measure on Σ and ν n ≪ μ for all n ∈ ℕ. So, invoking Theorem 2.4.33, we conclude that ν is a finite measure on Σ.

2.5 Regular and Radon Measures In this section we investigate the connections between measure theory and topology. When we combine the measure theoretic and topological structures, we obtain stronger and more interesting results. Throughout this section (X, τ) is a Hausdorff topological space. Additional conditions on X will be introduced as needed. By Cc (X) we denote the space of all continuous

138 | 2 Measure Theory functions f : X → ℝ with compact support. Recall that the support of f , denoted by supp f , is defined to be the closure of the set {x ∈ X : f(x) ≠ 0}. Definition 2.5.1. The Baire σ-algebra of X, denoted by Ba(X), is defined to be the smallest σ-algebra on X, which makes all functions in Cc (X) measurable. So, Ba(X) has as generators the sets {x ∈ X : f(x) ≥ η} with f ∈ Cc (X) and η ∈ ℝ. These sets are known as Baire sets. This new σ-algebra is most useful within the framework of locally compact spaces. Lemma 2.5.2. If X is locally compact, K ⊆ X is compact and W ⊆ X is open such that K ⊆ W, then we can find U ∈ τ ∩ Ba(X) and a compact G δ -set C such that K ⊆ U ⊆ C ⊆ W. Proof. Proposition 1.4.66(c) says that there exists D ∈ τ being relatively compact such that K ⊆ D ⊆ D ⊆ W. Then Proposition 1.4.68 implies that there is f ∈ Cc (X) such that f K = 1 and f D c = 0. Let C = {x ∈ X : f(x) ≥ 1/2}. Then C ⊆ X is compact, G δ , U = {x ∈ X : f(x) > 1/2} ∈ τ and we have K ⊆ U ⊆ C ⊆ W. Corollary 2.5.3. If X is locally compact, then τ ∩ Ba(X) is a basis for τ. Proof. Let x ∈ X and U ∈ N(x). Then Lemma 2.5.2 implies that there exists f ∈ Cc (X) such that f(x) = 1 and f U c = 0. Consider the set V = {x ∈ X : f(x) > 1/2}. Then V ∈ τ ∩ Ba(X) and V ⊆ U. Now we can give an alternative characterization of Ba(X) when X is locally compact. Theorem 2.5.4. If X is locally compact, then Ba(X) = σ({C ⊆ X : C is compact and a G δ -set}) . Proof. Let L = σ({C ⊆ X : C is compact and a G δ -set}). For every f ∈ Cc (X) and η > 0, the set {x ∈ X : f(x) ≥ η} is compact and G δ . Note that {f ≥ η} = ⋂n≥1 {f > η − 1/n}. Therefore {x ∈ X : f(x) ≥ η} ∈ L for all f ∈ Cc (X) and for all η > 0. For η < 0, we have 0 < −η + η/(2n) < −η and c

{f ≥ η} = {f < η}c = {−f > −η}c = ( ⋂ {−f ≥ −η + n≥1

η }) ∈ L . 2n

Moreover, note that {f ≥ 0} = ⋂n≥1 {f ≥ −1/n} ∈ L. So, every set {x ∈ X : f(x) ≥ η} for f ∈ Cc (X) and η ∈ ℝ, belongs to L and we have Ba(X) ⊆ L ;

(2.5.1)

see Definition 2.5.1. Now suppose that K = ⋂n≥1 W n with W n ∈ τ being compact. Lemma 2.5.2 implies that we can find U n ∈ τ ∩ Ba(X) such that K ⊆ U n ⊆ W n for all n ∈ ℕ. Then K = ⋂n≥1 U n ∈ Ba(X), which gives L ⊆ Ba(X) . From (2.5.1) and (2.5.2) we conclude that L = Ba(X).

(2.5.2)

2.5 Regular and Radon Measures | 139

Next we compare the Baire and Borel σ-algebras. Theorem 2.5.5. (a) Ba(X) ⊆ B(X) (b) If X is locally compact, separable and metrizable, then Ba(X) = B(X). Proof. (a) Just recall that every continuous function f : X → ℝ is Borel measurable. (b) From Proposition 1.4.78 (see also Proposition 1.5.40), we know that X is σ-compact. Therefore, every closed subset of X is likewise σ-compact. It follows that it suffices to show that every compact set belongs to Ba(X). But Proposition 1.5.8 says that every compact set in X is G δ . So, according to Theorem 2.5.4, it belongs to Ba(X) and we conclude that Ba(X) = B(X). Using Proposition 1.4.66(d) we have at once the following result. ̂ ⊆ Proposition 2.5.6. If X is locally compact and B̂ is a basis for τ, then Ba(X) ⊆ σ(B) B(X). The next theorem is the Baire counterpart of Proposition 2.2.26(b). Theorem 2.5.7. If X and Y are second countable, locally compact spaces, then Ba(X×Y) = Ba(X) ⨂ Ba(Y). Proof. Note that X × Y is locally compact. We define M(A) = {B ⊆ Y : A × B ∈ Ba(X × Y)} . It is routine to check that M(A) is a σ-ring for any A. Suppose that C ⊆ X is compact and a G δ -set. Then if E ⊆ Y is compact and G δ , then so is C × E ⊆ X × Y and we infer that M(C) contains every compact G δ -set in Y. Moreover, we have Y ∈ M(C); see Proposition 1.4.78 and Theorem 1.2.27. It follows that M(C) is a σ-algebra containing Ba(Y). Let L = {A ⊆ X : Ba(Y) ⊆ M(A)}. This family is closed under countable intersections and under complementation and we have seen above it contains every compact G δ . Therefore Ba(X) ⨂ Ba(Y) ⊆ Ba(X × Y) .

(2.5.3)

On the other hand, from Corollary 2.5.3, we know that the family B = {U × V : U ⊆ X Baire open, V ⊆ Y Baire open} is a basis for X × Y. Since U × V ∈ Ba(X) ⨂ Ba(Y) it follows that σ(B) ⊆ Ba(X) ⨂ Ba(Y). Then Proposition 2.5.6 gives Ba(X × Y) ⊆ Ba(X) ⨂ Ba(Y) . From (2.5.3) and (2.5.4), we conclude that Ba(X × Y) = Ba(X) ⨂ Ba(Y).

(2.5.4)

140 | 2 Measure Theory Definition 2.5.8. (a) A (signed) Borel measure is a (signed) measure defined on B(X). (b) We say that a Borel measure μ is regular if for every A ∈ B(X) μ(A) = inf [μ(U) : U ⊆ X is open, A ⊆ U] = sup [μ(C) : C ⊆ X is closed, C ⊆ A]

(outer regularity) (inner regularity) .

(c) We say that a Borel measure μ is compact regular if for every A ∈ B(X) μ(A) = sup [μ(K) : K ⊆ X is compact, K ⊆ A] . (d) We say that a Borel measure is a Radon measure if the following hold: – μ(K) < +∞ for every compact K ⊆ X; – μ(A) = inf[μ(U) : U ⊆ X is open, A ⊆ U] for all A ∈ B(X); – μ(A) = sup[μ(K) : K ⊆ X is compact, K ⊆ A] for all A ∈ B(X). For a signed Borel measure μ we say that μ is regular (resp. compact regular, Radon) if |μ| is such a measure or equivalently if μ+ and μ− have the corresponding properties. Remark 2.5.9. Evidently two regular Borel measures are equal if and only if they coincide on the open or closed subsets. Similarly two compact regular measures are equal if and only if they coincide on the compact sets. Proposition 2.5.10. For finite Borel measures μ, outer and inner regularity are equivalent properties. Proof. Suppose that for all A ∈ B(X) μ(A) = inf[μ(U) : U ⊆ X is open, A ⊆ U] .

(2.5.5)

Taking Proposition 2.1.24(b) and (2.5.5) into account yields μ(X) − μ(A) = μ(A c ) = inf[μ(U) : U ⊆ X is open, A c ⊆ U] = μ(X) − sup[μ(C) : C ⊆ X is closed, C ⊆ A] . Therefore, μ(A) = sup[μ(C) : C ⊆ X is closed, C ⊆ A]. Hence, outer regularity implies inner regularity. In a similar way we show that the opposite implication holds as well. So, the two notions are equivalent. Theorem 2.5.11. If μ : B(X) → [0, +∞) is a finite, compact regular Borel measure, then μ is a Radon measure. Proof. Since every compact subset of X is closed, for every A ∈ B(X) we derive μ(A) = sup[μ(K) : K ⊆ X is compact, K ⊆ A] ≤ sup[μ(C) : C ⊆ X is closed, C ⊆ A] ≤ μ(A) .

2.5 Regular and Radon Measures |

141

Hence, μ(A) = sup[μ(C) : C ⊆ X is closed, C ⊆ A] .

(2.5.6)

From (2.5.6) and Proposition 2.5.10, we conclude that μ is a Radon measure. Theorem 2.5.12. If X is metrizable and μ : B(X) → [0, +∞) is a finite Borel measure, then μ is regular. Proof. Let M = {A ∈ B(X) : A is both outer and inner regular}; see Definition 2.5.8(a). We are going to show that M is a σ-algebra containing all the open sets. Therefore M = B(X). Fact 1: A ∈ M implies A c ∈ M This is immediate from the definition of M. Recall that μ is finite and that μ(X) − μ(A) = μ(A c ); see Proposition 2.1.24(b). Fact 2: {A n }n≥1 ⊆ M implies A = ⋃n≥1 A n ∈ M For every n ∈ ℕ there exist an open U n ⊆ X and a closed C n ⊆ X such that ε (2.5.7) C n ⊆ A n ⊆ U n and μ(U n ) ≤ μ(C n ) + n . 2 Let U = ⋃n≥1 U n . Then U ⊆ X is open and A ⊆ U. We know that U \ A ⊆ ⋃n≥1 (U n \ A n ). Then, due to (2.5.7), this gives 0 ≤ μ(U) − μ(A) = μ(U \ A) ≤ ∑ μ(U n \ A n ) n≥1

ε =ε. 2n n≥1

= ∑ (μ(U n ) − μ(A n )) ≤ ∑ n≥1

Hence, μ(A) = inf[μ(U) : U ⊆ X is open, A ⊆ U]

(outer regularity of A) .

Let C = ⋃n≥1 C n . Arguing as above, we show that μ(A) ≤ μ(C) + ε .

(2.5.8)

̃ ̃ For every m ∈ ℕ, let C̃ m = ⋃m n=1 C n . Evidently C m is closed and C m ↗ C. Invoking ̃ Proposition 2.1.24(e), there exists m ∈ ℕ such that μ(C) ≤ μ(C m ) + ε which gives, thanks to (2.5.8), that μ(A) ≤ μ(C̃ m ) + 2ε. This finally yields μ(A) = sup[μ(C) : C ⊆ X is closed, C ⊆ A]

(inner regularity of A) .

Hence, A ∈ M. Fact 3: M contains all open sets Let U ⊆ X be open. Proposition 1.5.8 says that U is a F σ -set. So, we can find closed subsets {C n }n≥1 of X such that C n ↗ X. Then μ(C n ) ↗ μ(X); see Proposition 2.1.24(e). Hence μ(U) = sup[μ(C) : C ⊆ X is closed, C ⊆ U] , which gives U ∈ M since U is open. Combining Facts 1–3 imply that M = B(X).

142 | 2 Measure Theory Proposition 2.5.13. If X is metrizable and μ : B(X) → [0, +∞) is a finite Borel measure, then μ is compact regular if and only if for every ε > 0 there exists a compact K ε ⊆ X such that μ(X) − ε ≤ μ(K ε ). Proof. ⇒: This is immediate from Definition 2.5.8(c). ⇐: From Theorem 2.5.12 we know that μ is regular. So, it suffices to show that for every closed C ⊆ X, we have μ(C) = sup[μ(K) : K ⊆ X is compact, K ⊆ C] .

(2.5.9)

Arguing by contradiction suppose that there exists a closed C ⊆ X such that (2.5.9) is not true. So we can find ε > 0 such that ε sup[μ(K) : K ⊆ X is compact, K ⊆ C] ≤ μ(C) − . (2.5.10) 2 For K ⊆ X compact we have that K ∩ C ⊆ C is compact and, because of (2.5.10), μ(K) = μ(K ∩ C) + μ(K ∩ C c ) ≤ μ(C) −

ε ε + μ(C c ) = μ(X) − . 2 2

Since K ⊆ X is arbitrary, we get a contradiction to our hypothesis. On Polish spaces all finite Borel measures are Radon measures. Theorem 2.5.14. If X is a Polish space and μ : B(X) → [0, +∞) is a finite Borel measure, then μ is a Radon measure. Proof. On account of Theorem 2.5.11 we only need to show that μ is compact regular. Suppose that D = {x k }k≥1 ⊆ X is dense. We consider the closed balls B n (x k ) = {x ∈ X : d(x, x k ) ≤ 1/n} with n, k ∈ ℕ. Obviously X = ⋃k≥1 B n (x k ) for every n ∈ ℕ. Given ε > 0, for every n ∈ ℕ, we can find m n ∈ ℕ such that mn

μ (X \ ⋃ B n (x k )) ≤ k=1

ε . 2n

(2.5.11)

m

n Let K = ⋂n≥1 ⋃k=1 B n (x k ). The set K is closed and totally bounded, hence K is compact; see Theorem 1.5.36. Taking (2.5.11) into account it follows

mn

μ(X) − μ(K) = μ(X \ K) = μ [ ⋃ (X \ ⋃ B n (x k ))] n≥1 mn

k=1

ε =ε. n 2 n≥1

≤ ∑ μ (X \ ⋃ B n (x k )) ≤ ∑ n≥1

k=1

Hence, μ is compact regular (see Proposition 2.5.13), and so, μ is a Radon measure. In the next proposition we produce another useful dense subset of L p (X) for 1 ≤ p < ∞. Proposition 2.5.15. If X is locally compact and μ : B(X) → [0, +∞] is a Radon measure, then Cc (X) is dense in L p (X) for 1 ≤ p < ∞ where Cc (X) is the space of all continuous functions f : X → ℝ that have a compact support.

2.5 Regular and Radon Measures |

143

Proof. From Proposition 2.3.22, we know that simple functions are dense in L p (X). So, it suffices to show that for every A ∈ B(X) with μ(A) < +∞ we can approximate χ A in the L p -norm by Cc (X)-functions. Given ε > 0 there exist an open set U ⊆ X and a compact set K ⊆ X such that K⊆A⊆U

and

μ(U \ K) ≤ ε p .

(2.5.12)

Since X is locally compact, combining Urysohn’s Lemma (see Theorem 1.2.17) and Proposition 1.4.66(c), we can find f ∈ Cc (X) such that χ K ≤ f ≤ χ U . Then, using (2.5.12), ‖χ A − f‖p ≤ μ(U \ K)1/p ≤ ε, which demonstrates that Cc (X) is dense in L p (X) for 1 ≤ p < ∞. Remark 2.5.16. Since L∞ (X) contains noncontinuous functions, the density result above fails for p = +∞. The next theorem is another remarkable result in the spirit of Egorov’s Theorem; see Theorem 2.2.32. It asserts that a Borel measurable map between certain metric spaces is “almost” continuous. The result is known as “Lusin’s Theorem.” Theorem 2.5.17 (Lusin’s Theorem). If X is a Polish space, Y is a separable metric space, f : X → Y is Borel measurable, and μ : B(X) → [0, +∞) is a finite Borel measure, then given any ε > 0, there exists K ε ⊆ X being compact such that μ(X \ K ε ) ≤ ε and f K ε is continuous. Proof. We know that Y is second countable; see Proposition 1.5.5. So, let {V n }n≥1 be a countable basis for the metric topology of Y. We have f −1 (V n ) ∈ B(X) for all n ∈ ℕ and so using Theorem 2.5.12 there exists an open set U n ⊆ X such that f −1 (V n ) ⊆ U n

and

μ (U n \ f −1 (V n )) ≤

ε 2n+1

for all n ∈ ℕ .

(2.5.13)

The set f −1 (V n ) is relatively open in (X \ U n ) ∪ f −1 (V n ). Note that f −1 (V n ) = [(X \ U n ) ∪ f −1 (V n )] ∩ U n , see (2.5.13). Let A ε = X \ ⋃ (U n \ f −1 (V n )) = ⋂ ((X \ U n ) ∪ f −1 (V n )) . n≥1

n≥1

Thanks to (2.5.13), one gets μ(X \ A ε ) ≤

ε . 2

(2.5.14)

Using Theorem 2.5.14 there exists K ε ⊆ A ε being compact such that μ(A ε \ K ε ) ≤ ε/2, which gives μ(X \ K ε ) ≤ ε; see (2.5.14). For every n ∈ ℕ, f −1 (V n ) is relatively open in K ε . Since {V n }n≥1 is a basis for the metric topology of Y, it follows that for all open V ⊆ Y, f −1 (V) is relatively open in K ε . Hence f K ε is continuous. In addition there is also a second version of Lusin’s Theorem.

144 | 2 Measure Theory

Theorem 2.5.18 (Lusin’s Theorem, Second Version). If X is locally compact, μ is a Radon measure and f : X → ℝ is a Borel measurable function that vanishes outside a set of finite μ-measure, then for given ε > 0, there exist A ∈ B(X) and h ∈ Cc (X) such that μ(A) ≤ ε and f X\A = hX\A . Moreover if f is bounded, then it holds that ‖h‖∞ ≤ ‖f‖∞ . Proof. First assume that f is bounded. Let A = {x ∈ X : f(x) ≠ 0} ∈ B(X). By hypothesis, μ(A) < +∞. So, we can use Proposition 2.5.15 and find {h n }n≥1 ⊆ Cc (X) such that h n → f in L1 (X). So, by passing to a suitable subsequence, if necessary we may assume that h n (x) → f(x) for μ-a.a. x ∈ X; see Corollary 2.3.20. Invoking Egorov’s Theorem (see Theorem 2.2.32), there exists B ⊆ A such that μ(A \ B) ≤

ε 3

and

μ

h n → f on B .

(2.5.15)

Exploiting the fact that μ is a Radon measure, we find a compact set K ⊆ B and an open set U ⊇ B such that μ(B \ K) ≤

ε 3

and

μ(U \ A) ≤

ε . 3

(2.5.16)

μ Since h n → f on K, it follows that f K is continuous. Invoking the locally compact version of the Tietze Extension Theorem (see Theorem 1.4.88), there exists ĥ ∈ Cc (X) ̂ such that ĥ K = f K and supp ĥ ⊆ U. Hence, D = {x ∈ X : h(x) ≠ f(x)} ⊆ U \ K, which demonstrates, due to (2.5.15) and (2.5.16), that μ(D) ≤ μ(U \ K) ≤ ε. Now let ξ : ℝ → ℝ be defined by

{t ξ(t) = { ‖f‖ sgn t { ∞

if |t| ≤ ‖f‖∞ , if |t| > ‖f‖∞ .

Evidently ξ(0) = 0, and so ξ is continuous. So, if we define h = ξ ∘ f ̂, then h ∈ Cc (X), h = f on the set {ĥ = f} and ‖h‖∞ ≤ ‖f‖∞ . Finally we consider the general case in which f is unbounded. In this case we define A n = {x ∈ X : 0 < |f(x)| ≤ n} ∈ B(X). Then A n ↗ A and for large enough n ≥ 1, we have that μ(A \ A n ) ≤ ε/2. Then from the first part of the proof there exists h ∈ Cc (X) such that h = fχ A n outside a set D ∈ B(X) with μ(D) ≤ ε/2. Then finally we have h = f outside a set D0 ∈ B(X) with μ(D0 ) ≤ ε. There is a parametric variant of Lusin’s Theorem concerning Carathéodory functions; see Definition 2.2.30. The result is known as “Scorza–Dragoni Theorem.” Theorem 2.5.19 (Scorza–Dragoni Theorem). If T and X are Polish spaces, Y is a separable metric space, μ : B(T) → [0, +∞) is a finite compact regular Borel measure, and f : T × X → Y is a Carathéodory function, then for every ε > 0 there exists a compact set K ε ⊆ T with μ(T \ K ε ) ≤ ε such that f K ε ×X is continuous. Proof. From Theorem 1.5.21 we know that Y is homeomorphic to a subset of the Hilbert cube ℍ = [0, 1]ℕ . Let h = (h n )n∈ℕ : Y → ℍ be this homeomorphism. Then f is a

2.5 Regular and Radon Measures | 145

Carathéodory function if and only if for every n ∈ ℕ, h n ∘ f : T × X → [0, 1] is a Carathéodory function. Therefore without any loss of generality we may assume that Y = [0, 1]. Let {U n }n≥1 be a basis for the topology of X and let {x m }m≥1 ⊆ X be dense. For every q ∈ [0, 1] ∩ ℚ let ξ nq : X → [0, 1] be defined by ξ nq (x) = qχ U n (x). Since U n is open, χ U n is lower semicontinuous (see Definition 1.7.1), and if φ : X → Y = [0, 1] is lower semicontinuous, then φ(x) = sup[ξ nq (x) : ξ nq ≤ φ] with x ∈ X. So, we define A nqm = {t ∈ T : ξ nq (x m ) ≤ f(t, x m )} ∈ B(T) . Let A nq = ⋂m∈ℕ A nqm ∈ B(T). The density of {x m }m≥1 in X, the continuity of f(t, ⋅), and the lower semicontinuity of ξ nq imply that A nq = {t ∈ T : ξ nq (x) ≤ f(t, x) for all x ∈ X} . We set η nq (t, x) = χ A nq (t)ξ nq (x). Then η nq ≤ f and for all (t, x) ∈ T × X we have f(t, x) = supn,q η nq (t, x). Note that ℕ × ([0, 1] ∩ ℚ) is countable. So we can write that f = sup χ B k h k with B k ∈ B(T) ,

h k is lower semicontinuous on X .

k∈ℕ

Since by hypothesis μ is a finite, compact regular measure on T, there exist an open set V k ⊆ T and a compact set K k ⊆ T such that Kk ⊆ Bk ⊆ Vk

and

μ(V k \ K k ) ≤

ε 2k+2

for all k ∈ ℕ .

(2.5.17)

Let E k = K k ∪ (X \ V k ) for all k ∈ ℕ. Then χ B k E k is continuous (see (2.5.17)), and this implies that χ B k h k is lower semicontinuous. Let E = ⋂k∈ℕ E k ⊆ T be compact. We see that μ(T \ E) ≤ ε/2 and f E×X is lower semicontinuous as the upper envelope of lower semicontinuous functions; see Proposition 1.7.4(a). The same argument applied to 1 − f produces another compact set Ẽ ⊆ T with μ(T \ E)̃ ≤ ε/2 and (1 − f)E×X is ̃ lower semicontinuous. We set T ε = E ∩ Ẽ ⊆ T, which is compact. Then we see that μ(T \ T ε ) ≤ ε and f T ε ×X continuous. Next we introduce an extension of the notion of a Carathéodory function (see Definition 2.2.30), which is important in calculus of variation, optimal control, and optimization. Definition 2.5.20. Let (X, Σ) be a measurable space, Y a Hausdorff topological space, and f : X × Y → ℝ = ℝ ∪ {+∞}. We say that f is a normal integrand if the following hold: (a) f is Σ ⨂ B(Y)-measurable; (b) y → f(x, y) is lower semicontinuous for all x ∈ X. Proposition 2.5.21. If (X, Σ, μ) is a complete measure space, Y is a Polish space, and f : X × Y → ℝ = ℝ ∪ {+∞} is a normal integrand such that there is a Carathéodory

146 | 2 Measure Theory function ξ : X × Y → ℝ satisfying ξ(x, y) ≤ f(x, y) for all (x, y) ∈ X × Y, then there is a sequence of Carathéodory functions f n : X × Y → ℝ such that ξ(x, y) ≤ f n (x, y) ≤ f(x, y) for all (x, y) ∈ X × Y and f n ↗ f as n → ∞. Proof. We reason as in the proof of Proposition 1.7.6. So, we define f n (x, y) = inf[f(x, y) + nd(y, z) : z ∈ Y]

for all n ∈ ℕ

with d being the metric on Y. If {z m }m≥1 ⊆ Y is dense in Y, then f n (x, y) = inf [f(x, y) + nd(y, z m )] for all n ∈ ℕ . m∈ℕ

This shows that f n is Σ ⨂ B(X)-measurable; see Proposition 2.2.31. Clearly we have ξ(x, y) ≤ f n (x, y) for all (x, y) ∈ X×Y, for all n ∈ ℕ and as in the proof of Proposition 1.7.6, we show that f n ↗ f . Using this proposition we can have the following extension of the Scorza–Dragoni Theorem; see Theorem 2.5.19. Theorem 2.5.22. If T and Y are Polish spaces, μ is a finite, compact regular Borel measure on T and f : T × X → ℝ = ℝ ∪ {+∞} is a normal integrand bounded below by a Carathéodory function ξ , then for given ε > 0 there is a compact set T ε ⊆ T such that μ(T \ T ε ) ≤ ε and f T ε ×X is lower semicontinuous. Proof. Using Proposition 2.5.21, there exist Carathéodory functions f n such that ξ ≤ f n ≤ f for all n ∈ ℕ and f n ↗ f . We apply the Scorza–Dragoni Theorem (see Theorem 2.5.19), and for each n ∈ ℕ there is a compact set T n ⊆ T with μ(T \ T n ) ≤ ε/(2n ) and f n T n ×X is continuous. Let T ε = ⋂n≥1 T n ⊆ T being compact. Then, of course, μ(T \ T ε ) ≤ ε and f T ε ×X is lower semicontinuous. Definition 2.5.23. Let (X, Σ, μ) be a measure space, (Y, L) a measurable space, and f : X → Y a (Σ, L)-measurable map. Then μ induces an image measure μ ∘ f −1 on Y by (μ ∘ f −1 )(A) = μ(f −1 (A)) for all A ∈ L. Since f −1 preserves all the set theoretic operations, we see that indeed μ ∘ f −1 is a measure on (Y, L). Proposition 2.5.24. If (X, Σ, μ) is a measure space, (Y, L) is a measurable space, f : X → Y is a (Σ, L)-measurable map, and h : Y → ℝ is a L-measurable function, then ∫ hd (μ ∘ f −1 ) = ∫(h ∘ f)dμ Y

X

whenever either side exists. Proof. If h = χ A with A ∈ L, then the result follows from Definition 2.5.23. So, the result is also true for simple functions that are linear combinations of characteristic functions. Finally we use Proposition 2.2.18 to pass to the general case.

2.6 Analytic (Souslin) Sets |

147

Image measures via continuous maps preserve the property of being a Radon measure Proposition 2.5.25. If X, Y are Hausdorff topological spaces, X is compact, f : X → Y is continuous, and μ : B(X) → [0, +∞] is a Radon measure, then μ ∘ f −1 : B(Y) → [0, +∞] is a Radon measure as well. Proof. According to Theorem 2.5.11, it suffices to show that μ ∘ f −1 is compact regular. Since μ is a Radon measure, for every A ∈ B(Y) one gets (μ ∘ f −1 ) (A) = sup[μ(K) : K ⊆ X is compact, K ⊆ f −1 (A)] ;

(2.5.18)

see Definition 2.5.23. For a compact K ⊆ f −1 (A) it follows f(K) ⊆ A and so K ⊆ f −1 (f(K)) ⊆ f −1 (A). Hence μ(K) ≤ μ(f −1 (f(K))) ≤ (μ ∘ f −1 ) (A) .

(2.5.19)

The continuity of f implies that K̃ = f(K) ⊆ Y is compact. Then from (2.5.18) and (2.5.19) it follows that ̃ : K̃ ⊆ Y is compact, K̃ ⊆ A] , (μ ∘ f −1 ) (A) = sup [(μ ∘ f −1 ) (K) which shows that μ ∘ f −1 is compact regular, hence a Radon measure.

2.6 Analytic (Souslin) Sets In Definition 1.5.51 we introduced the notion of a Souslin space. Souslin spaces are of fundamental importance in measure theory since they give to the theory of Borel sets and Borel functions depth and power. Let us start by recalling the definition of Souslin space. Definition 2.6.1. A Hausdorff topological space X is said to be a Souslin space if it is the continuous image of a Polish space, that is, there exists a Polish space Y and a continuous surjection f : Y → X. A subset of a Hausdorff topological space that is a Souslin space is called a Souslin set. A Souslin subset of a Polish space is called analytic set as well. The complement of a Souslin set is called co-Souslin set (or coanalytic set). Remark 2.6.2. We have that a Souslin space is always separable but need not to be metrizable; see Remark 1.5.52. Moreover, using Remark 1.5.50, we see that a nonempty subset of a Hausdorff space is a Souslin set if it is the image of the Polish space ℕ∞ under a continuous map. Given a set B, by B f we denote the set of all finite sequences with terms in the set B. f f That is, B f = ⋃n≥1 B n with B n being the set of n-sequences. Of special interest to us is the set ℕf . Note that ℕf is countable in contrast to ℕ∞ , which is uncountable. Using ℕf we introduce the following definition.

148 | 2 Measure Theory Definition 2.6.3. Let X be a nonempty set and L ⊆ 2X . An L-Souslin scheme is a map A : ℕf → L. Let D be the family of all L-Souslin schemes. The Souslin operation (or A-operation) over the class L is a map a : D → L such that a(A) = ⋃ ⋂ A(p1 , . . . , p k )

for all A ∈ D .

(2.6.1)

p∈ℕ∞ k∈ℕ

The collection of all sets of this form is denoted by S(L). The elements of S(L) are called L-Souslin (or L-analytic) sets. A Souslin scheme A is said to be regular (or monotone) if A(p1 , . . . , p k+1 ) ⊆ A(p1 , . . . , p k ) with p ∈ ℕ∞ . Remark 2.6.4. If 0 ∈ L (or if L contains disjoint sets), then 0 ∈ S(L). Note that in (2.6.1) the union is uncountable. So, if L is a σ-algebra and A is an L-Souslin scheme, then a(A) may be outside of L. In what follows we will use the following notation. Given s = (s k )nk=1 ∈ ℕf and p ∈ ℕ∞ , we write s < p if and only if s1 = p1 , . . . , s n = p n . In the next proposition we collect some basic properties of the operator S. Proposition 2.6.5. If X is a nonempty set and L, L ⊆ 2X , then the following hold: (a) S(L) ⊆ S(L ) if L ⊆ L , that is, S is monotone; (b) S(L)δ = S(L), that is, S is closed under countable intersections; (c) S(L)σ = S(L), that is, S is closed under countable unions; (d) L ⊆ S(L). Proof. (a) This is an immediate consequence of Definition 2.6.3. (b) Clearly we have S(L) ⊆ S(L)δ . Suppose that ⋂k≥1 a(A k ) ∈ S(L)δ . We need to produce an L-Souslin scheme A : ℕf → L such that a(A) = ⋂k≥1 a(A k ). To this end for every k ∈ ℕ, let T k = {(2m − 1)2k−1 : m ∈ ℕ}. Then {T k }k≥1 is a partition of ℕ into infinitely many infinite sets. For each k ∈ ℕ, let ξ k : ℕ∞ → ℕ∞ be defined by ξ k ((p n )) = (p2k−1 , p3⋅2k−1 , p5⋅2k−1 , . . .) , that is, ξ picks from the sequence (p n )n∈ℕ those elements with index in T k . We will produce an L-Souslin scheme A such that ⋂ A(s) = ⋂ ⋂ A k (s) for all p ∈ ℕ∞ . s 0 it holds that limt→t−0 d f (t) = μ({x ∈ X : f(x) ≥ t0 }). Problem 2.13. Given ε > 0, produce a dense open set U ⊆ ℝ such that λ(U) ≤ ε, where λ is the Lebesgue measure on ℝ. Problem 2.14. Suppose that 1 ≤ p < ∞ and let f ∈ L p (ℝN ) for the Lebesgue measure on ℝN . Show that lim ∫ |f(x + h) − f(x)|dλ = 0 .

h→0

ℝN

2.8 Remarks | 175

Problem 2.15. (a) Suppose that f : ℝN → ℝ is integrable and K ⊆ ℝN is nonempty and compact. Show that lim|y|→∞ ∫K+y |f(x)|dx = 0. (b) Suppose that f : ℝN → ℝ is uniformly continuous and f ∈ L p (ℝN ) for some 1 ≤ p < ∞. Show that lim|x|→∞ f(x) = 0. Problem 2.16. Let X be a nonempty set, Y is a metrizable space and f : X → Y is a map that is the pointwise limit of simple functions. Show that f(X) ⊆ Y is separable. Problem 2.17. Let (X, Σ) be a measurable space, Y a second countable Hausdorff topological space, and f : X → Y a Σ-measurable multifunction. Show that Gr f ∈ Σ ⨂ B(Y). Problem 2.18. Let (Ω, Σ, μ) be a measure space and L ⊆ Σ a countable subset such that if A ∈ Σ, μ(A) < ∞, then there exists B ∈ L with μ(A △ B) ≤ ε. Show that L p (Ω) is separable for all 1 ≤ p < ∞. Problem 2.19. Let (Ω, Σ, μ) be a σ-finite measure space and assume that f ∈ L p (Ω) for all p ≥ p0 ≥ 1. Show that limp→+∞ ‖f‖p = ‖f‖∞ . Problem 2.20. Let (X, Σ), (Y, L), and (V, D) be measurable spaces, f : X → Y, g : X → V, and let h : X → Y × V be defined by h(x) = (f(x), g(x)) for all x ∈ X. Show that h is (Σ, L ⨂ D)-measurable if and only if f is (Σ, L)-measurable and g is (Σ, D)-measurable. Problem 2.21. Let (X, Σ) be a measurable space, Y, Y1 , Y2 separable metrizable spaces, and V a Hausdorff topological space. Suppose that f k : X × Y → Y k , k = 1, 2 are Carathéodory functions , g : Y1 × Y2 → V is Borel measurable . Show that h : X × Y → V defined by h(x, y) = g(f1 (x, y), f2 (x, y)) is Σ ⨂ B(X)measurable. Problem 2.22. Let E ⊆ ℝ be Lebesgue measurable with λ(E) > 0. Show that there exists a nonmeasurable subset of E. Problem 2.23. Let (X, Σ, μ) be a finite measure space and f nm : X → ℝ with n, m ∈ ℕ a family of Σ-measurable functions such that f nm (x) → f n (x)

μ-a.e. as m → ∞

and

f n (x) → f(x)

μ-a.e. as n → ∞ .

Show that there exists an increasing sequence m n ∈ ℕ with n ≥ 1 such that f nm n (x) → f(x)

μ-a.e. as n → ∞ .

Problem 2.24. Let X be a compact metrizable space and Y be a separable metrizable space, and consider the function space C(X, Y) with the τ u -topology; see Remark 1.6.17. Let L = {e−1 x (C), C ⊆ Y is closed} ; see Definition 1.6.7. Show that B(C(X, Y)) = σ(L).

176 | 2 Measure Theory Problem 2.25. Let (X, Σ) be a measurable space, V a compact metrizable space, Y a separable metrizable space, and consider the function space C(V, Y) endowed with the τ u -topology; see Remark 1.6.17. (a) Given a Carathéodory function f : X × V → Y, show that f ̂ : X → C(V, Y) defined by f ̂(x)(⋅) = f(x, ⋅) is Σ-measurable. ̃ ⋅) = (b) If h : X → C(V, Y) is Σ-measurable, show that h̃ : X × V → Y defined by h(x, h(x)(⋅) is a Carathéodory function. Problem 2.26. Let (X, Σ, μ) be a measure space and f : X → ℝ is a μ-integrable function. Show that the set C = {x ∈ X : f(x) ≠ 0} has σ-finite μ-measure. Problem 2.27. Suppose that X and Y are Hausdorff topological spaces such that D(Y) = {(y, v) ∈ Y × Y : y = v} ∈ B(Y) ⨂ B(Y) . Show that the graph of any Borel function f : X → Y belongs to B(X) ⨂ B(Y). Problem 2.28. Let (X, Σ, μ) be a finite measure space. Show that there exists an at most countable family {A n }n≥1 ⊆ Σ of atoms such that X \ ⋃n≥1 A n is nonatomic. Problem 2.29. Let (X, Σ, μ) be a measure space with μ being semifinite (see Definition 2.1.30(a)), and let f, g : X → [0, +∞] be two Σ-measurable functions such that ∫ fdμ ≤ ∫ gdμ A

for all A ∈ Σ with μ(A) < ∞ .

A

Show that f(x) ≤ g(x) for μ-a.a. x ∈ X. Problem 2.30. Let A ⊆ ℝ be a set of finite Lebesgue measure and let f : ℝ → ℝ be defined by f(x) = λ(A ∩ (−∞, x]) for all x ∈ ℝ. Here λ denotes the Lebesgue measure on ℝ. Show that f is continuous. Problem 2.31. Let A ⊆ ℝ be a Lebesgue measurable set with λ(A) > 0 with λ being the Lebesgue measure on ℝ. Show that A − A contains an open set. Problem 2.32. Let (X, Σ, μ) be a measure space and f : X → [0, ∞] is a Σ-measurable ∞ function. Show that ∫X fdμ = ∫0 μ({x ∈ X : f(x) > s})ds. Problem 2.33. Let (X, Σ, μ), (Y, L, ν) be two σ-finite measure spaces. Show that (X × Y, Σ ⨂ L, μ × ν) is σ-finite as well. Problem 2.34. Let (X, Σ, μ) be a measure space, f n , f : X → [0, +∞) with n ≥ 1 are μ

μ

Σ-measurable functions and suppose that f n → f . Show that for every ϑ > 0, f nϑ → f ϑ . Problem 2.35. Let (X, Σ, μ) be a nonatomic measure space and f : X → [0, ∞] is a Σ-measurable function. Show that the measure Σ ∋ A → ξ(A) = ∫A fdμ is nonatomic if and only if μ({x ∈ X : f(x) = +∞}) = 0.

2.8 Remarks | 177

Problem 2.36. Let X be a Hausdorff topological space, μ : B(X) → [0, +∞) be a finite Borel measure, and f : X → ℝ be a continuous function. Show that there exists an at most countable set D ⊆ ℝ such that μ({x ∈ X : f(x) = η}) > 0 for all η ∈ D. Problem 2.37. Let X, Y be two metric spaces and f : X → Y. Let C f = {x ∈ X : f is continuous}. Show that C f ∈ B(X). Problem 2.38. Does the Lebesgue Dominated Convergence Theorem (see Theorem 2.3.8) hold for nets? Justify your answer. Problem 2.39. Let X be a Polish space and A ⊆ X. Show that A is analytic if and only if A = projX B with B ∈ B(X × X) = B(X) ⨂ B(X). Problem 2.40. Let (X, Σ) be a measurable space and Y a metric space. Show that f : X → Y is Σ-measurable if and only if for all continuous φ : Y → ℝ we have that φ ∘ f is Σ-measurable. Problem 2.41. Let (Ω, Σ) be a measurable space, X a separable metrizable space, Y a Hausdorff topological space, f : Ω × X → Y a Carathéodory map, and U ⊆ Y be open. Show that the multifunction w → G(w) = {x ∈ X : f(w, x) ∈ U} is measurable. Problem 2.42. Let (Ω, Σ) be a measurable space, X is a Polish space and F n : Ω → P f (X) with n ∈ ℕ are measurable multifunctions such that for every w ∈ Ω, there exists n ∈ ℕ such that F n (w) ∈ P k (X). Show that w → ⋂n≥1 F n (w) is measurable. Problem 2.43. Let {X n }n≥1 be a sequence of Polish spaces and for each n ∈ ℕ, A n ⊆ X n is analytic. Show that ∏n≥1 A n is an analytic subset of ∏n≥1 X n . Problem 2.44. Let X, Y be a Polish spaces, A ∈ B(X), f : A → Y is a Borel measurable map, and E = f(A). Assume that f is injective and B ∈ B(Y). Show that f −1 is Borel measurable. Problem 2.45. Let X, Y be Polish spaces and f : X → Y be Borel measurable. (a) Show that if A ⊆ X is analytic, then f(A) ⊆ Y is analytic. (b) Show that if B ⊆ Y is analytic, then f −1 (B) ⊆ X is analytic. Problem 2.46. Let X, Y be Hausdorff topological spaces and f : X → Y be a map that has a graph that is a Souslin subset of X × Y. Show that f is Borel measurable. Problem 2.47. Let (X, Σ, μ) be a finite measure space, K ⊆ L1 (X) be uniformly integrable, and K ∗ be the sequential closure for the μ-almost everywhere convergence in K. Show that K ∗ is uniformly integrable as well. Problem 2.48. Let (X, Σ, μ) be a measure space and C ⊆ L1 (X) a uniformly integrable set. Show that for given ε > 0 there exist ξ ε ∈ L1 (X)+ and δ > 0 such that A ∈ Σ, ∫A ξ ε dμ ≤ δ implies supf ∈C ∫A |f|dμ ≤ ε.

178 | 2 Measure Theory Problem 2.49. Let (X, Σ, μ) be a measure space and C ⊆ L1 (X) a uniformly integrable set. Show that for given ε > 0 there is ξ ε ∈ L1 (X)+ such that supf ∈C ∫{|f|≥ξ } |f|dμ ≤ ε. ε

Problem 2.50. Let (X, Σ, μ) be a measure space and C ⊆ ε > 0 we can find ξ ε ∈ L1 (Ω)+ such that

L1 (X).

Assume that for every

sup ∫ |f|dμ ≤ ε . f ∈C

{|f|≥ξ ε }

Show that C is uniformly integrable. Problem 2.51. Let (X, Σ, μ) be a measure space and C ⊆ L1 (X) be a bounded set, and suppose that for every ε > 0 we can find ξ ε ∈ L1 (Ω)+ and δ > 0 such that A ∈ Σ, ∫A h ε dμ ≤ δ implies that supf ∈C ∫A |f|dμ ≤ ε. Show that C is uniformly integrable. Problem 2.52. Let (Ω, Σ) be a measurable space, X a separable metrizable space, f : Ω × X → ℝ a Carathéodory function, and F : Ω → P k (X) a measurable multifunction. Let m(w) = min[f(w, x) : x ∈ F(w)] and M(w) = {x ∈ F(w) : m(w) = f(w, x)}. Show that m and M are both measurable. Problem 2.53. Let (X, Σ) be a measurable space and μ, ν be finite measures on (X, Σ). Show that either μ⊥ν or that there exist ε > 0 and B ∈ Σ with μ(B) > 0 and ν ≥ εμ on B, that is, B is a positive set for ν − εμ.

3 Basic Functional Analysis Functional Analysis emerged as a coherent field of mathematics in the first four decades of the 20th century. It provided a unified framework to treat different objects using abstraction and axiomatization. The main idea is to view functions as points, respectively elements, of an abstract space endowed with certain structures that are axiomatically defined. This way mathematicians were able to “escape” from the usual finite dimensional Euclidean spaces and consider infinite dimensional function spaces. The starting point was the thesis of Fréchet in 1906 who introduced the abstract notion of “metric space” – a concept that was influential in the development of both functional analysis and point set topology. The work of Fréchet was the culmination of the efforts and contributions of many prominent mathematicians from France, Germany, and Italy. Combined with the revolution of measure theory this provided a fertile ground for the development of functional analysis. The prominent figure in the story is that of the Polish mathematician Stefan Banach (1892–1945). In this chapter, we review the basic notions and results of “Linear Functional Analysis.” Moreover, we touch on “Operator Theory” and in particular, we discuss the spectral properties of compact self-adjoint operators on a Hilbert space.

3.1 Topological Vector Spaces, Hahn–Banach Theorem We start with the basic notion of a topological vector space. Recall that a vector space or linear space is a set X equipped with two operations + : X × X → X defined by (x, u) → x + u called the vector addition and ⋅ : 𝕂 × X → X defined by (λ, x) → λ ⋅ x called the scalar multiplication where 𝕂 = ℝ or 𝕂 = ℂ. Definition 3.1.1. A topological vector space is a vector space endowed with a Hausdorff topology τ, which makes the two vector space operations above continuous. Then we say that τ is a vector topology on X. Remark 3.1.2. Continuity of vector addition means that if x, u ∈ X and V ∈ τ is a neighborhood of x + u, that is, V ∈ N(x + u), then there exist U x ∈ N(x) and U u ∈ N(u) such that U x + U u ⊆ V. Similarly the continuity of the scalar multiplication implies that if (λ, x) ∈ 𝕂 × X with 𝕂 = ℝ or 𝕂 = ℂ and V ∈ N(λx), then there exist ε > 0 and U x ∈ N(x) such that μU x ⊆ V for all |μ − λ| < ε. Moreover, for a given x ∈ X and a given λ ∈ 𝕂 we introduce T̂ x (u) = x + u M̂ λ (u) = λu

for all u ∈ X for all u ∈ X

(the translation operator) , (the scalar multiplication operator) .

Clearly, these operators are homeomorphisms of X onto X. It follows that the vector topology τ is translation invariant, that is, U ∈ τ if and only if x + U ∈ τ for all x ∈ X. https://doi.org/10.1515/9783110532982-003

180 | 3 Basic Functional Analysis Hence, τ is completely determined by any local basis, in particular by the local basis at the origin. If the vector topology is induced by a metric d, then the metric is invariant, that is, d(x + v, u + v) = d(x, u) for all x, u, v ∈ X. An immediate consequence of these observations is the following simple lemma. Lemma 3.1.3. Let (X, τ) be a topological vector space. (a) For all U, V ∈ τ and for all λ ∈ 𝕂 it follows U + V ∈ τ and λU ∈ τ. (b) If A ⊆ X and U ∈ τ, then A + U = A + U and it is open. (c) If K ⊆ X is compact and C ⊆ X is closed, then K + C ⊆ X is closed. (d) If K1 , K2 ⊆ X are compact sets, then K1 + K2 ⊆ X is compact. (e) If φ : X → ℝ is linear, then φ is continuous if and only if φ is continuous at x = 0. Proof. (a), (b), and (e) are clear. (c) Let {v α }α∈I ⊆ K + C be a net such that v α → v. We have that v α = x α + u α with x α ∈ K and v α ∈ C for all α ∈ I. The compactness of K implies that there exists a subnet {x β }β∈J of {x α }α∈I such that x β → x ∈ K; see Proposition 1.4.45(c). Then u β = v β − x β → v − x = u ∈ C since C is closed; see Proposition 1.2.36. Therefore v = x + u with x ∈ K and u ∈ C. Hence, we conclude that K + C is closed. (d) Since “+” is continuous on X × X and K1 × K2 ⊆ X × X is compact (see Theorem 1.4.56) we conclude that +(K1 ×K2 ) = K1 +K2 ⊆ X is compact; see Theorem 1.4.51. Remark 3.1.4. The algebraic sum of two closed sets need not be closed. In ℝ2 equipped with the usual Euclidean metric, we consider the sets C1 = {(x,

1 ) : x ∈ ℝ \ {0}} x

and

C2 = {(u, 0) : u ∈ ℝ} .

Then both are closed in ℝ2 but C1 + C2 = {(x + u, 1/x) : x ∈ ℝ \ {0}, u ∈ ℝ} is not closed in ℝ × ℝ. Remark 3.1.5. Let X, Y be two vector spaces. Recall that a map A : X → Y is called a linear function if it is additive and homogeneous, that is, A(x + y) = A(x) + A(y) for all x, y ∈ X , A(λx) = λA(x)

for all λ ∈ 𝕂 and for all x ∈ X .

By N(A) we denote the kernel of A, that is, N(A) = {x ∈ X : A(x) = 0} and by R(A) the range of A, that is, R(A) = {A(x) : x ∈ X}. Now we introduce certain classes of sets that are important in the study of topological vector spaces. Definition 3.1.6. Let X be a vector space and A ⊆ X. (a) We say that A is convex if for all x, u ∈ A and λ ∈ [0, 1], it holds (1 − λ)x + λu ∈ A. (b) We say that A is absorbing if for any x ∈ X there is t = t(x) > 0 such that x ∈ tA. So every absorbing set contains the origin.

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 181

(c) We say that A is balanced if λA ⊆ A for all λ ∈ 𝕂 with |λ| ≤ 1. (d) We say that A is symmetric if A = −A. Lemma 3.1.7. If (X, τ) is a topological vector space and V ∈ N(0), then there exists a symmetric set U ∈ N(0) such that U + U ⊆ V. Proof. The continuity of the vector addition operation implies that there exist U1 , U2 ∈ N(0) such that U1 + U2 ⊆ V. Let U = U1 ∩ (−U1 ) ∩ U2 ∩ (−U2 ). Then U ∈ N(0) is symmetric and U + U ⊆ V. Proposition 3.1.8. If (X, τ) is a topological vector space, K ⊆ X is compact, C ⊆ X is closed, and K ∩ C = 0, then there exists U ∈ N(0) such that (K + U) ∩ (C + U) = 0. Proof. We assume that K ≠ 0 or otherwise the result is obvious. Let x ∈ K. Applying Lemma 3.1.7 there is a symmetric U x ∈ N(0) such that (x + U x + U x + U x ) ∩ C = 0. Exploiting the symmetry of U x it follows that (x + U x + U x ) ∩ (C + U x ) = 0. The compactness m of K implies that there exist {x n }m n=1 ⊆ K such that K ⊆ ⋃n=1 (x n + U x n ). Let U = m ⋂n=1 U x n ∈ N(0). Then m

m

K + U ⊆ ⋃ (x n + U x n + U) ⊆ ⋃ (x n + U x n + U x n ) . n=1

n=1

We conclude that (K + U) ∩ (C + U) = 0. Note that K + U is an open set containing K and C + U is an open set containing C; see Lemma 3.1.3(b). Taking K to be a singleton we obtain the following result. Corollary 3.1.9. Every topological vector space is regular; see Definition 1.2.7. Proposition 3.1.10. Let (X, τ) be a topological vector space. (a) If A ⊆ X, then A = ⋂U∈N(0) (A + U). (b) If A, C ⊆ X, then A + C ⊆ A + C. (c) If A ⊆ X is convex, then int A and A are convex. (d) If A ⊆ X is balanced, then A is balanced and when 0 ∈ int A, then int A is balanced. Proof. (a) We know that x ∈ A if and only if (x + U) ∩ A ≠ 0 for all U ∈ N(0). Hence, x ∈ A if and only if x ∈ A − U for every U ∈ N(0). But U ∈ N(0) if and only if −U ∈ N(0). (b) Let x ∈ A, u ∈ C and let V ∈ N(x + u). Then there exist V x ∈ N(x), V u ∈ N(u) such that V x + V u ⊆ V. Then choose x ∈ A ∩ V x and u ∈ C ∩ V u . The existence follows since x ∈ A and u ∈ C. Then x + u ∈ (A + C) ∩ V. Since V ∈ N(x + u) we conclude that x + u ∈ A + C, thus A + C ⊆ A + C. (c) Since int A ⊆ A and A is convex, it follows that (1 − λ) int A + λ int A ⊆ A

for all λ ∈ (0, 1) .

Note that the left-hand side in (3.1.1) is an open set and so (1 − λ) int A + λ int A ⊆ int A

for all λ ∈ (0, 1) .

(3.1.1)

182 | 3 Basic Functional Analysis Hence, int A is convex. For λ ∈ (0, 1), due to part (b) and since A is convex, one gets (1 − λ)A + λA = (1 − λ)A + λA ⊆ (1 − λ)A + λA ⊆ A . Therefore A is convex. (d) The proof that A is balanced is similar to the proof of part (c). Let λ ∈ 𝕂 be such that 0 < |λ| ≤ 1. Since A is balanced, we derive λ int A = int λA ⊆ λA ⊆ A, which shows that λ int A ⊆ A. Moreover, since 0 ∈ int A, for λ = 0, it follows that λ int A ⊆ int A and so int A is balanced. This leads to the following structural result for the topology of X. Proposition 3.1.11. Let (X, τ) be a topological vector space. (a) Every V ∈ N(0) contains a balanced U ∈ N(0). (b) Every convex V ∈ N(0) contains a balanced convex U ∈ N(0). Proof. (a) Let V ∈ N(0). Exploiting the continuity of the scalar multiplication operation, there exist δ > 0 and Ũ ∈ N(0) such that λ Ũ ⊆ V for all λ ∈ 𝕂 with |λ| < δ. Let U be the union of all these sets λ U.̃ Evidently, U ∈ N(0), U is balanced and U ⊆ V. (b) Let V ∈ N(0) be convex. Let A = ⋂|λ|=1 λV. Applying part (a), let Û ∈ N(0) be balanced such that Û ⊆ V. We have λ−1 Û = Û for all λ ∈ 𝕂 with |λ| = 1. Hence Û ⊆ λV and thus Û ⊆ A. This means that Û ⊆ int A ∈ N(0). Moreover, int A ⊆ V. The set A is convex, being the intersection of convex sets. Hence, int A is convex; see Proposition 3.1.10(c). We claim that int A is balanced. According to Proposition 3.1.10(d) it suffices to show that A is balanced. To this end, let t ∈ [0, 1] and μ ∈ 𝕂 with |μ| = 1. Then, since λV ∈ N(0) is convex, tμA = ⋂ tμλV = ⋂ tλV ⊆ ⋂ λV . |λ|=1

|λ|=1

|λ|=1

Therefore, tμA ⊆ A and so A is balanced. We conclude that U = int A ∈ N(0) is the desired balanced and convex neighborhood of the origin. Corollary 3.1.12. Every topological vector space has a local basis consisting of balanced sets. We introduce some particular types of topological vector spaces depending on the structure of the local basis. Definition 3.1.13. Let (X, τ) be a topological vector space. (a) A set A ⊆ X is said to be bounded if for every U ∈ N(0) there is a t U > 0 such that A ⊆ tU for all t > t U . (b) We say that X is locally convex if it has a local basis B consisting of convex sets. (c) We say that X is locally bounded if it has a bounded set in N(0). (d) We say that X is Fréchet if it is locally convex and the topology τ is induced by a complete translation invariant metric d. (e) A norm on X is a real function ‖ ⋅ ‖ such that

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 183

(e)1 ‖x‖ ≥ 0 for all x ∈ X and ‖x‖ = 0 if and only if x = 0; (e)2 ‖λx‖ = |λ|‖x‖ for all (λ, x) ∈ 𝕂 × X; (e)3 ‖x + u‖ ≤ ‖x‖ + ‖u‖ for all x, u ∈ X, which is called triangle inequality. X equipped with a norm is called a normed space. The norm defines a translation invariant metric d(x, u) = ‖x − u‖. If (X, d) is complete, then X is a Banach space. (f) We say that X is normable if τ is generated by the metric induced by a norm. Remark 3.1.14. If X is locally bounded, then it is first countable. Indeed, if U ∈ N(0) is bounded and r n → 0+ , then {r n U}n∈ℕ is a local basis for the origin. Finite dimensional vector spaces exhibit some distinguishing properties. The Euclidean norm on X being finite dimensional with dim X = n is defined by n

2

1 2

for all x = (x k )nk=1 ∈ X .

‖x‖2 = ( ∑ |x k | ) k=1

The topology on X induced by ‖ ⋅ ‖2 is known as the Euclidean topology. It turns out that the Euclidean space is the prototype of a n-dimensional vector space. Definition 3.1.15. Let X be a vector space and let ‖ ⋅ ‖, | ⋅ | be two norms on X. We say that these norms are equivalent if there exist constants η > m > 0 such that m‖x‖ ≤ |x| ≤ η‖x‖

for all x ∈ X .

Remark 3.1.16. Equivalence of norms is an equivalence relation and equivalent norms generate the same topology on X. Proposition 3.1.17. In a finite dimensional vector space any two norms are equivalent. Proof. Let X be the n-dimensional vector space with norm ‖⋅‖ and consider ℝn equipped with the norm ‖ ⋅ ‖2 . Let {e k }nk=1 ⊆ X be a basis for X and consider the linear map A : ℝn → X defined by n

A(λ) = ∑ λ k e k

for all λ = (λ k )nk=1 ∈ ℝn .

k=1

It is easy to see that A is an isomorphism. Moreover, we obtain the estimate n

n

2

1 2

n

2

1 2

‖A(λ)‖ ≤ ∑ |λ k |‖e k ‖ ≤ ( ∑ |λ k | ) ( ∑ ‖e k ‖ ) ≤ η‖λ‖2 k=1

k=1

(3.1.2)

k=1

1/2

with η = (∑nk=1 ‖e k ‖2 ) . Therefore, A is continuous. In addition, let ξ = ‖ ⋅ ‖ ∘ A : ℝN → ℝ, that is, n ξ(λ) = ∑ λ k e k k=1

for all λ = (λ k )nk=1 ∈ ℝN .

(3.1.3)

184 | 3 Basic Functional Analysis Of course, ξ is continuous. Moreover, ∂B1 = {λ ∈ ℝN : ‖λ‖2 = 1} is closed and bounded, and thus compact; see Theorem 1.5.38. Hence, there exists λ∗ ∈ ∂B1 such that ξ(λ∗ ) = inf ξ(λ) = m ≥ 0 ; λ∈∂B1

see Theorem 1.4.52. If m = 0, then ∑nk=1 λ∗k e k = 0 (see (3.1.3)), a contradiction since λ∗ ∈ ∂B1 . Hence, m > 0 and we get m‖λ‖2 ≤ ‖A(λ)‖ for all λ ∈ ℝn .

(3.1.4)

From (3.1.2) and (3.1.4) we infer that X and ℝn are linearly homeomorphic and so we conclude that any two norms on X are equivalent. Corollary 3.1.18. Every finite dimensional normed space is complete, thus a Banach space. Corollary 3.1.19. Every finite dimensional subspace of a normed space is closed. Next we will give a characterization of finite dimensional normed spaces in terms of the topological properties of the closed unit ball B1 = {x ∈ X : ‖x‖ ≤ 1}. First we need an auxiliary result known as the “Riesz Lemma.” Lemma 3.1.20 (Riesz Lemma). If X is a normed space, Y ⊆ X is a proper, closed vector subspace, and 0 < ϑ < 1, then there exists x ϑ ∈ (X \ Y) ∩ ∂B1 such that d(x ϑ , Y) ≥ ϑ. Proof. Let u ∈ X \ Y. Since Y is closed it holds that d(u, Y) = m > 0. We choose y ∈ Y such that ‖u − y‖ ≤ m/ϑ and set x ϑ = (u − y)/(‖u − y‖) ∈ ∂B1 . Then for every v ∈ Y it follows that ‖x ϑ − v‖ =

1 ‖u − (y + v‖u − y‖)‖ . ‖u − y‖

(3.1.5)

Note that y + v‖u − y‖ ∈ Y. Therefore, from (3.1.5) and the choice of y ∈ Y, it results in ‖x ϑ − v‖ ≥ m/(m/ϑ) = ϑ. Applying this lemma, we have the following characterization of finite dimensional normed spaces. Theorem 3.1.21. A normed space X is finite dimensional if and only if B1 is compact. Proof. ⇒: This direction follows from Theorem 1.5.38. ⇐: The set B1 is totally bounded; see Remark 1.5.32. Hence, there is {x k }nk=1 ⊆ B1 such that n

B1 ⊆ ⋃ (x k + B 12 )

(3.1.6)

k=1

with B1/2 = {x ∈ X : ‖x‖ < 1/2}. Let Y = span{x k }nk=1 . The Corollary 3.1.19 implies that Y ⊆ X is closed. Suppose that Y ≠ X. Then by Lemma 3.1.20 we find x̂ ∈ (X \ Y) ∩ ∂B1

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 185

such that d(x,̂ Y) ≥ ϑ >

1 . 2

(3.1.7)

Comparing (3.1.6) and (3.1.7) we reach a contradiction. Hence X = Y and so X is finite dimensional. Proposition 3.1.22. If X is a finite dimensional normed space, Y is a normed space and L : X → Y is a linear map, then L is continuous. Proof. Suppose dim X = n and let {e k }nk=1 be a basis for X. Since L is linear, we derive for x = ∑nk=1 λ k e k ∈ X with λ k ∈ 𝕂 that L(x) = ∑nk=1 λ k L(e k ). Hence ‖L(x)‖Y ≤ 1/2 ∑nk=1 |λ k |‖e k ‖X ≤ M (∑nk=1 |λ k |2 ) by the Bunyakowsky–Cauchy–Schwarz inequality 1/2

for finite sums with M = (∑nk=1 ‖L(e k )‖2Y ) . On the other hand we know from Proposition 3.1.17 the existence of m > 0 such that ‖λ‖2 ≤ 1/m‖x‖X where λ = (λ k )nk=1 ∈ ℝn . Therefore, it follows ‖L(x)‖Y ≤ M/m‖x‖X . Hence L is continuous. Remark 3.1.23. In particular, if X is a finite dimensional normed space, then every linear functional f : X → ℝ is continuous. In fact the converse is true as well. We conclude our discussion of finite dimensional topological vector spaces with a result closely related to Theorem 3.1.21. It says that there are no infinite dimensional locally compact topological vector spaces. Proposition 3.1.24. A topological vector space (X, τ) is locally compact if and only if X is finite dimensional. Proof. ⇒: Let U ∈ N(0) be relatively compact. So there is {x k }nk=1 ⊆ U such that n

U ⊆ ⋃ (x k + k=1

1 1 U) = {x1 , . . . , x n } + U . 2 2

(3.1.8)

Let span{x k }nk=1 . Then from (3.1.8) it follows 1 1 1 1 U ⊆ [Y + U] = Y + 2 U . 2 2 2 2 By induction we have U⊆Y+

1 U 2n

for all n ∈ ℕ .

(3.1.9)

We fix x ∈ U. Then from (3.1.9) we see that x = y n + 1/2n u n with y n ∈ Y, u n ∈ U and n ∈ ℕ. Since U is relatively compact we find a subnet {u β }β∈J of {u n }n∈ℕ such that u β → u. Moreover, 1/2β → 0. Hence, y β = x − (1/2β )u β → x ∈ Y. Therefore U ⊆ Y and since U is absorbing, we conclude that X = Y. Hence, X is finite dimensional. ⇐: Since X is finite dimensional, we see that X is linearly homeomorphic to (ℝn , ‖ ⋅ ‖2 ). As X is a normed space, invoking Theorem 3.1.21, we get that B1 is compact. Thus, X is locally compact.

186 | 3 Basic Functional Analysis Proposition 3.1.25. If (X, τ) is a topological vector space and A ⊆ X, then the following statements are equivalent: (a) A is bounded; see Definition 3.1.13(a). (b) If {x n }n≥1 ⊆ A and {λ n }n≥1 ⊆ 𝕂 with λ n → 0, then λ n x n → 0 in X. Proof. (a) ⇒ (b): Let U ∈ N(0) be balanced; see Corollary 3.1.12. Then A ⊆ tU for some t > 0. Suppose {x n }n≥1 ⊆ A and {λ n }n≥1 ⊆ 𝕂 such that λ n → 0. Then there exists n0 ∈ ℕ such that |λ n |t < 1 for all n > n0 . Since U is balanced, it follows λ n x n = λ n t1/tx n ∈ U for all n > n0 . We conclude that λ n x n → 0 in X as n → ∞. (b) ⇒ (a): Arguing by contradiction suppose that A is not bounded. Then there exist t n → +∞ and U ∈ N(0) such that (X \ t n U) ∩ A ≠ 0 for all n ∈ ℕ. Let x n ∈ A with x n ∈ ̸ t n U for all n ∈ ℕ. We have 1/t n x n ∈ ̸ U for all n ∈ ℕ. Hence 1/t n u n does not converge to 0, a contradiction to our hypothesis. Next we take a closer look at convex sets. In Proposition 3.1.10(c) we saw that the interior and the closure of a convex set remain convex. In fact we can say more. Proposition 3.1.26. If X is a topological vector space, C ⊆ X is a convex set and 0 ≤ t < 1, then (1 − t) int C + tC ⊆ int C. Proof. For t = 0, the result is trivially true. So, suppose that 0 < t < 1 and let x ∈ int C and u ∈ C. Then there exists U ∈ N(0) such that x + U ⊆ C. Note that u − (1 − t)/tU ∈ N(u) and so there exists y ∈ C ∩ (u − (1 − t)/tU). Therefore t(u − y) ∈ (1 − t)U. Let V = (1 − t)(x + U) + ty = (1 − t)x + (1 − t)U + ty. This is a nonempty open set and V ⊆ C due to the convexity of C. One gets (1 − t)x + tu = (1 − t)x + t(u − y) + ty ∈ (1 − t)x + (1 − t)U + ty = V ⊆ C , which gives (1 − t)x + tu ∈ int C. Proposition 3.1.27. If X is a topological vector space and C ⊆ X is convex, then int C = C and int C = int C. Proof. From Proposition 3.1.26 it follows (1 − t) int C + tC ⊆ int C for all 0 ≤ t ≤ 1. Letting t → 1− gives C = int C. Let u ∈ int C and x ∈ int C. Then there exists U ∈ N(0) such that x + U ⊆ C. Since U is absorbing there exists ϑ ∈ (0, 1) such that ϑ(x−u) ∈ U. Then x+ϑ(x−u) ∈ C. Applying Proposition 3.1.26 gives x − ϑ(x − u) = (1 − ϑ)x + ϑu ∈ int C. Applying Proposition 3.1.26 again yields 1 1 x = [x − ϑ(x − u)] + [x + ϑ(x − u)] ∈ int C . 2 2 This shows int C ⊆ int C ⊆ int C and so int C = int C. Remark 3.1.28. Usually, sets C satisfying int C = C and int C = int C are called regular. Clearly the intersection of any family of convex sets is again convex. So we can state the following definition.

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 187

Definition 3.1.29. Let X be a vector space and A ⊆ X a nonempty set. The convex hull of A, denoted by conv A, is the intersection of all convex sets that contain A. Therefore, conv A is the smallest convex set containing A. An alternative description is given by n

n

conv A = {x ∈ X : ∃x k ∈ A, t k ≥ 0, k = 1, . . . , n with ∑ t k = 1, x = ∑ t k x k } . k=1

k=1

That is, conv A is the set of all convex combinations of elements in A. If X is a topological vector space, the closed convex hull of A, denoted by conv A, is the set conv A. For finite dimensional vector spaces, the convex hull of a set is described more precisely by the so-called “Carathéodory Convexity Theorem.” Theorem 3.1.30 (Carathéodory Convexity Theorem). If X is an m-dimensional vector space, A ⊆ X, and x ∈ conv A, then x is the convex combination of at most (m + 1)elements of A. Proof. From Definition 3.1.29 we know that x = ∑nk=1 t k x k with t k ≥ 0, x k ∈ A, k = 1, . . . , n and ∑nk=1 t k = 1. Without any loss of generality we may assume that t k > 0 for all k = 1, . . . , n. Suppose that n > m + 1, then {x k − x1 }nk=2 must be linearly dependent. Hence, there exist β2 , . . . , β m ∈ ℝ not all of them equal to zero such that n

n

∑ β k x k − ( ∑ β k ) x1 = 0 . k=2

k=2

Thus, there are η1 , . . . , η n ∈ ℝ not all of them equal to zero such that ∑nk=1 η k x k = 0 and ∑nk=1 η k = 0. We set I+ = {k ∈ {1, . . . , n} : η k > 0} , I− = {k ∈ {1, . . . , n} : η k < 0} , tk μ = min , J = {k ∈ I+ : t k − μη k = 0} . k∈I+ η k The sets I+ , I− and J are nonempty and μ > 0. One obtains n

n

x = ∑ t k x k = ∑ (t k − μη k )x k = ∑ (t k − μη k )x k . k=1

k=1

(3.1.10)

k∈J̸

If k ∈ I+ , then t k − μη k ≥ 0. If k ∈ I− , then t k − μη k > 0. If k ∈ I+ \ J, then t k − μη k > 0. Moreover, we get n

n

n

∑ (t k − μη k ) = ∑ t k − μ ∑ η k = 1 . k=1

k=1

(3.1.11)

k=1

From (3.1.10) and (3.1.11) we see that x is written as a convex combination with positive weights of n elements with n < n. We repeat this process until n ≤ m + 1.

188 | 3 Basic Functional Analysis Corollary 3.1.31. If X is an m-dimensional topological vector space and K ⊆ X is compact, then conv K ⊆ X is compact as well. m+1 Proof. Let D = {(t1 , . . . , t m+1 ) : t k ≥ 0, k = 1, . . . , m + 1, ∑k=1 t k = 1} ⊆ ℝm+1 and m+1 m+1 consider the map ξ : ℝ × (∏k=1 X k = X) → X defined by m+1

ξ ((t k )m+1 k=1 , x 1 , . . . , x m+1 ) = ∑ t k x k . k=1

It is easy to see that ξ is continuous. Since D ⊆ ℝm+1 and ∏m+1 k=1 (C k = K) ⊆ m+1 (X = X) are both compact, we get that Dx C = K) is compact as (∏ ∏m+1 k k k=1 k=1 well and so ξ(D, K, . . . , K) ⊆ X is also compact. But according to Theorem 3.1.30, ξ(D, K, . . . , K) = conv K. Hence, conv K ⊆ X is compact. The corollary fails in infinite dimensional topological vector spaces. Example 3.1.32. Let c0 = {(x n )n≥1 : x n ∈ ℝ for all n ∈ ℕ with x n → 0} furnished with the norm ‖(x n )n≥1 ‖ = sup{|x n | : n ∈ ℕ}. Then c0 is a Banach space. Let û n = (δ k,n 1/n) with δ k,n being the Kronecker delta. Evidently û n ∈ c0 for all n ∈ ℕ. Let K = {û n } ∪ {0}. Then K ⊆ c0 is compact, but 1 û ∈ conv K , k n n≥1 2

û = ∑

û ∈ ̸ conv K .

Thus, conv K is not closed, hence it is not compact. In the next definition we extend the notion of total boundedness (see Definition 1.5.31), to general topological vector spaces that are not necessarily metrizable. Definition 3.1.33. Let X be a topological vector space with a local basis B. A set A ⊆ X is said to be totally bounded if for every U ∈ B there exists a finite subset F ⊆ X such that A ⊆ F + U. Remark 3.1.34. The following assertions are easy to see: (a) A totally bounded set is bounded; see Definition 3.1.13(a). (b) The closure of a totally bounded set is totally bounded. (c) Compact sets are totally bounded. Proposition 3.1.35. If X is a locally convex space and A ⊆ X is totally bounded, then conv A is totally bounded. Proof. Let U ∈ N(0) be convex. Since A is totally bounded, there exists a finite F ⊆ X such that A ⊆ F + 1/2U. Corollary 3.1.31 implies that conv F is compact. Let x ∈ conv A. Then n

n

x = ∑ tk xk k=1

with

t k ≥ 0, x k ∈ A, k = 1, . . . , n, ∑ t k = 1 . k=1

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 189

For every k ∈ {1, . . . , n} there is u k ∈ F such that x k ∈ u k + 1/2U. Then one gets n

n

x = ∑ t k (x k − u k ) + ∑ t k u k ∈ k=1

k=1

1 U + conv F 2

since U is convex. Hence

1 U. (3.1.12) 2 As we already remarked, conv F ⊆ X is compact. Thus, we find a finite set E ⊆ conv F such that conv F ⊆ E + 1/2U, which gives, due to (3.1.12) and the fact that U is convex, that conv A ⊆ E + U and so we see that conv A is totally bounded in the sense of Definition 3.1.33. conv A ⊆ conv F +

From this proposition we deduce the following useful result. Theorem 3.1.36. If X is a Fréchet space and A ⊆ X is compact, then conv A ⊆ X is compact as well. Proof. Since A ⊆ X is compact, it is totally bounded; see Theorem 1.5.36. Then Proposition 3.1.35 implies that conv A is totally bounded, hence conv A is totally bounded; see Remark 3.1.34. Then Theorem 1.5.36 implies that conv A is compact. Next we introduce an important class of convex functionals that describes locally convex spaces. Definition 3.1.37. Let X be a vector space. A function ρ : X → ℝ is a seminorm if the following hold: (a) ρ is subadditive, that is, ρ(x + u) ≤ ρ(x) + ρ(u) for all x, u ∈ X. (b) ρ is absolutely homogeneous, that is, ρ(λx) = |λ|ρ(x) for all λ ∈ 𝕂 and for all x ∈ X. If ρ(x) ≠ 0 for x ≠ 0, then the seminorm is a norm; see Definition 3.1.13(e). A family P of seminorms on X is said to be separating if for each x ≠ 0 there exists ρ ∈ P such that ρ(x) ≠ 0. Given an absorbing set A ⊆ X, the real functional ρ A : X → ℝ defined by ρ A (x) = inf[t > 0 : x ∈ tA] is the Minkowski functional of A (or gauge of A). Proposition 3.1.38. If X is a vector space and ρ : X → ℝ is a seminorm, then the following hold: (a) ρ(0) = 0, |ρ(x) − ρ(u)| ≤ ρ(x − u) for all x, u ∈ X, ρ(x) ≥ 0 for all x ∈ X; (b) N(ρ) = {x ∈ X : ρ(x) = 0} is a vector subspace of X; (c) B1 = {x ∈ X : ρ(x) < 1} is convex, absorbing, balanced, and ρ = ρ B1 . Proof. (a) From Definition 3.1.37 we have ρ(0) = ρ(λ0) = |λ|ρ(0) for all λ ∈ 𝕂, hence ρ(0) = 0. Moreover, ρ(x) = ρ(x − u + u) ≤ ρ(x − u) + ρ(u) for all x, u ∈ X , hence, |ρ(x) − ρ(u)| ≤ ρ(x − u) by interchanging the roles of x and u. If u = 0, then we see that ρ(x) ≥ 0 for all x ∈ X.

190 | 3 Basic Functional Analysis (b) Let λ ∈ 𝕂 and x, u ∈ N(p). Then 0 ≤ ρ(λx + u) ≤ |λ|ρ(x) + ρ(u) = 0 . Hence λx + u ∈ N(p) and so N(p) is a vector subspace of X. (c) Let x, u ∈ B1 and t ∈ (0, 1). Then ρ((1 − t)x + tu) ≤ (1 − t)ρ(x) + tρ(u) < 1, which implies that B1 is convex. If x ∈ X and ϑ > ρ(x), then ρ(1/ϑx) = 1/ϑρ(x) < 1 and so B1 is absorbing. Moreover, it is clear that B1 is balanced. From the previous argument we see that ρ B1 ≤ ρ. Next let 0 < η ≤ ρ(x). Then 1 ≤ ρ(1/ηx) and so 1/ηx ∈ ̸ B1 . Therefore, ρ ≤ ρ B1 and we conclude that ρ = ρ B1 . For the Minkowski functional we obtain the following result. Proposition 3.1.39. If X is a vector space and A ⊆ X is convex and absorbing, then the following hold: (a) ρ A is subadditive and positively homogeneous, that is, ρ A is sublinear; (b) ρ A is a seminorm if A is in addition balanced; (c) if B = {x ∈ X : ρ A (x) < 1} and C = {x ∈ X : ρ A (x) ≤ 1}, then B ⊆ A ⊆ C and ρB = ρA = ρC . Proof. (a) For every x ∈ X, let A(x) = {t > 0 : x ∈ tA}. Pick t ∈ A(x) and ϑ > t. Since 0 ∈ A and A is convex, it holds ϑ ∈ A(x). Therefore, A(x) is a half-line starting at ρ A (x). Suppose that ρ A (x) < ϑ and ρ A (u) < μ. Let τ = ϑ + μ. Then it follows that 1/ϑx ∈ A, 1/μu ∈ A and since A is convex 1 ϑ 1 μ 1 (x + u) = ( ) x + ( ) u ∈ A . τ τ ϑ τ μ This gives ρ A (x + u) ≤ τ and so ρ A is subadditive. Of course, ρ A is also positively homogeneous. (b) This is immediate from Definition 3.1.6(c) and Definition 3.1.37. (c) Suppose ρ A (x) < 1. Then 1 ∈ A(x) and so x ∈ X. On the other hand if x ∈ A, then ρ A (x) ≤ 1 and so we conclude that B ⊆ A ⊆ C. It follows that B(x) ⊆ A(x) ⊆ C(x) for every x ∈ X and so ρ B (x) ≤ ρ A (x) ≤ ρ C (x). Suppose ρ C (x) < ϑ < μ. Then 1/ϑx ∈ C and so ρ A (1/ϑx) ≤ 1, hence 1 ϑ1 ϑ 1 ϑ ρ A ( x) = ρ A ( x) = ρ A ( x) ≤ < 1 . μ μϑ μ ϑ μ Therefore, 1/μx ∈ B, ρ B (1/μx) ≤ 1, hence ρ B (x) ≤ μ. We conclude that ρ B = ρ A = ρC . Seminorms characterize locally convex topologies. The following theorem can be found in Yosida [311, p. 26]. Theorem 3.1.40. If X is a vector space and {ρ α }α∈I is a separating family of seminorms on X, then there is a weakest locally convex topology on X making all the seminorms continuous. Conversely, any locally convex space is topologized by the seminorms defined

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 191

by the Minkowski functionals of the convex, absorbing, balanced sets. Such sets are often called barrels. Moreover, if F ⊆ ℝX is a set of ℝ-valued linear functionals on X, then the weakest topology on X making all elements of F continuous is locally convex. What about normable spaces, the next more restrictive class of vector spaces after locally convex spaces? We have the following theorem known as “Kolmogorov’s Normability Criterion.” Theorem 3.1.41 (Kolmogorov’s Normability Criterion). A topological vector space X is normable if and only if it is locally convex and locally bounded, that is, it possesses a bounded convex neighborhood of the origin. Proof. ⇒: The open unit ball B1 = {x ∈ X : ‖x‖ < 1} is a bounded convex neighborhood of the origin. ⇐: Let U ∈ N(0) be bounded convex. We may also assume that U is balanced; see Corollary 3.1.12. Let ‖x‖ = ρ U (x) for all x ∈ X with ρ U being the Minkowski functional of U. Note that {tU}t>0 is a local basis for the topology of X. If x ≠ 0, then there exists t > 0 such that x ∈ ̸ tU and so ‖x‖ = ρ U (x) ≥ t. Then, from Proposition 3.1.39(b) we infer that ‖ ⋅ ‖ is a norm on X. Moreover, from Proposition 3.1.39(c) we conclude that {x ∈ X : ‖x‖ < t} = tU for all t > 0. Therefore the norm topology coincides with the initial locally convex topology on X. Now we are ready for one of the most important results in analysis with far-reaching consequences. This is the celebrated “Hahn–Banach Extension Theorem.” Theorem 3.1.42 (Hahn–Banach Extension Theorem). If X is a vector space, ρ : X → ℝ is subadditive and positively homogeneous, that is, sublinear, V ⊆ X is a vector subspace, f : V → ℝ is linear and f(x) ≤ ρ(x) for all x ∈ V, then there exists f ̂ : X → ℝ being linear such that f ̂V = f and f ̂(x) ≤ ρ(x) for all x ∈ X. Proof. We assume that V ≠ X and let u ∈ X \ V. Let Y = span{V ∪ {u}}. Then each y ∈ Y can be written in a unique way as y = x + λu with x ∈ V and λ ∈ ℝ. Then any extension f ̂ of f on Y must be of the form f ̂(x + λu) = f(x) + λ f ̂(u). So, the main problem is to define f ̂(u). Recall that the extension f ̂ must satisfy f ̂ ≤ ρ on Y. Therefore f(x) + λ f ̂(u) ≤ ρ(x + λu) .

(3.1.13)

Taking λ = 1 in (3.1.13) yields f ̂(u) ≤ ρ(x + u) − f(x). Similarly, if we take λ = −1 and replace x by −x in (3.1.13) we infer −f(x) − f ̂(u) ≤ ρ(−x − u) because the subadditivity of ρ implies −f(x) ≤ f(−x). It follows that −f(v) − ρ(−v − u) ≤ f ̂(u) ≤ −f(x) + ρ(x + u)

for all v, x ∈ V .

(3.1.14)

Therefore the value f ̂(u) cannot be chosen arbitrarily but it must satisfy (3.1.14). However, in order to make (3.1.14) possible, we need to have −f(v) − ρ(−v − u) ≤ −f(x) + ρ(x + u)

for all v, x ∈ V .

(3.1.15)

192 | 3 Basic Functional Analysis But note that f(x) − f(v) = f(x − v) ≤ ρ(x − v) = ρ(x + u + (−v − u)) ≤ ρ(x + u) + ρ(−v − u) and so (3.1.15) holds. Now we can define the extension f ̂ of f on Y. We can take for example f ̂(u) = inf[−f(x) + ρ(x + u) : x ∈ V] and obtain f ̂(x + λu) = f(x) + λ f ̂(u). Clearly f ̂ is linear on Y and f ̂V = f . We need to show that f ̂ ≤ ρ. Since f ̂V = f we get f ̂ ≤ ρ when λ = 0. So, let λ ≠ 0. Then we replace v, x ∈ V by 1/λx ∈ V in (3.1.14). This gives 1 1 1 1 −f ( x) − ρ (− x − u) ≤ f ̂(u) ≤ −f ( x) + ρ ( x + u) , λ λ λ λ which implies 1 1 f ( x) + f ̂(u) ≤ ρ ( x + u) λ λ

1 1 − f ( x) − f ̂(u) ≤ ρ (− x − u) . λ λ

(3.1.16)

If λ > 0, then if we multiply the first inequality in (3.1.16) with λ, we obtain f ̂(x + λu) ≤ ρ(x + λu). If λ < 0, then multiplying the second inequality in (3.1.16) with −λ gives f ̂(x + λu) ≤ ρ(x + λu). In summary we have showed that f ̂ ≤ ρ. Now let L = {(Y, f ̂) : Y is a subspace of X containing V and f ̂ is a linear extension of f on Y with f ̂ ≤ ρ} . We order L as follows: (Y, f ̂) ≤ (Y , f ̂ ) if Y ⊆ Y and f ̂ Y = f ̂. Then every chain D of L has an upper bound in L namely if D = {(Y α , f α̂ )α∈I }, then Y = ⋃α∈I Y α is a linear subspace of X and f ̂(x) = f α̂ (x) for x ∈ Y α is a well-defined linear functional on Y. Evidently, (Y, f ̂) ∈ L and (Y α , f α̂ ) ≤ (Y, f ̂) for all α ∈ I. By Zorn’s Lemma (see Section 1.4), L admits a maximal element (Y, f ̂). We must have Y = X or otherwise we repeat the construction in the first part of the proof and contradict the maximality of (Y, f ̂). Remark 3.1.43. It should be noted that the extension f ̂ is in general not unique. A careful reading of the proof of Theorem 3.1.42 reveals that the complex variant of the result requires a modification of the condition on ρ since positive homogeneity of ρ makes in that case no sense. Theorem 3.1.44 (Hahn–Banach Extension Theorem (Complex Variant)). If X is a complex vector space, ρ : X → ℝ is a seminorm, V ⊆ X is a vector subspace, f : V → ℂ is linear and |f(x)| ≤ ρ(x) for all x ∈ X, then there exists f ̂ : X → ℂ being linear such that f ̂V = f and |f ̂(x)| ≤ ρ(x) for all x ∈ X. From now on, unless otherwise stated, all vector spaces will be over the reals. Definition 3.1.45. Let X, Y be normed spaces. A linear operator A : X → Y is bounded if ‖A(x)‖Y ≤ M‖x‖X for some M > 0 and for all x ∈ X. The smallest M ≥ 0 for which the inequality above holds, is called the operator norm of A and it is denoted by ‖A‖L = sup [

‖A(x)‖Y : x ∈ X, x ≠ 0] . ‖x‖X

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 193

By L(X, Y) we denote the vector space of all bounded, linear operators from X into Y. Evidently (L(X, Y), ‖ ⋅ ‖L ) is a normed space and the resulting norm topology is called the uniform operator topology. If Y = ℝ, then L(X, ℝ) = X ∗ is the topological dual and its elements are called bounded linear functionals. If x ∈ X and x∗ ∈ X ∗ we usually write ⟨x∗ , x⟩ instead of x∗ (x) and call ⟨⋅, ⋅⟩ the duality brackets for the pair (X ∗ , X). The proof of the next proposition is straightforward and so its proof is omitted. Proposition 3.1.46. If X, Y are normed spaces and A : X → Y is a linear operator, then the following properties are equivalent: (a) A is bounded. (b) A is continuous. (c) A is continuous at x = 0. Proposition 3.1.47. If X is a normed space and Y is a Banach space, then (L(X, Y), ‖ ⋅ ‖L ) is a Banach space. Proof. Suppose that {A n }n≥1 ⊆ L(X, Y) is a ‖ ⋅ ‖L -Cauchy sequence. Then it follows ‖(A n − A m )(x)‖Y ≤ ‖A n − A m ‖L ‖x‖ for all n, m ∈ ℕ and for all x ∈ X . Since Y is complete, one gets that A(x) = limn→∞ A n (x) exists for all x ∈ X. Of course, A : X → Y is linear and ‖A(x) − A n (x)‖Y = lim ‖A m (x) − A n (x)‖Y ≤ lim sup ‖A m − A n ‖L ‖x‖ . m→∞

m→∞

So, for given ε > 0, there exists n0 = n0 (ε) ∈ ℕ such that ‖A(x) − A n (x)‖Y ≤ ε‖x‖

for all x ∈ X and for all n ≥ n0 .

(3.1.17)

Hence ‖A(x)‖Y = ‖A(x) − A n0 (x)‖Y + ‖A n0 (x)‖Y ≤ (ε + ‖A n0 ‖L )‖x‖X . This implies that A ∈ L(X, Y) and ‖A n − A‖L → 0 as n → ∞; see (3.1.17). Corollary 3.1.48. If X is a normed space, then X ∗ is a Banach space and ‖x∗ ‖∗ = sup{|⟨x∗ , x⟩| : ‖x‖ ≤ 1} = sup{⟨x∗ , x⟩ : ‖x‖ ≤ 1}. Proposition 3.1.49. If X is a normed space, V ⊆ X is a vector subspace, and u∗ ∈ V ∗ , then there exists x∗ ∈ X ∗ such that x∗ V = u∗ and ‖x∗ ‖∗ = ‖u∗ ‖V ∗ . Proof. Applying Theorem 3.1.42 with ρ(x) = ‖u∗ ‖V ∗ ‖x‖ for all x ∈ X yields the assertion. Proposition 3.1.50. If X is a normed space and x0 ∈ X, then there exists x∗0 ∈ X ∗ such that ‖x∗0 ‖∗ = ‖x0 ‖ and ⟨x∗0 , x0 ⟩ = ‖x0 ‖2 . Proof. Applying Proposition 3.1.49 with V = ℝx0 and x∗0 (tx0 ) = ⟨x∗0 , tx0 ⟩ = t‖x0 ‖2 gives the desired result ‖x∗0 ‖∗ = ‖x0 ‖.

194 | 3 Basic Functional Analysis Remark 3.1.51. The element x∗0 ∈ X ∗ is not unique in general. In order to have uniqueness we need additional structure on X ∗ , for example, strict convexity; see Section 3.4. ∗ The multivalued map F : X → 2X \ {0} defined by F(x) = {x∗ ∈ X ∗ : ‖x∗ ‖∗ = ‖x‖ and ⟨x∗ , x⟩ = ‖x‖2 } is called the duality map from X into X ∗ . It is important in Nonlinear Analysis and we will encounter it again in Section 6.1. Proposition 3.1.52. If X is a normed space and x ∈ X, then ‖x‖ = sup [|⟨x∗ , x⟩| : x∗ ∈ X ∗ , ‖x∗ ‖∗ ≤ 1] = sup [⟨x∗ , x⟩ : x∗ ∈ X ∗ , ‖x∗ ‖∗ ≤ 1] . Proof. We assume that x ≠ 0. Note that sup [|⟨x∗ , x⟩| : x∗ ∈ X ∗ , ‖x∗ ‖∗ ≤ 1] ≤ ‖x‖ .

(3.1.18)

On the other hand from Proposition 3.1.50 we know that there is x∗0 ∈ X ∗ such that ‖x∗0 ‖∗ = ‖x‖ and ⟨x∗0 , x⟩ = ‖x‖2 . Let x̂ ∗0 = x∗0 /‖x‖. Then ‖x̂ ∗0 ‖∗ = 1 and ⟨x̂ ∗0 , x⟩ = ‖x‖. This combined with (3.1.18) implies the assertion of the proposition. Now we will produce some important geometric interpretations of the Hahn–Banach Extension Theorem; see Theorem 3.1.42. These are the well-known “Separation Theorems” for convex sets. Definition 3.1.53. Let X be a vector space. A hyperplane is a set of the form {f = ϑ} = {x ∈ X : f(x) = ϑ} with f : X → ℝ being linear and ϑ ∈ ℝ. A hyperplane determines two half-spaces, namely {f ≥ ϑ} = {x ∈ X : f(x) ≥ ϑ} and {f ≤ ϑ} = {x ∈ X : f(x) ≤ ϑ}. Given two sets A, C ⊆ X, we say that the hyperplane H = {f = ϑ} separates A and C, if A ⊆ H− = {f ≤ ϑ} and C ⊆ H+ = {f ≥ ϑ}. We say that H strongly separates A and C if there exists ε > 0 such that A ⊆ H−ε = {f ≤ ϑ − ε} and

C ⊆ H+ε = {f ≥ ϑ + ε} .

Proposition 3.1.54. If X is a topological vector space, then a hyperplane H = {f = ϑ} is either closed or dense. H is closed if and only if f is continuous while H is dense if and only if f is discontinuous. Proof. Due to the linearity of f , we may assume that ϑ = 0. If f is continuous, then H is closed while if H is dense, then clearly f is not continuous. Now assume that H is closed. Suppose that {x α }α∈I ⊆ X and x α → 0. In addition, let u ∈ X with f(u) = 1. Arguing by contradiction, suppose that f(x α ) ↛ 0; see Proposition 3.1.46. Then, for at least a subnet, we have |f(x α )| ≥ ε for all α ∈ I. Let v α = u − f(u)/(f(x α ))x α . Then v α ∈ H since ϑ = 0 and v α → u. So, u ∈ H, a contradiction. Therefore f(x α ) → 0 and so f is continuous; see Proposition 3.1.46. Now suppose that f is discontinuous. Then there exist a net {x α }α∈I ⊆ X and ε > 0 such that x α → 0 and |f(x α )| ≥ ε for all α ∈ I . Given any u ∈ X, let v α = u − f(u)/(f(x α ))x α ∈ H for all α ∈ I. We have v α → u and so we conclude that H is dense.

3.1 Topological Vector Spaces, Hahn–Banach Theorem | 195

Definition 3.1.55. Let X be a vector space and A ⊆ X. A point x ∈ A is said to be an absorbing point of A, if A − x ⊆ X is absorbing; see Definition 3.1.6(b). Remark 3.1.56. If X is a topological vector space and int A ≠ 0, then every x ∈ int A is an absorbing point. However, the set A can have absorbing points even if int A = 0. Suppose that X is a normed space and A = ∂B1 ∪ {0} where ∂B1 = {x ∈ X : ‖x‖ = 1}. Then x = 0 is absorbing but int A = 0. Next we present the “First Separation Theorem.” Theorem 3.1.57 (First Separation Theorem). If X is a vector space, A, C ⊆ X are two nonempty convex sets, A ∩ C = 0 and one of them has an absorbing point, then they can be separated by a hyperplane H = {f = ϑ} with f ≠ 0 and A ∪ C is not included in H. Proof. Suppose A has an absorbing point. Then A − C has an absorbing point x. Since A ∩ C = 0, we see that x ≠ 0. Moreover, the set E = A − C − x is nonempty, convex, and absorbing, and −x ∈ ̸ E since A ∩ C = 0. Then Proposition 3.1.39 implies that ρ E is sublinear. Suppose that ρ E (−x) < 1. Then there exist 0 ≤ t < 1 and e ∈ E such that x = te. Note that 0 ∈ E being absorbing. So we have −x = te + (1 − t)0 ∈ E, a contradiction. Therefore ρ E (−x) ≥ 1 .

(3.1.19)

Let V = ℝ(−x) and let f : V → ℝ be defined by f(t(−x)) = t. Clearly, f is linear and f ≤ ρ E on V. Indeed, if t ≥ 0, then ρ E (t(−x)) = tρ E (−x) ≥ t; see (3.1.19). If t < 0, then f(t(−x)) < 0 ≤ ρ E (t(−x)). Invoking Theorem 3.1.42 implies the existence of f ̂ : X → ℝ being linear such that f ̂V = f and f ̂ ≤ ρ E . Note that f ̂(x) = −1 and so f ̂ ≠ 0. ̂ We claim that f separates A and C. To see this, let a ∈ A and c ∈ C. It holds f ̂(a) = f ̂(a − c − x) + f ̂(x) + f ̂(c) ≤ ρ E (a − c − x) + f ̂(x) + f ̂(c) = ρ E (a − c − x) − 1 + f ̂(c) ≤ 1 − 1 + f ̂(c) = f ̂(c) . Since a ∈ A and c ∈ C are arbitrary, we see that f ̂ separates A and C. Finally, since 0 ∈ E, we have x = a − c with a ∈ A and c ∈ C. Recall that f ̂(x) = −1. Then f ̂(a) ≠ f ̂(c) and so we cannot have A and C to be subsets of the same hyperplane. Lemma 3.1.58. If X is a topological vector space, f : X → ℝ is linear, and f is bounded above or bounded below on a neighborhood of the origin, then f is continuous. Proof. Let U ∈ N(0) be symmetric and assume that f ≤ M on U. Then, for given ε > 0, one gets, since U is symmetric, that x − u ∈ ε/MU implies |f(x) − f(u)| = |f(x − u)| ≤ ε/MM = ε. Hence, f is continuous. Using this lemma, we can state a topological version of Theorem 3.1.57.

196 | 3 Basic Functional Analysis Theorem 3.1.59. If X is a topological vector space, A, C ⊆ X are nonempty convex sets, A ∩ C = 0 and one of them has nonempty interior, then they can be separated by a closed hyperplane H and A ∪ C is not included in H. Proof. Applying Theorem 3.1.57, we obtain a separating hyperplane H = {f = ϑ} with f ≠ 0. We only need to show that f is continuous. Suppose that int A ≠ 0. Then f(a) ≤ ϑ ≤ f(c) for all a ∈ A and for all c ∈ C. Note that if x ∈ int A, then U = int A − x ∈ N(0) and so f U is bounded above, hence f is continuous; see Lemma 3.1.58. Next we present the “Second Separation Theorem” called “Strong Separation Theorem.” Theorem 3.1.60 (Strong Separation Theorem). If X is a locally convex space and A, C ⊆ X are nonempty, disjoint, convex sets, then A and B can be strongly separated by a closed hyperplane if and only if there exists U ∈ N(0) being convex such that (A + U) ∩ C = 0. Proof. ⇒: Let f be the linear functional associated with the closed separating hyperplane. Then f is continuous; see Proposition 3.1.54. Moreover, taking ε > 0 from the strong separation (see Definition 3.1.53), U = {x ∈ X : |f(x)| < ε} is a convex neighborhood of the origin and (A + U) ∩ C = 0. ⇐: The set A + U is convex and open. So, we can apply Theorem 3.1.59 and find a linear, continuous functional f : X → ℝ and ϑ ∈ ℝ as well as ε > 0 such that f(a) ≤ ϑ − ε for all a ∈ A and f(c) ≥ ϑ + ε for all c ∈ C. Hence A and C are strongly separated by H = {f = ϑ}. Corollary 3.1.61. If X is locally convex, A, C ⊆ X are nonempty, disjoint, convex sets and A is compact as well as C is closed, then A and C can be strongly separated by a closed hyperplane. Proof. The set X \ C is open and A ⊆ X \ C. The compactness of A implies that there exists a convex neighborhood U ∈ N(0) such that A + U ⊆ X \ C. Hence (A + U) ∩ C = 0. Applying Theorem 3.1.60 gives the assertion. Proposition 3.1.62. If X is a normed space, V ⊆ X is a vector subspace, and V ≠ X, then there exists x∗ ∈ X ∗ with x∗ ≠ 0 such that ⟨x∗ , v⟩ = 0 for all v ∈ V. Proof. Let u ∈ X \ V. Then apply Corollary 3.1.61 with A = {x0 } and C = V. Thus, we find x∗ ∈ X ∗ with x∗ ≠ 0 and ϑ ∈ ℝ such that ⟨x∗ , x0 ⟩ < ϑ < ⟨x∗ , v⟩ for all v ∈ V. But since V is a vector space, we see that ⟨x∗ , v⟩ = 0 for all v ∈ V since λ⟨x∗ , v⟩ > ϑ for all λ ∈ ℝ, hence ϑ < 0. Remark 3.1.63. This proposition is useful for determining whether a linear subspace V is dense in X. We must have that the only element of X ∗ vanishing on V is x∗ = 0.

3.2 Three Fundamental Theorems | 197

3.2 Three Fundamental Theorems In this section we present three basic theorems that are the core results of linear functional analysis. These are the “Uniform Boundedness Principle,” the “Open Mapping Theorem,” and the “Closed Graph Theorem.” All three depend on the Baire Category Theorem; see Theorem 1.5.68. We recall that the Baire Category Theorem, roughly speaking, provides conditions for a set to be large in the sense that it has a nonempty interior. We start with the “Uniform Boundedness Principle.” This theorem asserts that for any family of bounded linear operators, pointwise boundedness implies uniform boundedness, that is, boundedness in the operator norm. As before, we consider real vector spaces. Theorem 3.2.1 (Uniform Boundedness Principle). If X is a Banach space, Y is a normed space, and L ⊆ L(X, Y) satisfies sup [‖A(x)‖Y : A ∈ L] = M(x) < ∞ , then there exists M0 > 0 such that sup [‖A‖L : A ∈ L] ≤ M0 . Proof. For every n ∈ ℕ let E n = {x ∈ X : ‖A(x)‖Y ≤ n for all A ∈ L}. The hypothesis implies that X = ⋃ En . (3.2.1) n≥1

Moreover, we claim that for every n ∈ ℕ, E n ⊆ X is closed. To see this, let {x m }m≥1 ⊆ E n and assume that x m → x in X. We obtain ‖A(x m )‖Y ≤ n for all A ∈ L and for all m ∈ ℕ. The continuity of A (see Proposition 3.1.46) implies that ‖A(x m )‖Y → ‖A(x)‖Y as m → ∞ for every A ∈ L. Therefore, ‖A(x)‖Y ≤ n for all A ∈ L and so x ∈ E n , which implies that E n ⊆ X is closed for every n ∈ ℕ. From (3.2.1) and the Baire Category Theorem (see Theorem 1.5.68 and Corollary 1.5.67), we infer that there exists n0 ∈ ℕ such that int E n0 ≠ 0. Hence, there exists ε > 0 such that B ε (x0 ) ⊆ E n0

with

B ε (x0 ) = {x ∈ X : ‖x − x0 ‖X ≤ ε} .

(3.2.2)

Let x ∈ X with ‖x‖X ≤ ε and A ∈ L. Then, due to (3.2.2), ‖A(x)‖Y = ‖A(x + x0 ) − A(x0 )‖Y ≤ ‖A(x + x0 )‖Y + ‖A(x0 )‖Y ≤ n0 + n0 = 2n0 . Thus, for all u ∈ X with ‖u‖X = 1, it follows, because of (3.2.3), that ‖A(u)‖Y =

1 2n0 ‖A(εu)‖Y ≤ ε ε

Hence, sup [‖A(u)‖Y : ‖u‖X ≤ 1] = ‖A‖L ≤

for all A ∈ L . 2n0 ε

for all A ∈ L .

(3.2.3)

198 | 3 Basic Functional Analysis Theorem 3.2.1 leads to the so-called “Banach–Steinhaus Theorem,” which says that the pointwise limit of a sequence of bounded linear operators is a bounded linear operator. Theorem 3.2.2 (Banach–Steinhaus Theorem). If X, Y are Banach spaces and {A n }n≥1 ⊆ L(X, Y) is a sequence such that A n (x) → A(x)

in Y as n → ∞ for all x ∈ X ,

then the following hold: (a) A ∈ L(X, Y) and supn≥1 ‖A n ‖L < ∞; (b) ‖A‖L ≤ lim inf n→∞ ‖A n ‖L . Proof. (a) Clearly, A : X → Y is linear. Since {A n (x)}n≥1 ⊆ Y is convergent, it holds that sup ‖A n (x)‖Y = M(x) < ∞ . n∈ℕ

Applying Theorem 3.2.1, there exists M0 > 0 such that supn∈ℕ ‖A n ‖L ≤ M0 < ∞, which implies ‖A n (x)‖Y ≤ M0 ‖x‖X for all x ∈ X and for all n ∈ ℕ. Therefore, we derive ‖A(x)‖Y = limn→∞ ‖A n (x)‖Y ≤ M0 ‖x‖X for all x ∈ X, which, due to Proposition 3.1.46, results in A ∈ L(X, Y). (b) It holds that ‖A n (x)‖Y ≤ ‖A n ‖L ‖x‖X for all x ∈ X and for all n ∈ ℕ. This gives ‖A(x)‖Y ≤ lim inf n→∞ ‖A n ‖L ‖x‖X for all x ∈ X and so, ‖A‖L ≤ lim inf n→∞ ‖A n ‖L . Example 3.2.3. (a) Theorems 3.2.1 and 3.2.2 fail if X is only a normed space. To see this, let us define the following subspaces: l∞ = {x̂ = (x n )n≥1 ∈ ℝℕ : sup |x n | < ∞} , n≥1

c0 = {x̂ = (x n )n≥1 ∈ ℝ : x n → 0 as n → ∞} , ℕ

X = {x̂ = (x n )n≥1 ∈ ℝℕ : there exists n0 ∈ ℕ such that x n = 0 for n ≥ n0 } . Evidently, X ⊆ c0 ⊆ l∞ and we furnish l∞ with the supremum norm ‖x‖̂ = supn∈ℕ |x n |. With this norm, l∞ is a Banach space, c0 is a closed subspace hence a Banach space itself, but X defined by

‖⋅‖

= c0 . Let A n : X → X with n ≥ 1 and A : X → X be

A n (x)̂ = (x, 2x2 , . . . , nx n , 0, 0, . . .) ,

A(x)̂ = (kx k )k≥1 .

Then A n (x)̂ → A(x)̂ as n → ∞ for all x̂ ∈ X and ‖A n ‖L = n for all n ∈ ℕ. Precisely, {A n }n≥1 is pointwise convergent, hence pointwise bounded as well, but supn≥1 ‖A n ‖L = ∞ and thus, A is not bounded. (b) In Theorem 3.2.1(b) the inequality can be strict. Let l2 = {x̂ = (x n )n≥1 ⊆ ℝℕ : ∑ x2n < ∞} n≥1

3.2 Three Fundamental Theorems | 199

furnished with the norm, ̂ x = ( ∑ x2n )

1 2

.

n≥1

With this norm, l2 becomes a Banach space. In fact it becomes a Hilbert space; see Section 3.5. Let X = l2 , Y = ℝ, and consider the bounded linear operators A k : l2 → ℝ with k ≥ 1 defined by A k (x)̂ = x k for every x̂ = (x n )n≥1 ∈ l2 and for every k ∈ ℕ. Evidently, A k (x)̂ → 0 as k → ∞ for every x̂ ∈ l2 but ‖A k ‖L = 1 for all n ∈ ℕ. Theorem 3.2.1 leads to interesting characterizations of bounded sets in a Banach space X and in its dual X ∗ ; see Definition 3.1.45. In the next section we will interpret these results in terms of weak and weak* topologies, respectively. Proposition 3.2.4. If X is a normed space and B ⊆ X is nonempty, then B is bounded if and only if x∗ (B) = {⟨x∗ , u⟩ : u ∈ B} ⊆ ℝ is bounded for every x∗ ∈ X ∗ . Proof. ⇒: This follows from the fact that |⟨x∗ , u⟩| ≤ ‖x∗ ‖∗ ‖u‖ for every x∗ ∈ X ∗ and for all u ∈ B. So, if B is bounded, then ‖u‖ ≤ M for some M > 0 and for all u ∈ B. Therefore, x∗ (B) ⊆ [−ϱ, ϱ] with ϱ = ‖x∗ ‖∗ M. ⇐: For every u ∈ B, let A u (x∗ ) = ⟨x∗ , u⟩ for all x∗ ∈ X ∗ where ⟨⋅, ⋅⟩ denotes the duality brackets for the pair (X ∗ , X). Then A u ∈ L(X ∗ , ℝ) for all u ∈ B and by hypothesis, sup |A u (x∗ )| = sup |⟨x∗ , u⟩| < +∞ . u∈B

u∈B

Since X ∗ is a Banach space (see Corollary 3.1.48), we can apply Theorem 3.2.1 and find M > 0 such that |A u (x∗ )| = |⟨x∗ , u⟩| ≤ M‖x∗ ‖∗

for all x∗ ∈ X ∗ and for all u ∈ B .

Because of Proposition 3.1.52 we infer that ‖u‖ ≤ M, which shows that B is bounded. There is also a “dual” version of this result. Proposition 3.2.5. If X is a Banach space and B∗ ⊆ X ∗ is nonempty, then B∗ is bounded if and only if x(B∗ ) = {⟨u∗ , x⟩ : u∗ ∈ B∗ } ⊆ ℝ is bounded for every x ∈ X. Proof. ⇒: This is as in the previous proof. ⇐: For every u∗ ∈ B∗ , let A u∗ (x) = ⟨u∗ , x⟩ for all x ∈ X. Then A u∗ ∈ L(X, ℝ) for all ∗ u ∈ B∗ and by hypothesis, sup |A u∗ (x)| = sup |⟨u∗ , x⟩| < ∞ .

u∗ ∈B∗

u∗ ∈B∗

Since X is a Banach space, we can apply Theorem 3.2.1 and find M > 0 such that |A u∗ (x)| = |⟨u∗ , x⟩| ≤ M‖x‖

for all x ∈ X and for all u∗ ∈ B∗ .

Then, Corollary 3.1.48 implies that ‖u∗ ‖∗ ≤ M for all u∗ ∈ B∗ .

200 | 3 Basic Functional Analysis Next we will prove the “Open Mapping Theorem,” which asserts that a surjective bounded linear operator between Banach spaces is an open map. In order to prove this theorem, we will need two auxiliary results. Lemma 3.2.6. If X, Y are Banach spaces and A ∈ L(X, Y) surjective, then there exists ϑ > 0 such that for any ε > 0 and y ∈ Y we find x ∈ X such that ‖A(x) − y‖Y ≤ ε

‖x‖X ≤

and

1 ‖y‖Y . ϑ

Proof. Let B1X = {x ∈ X : ‖x‖X < 1}. The surjectivity of A implies that Y = ⋃ A(nB1X ) . n≥1

Then by the Baire Category Theorem there is n ∈ ℕ such that int A(nB1X ) ≠ 0. This implies B η (y0 ) ⊆ A(nB1X ) for some η > 0 and y0 ∈ Y. Here B η (y0 ) = {y ∈ Y : ‖y − y0 ‖Y < η}. Given y ∈ Y with ‖y‖Y < η, let {x k }k≥1 , {u k }k≥1 ⊆ nB1X such that A(x k ) → y0

and

A(u k ) → y0 + y in Y as k → ∞ .

Let v k = u k − x k for k ∈ ℕ. Then A(v k ) → y

in Y as k → ∞

and ‖v k ‖X < 2n

for all k ∈ ℕ .

(3.2.4)

Let w ∈ Y \ {0} and let z = (η/2) ⋅ (w /‖w‖). Then z ∈ Y and ‖z‖Y < η. From (3.2.4) we know that there exist {ṽ k }k≥1 ⊆ X such that A (ṽ k ) → z =

η w 2 ‖w‖X

in Y as k → ∞

and

‖ṽ k ‖X < 2n

for all k ∈ ℕ .

Hence, 2 A ( ‖w‖X ṽ k ) → w η Note that

in Y as k → ∞ .

4n 2 ‖w‖X ‖ṽ k ‖X < ‖w‖X η η

(3.2.5)

for all k ∈ ℕ .

Finally let ϑ = η/(4n) and apply (3.2.5) to obtain the result of the lemma. Using this lemma, we can prove the following proposition. Proposition 3.2.7. If X, Y are Banach spaces, B1X = {x ∈ X : ‖x‖X < 1}, B1Y = {y ∈ Y : ‖y‖Y < 1}, and A ∈ L(X, Y) is surjective, then there exists δ > 0 such that δB1Y ⊆ A(B1X ). Proof. Let ϑ > 0 be as postulated by Lemma 3.2.6. Let y ∈ ϑB1Y and ε = 1/2ϑ > 0. Using Lemma 3.2.6, there exists x1 ∈ X such that ‖A(x1 ) − y‖Y ≤

ϑ 2

and

‖x1 ‖X ≤

1 ‖y‖Y < 1 . ϑ

(3.2.6)

3.2 Three Fundamental Theorems |

201

Now consider y − A(x1 ) ∈ Y and ε = ϑ/4. A new application of Lemma 3.2.6 gives x2 ∈ X such that ‖A(x2 ) − (y − A(x1 ))‖Y ≤

ϑ 4

and ‖x2 ‖X ≤

1 1 ‖y − A(x1 )‖Y < , ϑ 2

see (3.2.6). Suppose that we have produced {x k }nk≥1 ⊆ X such that n A ( ∑ x k ) − y ≤ ϑ Y 2n k=1

and ‖x k ‖X ≤

1 2k−1

for all k = 1, . . . , n .

Using Lemma 3.2.6, we obtain x n+1 ∈ X such that n n+1 1 1 ϑ A ( ∑ x k ) − y ≤ n+1 and ‖x n+1 ‖X ≤ A ( ∑ x k ) − y ≤ n ; ϑ Y 2 k=1 k=1 Y 2

(3.2.7)

(3.2.8)

see (3.2.7). By induction we have a sequence {x n }n≥1 ⊆ X such that (3.2.8) holds. Let u n = ∑nk=1 x k ∈ X with n ∈ ℕ. For m > n one gets m m 1 ‖u m − u n ‖X = ∑ x k ≤ ∑ k ; k=n+1 X k=n+1 2 see (3.2.8). This implies that {u n }n≥1 ⊆ X is a Cauchy sequence. Since X is a Banach space, we obtain u n → u in X. Then ‖u‖X ≤ ∑ ‖x k ‖X ≤ ∑ k≥1

k≥1

1 2k−1

=2,

which shows that u ∈ 2B1X . From (3.2.8) it follows ‖A(u n )−y‖Y ≤ ϑ/2n , hence A(u n ) → y in Y. But we also have A(u n ) → A(u) in Y. Therefore, y = A(u). Recall that y ∈ ϑB1Y is arbitrary and x ∈ 2B1X . That means ϑ/2B1Y ⊆ A(B1X ). Choosing δ = ϑ/2 > 0, we obtain the assertion of the proposition. Remark 3.2.8. This proposition provides estimates for the solutions x ∈ X of A(x) = y ∈ Y in terms of y. That the equation A(x) = y always has a solution for all y ∈ Y is a consequence of the surjectivity of A. Once we have this proposition, we can easily prove the “Open Mapping Theorem.” Theorem 3.2.9 (Open Mapping Theorem). If X, Y are Banach spaces and A ∈ L(X, Y) is surjective, then A is an open map, that is, it maps open sets in X to open sets in Y. Proof. Let U ⊆ X be nonempty and open, and let x0 ∈ U. Let V = U − x0 ∈ N(0). Then there exists ξ > 0 such that ξB1X ⊆ V. Using Proposition 3.2.7 we find δ > 0 such that A(V) ⊇ A(ξB1X ) = ξA(B1X ) ⊇ ξδB1Y , which implies A(U) = A(V + x0 ) = A(x0 ) + A(V) ⊇ A(x0 ) + ξδB1Y .

202 | 3 Basic Functional Analysis The last set is open in Y centered at A(x0 ) with a radius of ξδ > 0. This means that A(U) ⊆ Y is open. As an easy consequence of the Open Mapping Theorem we obtain the so-called “Banach Theorem.” Theorem 3.2.10 (Banach Theorem). If X, Y are Banach spaces and A ∈ L(X, Y) is a bijection, that is, A is surjective and injective, then A−1 ∈ L(Y, X). Proof. First note that A−1 : Y → X is a well-defined linear map. Let U ⊆ X be open. Due to Theorem 3.2.9 it follows that (A−1 )−1 (U) = A(U) ⊆ Y is open. Then Proposition 3.1.46 implies that A−1 ∈ L(Y, X). Definition 3.2.11. Let X be a vector space and let ‖ ⋅ ‖, | ⋅ | be two norms on X. We say that the two norms are equivalent if there exists a constant ϑ ≥ 1 such that 1 ‖x‖ ≤ |x| ≤ ϑ‖x‖ ϑ

for all x ∈ X .

Remark 3.2.12. This notion defines an equivalence relation on the set of all possible norms on X. The norms ‖⋅‖, |⋅| on X are equivalent if and only if id : (X, ‖⋅‖) → (X, |⋅|) and id : (X, | ⋅ |) → (X, ‖ ⋅ ‖) are both bounded linear operators. Two norms are equivalent if and only if they generate the same metric topology on X. Finally, if ‖ ⋅ ‖, | ⋅ | are equivalent norms, then (X, ‖ ⋅ ‖) is a Banach space if and only if (X, | ⋅ |) is a Banach space. Proposition 3.2.13. If V is a vector space, ‖ ⋅ ‖ and | ⋅ | are two norms on V with V being a Banach space for both norms and there exists η > 0 such that |x| ≤ η‖x‖ for all x ∈ V , then ‖ ⋅ ‖ and | ⋅ | are equivalent norms on V. Proof. Let X = (V, ‖ ⋅ ‖), Y = (V, | ⋅ |), and A = id : X → Y with id(x) = x for all x ∈ X. Then A ∈ L(X, Y) is bijective and we can apply Theorem 3.2.10 and infer that A−1 = id : Y = (V, | ⋅ |) → X = (V, ‖ ⋅ ‖) is continuous. So, it follows that the norms ‖ ⋅ ‖ and | ⋅ | are equivalent; see Remark 3.2.12. Recall that a continuous map f : X → Y has a closed graph Gr f = {(x, y) ∈ X × Y : y = f(x)}. The converse is not true in general. To see this, let X = Y = ℝ+ and consider the function f : ℝ+ → ℝ+ defined by {0 f(x) = { 1 {x

if x = 0 , if x > 0 .

Then Gr f is closed but f is not continuous at x = 0. For linear operators between Banach spaces, the situation changes and we have the third basic theorem of linear functional analysis, which is called the “Closed Graph Theorem.”

3.2 Three Fundamental Theorems |

203

Theorem 3.2.14 (Closed Graph Theorem). If X, Y are Banach spaces and A : X → Y is a linear operator, then A ∈ L(X, Y) if and only if Gr A = {(x, y) ∈ X × Y : y = A(x)} ⊆ X × Y is closed. Proof. ⇐: The graph of any continuous map (linear or not) is closed. ⇒: On X we consider the following norms ‖x‖ = ‖x‖X

and |x| = ‖x‖X + ‖A(x)‖Y

for all x ∈ X .

Note that | ⋅ | is called the graph norm. Since Gr A ⊆ X × Y is closed, (X, | ⋅ |) is a Banach space. Moreover, the inequality ‖x‖ ≤ |x| for all x ∈ X is clearly satisfied. Invoking Proposition 3.2.13, we conclude that ‖ ⋅ ‖ and | ⋅ | are equivalent norms. Thus, there exists M > 0 such that |x| ≤ M‖x‖ for all x ∈ X, which implies ‖A(x)‖Y ≤ M‖x‖X for all x ∈ X. Then Proposition 3.1.46 finally gives A ∈ L(X, Y). We can apply these results to quotient spaces (see Section 1.3), which in turn will lead us to complemented spaces. So, let X be a normed vector space and V ⊆ X a closed subspace. We define the equivalence relation ∼ on X by x∼u

if and only if

x−u∈V .

(3.2.9)

Let [x] denote the equivalence class corresponding to x ∈ X. Then [x] = x + V = {x + v : v ∈ V} and let X/V be the quotient space, that is, the set of all equivalence classes under ∼ defined by (3.2.9). So, the whole subspace V is collapsed in the quotient space X/V and identified with the zero vector. The quotient space X/V becomes a vector space under the following operations: vector addition:

[x1 ] + [x2 ] = x1 + V + x2 + V = x1 + x2 + V ,

scalar multiplication:

λ(x + V) = λx + V ,

for all x1 , x2 , x ∈ X and for all λ ∈ ℝ. As we already mentioned, the zero vector in X/V is 0 + V = V. We can define a norm on X/V by setting ‖[x]‖ = inf[‖x + v‖ : v ∈ V] . It is easy to check that this is a norm on X/V. Note that ‖[x]‖ = inf[‖x + v‖ : v ∈ V] = inf[‖x − v‖ : v ∈ V]

for all x ∈ X .

(3.2.10)

Proposition 3.2.15. If X is a normed space and V ⊆ X is a closed subspace, then the following hold: (a) ‖x‖ ≥ ‖[x]‖ for all x ∈ X; (b) if x ∈ X and ε > 0, then there exists u ∈ X with u ∼ x, that is [x] = [u], such that ‖u‖ ≤ ‖[x]‖ + ε. Proof. (a) This is an immediate consequence from (3.2.10). (b) Let v ∈ V be such that ‖x − v‖ ≤ d(x, M) + ε = ‖[x]‖ + ε; see (3.2.10). Set u = x − v ∈ [x]. Then ‖u‖ ≤ ‖[x]‖ + ε.

204 | 3 Basic Functional Analysis Remark 3.2.16. Suppose that x, y ∈ X be such that ‖[x − y]‖ < ϑ for some ϑ > 0. Then according to Proposition 3.2.15(b), there exists y ∈ X such that [x − y] = [x − y ] and ‖x − y ‖ < ϑ. Proposition 3.2.17. If X is a Banach space and V ⊆ X is a closed subspace, then X/V is a Banach space as well. Proof. Suppose that {‖[x n ]‖}n≥1 ⊆ X/V is a Cauchy sequence. By passing to a subsequence if necessary we may assume that ‖[x n − x n+1 ]‖

0 such that x + ϑB1X ⊆ U, hence p(x) + ϑp(B1X ) ⊆ p(U). X/V We claim that p(B1X ) = B1 = {[x] ∈ X/V : ‖[x]‖ < 1}. To see this, let x ∈ B1X . Then X/V ‖p(x)‖ = ‖[x]‖ ≤ ‖x‖ < 1; see Proposition 3.2.15(a). Therefore, p(B1X ) ⊆ B1 . On X/V X the other hand if [u] ∈ B1 , then there is u ∈ B1 such that p(u ) = [u ] = [u] X/V X/V X/V (see Proposition 3.2.15(b)), and so B1 ⊆ p(B1 ). Thus finally p(B1X ) = B1 and so X/V p(x) + ϑB1 ⊆ p(U). Hence, p is open. Proposition 3.2.21. If X, Z are normed spaces, V ⊆ X is a closed subspace and A ∈ L(X, Z) satisfies N(A) = {x ∈ X : A(x) = 0} ⊇ V, then there exists a unique Â ∈ L(X/V, Z) such that A = Â ∘ p.

3.2 Three Fundamental Theorems |

205

̂ = A(x) is well-defined since Proof. The operator Â : X/V → Z defined by A([x]) ̂ V ⊆ N(A). Clearly A is linear and ̂ A([x]) = ‖A(x + v)‖Z ≤ ‖A‖L ‖x + v‖X for all v ∈ V , Z since V ⊆ N(A). Hence, ̂ A([x]) ≤ ‖A‖L inf [‖x + v‖X : v ∈ V] = ‖A‖L ‖[x]‖ . Z ̂ This shows that A ∈ L(X/V, Z) and A = Â ∘ p. Clearly Â is unique. Remark 3.2.22. This is a factorization theorem and it can be better remembered if we use the following figure: X p

A

Z

Â

X/V Proposition 3.2.23. If X, Z are Banach spaces, A ∈ L(X, Z) is surjective and V = N(A) = {x ∈ X : A(x) = 0}, then X/V and Z are isomorphic, that is, there exists L : X/V → Z being a linear, continuous bijection with a continuous inverse. Proof. From Proposition 3.2.21 we know that there exists a unique Â ∈ L(X/V, Z) such ̂ ̂ that A = Â ∘ p. If A([x]) = A([u]), then A(x) = A(u) and so x − u ∈ N(A), which means that Â is one-to-one. Let z ∈ Z and recall that A is surjective. Then we can find x ∈ X ̂ such that A(x) = z. Thus, A([x]) = z, which implies that Â is surjective, that is, a bijection. Invoking Theorem 3.2.10, we conclude that Â is an isomorphism. Definition 3.2.24. Let X be a normed space and let D ⊆ X. The annihilator of D is defined by D⊥ = {x∗ ∈ X ∗ : ⟨x∗ , d⟩ = 0 for all d ∈ D} . Evidently, D⊥ is a closed vector subspace of X ∗ . Using this notion we can characterize the dual of a quotient space. Proposition 3.2.25. If X is a normed space and V ⊆ X is a closed subspace, then (X/V)∗ and V ⊥ are isometrically isomorphic. Proof. Let l ∈ (X/V)∗ and let x∗ = l ∘ p : X → ℝ. Then x∗ ∈ X ∗ and x∗ V = 0. So, ∗ ⊥ ∗ ⊥ x ∈ V . Conversely, let x ∈ V . Then according to Proposition 3.2.21, there exists a unique l ∈ (X/V)∗ such that x∗ = l ∘ p. So, the linear map ξ : (X/V)∗ → V ⊥ defined by ξ(l) = l ∘ p is a bijection and l([x]) = ⟨ξ(l), x⟩ = ⟨ξ(l), x + v⟩ ≤ ‖ξ(l)‖∗ ‖x + v‖ for all v ∈ V .

206 | 3 Basic Functional Analysis Thus ‖l‖(X/V)∗ ≤ ‖ξ(l)‖∗ .

(3.2.11)

On the other hand, thanks to Proposition 3.2.15(a), one gets ⟨ξ(l), x⟩ = l([x]) ≤ ‖l‖(X/V)∗ ‖[x]‖ ≤ ‖l‖(X/V)∗ ‖x‖ . This gives ‖ξ(l)‖∗ ≤ ‖l‖(X/V)∗ .

(3.2.12)

From (3.2.11) and (3.2.12) we infer that ‖ξ(l)‖∗ = ‖l‖(X/V)∗ and so ξ is an isometric isomorphism. We present some additional properties of closed subspaces in Banach spaces. Proposition 3.2.26. If X is a Banach space and V, W ⊆ X are closed subspaces of X such that V + W is closed, then there exists ĉ > 0 such that every u ∈ V + W admits a decomposition u = v + w with v ∈ V and w ∈ W as well as ̂ ‖v‖ ≤ c‖u‖ and

̂ ‖w‖ ≤ c‖u‖ .

Proof. We consider the Cartesian product V ×W furnished with the norm ‖(v, w)‖ = ‖v‖+ ‖w‖. Moreover, we consider on V + W the norm inherited from X. Let A : V × W → V + W be defined by A((v, w)) = v + w. Evidently, A ∈ L(V × W, V + W) and is surjective. Since V × W and V + W are Banach spaces, invoking the Open Mapping Theorem (see Theorem 3.2.9), there exists c > 0 such that u ∈ V + W with ‖u‖ < c implies u = v + w with v ∈ V, w ∈ W and ‖v‖ + ‖w‖ < 1. By the homogeneity, there holds for every u ∈ V + W that u = v + w with v ∈ V, w ∈ W and ‖v‖ + ‖w‖ ≤ 1/c‖u‖. Then for ĉ = c−1 we have the result. Definition 3.2.27. Let X be a normed space. A closed subspace V ⊆ X is called complemented (or we say that it admits a topological complement), if there exists a closed subspace W ⊆ X such that V ∩ W = {0} and X = V + W (we write X = V ⊕ W). Then we say that V and W are complementary subspaces of X. The next results shows that finite dimensional subspaces or subspaces with finite codimension, are complemented. Proposition 3.2.28. If X is a normed space and V ⊆ X is a closed subspace such that dim V < ∞ or dim (X/V) < ∞, then V is complemented. Proof. Let n = dim V < ∞ and let {e k }nk=1 be a basis of V. According to Proposition 3.1.49, there exists {e∗m }nm=1 ⊆ X ∗ such that {1 if m = k , ⟨e∗m , e k ⟩ = δ mk = { 0 if m ≠ k . { Let W = {x ∈ X : ⟨e∗m , x⟩ = 0 for all m ∈ {1, . . . , n}}. Clearly W ⊆ X is a closed subspace and X = V ⊕ W since x − ∑nm=1 ⟨e∗m , x⟩e m ∈ W for all x ∈ X.

3.3 Weak and Weak* Topologies |

207

Next let n = dim (X/V) < ∞. We choose {x k }nk=1 ⊆ X such that {[x k ]}nk=1 is a basis of X/V. Then W = span{x k }nk=1 ⊆ X is closed (see Corollary 3.1.19) and satisfies X = V ⊕ W. Remark 3.2.29. It is not true that every closed subspace of an infinite dimensional Banach space is complemented. For example, c0 ⊆ l∞ is a closed subspace, but it is not complemented; see Phillips [237]. In fact a result due to Lindenstrauss-Tzafriri [201] says that every Banach space that is not a Hilbert space admits a closed subspace that is not complemented.

3.3 Weak and Weak* Topologies In this section we study the weak topology on a normed space X and the weak* topology on X ∗ , which is always a Banach space; see Corollary 3.1.48. These are locally convex topologies and are special cases of the weak topologies introduced in Definition 1.3.1 when Y i = ℝ for all i ∈ I and {f i }i∈I = X ∗ (for the weak topology) as well as {f i }i∈I = X (for the weak* topology). The strong (norm) topology on an infinite dimensional normed space is too strong for many purposes. In particular, note that a strongly compact set in an infinite dimensional normed space has an empty interior. Indeed, if this is not the case, then the space is locally compact, hence by Proposition 3.1.24, it is finite dimensional, a contradiction. The main result of this section is “Alaoglu’s Theorem” (see Theorem 3.3.38), which says that the unit ball in the dual space X ∗ is compact for the relative weak* topology. This result is reminiscent of the classical Heine–Borel Theorem; see Theorem 1.5.38. Definition 3.3.1. Let X be a normed space. The weak topology on X is the weakest topology on X with respect to which every element x∗ ∈ X ∗ (x∗ : X → ℝ being norm continuous and linear) is continuous. We denote the weak topology by w(X, X ∗ ) or simply by w. Remark 3.3.2. As we already mentioned, the w-topology is a particular case of the weak (initial) topology introduced in Definition 1.3.1 when the initial space is X (the normed space), Y i = ℝ for all i ∈ I, I = X ∗ and f x∗ : X → ℝ with x∗ ∈ X ∗ = I is the linear functional f x∗ (x) = ⟨x∗ , x⟩. Recall that ⟨⋅, ⋅⟩ denotes the duality brackets for the pair (X ∗ , X). Evidently the weak topology w is weaker than the norm (metric) topology on X. Proposition 3.3.3. The weak topology w(X, X ∗ ) is Hausdorff. Proof. From Corollary 3.1.61, we know that {f x∗ }x∗ ∈X∗ =I is separating and so Proposition 1.3.7 implies that w(X, X ∗ ) is Hausdorff. The weak topology on X is clearly linear, that is, both operations, vector addition and scalar multiplication, are continuous. Moreover, it is locally convex; see Theorem 3.1.40. Note that ℝ is regular and recall that regularity is hereditary and topological (see

208 | 3 Basic Functional Analysis Proposition 1.2.10), and it is preserved in Cartesian products; see Proposition 1.3.13. Therefore, we can improve Proposition 3.3.3 in the following way. Proposition 3.3.4. The weak topology w(X, X ∗ ) is regular; see Definition 1.2.7. Remark 3.3.5. In fact for the same reasons, w(X, X ∗ ) is completely regular; see Definition 1.2.19. The linearity of the weak topology implies that in order to describe it we only need to specify a local basis at the origin. Then by translation we obtain a local basis at any other point. Remark 1.3.2 allows us to give a precise description of the local basis at the origin. Proposition 3.3.6. A typical basic weak neighborhood of the origin is given by U(0; x∗1 , . . . , x∗n , ε) = {x ∈ X : |⟨x∗k , x⟩| < ε for all k = 1, . . . , n} with {x∗k }nk=1 ⊆ X ∗ , n ∈ ℕ and ε > 0. As ε > 0, n ∈ ℕ and {x∗k }nk=1 vary, we cover a local basis for the weak topology at the origin. At any other point x0 ∈ X the local basis consists of sets of the form x0 + U(0; x∗1 , . . . , x∗n , ε) = {x ∈ X : |⟨x∗k , x − x0 ⟩| < ε for all k = 1, . . . , n} . In infinite dimensional normed spaces the weak topology and the strong (norm) topology never coincide. To see this we will need to recall some simple facts from linear algebra. The first is an algebraic variant of the factorization result stated in Proposition 3.2.21. Lemma 3.3.7. If X, Y, Z are vector spaces, f : X → Z and g : X → Y are linear maps and N(g) ⊆ N(f), where N(g) = {x ∈ X : g(x) = 0}, N(f) = {x ∈ X : f(x) = 0}, then there exists a linear map ξ : Y → Z such that f = ξ ∘ g. Proof. Let ξ : g(X) → Z be defined by ξ(g(x)) = f(x) for all x ∈ X. This linear map is well-defined since if g(x1 ) = g(x2 ), then x1 − x2 ∈ N(g) ⊆ N(f) and so f(x1 ) = f(x2 ). Extending ξ to a linear map on all of Y gives f = ξ ∘ g. Using this lemma, we can prove the second auxiliary result from linear algebra. Lemma 3.3.8. If X is a vector space, f, f1 , . . . , f n : X → ℝ are linear maps and ⋂nk=1 N(f k ) ⊆ N(f), then f is a linear combination of the f k s. Proof. Let X = X, Y = ℝn , Z = ℝ, f = f and g = (f k )nk=1 and apply Lemma 3.3.7 to produce a linear functional ξ : ℝn → ℝ such that f = ξ ∘ g. Then ξ(y)̂ = ∑nk=1 λ k y k with λ1 , . . . , λ n ∈ ℝ, ŷ = (y k )nk=1 ∈ ℝn . It follows that f(x) = ∑nk=1 λ k f k (x) for all x ∈ X. These auxiliary results lead to the following important observations about the weak topology. Proposition 3.3.9. If X is an infinite dimensional normed space and U ⊆ X is nonempty and w-open, then U is not bounded.

3.3 Weak and Weak* Topologies |

209

Proof. Translating U if necessary, we may assume that 0 ∈ U. By Proposition 3.3.6 there exist x∗1 , . . . , x∗n ∈ X ∗ and ε > 0 such that U(0; x∗1 , . . . , x∗n , ε) ⊆ U. Note that V = ⋂nk=1 N(x∗k ) ⊆ U. Of course, V is a vector subspace of X and we claim that V ≠ {0}. Indeed, if V = {0}, then it holds that V ⊆ N(x∗ ) for all x∗ ∈ X ∗ and so Lemma 3.3.8 implies that x∗ is a linear combination of the x∗k ’s. This means that X ∗ = span{x∗k }nk=1 and so X ∗ is finite dimensional, and hence X is finite dimensional, a contradiction. Therefore U is not bounded since it contains V. Remark 3.3.10. This proposition implies that weakly open sets are large. In particular, if x ∈ V (see the previous proof), x ≠ 0, then ℝx ⊆ U. Therefore the open unit ball B1 = {x ∈ X : ‖x‖ < 1} is never w-open in an infinite dimensional normed space X. Corollary 3.3.11. If X is an infinite dimensional normed space, then the weak and strong (norm) topology do not coincide. In finite dimensional normed spaces, which are then of course Banach spaces, the two topologies coincide. Proposition 3.3.12. If X is a finite dimensional normed space, then the weak topology and the strong (norm) topology coincide. Proof. By definition, the weak topology is smaller than the strong topology. So, in order to prove the proposition, it suffices to show that every strongly open set is weakly open. Let x0 ∈ X and let U be a strongly open set containing x0 . Then there exists ϱ > 0 such that B ϱ (x0 ) = {x ∈ X : ‖x − x0 ‖ < ϱ} ⊆ U . (3.3.1) Let {e k }nk=1 be a basis for X with ‖e k ‖ = 1 for all k = 1, . . . , n. Then every x ∈ X admits an expression x = ∑nk=1 λ k e k with λ k ∈ ℝ. For every k = 1, . . . , n the coordinate map x → λ k , denoted by x∗k , is linear and continuous for every k = 1, . . . , n. We consider U(x0 ; x∗1 , . . . , x∗n , ϱ/n) being the basic weak neighborhood of x0 determined by these coordinate maps. Then it follows n

‖x − x0 ‖ ≤ ∑ |⟨x∗k , x − x0 ⟩| ≤ n k=1

ϱ =ϱ n

for all x ∈ U (x0 ; x∗1 , . . . , x∗n ,

ϱ ) , n

which implies

ϱ ) ⊆ B ϱ (x0 ) ⊆ U , n see (3.3.1). That means that U is w-open and so the two topologies coincide. U (x0 ; x∗1 , . . . , x∗n ,

w

In what follows, we denote the convergence in the weak topology by → and the convergence in the strong (norm) topology by →. Proposition 3.3.13. If X is a normed space and {x α }α∈I ⊆ X is a net, then the following hold: w (a) x α → x if and only if ⟨x∗ , x α ⟩ → ⟨x∗ , x⟩ for all x∗ ∈ X ∗ ; w (b) x α → x implies x α → x;

210 | 3 Basic Functional Analysis w

(c) x α → x implies ‖x‖ ≤ lim inf α∈I ‖x α ‖ and a weakly convergent sequence is norm bounded; w (d) x α → X in X and x∗α → x∗ in X ∗ imply ⟨x∗α , x α ⟩ → ⟨x∗ , x⟩. Proof. (a) This is a consequence of Proposition 1.3.3. (b) For every x∗ ∈ X ∗ , we have |⟨x∗ , x α ⟩ − ⟨x∗ , x⟩| = |⟨x∗ , x α − x⟩| ≤ ‖x∗ ‖∗ ‖x α − x‖ → 0 . w

(c) Suppose that there is a sequence {x n }n∈ℕ ⊆ X such that x n → x. Then, it follows ∗ ⟨x , x n − x⟩ → 0 for all x∗ ∈ X ∗ , which implies supn∈ℕ |⟨x∗ , x n − x⟩| < ∞. Taking Theorem 3.2.1 into account there exists M > 0 such that ‖x n ‖ ≤ M for all n ∈ ℕ. Evidently we may assume that x ≠ 0. According to Proposition 3.1.50, there exists x̂ ∗ ∈ X ∗ with ‖x̂ ∗ ‖∗ = 1 such that ⟨x̂ ∗ , x⟩ = ‖x‖. So, ‖x‖ = limα∈I |⟨x̂ ∗ , x α ⟩|. Then, for given ε > 0 we can find α0 = α0 (ε) ∈ I such that ‖x‖ − ε ≤ |⟨x∗ , x α ⟩| ≤ ‖x α ‖

for all α ≥ α0 .

Hence, ‖x‖ ≤ lim inf α∈I ‖x α ‖. (d) Applying part (c), we derive, for some M > 0 and for every α ∈ I, that |⟨x∗α , x α ⟩ − ⟨x∗ , x⟩| ≤ |⟨x∗α − x∗ , x α ⟩| + |⟨x∗ , x α − x⟩| ≤ ‖x∗α − x∗ ‖∗ M + |⟨x∗ , x α − x⟩| → 0 . Thus, ⟨x∗α , x α ⟩ → ⟨x∗ , x⟩. Remark 3.3.14. We emphasize that the boundedness in Proposition 3.3.13(c) holds only for weakly convergent sequences and it fails for nets. Indeed, every infinite w dimensional normed space admits a net {x α }α∈I ⊆ X such that x α → 0 in X and sup[‖x η ‖ : η ≥ a, η ∈ I] = +∞. To see this let E denote the collection of all nonempty finite subsets of X ∗ . This set is directed by the set inclusion, that is, if α, η ∈ E, α ≥ η if and only if α ⊇ η. For each α = (x∗k )nk=1 ∈ E there exist some x α ∈ ⋂nk=1 N(x∗k ) such that ‖x α ‖ = card α. The net {x α }α∈E has the desired properties. The weak topology is not metrizable in general and so sequences are not adequate to describe it. In fact we have the following result. Proposition 3.3.15. If X is a normed space and the weak topology on X is metrizable, then X is finite dimensional. Proof. Since the weak topology is metrizable, it is first countable. Hence, we can find a sequence {x∗n }n≥1 ⊆ X ∗ such that for any given U ∈ Nw (0) being the filter of weak neighborhoods of the origin, there exist ε ∈ (0, 1) ∩ ℚ and n U ∈ ℕ such that U (0; x∗1 , . . . , x∗n U , ε) ⊆ U . For each x∗ ∈ X ∗ , we have U(0; x∗ , 1) ∈ Nw (0) and so by (3.3.2) it follows that U (0; x∗1 , . . . , x∗n(U(0;x∗ ,1)) , ε) ⊆ U(0; x∗ , 1) .

(3.3.2)

3.3 Weak and Weak* Topologies

Then

n(U(0;x∗ ,1))

⋂

| 211

N(x∗k ) ⊆ N(x∗ ) ,

k=1

which, due to Lemma 3.3.8, results in n(U(0;x∗ ,1))

x∗ ∈ span{x∗k }k=1

.

Since x∗ ∈ X ∗ is arbitrary, it follows that X ∗ = ⋃k≥1 V k with each V k being finite dimensional. Recall that X ∗ is a Banach space. So, invoking Corollary 1.5.67 we see that int V k0 ≠ 0 for some k0 ∈ ℕ. This means that V k0 = X ∗ and so X ∗ is finite dimensional. Hence X is finite dimensional. In what follows, we define for a normed space B1 = {x ∈ X : ‖x‖ ≤ 1} and

∂B1 = {x ∈ X : ‖x‖ = 1} .

Both sets are strongly closed. However, the situation changes for a weak topology. This is another illustration of the character of weak topology compared with strong (norm) topology, in the case of infinite dimensional normed spaces, of course; see Proposition 3.3.12. w

Proposition 3.3.16. If X is an infinite dimensional normed space, then ∂B1 = B1 . Proof. First we point out that the set B1 is w-closed. Indeed, if {x α }α∈I ⊆ B1 is a net w such that x α → x, then from Proposition 3.3.13(c) one gets ‖x‖ ≤ lim inf α∈I ‖x α ‖ ≤ 1. Hence x ∈ B1 and so B1 is w-closed. It follows that w

∂B1 ⊆ B1 .

(3.3.3)

Next let x0 ∈ B1 = {x ∈ X : ‖x‖ < 1} and take U ∈ Nw (x0 ) being the filter of weak neighborhoods of x0 . We may always assume that U is basic, that is, U = U (x0 ; x∗1 , . . . , x∗n , ε)

with {x∗k }nk=1 ⊆ X ∗

and

ε>0.

We fix u ∈ ⋂nk=1 N(x∗k ), u ≠ 0 (see the proof of Proposition 3.3.9) and consider the function ξ : ℝ+ → ℝ+ defined by ξ(λ) = ‖x0 + λu‖ for all λ ≥ 0. We see that ξ is continuous, ξ(0) < 1 and limλ→+∞ ξ(λ) = +∞. So, by Bolzano’s Theorem there exists λ0 > 0 such that ξ(λ0 ) = ‖x0 + λ0 u‖ = 1, hence x0 + λ0 u ∈ ∂B1 . Moreover, for every k = 1, . . . , n we obtain |⟨x∗k , x0 + λ0 u − x0 ⟩| = 0, which shows w

that x0 + λ0 u ∈ ∂B1 ∩ U. Therefore it follows that B1 ⊆ ∂B1 and since the weak w w topology is smaller we infer that B1 ⊆ B1 ⊆ ∂B1 . Finally, because of (3.3.3), we w conclude that B1 = ∂B1 . Remark 3.3.17. Consider the infinite dimensional Banach space l1 = {x̂ = (x n )n≥1 ∈ ℝℕ : ∑n≥1 |x n | < ∞} which is called the space of all absolutely summable sequences in ℝ. One can show that weak and norm convergent sequences coincide in l1 . This is known as “Schur’s Theorem” and its proof can be found in the book of Diestel [79, p. 85].

212 | 3 Basic Functional Analysis Our previous discussion of the weak topology has established that in an infinite dimensional normed space there are many more strongly closed sets than there are weakly closed sets. In the next theorem we show that for convex sets both notions agree. This is a remarkable result since a purely algebraic property, namely convexity, leads to a purely topological conclusion, namely that weak and strong closures coincide. The result is known as “Mazur’s Theorem.” Theorem 3.3.18 (Mazur’s Theorem). If X is a normed space and C ⊆ X is convex, then w C=C . Proof. Since the strong (norm) topology is larger than the weak topology we directly obtain w C⊆C . (3.3.4) Arguing by contradiction suppose that the inclusion in (3.3.4) is strict. That means w there exists x0 ∈ C \ C. Invoking the Strong Separation Theorem (see Theorem 3.1.60), we find x∗ ∈ X ∗ \ {0} and ε > 0 such that ⟨x∗ , x0 ⟩ + ε ≤ ⟨x∗ , u⟩

for all u ∈ C .

inf[⟨x∗ ,

We set ϑ = u⟩ : u ∈ C] and U = {x ∈ X : ⟨x∗ , x⟩ < ϑ}. Evidently U ∈ Nw (x0 ) with w Nw (x0 ) being the filter of weak neighborhoods of x0 . Then U ∩ C = 0 and so x0 ∈ ̸ C , a w contradiction. Therefore from (3.3.4) we conclude that C = C . w

Corollary 3.3.19. If X is a normed space and V ⊆ X is a vector subspace, then V = V . w

Corollary 3.3.20. If X is a normed space and x n → x, then there exists a sequence {u n }n≥1 ⊆ X consisting of convex combinations of the x n ’s such that u n → x in X. w

Proof. Let C = conv {x n }n≥1 . Theorem 3.3.18 gives x ∈ C = C and so x ∈ conv {x n }n≥1 . The result follows. w

Remark 3.3.21. This corollary known as “Mazur’s Lemma” says that if x n → x, then for m a given ε > 0 there exist t1 , . . . , t m ≥ 0 such that ∑m k=1 t k = 1 and x − ∑k=1 t k x k < ε. Corollary 3.3.22. If X is a normed space and C ⊆ X is convex, then C is closed if and only if C is w-closed. The next result is a consequence of the projective character of the weak topology. Proposition 3.3.23. If X, Y are normed spaces, then A ∈ L(X, Y) if and only if A is weak-to-weak continuous. X

X

Proof. Note that A ∈ L(X, Y) if and only if A(B1 ) ⊆ Y is bounded with B1 = {x ∈ X

X : ‖x‖X ≤ 1}; see Proposition 3.1.46. From Proposition 3.2.4 we know that A(B1 ) ⊆ Y X

is bounded if and only if y∗ (A(B1 )) ⊆ ℝ is bounded for every y∗ ∈ Y ∗ . But a linear functional on a normed space is continuous if and only if it is weakly continuous. Invoking Proposition 1.3.4 we conclude that A is continuous if and only if it is weak-toweak continuous.

3.3 Weak and Weak* Topologies

| 213

From Proposition 3.2.4 we have the following result about bounded sets. Proposition 3.3.24. If X is a normed space and A ⊆ X, then A is bounded if and only if A is w-bounded. Remark 3.3.25. We can formulate this result in a more general form. We say that a locally convex topology τ on X is compatible with the pair (X ∗ , X) if and only if (X τ )∗ = X ∗ . Then A ⊆ X is bounded if and only if A is τ-bounded. In short, we can say that boundedness is duality invariant. On the dual space X ∗ we can define two topologies. The first is the usual strong (metric) topology induced by the norm and the second is the weak topology w = w(X ∗ , X ∗∗ ). ∗ )∗ = X ∗∗ . Recall that the weak topology w is the weakest topology on X ∗ such that (Xw ∗ There is a third topology that we can define known as the w -topology. This topology makes sense only on dual spaces. Definition 3.3.26. Let X be a normed space and X ∗ is the topological dual, that is, X ∗ = L(X, ℝ). The weak* topology on X ∗ is the weakest topology w∗ on X ∗ such that ∗ ∗ ∗ → ℝ defined by f (x ∗ ) = ⟨x ∗ , x⟩. (Xw ∗ ) = X. Consider now the linear functional f x : X x * Then the weak topology is the weakest topology on X ∗ making the collection {f x }x∈X of maps from X ∗ into ℝ continuous. The weak* topology on X ∗ is denoted by w∗ or by w(X ∗ , X). Remark 3.3.27. Since X ⊆ X ∗∗ it is clear that w∗ ⊆ w, that is, the weak* topology has fewer open (resp. closed) sets than the weak topology. Similarly to the weak topology (see Proposition 3.3.4 and Remark 3.3.5), we have the following result. Proposition 3.3.28. If X is a normed space, then X ∗ , equipped with the weak* topology, is a completely regular locally convex space. Moreover, we obtain the next two propositions as a consequence from Proposition 3.3.12. Proposition 3.3.29. If X is a normed space, then the w∗ , the w, and the strong topologies on X ∗ coincide if and only if X is finite dimensional. Proposition 3.3.30. If X is a normed space, then the basic weak* neighborhood of the origin has the form U(0; x1 , . . . , x n , ε) = {x∗ ∈ X ∗ : |⟨x∗ , x k ⟩| < ε for all k = 1, . . . , n} with {x k }nk=1 ⊆ X, n ∈ ℕ and ε > 0. Since the weak* topology is linear, we obtain the local basis at any other point by translation. The proof of Proposition 3.3.13 gives the following result. In what follows we denote the w∗

convergence in weak* topology by →.

214 | 3 Basic Functional Analysis Proposition 3.3.31. If X is a normed space and {x∗α }α∈I ⊆ X ∗ is a net, then the following hold: ∗ w (a) x∗α → x∗ if and only if ⟨x∗α , x⟩ → ⟨x∗ , x⟩ for all x ∈ X; w

w∗

(b) x∗α → x∗ or x∗α → x∗ implies x∗α → x∗ ; w∗

(c) x∗α → x∗ implies ‖x∗ ‖∗ ≤ lim inf α∈I ‖x∗α ‖∗ and every weakly* convergent sequence is norm bounded; w∗

(d) x∗α → x∗ and x α → x in X imply ⟨x∗α , x α ⟩ → ⟨x∗ , x⟩. Remark 3.3.32. From the definition of the weak* topology, we see that any linear functional f : X ∗ → ℝ, which is continuous for the w∗ -topology, has the form f(x∗ ) = ⟨x∗ , x⟩̂ for some x̂ ∈ X. Proposition 3.3.33. If X is a normed space and H ⊆ X ∗ is a w∗ -closed hyperplane, then there exist x̂ ∈ X, x̂ ≠ 0, and ϑ ∈ ℝ such that H = {x∗ ∈ X ∗ : ⟨x∗ , x⟩̂ = ϑ} . Proof. We know that H = {x∗ ∈ X ∗ : f(x∗ ) = ϑ} with f : X ∗ → ℝ being linear and ϑ ∈ ℝ; see Definition 3.1.53. Since by hypothesis H is w∗ -closed, Proposition 3.1.54 implies that f is w∗ -continuous. Finally, using Remark 3.3.32, we conclude that there exists x̂ ∈ X such that H = {x∗ ∈ X ∗ : ⟨x∗ , x⟩̂ = ϑ}. Recall that every x ∈ X defines in a natural way a linear functional f x : X ∗ → ℝ according to the formula f x (x∗ ) = ⟨x∗ , x⟩. Indeed, we see that |f x (x∗ )| = |⟨x∗ , x⟩| ≤ ‖x∗ ‖∗ ‖x‖, which shows that f x is bounded, that is, f x ∈ X ∗ , and ‖f x ‖∗ ≤ ‖x‖. Thus we can define the map j : X → X ∗∗ by j(x) = f x . Clearly j is linear, injective, and ‖j(x)‖∗ ≤ ‖x‖ for all x ∈ X. Additional information about this map is supplied by the next proposition. Proposition 3.3.34. If X is a normed space and j : X → X ∗∗ is the linear map defined above, then j is an isometric isomorphism onto j(X). Proof. We already proved that j is an isomorphism onto j(X) and ‖j(x)‖∗ ≤ ‖x‖ for all x ∈ X. On the other hand, from Proposition 3.1.50, we know that there exists x∗ ∈ X ∗ such that ‖x∗ ‖∗ = 1 and j(x)(x∗ ) = ⟨x∗ , x⟩ = ‖x‖. This shows that ‖j(x)‖∗ ≥ ‖x‖ for all x ∈ X. Hence, j is an isometry. Definition 3.3.35. The isometry j : X → X ∗∗ of Proposition 3.3.34 is called the canonical embedding of the normed space X into X ∗∗ . Remark 3.3.36. Using the canonical embedding we can identify X with a subspace of X ∗∗ . Moreover, j(X) is a closed subspace of the Banach space X ∗∗ . Hence, V = j(X) is a Banach space as well. Therefore j is an isometric isomorphism onto a dense subset of the Banach space V. Hence, the canonical embedding provides a shortcut to the completion of a normed space. Every normed space can be viewed as a dense subspace of a Banach space. When the canonical embedding j is not surjective, then the weak topology w(X ∗ , X ∗∗ ) is strictly larger than the weak* topology. Indeed let û ∈ X ∗∗ \ j(X)

3.3 Weak and Weak* Topologies

| 215

and consider the subspace H = {x∗ ∈ X ∗ : ⟨u,̂ x∗ ⟩ = 0}. Then H is w-closed, but it is not w∗ -closed; see Proposition 3.3.33. In fact this example shows that Mazur’s Theorem (see Theorem 3.3.18) fails for the w∗ -topology. A strongly closed convex set need not be w∗ -closed. Moreover, a normed space and its completion have the same dual space; however, their weak* topologies differ. So, one should be careful when dealing with the weak* topology of the dual of a normed space and that of the dual of the Banach space resulting from its completion. Since X can be viewed as a subspace of X ∗∗ , it is natural to ask what kind of subspace it is. The answer is given by the so-called “Goldstine’s Theorem.” In what follows we set X

B1 = {x ∈ X : ‖x‖ ≤ 1} ,

B1X = {x ∈ X : ‖x‖ < 1}, B1X

∗∗

X ∗∗

= {x∗∗ ∈ X ∗∗ : ‖x∗∗ ‖∗∗ < 1},

B1

= {x∗∗ ∈ X ∗∗ : ‖x∗∗ ‖∗∗ ≤ 1} . w∗

Theorem 3.3.37 (Goldstine’s Theorem). If X is a normed space, then j (B1X ) and j(X)

w∗

X ∗∗

= B1

= X ∗∗ .

Proof. Clearly, the second equality is a consequence of the first. So, let us prove the first one. ∗ ∗ w

w

Let x∗∗ ∈ X ∗∗ \ j (B1X ) . Since j (B1X ) ⊆ X ∗∗ is convex and w∗ -closed, by the ∗∗ ∗ ∗ Strong Separation Theorem (see Corollary 3.1.61), there exists x∗ ∈ (Xw ∗ ) = X with ∗ x ≠ 0 such that w∗

sup [⟨x∗ , u∗∗ ⟩ : u∗∗ ∈ j (B1X ) ] < ⟨x∗ , x∗∗ ⟩ .

(3.3.5)

We may always assume that ‖x∗ ‖∗ = 1. Then, from (3.3.5), we have 1 = ‖x∗ ‖∗ < ⟨x∗ , x∗∗ ⟩ ≤ ‖x∗ ‖∗ ‖x∗∗ ‖∗∗ . w∗

Hence, 1 < ‖x∗∗ ‖∗∗ and so j (B1X )

X ∗∗

= B1 .

The weaker a topology is, the more compact sets it has. The next theorem is the most important feature of the weak* topology. It is reminiscent of the Heine–Borel-Theorem and it is the reason why the weak* topology is important in the theory of Banach spaces. The result is known as “Alaoglu’s Theorem.” X∗

Theorem 3.3.38 (Alaoglu’s Theorem). If X is a normed space, then B1 = {x∗ ∈ X ∗ : ‖x∗ ‖∗ ≤ 1} is w∗ -compact. More generally every w∗ -closed and bounded subset of X ∗ is w∗ -compact. X∗

X

Proof. Suppose that x∗ ∈ B1 . Then for each x ∈ B1 it follows |⟨x∗ , x⟩| ≤ 1. Therefore X

x∗ (B1 ) ⊆ I = {λ ∈ ℝ : |λ| ≤ 1} .

216 | 3 Basic Functional Analysis X∗

X

We can identify each element of B1 with a point in I B1 . From Tychonoff’s Theorem, X

see Theorem 1.4.56, I B1 equipped with the product topology is compact. Since the X weak* topology is by definition the topology of pointwise convergence on B1 , the X∗

X

identification of B1 with a subset of I B1 leaves the weak* topology unchanged. So, X∗

X∗

X

it remains to show that B1 is closed in I B1 . To this end, let {x∗α }α∈I ⊆ B1 be a net X

and assume that it converges pointwise to g ∈ I B1 . Evidently g is linear and so g is X the restriction on B1 of a linear functional x∗ on X. Moreover, since |g(x)| ≤ 1 for all X

X∗

X∗

X

x ∈ B1 , it follows that x∗ ∈ B1 and this proves that B1 is closed in I B1 , and hence w∗ -compact. X∗ X∗ Every bounded set C ⊆ X ∗ satisfies C ⊆ rB1 for some r > 0. Since B1 is w∗ -compact and C is w∗ -closed, we conclude that it is w∗ -compact. Remark 3.3.39. From the theorem above, we derive that if X is a normed space and C ⊆ X ∗ , then C is w∗ -closed and bounded implies that C is w∗ -compact. For the converse to hold, we need to assume that X is a Banach space. To see this, let X = {x̂ = (a n )n∈ℕ : a n = 0 for all n ≥ n0 } equipped with the norm x̂ = ∑n∈ℕ |a n |. Clearly this is a normed space but not a Banach space. Consider a sequence {ξ n }n∈ℕ ⊆ ℝ with ξ n > 0 for all n ∈ ℕ such that ξ n → +∞ as n → ∞. Let {x̂ n }n∈ℕ ⊆ X ∗ be defined by x̂ ∗n (x)̂ = a n for all n ∈ ℕ. Let D = {0, ξ1 x̂ 1 , ξ2 x̂ 2 , . . . , ξ n x̂ n , . . .} ⊆ X ∗ . This set is unbounded since ‖ξ n x̂ n ‖∗ = w∗

ξ n → +∞. However, it holds ξ n x̂ n (x)̂ = ξ n a n → 0 for all x̂ ∈ X. So, ξ n x̂ n → 0 and it follows that D ⊆ X ∗ is w∗ -bounded. From the previous remark, we have the following corollary. Corollary 3.3.40. If X is a Banach space and C ⊆ X ∗ , then C is bounded if and only if C is w∗ -bounded, that is, x(C) ⊆ ℝ is bounded for every x ∈ X. We conclude this section with a remarkable result of R. C. James, which provides a necessary and sufficient condition for a set C in a Banach space X to be weakly compact. The result is known as “James’ Theorem” and its proof is lengthy and can be found in Holmes [155, p. 157]. Theorem 3.3.41 (James’ Theorem). If X is a Banach space and C ⊆ X is bounded and w-closed, then C is w-compact if and only if every x∗ ∈ X ∗ attains its supremum over C.

3.4 Separable and Reflexive Banach Spaces In this section we examine two special classes of Banach spaces, namely separable and reflexive Banach spaces. They exhibit special properties, which are important in applications.

3.4 Separable and Reflexive Banach Spaces | 217

Definition 3.4.1. (a) A normed space X is separable if it contains a countable dense subset. (b) A normed space X is reflexive if the canonical embedding j : X → X ∗∗ (see Definition 3.3.35) is surjective. A reflexive normed space is necessarily complete, that is, a Banach space. Remark 3.4.2. Any subset of a separable normed space is a separable metric space. Many important spaces in analysis are separable and/or reflexive. Every finite dimensional Banach space is separable and reflexive. In the definition of reflexivity it is essential to use the canonical embedding j stated in Definition 3.3.35. R. C. James produced in 1951 a remarkable example of a nonreflexive Banach space X that is isometrically isomorphic to X ∗∗ . In this example, the image of X under the canonical embedding j : X → X ∗∗ is a closed subspace of codimension one. A detailed construction of this space can be found in Megginson [212]; see Section 4.5. In what follows, for the sake of notational simplicity, we drop the use of the map j. It is understood that X is embedded into X ∗∗ via the canonical embedding. Proposition 3.4.3. If X is a Banach space and X ∗ is separable, then X is separable. Proof. Let {x∗n }n≥1 ⊆ X ∗ be dense. Thanks to Corollary 3.1.48 we know that ∗ x n ∗ = sup [⟨x∗n , x⟩ : x ∈ X, ‖x‖ ≤ 1] for all n ∈ ℕ. Hence, there exists x n ∈ X such that 1 ∗ ‖x ‖∗ ≤ ⟨x∗n , x n ⟩ , n ∈ ℕ . (3.4.1) 2 n Let V0 = spanℚ {x n }n∈ℕ , that is, V0 is the set of all finite linear combinations with coefficients in ℚ of the vectors {x n }n∈ℕ . This set is countable since V0 = ⋃m≥1 V m with V m being the set of linear combinations with coefficients in ℚ of {x n }m n=1 . Each V m is countable, and so V0 = ⋃m≥1 V m is countable as well. Let V = span{x n }n∈ℕ . We claim that V is dense in X. To this end, let x∗ ∈ V ⊥ . Then there exists {x∗n k }k∈ℕ ⊆ {x∗n }n∈ℕ such that ‖x n ‖ = 1

and

x∗n k → x∗

in X ∗ as k → ∞ .

(3.4.2)

Then, because of (3.4.1) and since ∈ it follows that ∗ x n k ∗ ≤ 2⟨x∗n k , x n k ⟩ = 2⟨x∗n k − x∗ , x n k ⟩ ≤ 2 x∗n k − x∗ ∗ x n k = 2 x∗n k − x∗ ∗ . x∗

V ⊥,

Hence, thanks to (3.4.2), one gets x∗n k → x∗ = 0 in X ∗ . This shows that V ⊥ = {0} and so V is dense in X; see Remark 3.1.63. Since V0 is countable and dense in V, we conclude that X is separable. Remark 3.4.4. The converse of this result is not true. Namely, separability of X does not imply separability of X ∗ . For example, X = L1 ([0, 1]) is separable (see Proposition 2.3.24), but X ∗ = L∞ ([0, 1]) is not separable; see Proposition 2.3.29. In Section 4.1 we will show that L∞ ([0, 1]) = L1 ([0, 1])∗ .

218 | 3 Basic Functional Analysis X

Theorem 3.4.5. A Banach space X is reflexive if and only if B1 = {x ∈ X : ‖x‖ ≤ 1} is w-compact. X ∗∗

X

Proof. ⇒: The reflexivity of X implies that X = X ∗∗ . Hence B1 = B1 . By Alaoglu’s X ∗∗

Theorem (see Theorem 3.3.38), B1 is w∗ -compact and from Proposition 1.3.5, we know that w(X ∗∗ , X ∗ )X = w(X, X ∗ ) . (3.4.3) X

Therefore, B1 is w-compact.

X

⇐: Since by hypothesis, B1 is w-compact, it is w∗ -closed in X ∗∗ ; see (3.4.3). X The Goldstine’s Theorem (see Theorem 3.3.37), gives B1 ∗∗ X X w∗ -closed in X ∗∗ , we obtain B1 = B1 . Therefore X = X ∗∗

w∗

X ∗∗

= B1

X

and since B1 is

and so we conclude that X

is reflexive.

Proposition 3.4.6. A Banach space X is reflexive if and only if X ∗ is reflexive. Proof. ⇒: Since X is reflexive, we know that X = X ∗∗ and so the weak and weak* X∗ topologies on X ∗ coincide. Alaoglu’s Theorem (see Theorem 3.3.38) implies that B1 = {x∗ ∈ X ∗ : ‖x∗ ‖∗ ≤ 1} is w-compact and so Theorem 3.4.5 implies that X ∗ is reflexive. ⇐: Since X ∗ is reflexive, by the previous part of the proof we have that X ∗∗ is X ∗∗ reflexive as well. Then, Theorem 3.4.5 implies that B1 = {x∗∗ ∈ X ∗∗ : ‖x∗∗ ‖∗∗ ≤ 1} is X ∗∗

X

w-compact. The set B1 is closed, convex, hence a w-closed subset of B1 ; see Mazur’s X

Theorem (Theorem 3.3.18). Therefore B1 is w-compact in X ∗∗ . Since the w∗ -topology on X

X ∗∗ is weaker than the w-topology, it follows that B1 is w∗ -compact in X ∗∗ . Hence it is w-compact in X; see (3.4.3). We conclude by using Theorem 3.4.5. Proposition 3.4.7. If X is a reflexive Banach space and V is a closed subspace of X, then V is a reflexive Banach space. Proof. We know that see Proposition 1.3.5. The set

w(V, V ∗ ) = w(X, X ∗ )V ; V B1

(3.4.4)

= {x ∈ V : ‖x‖ ≤ 1} is a weakly closed subset of the

X B1 ; see Theorem 3.4.5. Combining this with (3.4.4), we infer that

V

weakly compact set B1 is w-compact in V. Then invoking Theorem 3.4.5 we conclude that V is reflexive. Combining Propositions 3.4.3 and 3.4.6, we obtain the following.

Proposition 3.4.8. If X is a Banach space, then X is separable and reflexive if and only if X ∗ is separable and reflexive. Proposition 3.4.9. If X is a reflexive Banach space and V ⊆ X is a closed subspace, then X/V is reflexive. Proof. From Proposition 3.2.25, we know that (X/V)∗ and V ⊥ are isometrically isomorphic. Let ξ : (X/V)∗ → V ⊥ be this isometric isomorphism. If p : X → X/V is the

3.4 Separable and Reflexive Banach Spaces | 219

quotient map (see Definition 3.2.19), then from the proof of Proposition 3.2.25 we know that ξ(l) = l ∘ p for all l ∈ (X/V)∗ . Let l∗ ∈ (X/V)∗∗ . The map l∗ ∘ ξ −1 : V ⊥ → ℝ is a bounded linear functional on a subspace of X ∗ . Hence, by Proposition 3.1.49 there exists x∗∗ ∈ X ∗∗ such that ⟨x∗∗ , x∗ ⟩ = ⟨l∗ , ξ −1 (x∗ )⟩ for all x∗ ∈ V ⊥ . This implies that ⟨x∗∗ , l ∘ p⟩ = ⟨l∗ , l⟩ for all l ∈ (X/V)∗ .

(3.4.5)

The reflexivity of X implies that there exists x ∈ X such that j(x) = x∗∗ with j being the canonical embedding. Let u = [x] = p(x) ∈ X/V. Combining Definition 3.3.35 and (3.4.5), it follows that ⟨l∗ , l⟩ = ⟨x∗∗ , l ∘ p⟩ = ⟨j(x), l ∘ p⟩ = ⟨l ∘ p, x⟩ = ⟨l, p(x)⟩ = ⟨l, u⟩ . Hence, j(u) = l∗ with j being the canonical embedding for X/V. Since l∗ ∈ (X/V)∗∗ is arbitrary, it follows that j is surjective and so X/V is reflexive; see Definition 3.4.1(b). We know that on an infinite dimensional normed space and on its dual, the weak and weak* topologies are never metrizable. Nevertheless, the traces of these topologies on certain subspaces can be metrizable. The results that follow investigate this issue. We start with a general topological result. Lemma 3.4.10. If (X, τ) is a compact topological space and {f n }n≥1 is a separating sequence of continuous functions on X (see Definition 1.3.6), then the topology τ is metrizable. Proof. We may assume that |f n (x)| ≤ 1 for all x ∈ X and for all n ∈ ℕ. On X we consider the metric d defined by 1 |f n (x) − f n (u)| 2n n∈ℕ

d(x, u) = ∑

for all x, u ∈ X .

Let τ d be the metric topology induced by this metric on X. For every fixed u ∈ X, x → d(x, u) is τ-continuous as the uniform limit of τ-continuous functions. So, for every ε > 0, it follows that B ε (u) = {x ∈ X : d(x, u) < ε} ∈ τ, which means that τ d ⊆ τ. Using Theorem 1.4.54, we see that the identity map i X : (X, τ) → (X, τ d ) is a homeomorphism. Hence τ = τ d . Using this lemma, we can state the first metrizability result for the weak* topology. Theorem 3.4.11. If X is a separable normed space and C ⊆ X ∗ is w∗ -compact, then C equipped with the w∗ -topology is metrizable. Proof. Let {x n }n≥1 ⊆ X be dense in X. If j : X → X ∗∗ is the canonical embedding, then ⟨j(x n ), x∗ ⟩ = ⟨x∗ , x n ⟩

for all n ∈ ℕ and for all x∗ ∈ X ∗ ;

see Definition 3.3.35. So, if ⟨j(x n ), x∗ ⟩ = 0 for all n ∈ ℕ, we derive that ⟨x∗ , x n ⟩ = 0 for all n ∈ ℕ and the density of {x n }n≥1 in X implies that x∗ = 0. Therefore {j(x n )}n≥1 ⊆ X ∗∗

220 | 3 Basic Functional Analysis is separating and each j(x n ) is w∗ -continuous. Applying Lemma 3.4.10, we conclude that (C, w∗ ) is metrizable. We can improve this result in the following way. Theorem 3.4.12. If X is a normed space, then the following hold: X∗ (a) (B1 , w∗ ) is metrizable if and only if X is separable; X

(b) (B1 , w) is metrizable if and only if X ∗ is separable. X∗

Proof. (a) ⇒: Since (B1 , w∗ ) is metrizable we can find a countable basis {U n }n≥1 at the origin. We obtain X∗

U n = {x∗ ∈ B1 : |⟨x∗ , x⟩| < ε n for all x ∈ F n } ,

n∈ℕ

with F n ⊆ X finite and ε1 , . . . , ε n > 0. Let E = ⋃n≥1 F n . Then E ⊆ X is countable and x∗ (E) = 0 implies x∗ ∈ U n for all n ∈ ℕ and so x∗ = 0. Moreover, if x∗ (spanE) = 0, then x∗ = 0. Therefore spanE = X and so we conclude that X is separable. ⇐: This follows from Theorem 3.4.11. (b) ⇒: As before, let {U n }n≥1 be a countable local basis at the origin of X. We obtain X

U n = {x ∈ B1 : |⟨x∗ , x⟩| < ε n for all x∗ ∈ F ∗n } ,

n∈ℕ

(3.4.6)

with F ∗n ⊆ X ∗ finite and ε1 , . . . , ε n > 0. Let E∗ = ⋃n≥1 F ∗n . Then E∗ ⊆ X ∗ is countable and so spanE∗ is separable. We will show that X ∗ = spanE∗ . Arguing by contradiction, suppose that there exists x̂ ∗ ∈ X ∗ \ spanE∗ . Let d = d(x∗ , spanE∗ ). Then we can find x̂ ∗∗ ∈ X ∗∗ such that 1 ̂ ∗∗ x ∗∗ = , d

x̂ ∗∗ (spanE∗ ) = 0

and

⟨x̂ ∗∗ , x̂ ∗ ⟩ = 1 ;

(3.4.7)

see Proposition 3.1.50. We introduce X

V = {x ∈ B1 : |⟨x̂ ∗ , x⟩|

0 there X is a weakly compact set K ε ⊆ X such that C ⊆ K ε + εB1 , then C is weakly compact. Proof. Viewing C as a subset of X ∗∗ via the canonical embedding, we directly obtain C

w∗

X

⊆ K ε + εB1

w∗

w∗

X

= K ε + εB1

w∗

X ∗∗

= K ε + εB1

,

222 | 3 Basic Functional Analysis since K ε is w-compact and due to Theorem 3.3.37. Therefore C

w∗

X ∗∗

⊆ ⋂ (K ε + εB1 ) ⊆ X , ε>0

which shows that C is w-compact since C is w-closed. Continuing with weakly compact sets, we show that this property is preserved if we take the closed convex hull of the set. Proposition 3.4.20. If X is a Banach space and C ⊆ X is w-compact, then conv C ⊆ X is w-compact as well. Proof. Let x∗ ∈ X ∗ . Then sup [⟨x∗ , x⟩ : x ∈ C] = sup [⟨x∗ , u⟩ : u ∈ conv C] .

(3.4.9)

Because C ⊆ X is w-compact, there exists x̂ ∈ C such that ⟨x∗ , x⟩̂ = sup [⟨x∗ , x⟩ : x ∈ C] . This implies, due to (3.4.9), that ⟨x∗ , x⟩̂ = sup [⟨x∗ , u⟩ : u ∈ conv C] . Since x∗ ∈ X ∗ is arbitrary, invoking James’s Theorem (see Theorem 3.3.41), we conclude that conv C is w-compact. Note that conv C is w-closed by Theorem 3.3.18. Next we introduce some new classes of Banach spaces based on some geometric properties of the unit ball. Definition 3.4.21. Let X be a Banach space. (a) We say that X is strictly convex if for all x, u ∈ X with x ≠ u and ‖x‖ = ‖u‖ = 1 it holds ‖(1 − t)x + tu‖ < 1 for all t ∈ (0, 1). (b) We say that X is uniformly convex if for every ε > 0 there exists δ = δ(ε) > 0 such that 1 ‖x + u‖ ≤ 1 − δ . x, u ∈ X, ‖x‖ ≤ 1, ‖u‖ ≤ 1, ‖x − u‖ ≥ ε imply 2 (c) We say that X is locally uniformly convex if for every ε > 0 and x ∈ X with ‖x‖ = 1 there exists δ = δ(ε, x) > 0 such that u ∈ X, ‖u‖ = 1, ‖x − u‖ ≥ ε

imply

1 ‖x + u‖ ≤ 1 − δ . 2

Remark 3.4.22. Evidently it holds Uniformly convex ⇒ Locally uniformly convex ⇒ Strictly convex . Note that these implications are not reversible in general. For finite dimensional spaces, the three notions are equivalent.

3.4 Separable and Reflexive Banach Spaces | 223

Proposition 3.4.23. Let X be a Banach space. The following properties are equivalent: (a) X is strictly convex. (b) The boundary of the unit ball called the unit sphere contains no line segments. (c) x ≠ u and ‖x‖ = ‖u‖ = 1 implies ‖x + u‖ < 2. (d) If ‖x − y‖ = ‖x − u‖ + ‖u − y‖ for x, u, y ∈ X, then there exists t ∈ [0, 1] such that u = (1 − t)x + ty. X (e) Every x∗ ∈ X ∗ \ {0} attains its supremum on B1 on at most one point. Proof. (a) ⇒ (b): This is obvious from Definition 3.4.21(a). (b) ⇒ (a): Arguing by contradiction suppose that we can find x, u ∈ X, x ≠ u, ‖x‖ = ‖u‖ = 1 and t0 ∈ (0, 1) such that ‖(1 − t0 )x + t0 u‖ = 1. Let t ∈ (0, t0 ). Then we obtain (1 − t0 )x + t0 u =

1 − t0 t0 − t ((1 − t)x + tu) + u, 1−t 1−t

which gives

1 − t0 t0 − t ‖(1 − t)x + tu‖ + . 1−t 1−t Hence ‖(1 − t)x + tu‖ ≥ 1 and so ‖(1 − t)x + tu‖ = 1. Similarly we treat the case t ∈ (t0 , 1). Therefore the line segment [x, u] is on the unit sphere of X, a contradiction to the hypothesis. (a) ⇒ (c) and (c) ⇒ (b): These implications are obvious. (a) ⇒ (d): Let x, u, y ∈ X be such that ‖x − y‖ = ‖x − u‖ + ‖u − y‖. We may assume that ‖x − u‖ ≠ 0, ‖u − y‖ ≠ 0 and ‖x − u‖ ≤ ‖u − y‖. Then we derive 1≤

1 x − u 1 u − y + 2 ‖x − u‖ 2 ‖u − y‖ 1 x − u 1 u − y 1 u − y 1 u − y + − ≥ − 2 ‖x − u‖ 2 ‖x − u‖ 2 ‖x − u‖ 2 ‖u − y‖ 1 ‖x − y‖ 1 ‖u − y‖ − ‖x − u‖ − = 2 ‖x − u‖ 2 ‖x − u‖ 1 1 = 2‖x − u‖ = 1 . 2 ‖x − u‖ Hence we obtain

which finally gives

x − u u − y + = 2, ‖x − u‖ ‖u − y‖ u−y x−u = . ‖x − u‖ ‖u − y‖

Therefore u = (1 − t)x + ty with t = (‖x − u‖)/(‖x − y‖) ∈ (0, 1). (d) ⇒ (c): Let x, y ∈ X, x ≠ y with ‖x‖ = ‖y‖ = 1/2‖x + y‖ = 1. Then ‖x + y‖ = ‖x‖ + ‖y‖, which gives u = 0 = (1 − t)x − ty for some t ∈ (0, 1). Hence x = t/(1 − t)y and so t = 1/2, that is, x = y, a contradiction. Therefore we conclude that ‖x + y‖ < 2.

224 | 3 Basic Functional Analysis (a) ⇒ (e): Let x∗ ∈ X ∗ \{0}, and suppose that there exist x, u ∈ X with ‖x‖ = ‖u‖ = 1 such that ⟨x∗ , x⟩ = ⟨x∗ , u⟩ = ‖x∗ ‖∗ . For t ∈ (0, 1) it follows that ‖x∗ ‖∗ = (1 − t)⟨x∗ , x⟩ + t⟨x∗ , u⟩ = ⟨x∗ , (1 − t)x + tu⟩ ≤ ‖x∗ ‖∗ ‖(1 − t)x + tu‖ , which implies 1 ≤ ‖(1 − t)x + tu‖ < 1, a contradiction. Thus, x∗ ∈ X ∗ \ {0} has at most one maximizer on the closed unit ball of X. (e) ⇒ (c): Suppose that there are x, u ∈ X, x ≠ u with ‖x‖ = ‖u‖ = 1, ‖x + u‖ = 2. Invoking Proposition 3.1.50, there exists x∗ ∈ X ∗ such that ‖x∗ ‖∗ = 1 and ⟨x∗ , 1/2(x+u)⟩ = 1/2‖x + u‖ = 1. Hence ⟨x∗ , x⟩ + ⟨x∗ , u⟩ = 2 .

(3.4.10)

It holds ⟨x∗ , x⟩ ≤ 1 and ⟨x∗ , u⟩ ≤ 1. So, from (3.4.10) it follows that ⟨x∗ , x⟩ = ⟨x∗ , u⟩ = 1, which contradicts the hypothesis. Therefore ‖x + u‖ < 2. From the proposition above and its proof we directly obtain the following corollary. Corollary 3.4.24. Let X be a Banach space. The following properties are equivalent: (a) X is strictly convex. (b) If x, u ∈ X, ‖x‖ = ‖u‖ = 1 and ‖x + u‖ = 2, then x = u. (c) If x, u ∈ X satisfy 2‖x‖2 + 2‖u‖2 = ‖x + u‖2 , then x = u. (d) If x, u ∈ X \ {0} satisfy ‖x + u‖ = ‖x‖ + ‖u‖, then x = tu for some t > 0. A sequential reformulation of Definition 3.4.21(b),(c) gives the following characterization of uniform convexity and local uniform convexity. Proposition 3.4.25. Let X be a Banach space. X (a) X is uniformly convex if and only if for every {x n }n≥1 , {u n }n≥1 ⊆ B1 such that ‖x n + u n ‖ → 2, we have ‖x n − u n ‖ → 0 as n → ∞. (b) X is locally uniformly convex if and only if for any x ∈ X, ‖x‖ = 1 and for every sequence {x n }n≥1 ⊆ X with ‖x n ‖ = 1 for all n ∈ ℕ such that ‖x n + x‖ → 2, we have ‖x n − x‖ → 0. Remark 3.4.26. In the characterizations above, the sequence can be replaced by nets. Another characterization of uniform convexity is given by the next proposition. Proposition 3.4.27. If X is a Banach space, then X is uniformly convex if and only if for every sequences {x n }n≥1 , {u n }n≥1 ⊆ X with {x n }n≥1 bounded such that 2‖x n ‖2 + 2‖u n ‖2 − ‖x n + u n ‖2 → 0 as n → ∞ , we have ‖x n − u n ‖ → 0 as n → ∞. Proof. ⇒: Note that (‖x n ‖ − ‖u n ‖)2 = 2‖x n ‖2 + 2‖u n ‖2 − (‖x n ‖ + ‖u n ‖)2 ≤ 2‖x n ‖2 + 2‖u n ‖2 − ‖x n + u n ‖2

3.4 Separable and Reflexive Banach Spaces | 225

for all n ∈ ℕ. Hence ‖x n ‖ − ‖u n ‖ → 0 as n → ∞. Therefore if ‖x n ‖ → 0 or ‖u n ‖ → 0, then ‖x n − u n ‖ → 0. So we may assume that there exists ε > 0 such that ‖x n ‖ ≥ ε

and ‖u n ‖ ≥ ε

for all n ∈ ℕ .

Let y n = x n /‖x n ‖, v n = u n /‖u n ‖ with n ∈ ℕ. Then ‖y n ‖ = ‖v n ‖ = 1 for all n ∈ ℕ and ‖y n + v n ‖ → 2. It follows that ‖y n − v n ‖ → 0 and so ‖x n − u n ‖ → 0. ⇐: This implication is obvious; see Proposition 3.4.25. Uniformly convex Banach spaces are reflexive. The result is known as the “Milman– Pettis Theorem.” Theorem 3.4.28 (Milman–Pettis Theorem). If X is a uniformly convex Banach space, then X is reflexive. X ∗∗

Proof. Let x∗∗ ∈ B1 . Invoking the Goldstine’s Theorem (see Theorem 3.3.37), we X

w∗

can find a net {x α }α∈I ⊆ B1 such that x α → x∗∗ in X ∗∗ . Exploiting the w∗ -lower semicontinuity of the norm ‖ ⋅ ‖∗∗ on X ∗∗ (see Proposition 3.3.31(c)), we see that ‖x α + x β ‖ → 2. Applying Proposition 3.4.25(a) gives ‖x α − x β ‖ → 0, which implies that {x α }α∈I ⊆ X is a Cauchy net. The completeness of X implies that x α → x∗∗ ∈ X and so X = X ∗∗ , that is, X is reflexive. In Remark 3.3.17 we mentioned that in the Banach space l1 for sequences, weak and norm convergences are equivalent. More generally, any Banach space having this property is said to have the Schur property. Example 3.4.29. The Banach space (in fact Hilbert space; see Section 3.5) l2 = {x̂ = (x n )n≥1 ∈ ℝℕ : ∑n≥1 x2n < ∞} does not have the Schur property. Since l2 is a Hilbert space, we have (l2 )∗ = l2 , see Theorem 3.5.21. Let e n = (0, . . . , 1, 0, . . .) with 1 at the w∗

nth-spot. Then for every x̂ ∗ ∈ (l2 )∗ = l2 we have ⟨x∗ , e n ⟩ → 0, that is, e n → 0. On the other hand ‖e n ‖ = 1 for all n ∈ ℕ and so e n ↛ 0 in the norm topology. However l2 as well as every Hilbert space has the following weakened version of the Schur property. Definition 3.4.30. A normed space X is said to have the Kadec–Klee property if it satisfies the following condition: w

For every sequence {x n }n≥1 ⊆ X such that x n → x in X and ‖x n ‖ → ‖x‖, we have x n → x in X. Remark 3.4.31. The names Radon–Riesz property or property (H) are also used in the literature. Proposition 3.4.32. If X is a locally uniformly convex Banach space, then X has the Kadec–Klee property.

226 | 3 Basic Functional Analysis w

Proof. Consider x n → x in X. Evidently we may assume that x ≠ 0. Let u ∈ X, u ≠ 0. w Let y n = x n /‖x n ‖, y = x/‖x‖ with n ∈ ℕ. Then ‖y n ‖ = ‖y‖ = 1 for all n ∈ ℕ and y n → y in X. Hence 2 = 2‖y‖ ≤ lim inf ‖y n + y‖ ≤ lim sup ‖y n + y‖ ≤ lim ‖y n ‖ + ‖y‖ = 2 ; n→∞

n→∞

n→∞

see Proposition 3.3.13(c). Then limn→∞ ‖y n + y‖ = 2. Proposition 3.4.25(b) implies that ‖y n − y‖ → 0 since X is locally uniformly convex.

3.5 Hilbert Spaces In this section we turn our attention to Hilbert spaces, which are Banach spaces with some additional structure, resulting from the presence of an inner product. The inner product supplies a very rich structure, which leads to important simplifications and makes Hilbert spaces the infinite dimensional analog of Euclidean spaces. Definition 3.5.1. Let H be a vector space over the field 𝔽 with 𝔽 = ℝ or 𝔽 = ℂ. An inner product on X is a map (⋅, ⋅) : H × H → 𝔽 such that (a) (λx + u, y) = λ(x, y) + (u, y) for all x, u, y ∈ H and for all λ ∈ 𝔽 (linearity); (b) (x, u) = (u, x) for all x, u ∈ H (conjugate symmetry); (c) (x, x) ≥ 0 and (x, x) = 0 if and only if x = 0 (positive definiteness). Remark 3.5.2. Linearity in (a) in fact means linearity in the first argument. In the second argument the map is conjugate linear. Property (b) is sometimes called Hermitian symmetry. The next result is of fundamental importance and is known as the “Cauchy–BunyakowskySchwarz inequality.” Proposition 3.5.3 (Cauchy–Bunyakowsky-Schwarz inequality). If H is a vector space with inner product (⋅, ⋅), then |(x, u)|2 ≤ (x, x)(u, u) for all x, u ∈ H. Proof. Let x, u ∈ H and let λ ∈ 𝔽. Then it follows that 0 ≤ (x − λu, x − λu) = (x, x) − λ(x, u) − λ(x, u) + |λ|2 (u, u) . Choosing λ = (x, u)/ϑ with ϑ > 0 results in 0 ≤ (x, x) −

1 (u, u) (2 − ) |(x, u)|2 . ϑ ϑ

If u ≠ 0, then choose ϑ = (u, u) to get the desired inequality. If u = 0, then (x, u) = 0 and so the inequality holds trivially. Proposition 3.5.4. If H is a vector space with inner product (⋅, ⋅), then ‖x‖ = (x, x)1/2 for all x ∈ H defines a norm on H.

3.5 Hilbert Spaces | 227

Proof. We only need to verify the triangle inequality. So, let x, u ∈ H. Then, using Proposition 3.5.3, it follows that ‖x + u‖2 = (x + u, x + u) = (x, x) + (x, u) + (u, x) + (u, u) = ‖x‖2 + 2 Re(x, u) + ‖u‖2 ≤ ‖x‖2 + 2|(x, u)| + ‖u‖2 ≤ ‖x‖2 + 2‖x‖‖u‖ + ‖u‖2 = (‖x‖ + ‖u‖)2 . This shows the assertion. Remark 3.5.5. A vector space with an inner product will be referred as an inner product space. Usually we will not explicitly mention the inner product unless we want to distinguish between different inner products defined on H. The norm ‖ ⋅ ‖ defined in Proposition 3.5.4 is the norm defined (induced or generated) by the inner product (⋅, ⋅). At this point it is natural to ask when a norm is defined by an inner product. The next proposition will lead to a necessary and sufficient condition for this to happen. Proposition 3.5.6. If H is an inner product space, then the following hold: (a) Parallelogram law: For all x, u ∈ H we have ‖x + u‖2 + ‖x − u‖2 = 2 (‖x‖2 + ‖u‖2 ) . (b) Polarization identities: For all x, u ∈ H we have (x, u) =

1 [‖x + u‖2 − ‖x − u‖2 + i‖x + iu‖2 − i‖x − iu‖2 ] 4

if 𝔽 = ℂ ,

(x, u) =

1 [‖x + u‖2 − ‖x − u‖2 ] 4

if 𝔽 = ℝ .

Proof. (a) For all x, u ∈ H and for all λ ∈ 𝔽 one gets ‖x + λu‖2 = ‖x‖2 + 2 Re(λ(x, u)) + |λ|2 ‖u‖2 = ‖x‖2 + 2[ Re λ Re(x, u) − im λ im(x, u)] + |λ|2 ‖u‖2 .

(3.5.1)

Choosing λ = 1 and λ = −1 in (3.5.1) and adding these equalities, we obtain the desired parallelogram law. (b) Choosing λ = 1 and λ = −1 in (3.5.1) and subtracting, we get the real polarization identity, that is, the case 𝔽 = ℝ. Choosing λ = i and λ = −i in (3.5.1) and subtracting, we obtain the complex polarization identity, that is, the case 𝔽 = ℂ. The next theorem provides a necessary and sufficient condition for a norm to be generated by an inner product. For a proof of this result, we refer to Weidmann [307, p. 9]. Theorem 3.5.7. A norm on a vector space H is defined by an inner product if and only if it satisfies the parallelogram law. Moreover, if the norm on H satisfies the parallelogram law, then the unique inner product defining the norm is given by the polarization identities; see Proposition 3.5.6(b).

228 | 3 Basic Functional Analysis Definition 3.5.8. A Hilbert space is a complete inner product space. Remark 3.5.9. So, according to Theorem 3.5.7, a Hilbert space is a Banach space whose norm satisfies the parallelogram law. Theorem 3.5.10. Every Hilbert space H is uniformly convex, hence reflexive; see Theorem 3.4.28. Proof. Let ε > 0 and let x, u ∈ H with ‖x‖ ≤ 1, ‖u‖ ≤ 1 and ‖x − u‖ ≥ ε. Using the parallelogram law (see Proposition 3.5.6(a)), we derive ‖(x + u)/2‖2 ≤ 1 − ε2 /4, which implies that 1 ‖x + u‖ ≤ 1 − δ 2

1

with

ε2 2 δ = 1 − (1 − ) > 0 . 4

Therefore H is uniformly convex; see Definition 3.4.21(b). The next notion is particular to inner product spaces and gives them the extra structure with respect to general Banach spaces. Definition 3.5.11. Let H be an inner product space and x, u ∈ H. We say that x, u are orthogonal denoted by x⊥u if (x, u) = 0. If x ∈ H and C ⊆ H, then we say that x is orthogonal to C denoted by x⊥C if x⊥u for all u ∈ C. Finally if C, D ⊆ X, we say that the two sets are orthogonal, denoted by C⊥D if x⊥u for all x ∈ C and for all u ∈ D. We say that C ⊆ X is an orthogonal set if x⊥u for all x, u ∈ C with x ≠ u. Remark 3.5.12. Clearly, x⊥u if and only if u⊥x. Hence C⊥D if and only if D⊥C. Moreover, C⊥D implies C ∩ D = {0}. The next result is an extension of the classical “Pythagorean Theorem.” Theorem 3.5.13 (Generalized Pythagorean Theorem). If H is an inner product space and {x k }nk=0 ⊆ H is a finite orthogonal set, then n 2 n ∑ x k = ∑ ‖x k ‖2 . k=0 k=0 Proof. First suppose that n = 1, that is, we have a pair x0 , x1 ∈ X of orthogonal vectors. Since x0 ⊥x1 we derive ‖x0 + x1 ‖2 = (x0 + x1 , x0 + x1 ) = ‖x0 ‖2 + 2 Re(x0 , x1 ) + ‖x1 ‖2 = ‖x0 ‖2 + ‖x1 ‖2 . So, the result holds for n = 1. Proceeding by induction, suppose that it holds for some n ∈ ℕ, that is n 2 n ∑ x k = ∑ ‖x k ‖2 k=0 k=0

for every orthogonal set {x k }nk=0 ⊆ H .

(3.5.2)

3.5 Hilbert Spaces | 229

n Let {x k }n+1 k=0 ⊆ H be an arbitrary orthogonal set. Since x n+1 ⊥{x k }k=0 it follows that n x n+1 ⊥ ∑k=0 x k , and hence

2 n 2 n+1 2 n n+1 ∑ x k = ∑ x k + x n+1 = ∑ x k + ‖x n+1 ‖2 = ∑ ‖x k ‖2 ; k=0 k=0 k=0 k=0 see (3.5.2). So, the induction is complete and the Generalized Pythagorean Theorem holds. We can state an infinite version of the Pythagorean Theorem. Theorem 3.5.14. If H is an inner product space and {x k }k≥1 ⊆ H is an orthogonal sequence, then the following hold: 2 (a) ∑k≥1 x k exists in X implies that ∑k≥1 ‖x k ‖2 < ∞ and ∑k≥1 x k = ∑k≥1 ‖x k ‖2 . (b) If H is a Hilbert space and ∑k≥1 ‖x k ‖2 < ∞, then ∑k≥1 x k exists in H. Proof. (a) By hypothesis we have n

∑ xk → ∑ xk k=1

in H as n → ∞ ,

k≥1

which implies that 2 n 2 ∑ x k → ∑ x k k≥1 k=1 From Theorem 3.5.13 we obtain n 2 n ∑ x k = ∑ ‖x k ‖2 k=1 k=1 Hence, due to (3.5.3)

for every n ∈ ℕ .

2 ∑ ‖x k ‖ → ∑ x k k≥1 k=1 n

Therefore

as n → ∞ .

2

as n → ∞ .

2 ∑ x = ∑ ‖x ‖2 < ∞ . k k k≥1 k≥1

(b) For m > n, it holds m 2 m 2 n m ∑ x k − ∑ x k = ∑ x k = ∑ ‖x k ‖2 ; k=1 k=1 k=n+1 k=n+1 see Theorem 3.5.13. Hence m 2 n ∑ x k − ∑ x k → 0 k=1 k=1

as n → ∞ .

(3.5.3)

230 | 3 Basic Functional Analysis Therefore, {∑nk=1 x k }n∈ℕ ⊆ H is a Cauchy sequence. Since H is a Hilbert space it follows that ∑nk=1 x k → ∑k≥1 x k in H as n → ∞. Corollary 3.5.15. If H is a Hilbert space and {x k }k≥1 ⊆ H is an orthogonal sequence, then ∑k≥1 ‖x k ‖2 < +∞ if and only if ∑k≥1 x k exists in H and ‖ ∑k≥1 x k ‖2 = ∑k≥1 ‖x k ‖2 . Example 3.5.16. Two classical examples of Hilbert spaces are the following ones: (a) ℝN equipped with the Euclidean inner product N

(x,̂ u)̂ = ∑ x k u k

with x̂ = (x k )Nk=1 , û = (u k )Nk=1 ∈ ℝN .

k=1

(b) The Banach space l2 = {x̂ = (x k )k≥1 ∈ ℝℕ : ∑k≥1 x2k < ∞} equipped with the inner product (x,̂ u)̂ = ∑ x k u k for all x,̂ û ∈ l2 . k≥1

Remark 3.5.17. The other sequence Banach spaces l p = {x̂ = (x k )k≥1 ∈ ℝℕ : ∑k≥1 |x k |p < ∞} with 1 < p < ∞ and p ≠ 2 are not Hilbert spaces. We can easily see that the parallelogram law fails; see Theorem 3.5.7. Now we present a basic property of closed convex sets in a Hilbert space. From now on all Hilbert spaces considered are real, that is, 𝔽 = ℝ. Theorem 3.5.18. If H is a Hilbert space and C ⊆ H is nonempty, closed, and convex, then for any given x ∈ H there exists a unique element p C (x) ∈ C such that ‖x − p C (x)‖ ≤ ‖x − u‖ for all u ∈ C. Proof. By translating things if necessary, we assume that x = 0. Let η = inf[‖u‖ : u ∈ C] and consider the minimizing sequence {u n }n≥1 ⊆ C, that is, ‖u n ‖ ↘ η as n → ∞. From the parallelogram law (see Proposition 3.5.6(a)), one gets for m > n, that u m + u n 2 ≤ 2‖u ‖2 + 2‖u ‖2 − 4η2 , ‖u m − u n ‖2 = 2‖u m ‖2 + 2‖u n ‖2 − 4 m n 2 since C is convex. Hence ‖u m − u n ‖2 → 0 as m, n → ∞ and so, {u n }n≥1 ⊆ C is a Cauchy sequence. Thus u n → u in H and ‖u‖ = η. Now we show the uniqueness of this best approximation (minimum norm) point u. Suppose that some v ∈ C satisfies ‖v‖ = η. A new application of the parallelogram law gives u + v 2 ≤ 4η2 − 4η2 = 0 , 0 ≤ ‖u − v‖2 = 2‖u‖2 + ‖v‖2 − 4 2 recall again the convexity of C. Then u = v. So, u = p C (x) is the unique best approximation of x in C. Definition 3.5.19. The map p C : H → C assigning to each x ∈ H its unique best approximation from C is called the metric projection of H onto C.

3.5 Hilbert Spaces | 231

The next proposition establishes the main properties of the metric projection map. Proposition 3.5.20. If H is a Hilbert space, C ⊆ H is nonempty, closed, and convex, and p C : X → C is the metric projection map. Then the following hold: (a) p C C = idC ; (b) if x ∈ X \ C, then p C (x) ∈ bd C; (c) (x − p C (x), u − p C (x)) ≤ 0 for all u ∈ C; (d) ‖p C (x) − p C (y)‖ ≤ ‖x − y‖ for all x, y ∈ H; (e) if C is a closed vector subspace of H, then x − p C (x)⊥C and p C ∈ L(H). Proof. (a) This is obvious. (b) Let t ∈ (0, 1) and let x t = (1 − t)x + tp C (x). We get ‖x − x t ‖ = t‖x − p C (x)‖ < ‖x − p C (x)‖ . So, if p C (x) ∈ int C, then for t ∈ (0, 1) close to 1 it follows x t ∈ C, a contradiction. Hence p C (x) ∈ bd C. (c) Let x ∈ X, u ∈ C and t ∈ (0, 1). The convexity of C implies ‖x − p C (x)‖2 ≤ ‖x − ((1 − t)p C (x) + tu)‖2 = ‖x − p C (x) − t(u − p C (x))‖2 = ‖x − p C (x)‖2 − 2t(x − p C (x), u − p C (x)) + t2 ‖u − p C (x)‖2 , which implies 2(x − p C (x), u − p C (x)) ≤ t‖u − p C (x)‖2 . We let t ↘ 0 and obtain (x − p C (x), u − p C (x)) ≤ 0 for all u ∈ C. (d) Let x, y ∈ H. Using part (c) with u = p C (y) ∈ C it follows that (x − p C (x), p C (y) − p C (x)) ≤ 0 .

(3.5.4)

Reversing the roles of x, y ∈ H we also obtain (y − p C (y), p C (x) − p C (y)) ≤ 0 .

(3.5.5)

Adding (3.5.4) and (3.5.5) yields (x − y, p C (y) − p C (x)) + (p C (y) − p C (x), p C (y) − p C (x)) ≤ 0 , which leads to

‖p C (x) − p C (y)‖2 ≤ ‖x − y‖‖p C (x) − p C (y)‖ ;

see Proposition 3.5.3. This finally gives ‖p C (x) − p C (y)‖ ≤ ‖x − y‖ for all x, y ∈ H. (e) For every u ∈ C and ϑ ∈ ℝ we get ‖x − p C (x)‖2 ≤ ‖x − [p C (x) + ϑ(±u)]‖2 = ‖x − p C (x)‖2 ∓ 2ϑ(x − p C (x), u) + ϑ2 ‖u‖2 , which turns into

±2(x − p C (x), u) ≤ ϑ‖u‖2 .

232 | 3 Basic Functional Analysis Letting ϑ ↘ 0, it results in ±(x − p C (x), u) ≤ 0 for all u ∈ C and so (x − p C (x), u) = 0 for all u ∈ H since C ⊆ H is a subspace. This gives x − p C (x)⊥C .

(3.5.6)

Finally note that using (3.5.6), for all u, x ∈ H, leads to (p C (x + y) − (p C (x) + p C (y)), u) = 0

for all u ∈ C .

Hence, p C (x + y) = p C (x) + p C (y), that is, p C is additive. Clearly p C (0) = 0 and for all λ ∈ ℝ \ {0} it follows that (p C (λx) − λp C (x), u) = 0 for all u ∈ C, which shows that p C (λx) = λp C (x), that is, p C is homogeneous. Therefore p C ∈ L(H). A remarkable application of this result is a characterization of the topological dual of a Hilbert space. The result is known as the “Riesz-Fréchet Representation Theorem for Hilbert Spaces.” Theorem 3.5.21 (Riesz-Fréchet Representation Theorem for Hilbert Spaces). If H is a Hilbert space and x∗ ∈ H ∗ , then there exists a unique x0 ∈ H such that ⟨x∗ , y⟩ = (x0 , y) for all y ∈ H and ‖x∗ ‖∗ = ‖x0 ‖. Proof. Let V = (x∗ )−1 (0). This is a closed subspace of H. We may assume that V ≠ H otherwise x∗ = 0 and the result is trivially true with x0 = 0. Let u0 ∈ H \ V, u1 = p V (u0 ) and u = (u0 − u1 )/(‖u0 − u1 ‖). Then ‖u‖ = 1 and (u, x) = 0 for all x ∈ V; see Proposition 3.5.20(e). Therefore u ∈ ̸ V. For any y ∈ H, we set z = y − tu

with

t=

⟨x∗ , y⟩ . ⟨x∗ , u⟩

Note that ⟨x∗ , u⟩ ≠ 0 since u ∈ ̸ V. Then ⟨x∗ , z⟩ = 0 and so z ∈ V. Therefore (u, z) = 0, which implies that ⟨x∗ , y⟩ = ⟨x∗ , u⟩(u, y) for all y ∈ H .

(3.5.7)

So if we set x0 = ⟨x∗ , u⟩u, then it follows that ⟨x∗ , y⟩ = (x0 , y) for all y ∈ H. Clearly this x0 is unique. Evidently, thanks to Proposition 3.5.3 one gets ‖x0 ‖ ≤ |⟨x∗ , u⟩|‖u‖ = |⟨x∗ , u⟩| ≤ ‖x∗ ‖∗ . Moreover, from (3.5.7) and Proposition 3.5.3 we conclude that ‖x∗ ‖∗ ≤ |⟨x∗ , u⟩|‖u‖ = ‖x0 ‖, which implies ‖x∗ ‖∗ = ‖x0 ‖. Remark 3.5.22. According to this theorem there is a surjective linear isometry from H ∗ into H. This means that we can identify H ∗ with H, that is, a Hilbert space is self-dual. However, it is not always possible to do this identification. This is the case of evolution triples, which we will discuss in Section 4.2. Definition 3.5.23. Let H be a Hilbert space and C ⊆ H. The orthogonal complement C⊥ of C is the set C⊥ = {x ∈ H : (x, u) = 0 for all u ∈ C} .

3.5 Hilbert Spaces | 233

On account of Theorem 3.5.21 the orthogonal complement of C is simply the annihilator of C introduced in Definition 3.2.24. Evidently C⊥ is a closed vector subspace of H. Moreover, C⊥⊥ = (C⊥ )⊥ . Remark 3.5.24. Clearly {0}⊥ = H, H ⊥ = {0}. Moreover C⊥C⊥ , C ∩ C⊥ ⊆ {0} and if 0 ∈ C, then C ∩ C⊥ = {0}. Also, if C, D ⊆ H are nonempty sets, then C⊥D if and only if C ⊆ D⊥ . Since ⊥ is a symmetric relation, that is, C⊥D if and only if D⊥C, we also obtain that D ⊆ C⊥ . Moreover, C⊥D implies that C ∩ D ⊆ {0}. We can easily see that C ⊆ D implies that D⊥ ⊆ C⊥ and C⊥⊥ ⊆ D⊥⊥ , C⊥ = (span C)⊥ = (spanC)⊥ .

(3.5.8)

In addition, since C⊥C⊥ and C⊥ ⊥C⊥⊥ , we derive that C ⊆ C⊥⊥ and C⊥ ⊆ C⊥⊥⊥ , here C⊥⊥⊥ = (C⊥⊥ )⊥ . Therefore we have C ⊆ C⊥⊥ and C⊥ = C⊥⊥⊥ . Finally, if C ⊆ H is a vector subspace, then C⊥⊥ = C and C⊥ = {0} if and only if C is dense in H. Proposition 3.5.25. If H is a Hilbert space and V is a closed vector subspace of H, then H = V ⊕ V ⊥ ; see Definition 3.2.27. Proof. It is easy to see that V ⊕ V ⊥ is a closed vector subspace of H. Suppose that H ≠ V ⊕ V ⊥ . Then there exists u ∈ H, u ≠ 0 such that u⊥V ⊕ V ⊥ . We have u ∈ V ⊥ ∩ V ⊥⊥ = {0}, a contradiction. Therefore, H = V ⊕ V ⊥ . From Propositions 3.5.20 and 3.5.25 we infer at once the so-called “Projection Theorem.” Theorem 3.5.26 (Projection Theorem). If H is a Hilbert space and V is a closed vector subspace of H, then there exists a unique pair of continuous linear operators P : H → V and Q : H → V ⊥ such that (a) x ∈ V implies that P(x) = x, Q(x) = 0 and y ∈ V ⊥ implies that P(y) = 0, Q(y) = y; (b) P(x) = p V (x) and Q(x) = p V ⊥ (x); (c) for all x ∈ H one has ‖x‖2 = ‖P(x)‖2 + ‖Q(x)‖2 . Now we turn our attention to orthogonal sets that lead to bases for Hilbert spaces. First we recall the following basic notion from linear algebra. Definition 3.5.27. Let X be a vector space and C ⊆ X. We say that C is linearly independent if every x ∈ C is not a linear combination of vectors in C \ {x}, that is, x ∈ ̸ span[C \ {x}]. A set C ⊆ X that is not linearly independent is said to be linearly dependent. Remark 3.5.28. The empty set 0 is linearly independent. Also, the singleton C = {x}, x ≠ 0 is linearly independent. Any set C ⊆ X, which contains the origin, is linearly independent. Finally, C ⊆ X is linearly independent if and only if every finite subset of C is linearly independent. Proposition 3.5.29. If H is an inner product space and C ⊆ H is an orthogonal set consisting of nonzero vectors, then C is linearly independent.

234 | 3 Basic Functional Analysis Proof. Arguing by contradiction, suppose that there is a sequence {x k }nk=0 with n ≥ 1 such that x0 = ∑nk=1 λ k x k with λ k ∈ ℝ, k = 1, . . . , n; see Remark 3.5.28. Exploiting the orthogonality of the set C yields ‖x0 ‖2 = ∑nk=1 λ k (x k , x0 ) = 0, a contradiction. Definition 3.5.30. Let H be an inner product space and C ⊆ X. We say that C is an orthonormal set if it is an orthogonal set consisting of vectors with unit norm, that is, unit vectors. Remark 3.5.31. Every orthogonal set consisting of nonzero vectors can be normalized. Indeed, if C is an orthogonal set such that x ≠ 0 for all x ∈ C, then {x/‖x‖ : x ∈ C} is an orthonormal set. From Proposition 3.5.29 we directly obtain the following result. Proposition 3.5.32. If H is an inner product space and C ⊆ H is an orthonormal set, then C is linearly independent. The next proposition is an immediate consequence of Definition 3.5.30. Proposition 3.5.33. If H is an inner product space, C ⊆ X is an orthonormal set, and x ∈ H with ‖x‖ = 1 and x⊥C, then C ∪ {x} is an orthonormal set as well. Definition 3.5.34. Let H be an inner product space and let L be the family of all orthonormal subsets of H. Evidently L ≠ 0 since C = {x} with ‖x‖ = 1 is orthonormal. A set C ∈ L is maximal orthonormal if there is no set C ∈ L such that C ≠ C and C ⊆ C . Proposition 3.5.35. If H is an inner product space and C ⊆ H is an orthonormal set, then the following statements are equivalent: (a) C is a maximal orthonormal set. (b) There is no unit vector x ∈ X, that is, ‖x‖ = 1, such that C ∪ {x} is an orthonormal set. (c) C⊥ = {0}. Proof. (a) ⇒ (b): Otherwise C ∪ {x} contradicts the maximality of C. (b) ⇒ (c): Let y ∈ H be such that y⊥C. Let x = y/‖y‖. Then ‖x‖ = 1 and C ∪ {x} is an orthonormal set, a contradiction. (c) ⇒ (a): Arguing by contradiction, suppose that there exists an orthonormal set C ⊆ X such that C \ C ≠ 0. Let x ∈ C \ C. Then ‖x‖ = 1 with x⊥C, a contradiction. Proposition 3.5.36. If H is an inner product space and C ⊆ H is an orthonormal set, then there exists a maximal orthonormal set C0 ⊆ H such that C ⊆ C0 . Proof. Let LC = {D ∈ 2H : D is an orthonormal set, C ⊆ D} and let D be a chain in LC . Let ⋃ D = ⋃D∈D D and consider x, u ∈ ⋃ D with x ≠ u. Then x ∈ D x ∈ D and u ∈ D u ∈ D. Since D is a chain, we may assume that D x ⊆ D u . Hence x, u ∈ D u and so ⋃ D ∈ LC . Invoking Zorn’s Lemma (see Section 1.4), LC has a maximal element C0 such that C ⊆ C0 and C0 is orthogonal. If we can find a unit vector x ∈ H such that C0 ∪ {x} is

3.5 Hilbert Spaces | 235

orthonormal, then C0 ∪ {x} ∈ LC and this contradicts the maximality of C0 . This proves that C0 is a maximal orthonormal set. Now that we have established that maximal orthonormal sets exist, we can show that they span H. Proposition 3.5.37. If H is an inner product space and C ⊆ H is an orthonormal set, then the following hold: (a) spanC = H implies that C is maximal orthonormal. (b) If H is a Hilbert space and C ⊆ X is maximal orthonormal, then spanC = H. Proof. (a) From (3.5.8) we know that C⊥ = (spanC)⊥ = H ⊥ = {0}. Then Proposition 3.5.35 implies that C is maximal orthonormal. (b) From Proposition 3.5.35 and (3.5.8), we deduce that 0 = C⊥ = (spanC)⊥ . Hence spanC = H; see Remark 3.5.24. Definition 3.5.38. Let H be an inner product space. A set B ⊆ H is an orthonormal basis of H if the following hold: (a) B is an orthonormal set. (b) spanB = H. Remark 3.5.39. According to Proposition 3.5.37, every Hilbert space admits an orthonormal basis. In fact, for Hilbert spaces, the notions of maximal orthonormal set and of orthonormal basis coincide. That is, if H is a Hilbert space, then B ⊆ H is a maximal orthonormal set if and only if B ⊆ H is an orthonormal set. In finite dimensional Hilbert space all orthonormal bases are finite and have cardinality equal to the dimension of the space. The next proposition establishes a fundamental inequality for inner product spaces known as “Bessel’s inequality.” First let us see how we interpret summation over an arbitrary index set. Definition 3.5.40. Let (X, τ) be a Hausdorff topological vector space, I be an arbitrary index set, and I ∋ α → x α ∈ X be a map. Then the sum ∑α∈I x α is defined as follows: Let F be the family of all finite subsets of I ordered by inclusion. Then ∑α∈I x α = x if and only if the net {∑α∈F x α }F∈F τ-converges to x. This is called unconditional convergence since it does not depend on any ordering on the index set I. If I = ℕ, then ∑n≥1 x n = x τ

n means that ∑m n=1 x n → x as m → ∞. Then the series ∑n≥1 (−1) 1/n is convergent but not unconditionally convergent.

Remark 3.5.41. If X = ℝ, then ∑α∈I x α = x ∈ ℝ means that for a given ε > 0 there exists a finite set F ⊆ I such that x − ∑α∈G x α ≤ ε for all finite F ⊆ G ⊆ I. On the other hand ∑α∈I x α = +∞ means that for any given M > 0 we can find a finite set F ⊆ I such that ∑α∈G x α ≥ M for all F ⊆ G ⊆ I. Also recall that absolutely convergent series can be rearranged (see Amann–Escher [8, p. 201]) and we mention a remarkable result known as the “Orlicz–Pettis Theorem,” which says that a series ∑n≥1 x n in a Banach

236 | 3 Basic Functional Analysis space X is weakly unconditionally convergent if and only if it is strongly unconditionally convergent. Finally we mention another important result, the “Dvoretzky–Rogers Theorem,” which says that if X is infinite dimensional, then there exists a sequence {x n }n≥1 ⊆ X such that ∑n≥1 x n is unconditional convergent and ∑n≥1 ‖x n ‖ = +∞. Lemma 3.5.42. If X = ℝ and {x α }α∈I ⊆ [0, +∞), then ∑ x α = sup [ ∑ x α : F ⊆ I is finite] . α∈I

α∈F

Proof. First suppose that ∑α∈I x α < +∞. Then for a given ε > 0 there exists a finite set F ⊆ I such that ∑ xα ≥ ∑ xα − ε . α∈F

α∈I

Hence, ∑ xα ≥ ∑ xα ≥ ∑ xα ≥ ∑ xα − ε α∈I

α∈G

α∈F

for all finite F ⊆ G ⊆ I .

α∈I

Therefore ∑ x α = sup [ ∑ x α : F ⊆ I finite] . α∈I

α∈F

Now assume that ∑α∈I x α = +∞. Then for any given M > 0 there exists a finite F ⊆ I such that ∑α∈F x α ≥ M, which implies that ∑α∈G x α ≥ M for all finite F ⊆ G ⊆ I. Hence, sup [ ∑ x α : F ⊆ I finite] = +∞ . α∈F

Remark 3.5.43. If I is uncountable and uncountably many x α are different from zero, then ∑α∈I x α cannot converge to a finite limit. The next result is a fundamental inequality in the theory of Hilbert spaces and is known as “Bessel’s inequality.” Proposition 3.5.44 (Bessel’s inequality). If H is an inner product space and {x α }α∈I ⊆ H is an orthonormal set, then ∑α∈I |(x, x α )|2 ≤ ‖x‖2 for all x ∈ H. Proof. On account of Lemma 3.5.42, we may assume that I is finite. Let u = ∑α∈I (x, x α )x α . Then (x, u) = ∑α∈I (x, x α )2 = (u, u); see Theorem 3.5.13. Therefore x − u⊥u and so ‖x‖2 = ‖x − u‖2 + ‖u‖2 due to the Generalized Pythagorean Theorem; see Theorem 3.5.13. Hence, ‖x‖2 ≥ ‖u‖2 = (u, u) = ∑α∈I (x, x α )2 . Corollary 3.5.45. If H is an inner product space and {x α }α∈I ⊆ H is an orthonormal set, then for every x ∈ H the set {α ∈ I : (x, x α ) ≠ 0} is countable. Remark 3.5.46. We have already mentioned that every Hilbert space has an orthonormal basis. In fact all orthonormal bases of a Hilbert space have the same cardinality, that is, every maximal orthonormal set in an inner product space has the same cardinality.

3.5 Hilbert Spaces | 237

Proposition 3.5.47. If H is a separable inner product space, then every orthonormal set in H is countable. Proof. Let B = {x α }α∈I ⊆ H be an orthonormal set and let D = {u n }n≥1 ⊆ H be dense, which is possible since H is separable. Then for any α ∈ I, B1/2 (x α ) ∩ D ≠ 0. So, there exists n α ∈ ℕ such that ‖x α − u n α ‖ ≤ 1/2. Let φ : I → ℕ be defined by φ(α) = n α . We claim that φ is injective. Using the parallelogram law and the Generalized Pythagorean Theorem, it follows that √2 = ‖x α − x β ‖ = ‖x α − u n α − x β + u n β + u n α − u n β ‖ ≤ ‖x α − u n α ‖ + ‖x β − u n β ‖ + ‖u n α − u n β ‖ ≤ 1 + ‖u n α − u n β ‖ . Hence, √2 − 1 ≤ ‖u n α − u n β ‖ for all α, β ∈ I with α ≠ β. This proves the injectivity of φ, which means that card I ≤ card ℕ and so I is countable. This leads to the following useful characterization of separable Hilbert spaces. Theorem 3.5.48. A Hilbert space H is separable if and only if it has a countable orthonormal basis. Given a linearly independent sequence one can produce an orthonormal set with the same linear span. The process to achieve this is known as the “Gram–Schmidt Orthonormalization Process.” Proposition 3.5.49 (Gram–Schmidt Orthonormalization Process). If H is an inner product space and {u n }n≥1 ⊆ H are linearly independent, then there exists an orthonormal sequence {x n }n≥1 ⊆ H such that span{u n }n≥1 = span{x n }n≥1 . Proof. Let x1 = u1 /‖u1 ‖. So, the result holds for n = 1. Proceeding by induction suppose that we have produced x1 , . . . , x n−1 . Then we set n−1

h n = u n − ∑ (u k , x k )x k . k=1

Evidently h n ⊥x k for all k = 1, . . . , n − 1 and h n ≠ 0 since u n ∈ ̸ span{u k }n−1 k=1 , due to the linear independence of the sequence {u n }n≥1 ⊆ H. According to the induction n−1 hypothesis, we have span{u k }n−1 k=1 = span{x k }k=1 . Let x n = h n /‖h n ‖. Then by induction we have produced the desired orthonormal set {x n }n≥1 ⊆ H. We conclude this section with a brief look at the notion of the basis for a vector space X. If X is finite dimensional, then it is well-known that a basis is a set {e k }nk=1 such that every x ∈ X can be written in a unique way as x = ∑nk=1 λ k e k with λ k ∈ ℝ known as the coordinates of x for the given basis. How do we extend this notion to infinite dimensional vector spaces? Definition 3.5.50. (a) Given a vector space X, a Hamel basis is a set {e α }α∈I ⊆ X such that every x ∈ X can be written in a unique way as x = ∑α∈I λ α x α with only finite

238 | 3 Basic Functional Analysis numbers of the real λ α different from zero. If X is finite dimensional, then a Hamel basis is the usual basis. But in infinite dimensional spaces there are no obvious Hamel bases although they can be shown to exist via Zorn’s Lemma. (b) Let X be a Banach space. A sequence {x n }n≥1 ⊆ X is a Schauder basis for X if for each x ∈ X there exists a unique sequence {λ n }n≥1 ⊆ ℝ such that x = ∑n≥1 λ n x n . Remark 3.5.51. The Hamel basis is an algebraic notion that does not relate to any topology. A Banach space with a Schauder basis is necessarily separable. Banach [25, p. 111] asked if every infinite dimensional separable Banach space has a Schauder basis. This question was settled in the negative by Enflo [104] who produced a separable reflexive Banach space with no Schauder basis.

3.6 Bounded and Unbounded Linear Operators Let X, Y be Banach spaces. Recall that by L(X, Y) we denote the Banach space of all bounded linear operators from X into Y. The norm of L(X, Y) is defined by ‖A‖L = sup [

‖A(x)‖Y : x ∈ X \ {0}] ; ‖x‖X

(3.6.1)

see Definition 3.1.45. If X = Y, we write L(X, X) = L(X). Definition 3.6.1. (a) The norm (metric) topology induced on L(X, Y) by the norm ‖ ⋅ ‖L (see (3.6.1)) is called the uniform operator topology or simply the norm topology. (b) The strong operator topology on L(X, Y) is the weakest topology on L(X, Y) for which the maps e x : L(X, Y) → Y with x ∈ X defined by e x (A) = A(x) for all A ∈ L(X, Y) are continuous. Then a local basis at the origin consists of the sets {A ∈ L(X, Y) : ‖A(x k )‖Y < ε for k = 1, . . . , n} with n ∈ ℕ and ε > 0. A net {A α }α∈I ⊆ L(X, Y) converges to A ∈ L(X, Y) in this s topology if and only if ‖A α (x) − A(x)‖Y → 0 for all x ∈ X. We write A α → A in L(X, Y). (c) The weak operator topology on L(X, Y) is the weakest topology on L(X, Y) for which the maps e x,y∗ : L(X, Y) → ℝ with x ∈ X and y∗ ∈ Y ∗ defined by e x,y∗ (A) = ⟨y∗ , A(x)⟩ are continuous. Then a local basis at the origin consists of the sets {A ∈ L(X, Y) : |⟨y∗i , A(x k )| < ε for k = 1, . . . , n, i = 1, . . . , m} with n, m ∈ ℕ and ε > 0. A net {A α }α∈I ⊆ L(X, Y) converges to A ∈ L(X, Y) in this topology if and only if |⟨y∗ , A α (x)⟩ − ⟨y∗ , A(x)⟩| → 0 for all x ∈ X, y∗ ∈ Y ∗ . We w write A α → A in L(X, Y).

3.6 Bounded and Unbounded Linear Operators | 239

Remark 3.6.2. Evidently it holds that weak topology ⊆ strong topology ⊆ norm topology. We should not confuse the weak operator topology with the weak topology that we can define on the Banach space L(X, Y). Let V be a third Banach space and consider the map ϑ : L(X, Y) × L(Y, V) → L(X, V) defined by ϑ(A, B) = B ∘ A. Then ϑ is jointly continuous for the uniform operator topology but only separately continuous for the strong and weak operator topologies. In general, the strong and weak operator topologies are not first countable and this complicates their study. Proposition 3.6.3. If H is a Hilbert space and {A n }n≥1 ⊆ L(H) is a sequence such that w {(y, A n (x))}n≥1 is convergent for all x, y ∈ H, then there exists A ∈ L(H) such that A n → A. Proof. For given x, y ∈ H we derive supn≥1 |(y, A n (x))| < ∞. Invoking Theorem 3.2.1 we obtain that supn≥1 ‖A n (x)‖ < ∞. A second application of Theorem 3.2.1 gives supn≥1 ‖A n ‖L < ∞. Let ξ(x, y) = limn→∞ (y, A n (x)). Evidently ξ is bilinear and |ξ(x, y)| ≤ lim sup |(y, A n (x))| ≤ ‖y‖‖x‖(sup ‖A n ‖L ) . n→∞

n≥1

Hence ξ is bounded. Then there exists A ∈ L(H) such that (y, A(x)) = ξ(x, y); see w Theorem 3.5.21. Therefore we get A n → A in L(X, Y). In a similar way we obtain the corresponding result for the strong operator topology. Proposition 3.6.4. If X, Y are Banach spaces, {A n }n≥1 ⊆ L(X, Y) and {A n (x)}n≥1 ⊆ Y is s a Cauchy sequence for each x ∈ X, then there exists A ∈ L(X, Y) such that A n → A. Remark 3.6.5. Both results fail for nets of operators. Definition 3.6.6. Let X, Y be normed spaces and A ∈ L(X, Y). The adjoint (or dual) operator of A is the unique operator A∗ : Y ∗ → X ∗ defined by A∗ (y∗ ) = y∗ ∘ A

for all y∗ ∈ Y ∗ .

Continuing, the second adjoint (or second dual or bidual) (A∗ )∗ of A is the unique linear map A∗∗ : X ∗∗ → Y ∗∗ such that A∗∗ (x∗∗ ) = x∗∗ ∘ A∗

for all x∗∗ ∈ X ∗∗ .

The next proposition summarizes the main properties of A∗ and A∗∗ . Proposition 3.6.7. If X, Y are normed spaces and A, S, T ∈ L(X, Y), then the following hold: (a) A∗ ∈ L(Y ∗ , X ∗ ) and ‖A∗ ‖L = ‖A‖L . (b) If λ1 , λ2 ∈ ℝ, then (λ1 S + λ2 T)∗ = λ1 S∗ + λ2 T ∗ .

240 | 3 Basic Functional Analysis (c) A∗∗ X = A. (d) If V is a third normed space and B ∈ L(Y, V), then (B ∘ A)∗ = A∗ ∘ B∗ . (e) If A is invertible, that is, A−1 exists and A−1 ∈ L(Y, X), then A∗ is invertible as well and (A∗ )−1 = (A−1 )∗ . Proof. (a) For all x ∈ X and for all y∗ ∈ Y ∗ one gets ⟨A∗ (y∗ ), x⟩ = ⟨y∗ , A(x)⟩ ≤ ‖y∗ ‖∗ ‖A(x)‖ ≤ ‖y∗ ‖∗ ‖A‖L ‖x‖ , hence ‖A∗ (y∗ )‖ ≤ ‖y∗ ‖∗ ‖A‖L , and so ‖A∗ ‖L ≤ ‖A‖L . Given ε > 0 there exists x0 ∈ X with ‖x0 ‖ = 1 such that ‖A‖L − ε ≤ ‖A(x0 )‖. Let y∗ ∈ Y ∗ with ‖y∗ ‖∗ = 1 such that ⟨y∗ , A(x0 )⟩ = ‖A(x0 )‖; see Proposition 3.1.50. Then it follows that ⟨A∗ (y∗ ), x0 ⟩ = ⟨y∗ , A(x0 )⟩ = ‖A(x0 )‖ ≥ ‖A‖L − ε , which gives ‖A∗ ‖L ≥ ‖A‖L − ε. Letting ε ↘ 0, we obtain ‖A∗ ‖L ≥ ‖A‖L . Therefore, A∗ ∈ L(Y ∗ , X ∗ ) and ‖A∗ ‖L = ‖A‖L . (b) This follows immediately from Definition 3.6.6. (c) This is also clear from Definition 3.6.6. (d) For all x ∈ X and for all v∗ ∈ V ∗ we derive ⟨A∗ (B∗ (v∗ )), x⟩ = ⟨B∗ (v∗ ), A(x)⟩ = ⟨v∗ , B(A(x))⟩ and so we conclude that A∗ ∘ B∗ = (B ∘ A)∗ . (e) Since A is invertible we have A−1 ∘ A = i X = A ∘ A−1 . Then using part (c) we obtain ∗ ∗ A∗ ∘ (A−1 ) = i∗X = i X∗ = (A−1 ) ∘ A∗ . ∗

Hence, A∗ is invertible and (A∗ )−1 = (A−1 ) . Remark 3.6.8. According to this proposition the map A → A∗ from L(X, Y) into L(Y ∗ , X ∗ ) is an isometric isomorphism. It is also continuous for the weak operator topologies but not for the strong operator topology. When X = Y = H is a complex Hilbert space, that is, over 𝔽 = ℂ, then, since H is self-dual, that is, H = H ∗ , we want to define A∗ on the space H. From the Riesz-Fréchet Representation Theorem (see Theorem 3.5.21), we know that H is isometric with its dual H ∗ but the isometry is a conjugate isomorphism j : H → H ∗ . We set A = j−1 ∘ A∗ ∘ j and get that (x, A(y)) = ⟨j(x), A(y)⟩ = ⟨A∗ (j(x)), y⟩ = (j−1 (A∗ (j(x))), y) = (A (x), y)

(3.6.2)

for all x, y ∈ H. Then A ∈ L(H) is the Hilbert space adjoint and now the map A → A is conjugate linear, that is, λA → λA for all λ ∈ ℂ because A is defined on H rather than on H ∗ and H is identified with H ∗ by a conjugate isometric isomorphism. However, in what follows for notational uniformity we denote A by A∗ with the understanding that A∗ is defined on H. When H is a real Hilbert space, we define again A = A∗ on H as above.

3.6 Bounded and Unbounded Linear Operators |

241

Proposition 3.6.9. If H is a Hilbert space over ℝ or ℂ and if A ∈ L(H), then ‖A‖2L = ‖A∗ ∘ A‖L . Proof. Taking Proposition 3.6.7(a) and (3.6.2) into account yields ‖A‖2L = sup [‖A(x)‖ : ‖x‖ ≤ 1] = sup [(A(x), A(x)) : ‖x‖ ≤ 1] = sup [(A∗ (A(x)), x) : ‖x‖ ≤ 1] ≤ ‖A∗ ∘ A‖L ≤ ‖A∗ ‖L ‖A‖L = ‖A‖2L . Example 3.6.10. Section 4.1 shows that (l1 )∗ = l∞ . Consider the right shift operator A ∈ L(l1 ) defined by A(x)̂ = (0, x1 , x2 , . . .) for all x̂ = (x n )n≥1 ∈ l1 . Then A∗ : l∞ → l∞ is defined by A∗ (u)̂ = (u2 , u3 , . . .) for all û = (u n )n≥1 ∈ l∞ . In this case we have ‖A‖L = ‖A∗ ‖L = 1. Proposition 3.6.11. If X, Y are normed spaces and A ∈ L(X, Y), then A∗ ∈ L(Y ∗ , X ∗ ) is weak* -to-weak* continuous. Conversely, if T : Y ∗ → X ∗ is a weak* -to-weak* continuous linear operator, then there exists A ∈ L(X, Y) such that A∗ = T. w∗

Proof. Let {y∗α }α∈I ⊆ Y ∗ be a net such that y∗α → y∗ in Y ∗ . Then for every x ∈ X, it follows that ⟨A∗ (y∗α ), x⟩ = ⟨y∗α , A(x)⟩ → ⟨y∗ , A(x)⟩ = ⟨A∗ (y∗ ), x⟩ , w∗

hence, A∗ (y∗α ) → A∗ (y∗ ) and so A∗ is weak* -to-weak* continuous. Let j X : X → X ∗∗ and j Y : Y → Y ∗∗ be the canonical embeddings; see Definition 3.3.35. For every x ∈ X, j X (x)T is a w∗ -continuous linear functional on Y ∗ , hence j X (x)T ∈ j Y (Y). Then j−1 Y (j X (x)T) ∈ Y. So, we can define an operator A : X → Y by setting A(x) = j−1 (j (x)T) for all x ∈ X. Clearly A is linear. Moreover, let {x α }α∈I ⊆ X X Y w

w∗

be a net such that x α → x. Then j X (x α ) → j X (x); see Proposition 3.3.23. Hence, for all y∗ ∈ Y ∗ , we have (j X (x α )T)(y∗ ) → (j X (x)T)(y∗ ) in ℝ , w∗

thus j X (x α )T → j X (x)T in Y ∗∗ . Therefore, w

−1 A(x α ) = j−1 Y (j X (x α )T) → j Y (j X (x)T) = A(x) in Y .

This means that A : X → Y is weak-to-weak continuous, hence A ∈ L(X, Y); see Proposition 3.3.23. Moreover, with view to Definition 3.3.35, we get ∗ ∗ ⟨A∗ (y∗ ), x⟩ = ⟨y∗ , A(x)⟩ = ⟨y∗ , j−1 Y (j X (x)T)⟩ = ⟨j X (x)T, y ⟩ = ⟨T(y ), x⟩ .

Thus, A∗ = T. Corollary 3.6.12. If X, Y are normed spaces and S : X ∗ → Y ∗ is weak* -to-weak* continuous, then S ∈ L(X ∗ , Y ∗ ). Next we introduce some important special classes of linear operators.

242 | 3 Basic Functional Analysis Definition 3.6.13. (a) Let X be a vector space and let P : X → X be a linear operator. We say that P is a projection if P2 = P, that is, P(P(x)) = P(x) for all x ∈ X. (b) Let H be a Hilbert space and A ∈ L(H). We say that A is self-adjoint (or hermitian) if A = A∗ , that is, (A(x), y) = (x, A(y)) for all x, y ∈ H. (c) Let H be a Hilbert space and P ∈ L(H). We say that P is an orthogonal projection if P is a projection and P is self-adjoint. Proposition 3.6.14. If H is a Hilbert space and T, S ∈ L(H) are self-adjoint and commuting, that is, T ∘ S = S ∘ T, then T ∘ S ∈ L(H) is self-adjoint as well. Proof. For every x, y ∈ H we see that (T(S(x)), y) = (S(x), T(y)) = (x, S(T(y))) = (x, T(S(y))) . This shows that T ∘ S is self-adjoint. Proposition 3.6.15. If H is a Hilbert space and A ∈ L(H) is self-adjoint, then for every m ∈ ℕ, A m is self-adjoint and ‖A m ‖L = ‖A‖m . Proof. That A m is self-adjoint for every m ∈ ℕ follows from Proposition 3.6.14. From Proposition 3.6.9 we see that ‖A‖2L = A∗ ∘ AL = A2 L ,

4 2 A = A2 = ‖A‖4L L L

and so on. Therefore we obtain n 2n A = ‖A‖2L . L

(3.6.3)

If 1 ≤ m ≤ 2n , then n n n 2n 2n −m A = A m ∘ A2 −m ≤ A m L ‖A‖2L −m ≤ ‖A‖m = ‖A‖2L , L ‖A‖L L L

which, due to (3.6.3), results in n n m A L ‖A‖2L −m = ‖A‖2L .

Thus, ‖A m ‖L = ‖A‖m L. Proposition 3.6.16. If H is a Hilbert space and A ∈ L(H) is self-adjoint, then ‖A‖L = sup [|(A(x), x)| : ‖x‖ ≤ 1]. Proof. For x ∈ H with ‖x‖ ≤ 1 we infer |(A(x), x)| ≤ ‖A(x)‖‖x‖ ≤ ‖A‖L ‖x‖2 ≤ ‖A‖L , which gives sup [|(A(x), x)| : ‖x‖ ≤ 1] ≤ ‖A‖L .

(3.6.4)

Let η = sup [|(A(x), x)| : ‖x‖ ≤ 1]. Then |(A(u), u)| ≤ η‖u‖2 for all u ∈ H. For u ∈ H with u ≠ 0 let λ = (‖A(u)‖/‖u‖)1/2 and y = 1/λA(u). Since A is self-adjoint, (A(λu), y) ∈ ℝ

3.6 Bounded and Unbounded Linear Operators |

243

and, due to the parallelogram law, we obtain ‖A(u)‖2 = (A(u), A(u)) = (A(λu),

1 A(u)) = (A(λu), y) λ

1 [(A(λu + y), λu + y) − (A(λu − y), λu − y)] 4 1 1 ≤ η (‖λu + y‖2 + ‖λu − y‖2 ) = η (‖λu‖2 + ‖y‖2 ) 4 2 1 1 = η (λ2 ‖u‖2 + 2 ‖(A(u)‖2 ) = η‖u‖‖A(u)‖ , 2 λ =

where we used the fact that λ‖u‖ = 1/λ‖A(u)‖. Hence ‖A(u)‖ ≤ η‖u‖ for every u ∈ H, which gives ‖A‖L ≤ η and so, because of (3.6.4), the result follows. Next we present a useful factorization result. Proposition 3.6.17. If X, Y, V are Banach spaces, A ∈ L(X, Y), T ∈ L(V, Y), and A is injective, then the following statements are equivalent: (a) R(T) ⊆ R(A). (b) There exists S ∈ L(V, X) such that A ∘ S = T. Proof. (a) ⇒ (b): Let S = A−1 ∘ T : V → X, where we recall that A is injective. Then S is linear and A∘S = T. We claim that Gr S ⊆ V ×X is closed. To this end, let {v n }n≥1 ⊆ V such that v n → v in V and S(v n ) → x in X. Then A(x) = limn→∞ A(S(v n )) = limn→∞ T(v n ) = T(v) = A(S(v)). Since A is injective it follows that x = S(v) and so Gr S ⊆ V × X is closed. Hence, by the Closed Graph Theorem (see Theorem 3.2.14), we conclude that S ∈ L(V, X). (b) ⇒ (a): It holds that R(T) = R(A ∘ S) ⊆ R(A). Next we present two theorems relating operators with the same range space and their adjoints. We start with an auxiliary result. Lemma 3.6.18. If X, Y are normed spaces, A ∈ L(X, Y) and x∗ ∈ X ∗ , then the following statements are equivalent: (a) x∗ ∈ R(A∗ ). (b) |⟨x∗ , x⟩| ≤ c‖A(x)‖Y for all x ∈ X and for some c > 0. Proof. (a) ⇒ (b): Of course, x∗ = A∗ (y∗ ) for some y∗ ∈ Y ∗ . Then |⟨x∗ , x⟩| = |⟨A∗ (y∗ ), x⟩| = |⟨y∗ , A(x)⟩| ≤ ‖y∗ ‖∗ ‖A(x)‖Y

for all x ∈ X ,

which gives |⟨x∗ , x⟩| ≤ c‖A(x)‖Y with c = ‖y∗ ‖∗ . (b) ⇒ (a): There exists a continuous, linear functional g : R(A) → ℝ such that ∗ x = g ∘ A. According to Proposition 3.1.49, there exists y∗ ∈ Y ∗ such that y∗ R(A) = g. Then x∗ = y∗ ∘ A = A∗ (y∗ ); see Definition 3.6.6. Theorem 3.6.19. If X, Y, V are Banach spaces and A ∈ L(X, Y), T ∈ L(V, Y) with R(T) ⊆ R(A), then ‖T ∗ (y∗ )‖∗ ≤ c‖A∗ (y∗ )‖∗ for all y∗ ∈ Y ∗ and for some c > 0.

244 | 3 Basic Functional Analysis Proof. Let X̂ = X/N(A) with N(A) being the kernel of A and p : X → X̂ being the quotient map. Then p∗ : X̂ ∗ → X ∗ is an isometric embedding onto N(A)⊥ ⊆ X ∗ ; see Proposition 3.2.25. Let Â : X̂ → Y be defined by Â ∘ p = A. Then A∗ = p∗ ∘ Â ∗ , and so ∗ ∗ A (y )∗ = Â ∗ (y∗ )∗

for all y∗ ∈ Y ∗ .

By hypothesis, R(T) ⊆ R(A) = R(A)̂ and Â is injective. So, we can use Proposition 3.6.17 and produce S ∈ L(V, X)̂ such that Â ∘ S = T. Then, since Â ∘ S = T, ⟨T ∗ (y∗ ), v⟩ : v ∈ V, v ≠ 0] ‖v‖V ⟨y∗ , T(v)⟩ = sup [ : v ∈ V, v ≠ 0] ‖v‖V ⟨Â ∗ (y∗ ), S(v)⟩ : v ∈ V, v ≠ 0] = sup [ ‖v‖V ‖Â ∗ (y∗ )‖∗ ‖S(v)‖X̂ : v ∈ V, v ≠ 0] ≤ sup [ ‖v‖V

‖T ∗ (y∗ )‖∗ = sup [

= ‖S‖L ‖Â ∗ (y∗ )‖∗

for all y∗ ∈ Y ∗ .

So, the conclusion of the theorem holds with c = ‖S‖L . Theorem 3.6.20. If X, Y, V are normed spaces and A ∈ L(X, Y), T ∈ L(X, V), then the following statements are equivalent: (a) R(T ∗ ) ⊆ R(A∗ ). (b) ‖T(x)‖V ≤ c‖A(x)‖Y for all x ∈ X and for some c > 0. Proof. (a) ⇒ (b): Using Theorem 3.6.19, we infer ∗∗ ∗∗ T (x )∗∗ ≤ c A∗∗ (x∗∗ )∗∗

for all x∗∗ ∈ X ∗∗ and for some c > 0 .

Applying Proposition 3.6.7 gives ‖T(x)‖V = T ∗∗ (x)∗∗ ≤ c A∗∗ (x)∗∗ = c‖A(x)‖Y

for all x ∈ X .

(b) ⇒ (a): Let x∗ ∈ R(T ∗ ) ⊆ X ∗ . Using Lemma 3.6.18 yields |⟨x∗ , x⟩| ≤ c0 ‖T(x)‖V

for all x ∈ X and for some c0 > 0 ,

which implies |⟨x∗ , x⟩| ≤ c0 c‖A(x)‖V Hence, with view to Lemma 3.6.18 we see that

for all x ∈ X . x∗

∈ R(A∗ ). Thus, R(T ∗ ) ⊆ R(A∗ ).

Theorem 3.6.21. If X, Y, V are Banach spaces, X is reflexive, A ∈ L(X, Y), T ∈ L(V, Y) and ∗ ∗ ∗ ∗ ∗ ∗ T (y )∗ ≤ c A (y )∗ for all y ∈ Y and for some c > 0 , then R(T) ⊆ R(A).

3.6 Bounded and Unbounded Linear Operators | 245

Proof. Applying Theorem 3.6.20 we obtain R(T ∗∗ ) ⊆ R(A∗∗ ). Let v ∈ V and let x ∈ X ∗∗ = X such that A(x) = A∗∗ (x) = T ∗∗ (v) = T(v); see Proposition 3.6.7(c). Hence R(T) ⊆ R(A). Motivated from Definition 3.2.24, we introduce a similar notion for sets in X ∗ . Definition 3.6.22. Let X be a normed space and E ⊆ X ∗ . The preannihilator of E is defined by ⊥ E = {x ∈ X : ⟨x∗ , x⟩ = 0 for all x∗ ∈ E} . Evidently ⊥ E is a closed linear subspace of X. w∗

Remark 3.6.23. It is easy to see that if E ⊆ X ∗ is a vector subspace, then E = (⊥ E)⊥ , E w∗ is w∗ -closed if and only if E = (⊥ E)⊥ , and E = X ∗ if and only if ⊥ E = {0}. Moreover, if Y, V are closed vector subspaces of X, then V ∩ Y =⊥ (V ⊥ + Y ⊥ ) ,

(V ∩ Y)⊥ ⊇ V ⊥ + Y ⊥ ,

V ⊥ ∩ Y ⊥ = (V + Y)⊥ ,

⊥

(V ⊥ ∩ Y ⊥ ) = V + Y .

Proposition 3.6.24. If X, Y are normed spaces and A ∈ L(X, V), then the following hold: (a) R(A)⊥ = N(A∗ ) and ⊥ R(A∗ ) = N(A). (b) R(A) = Y if and only if A∗ is injective. w∗

(c) A is injective if and only if R(A∗ )

= X∗ .

Proof. (a) Note that y∗ ∈ R(A)⊥

if and only if

⟨y∗ , A(x)⟩ = 0

for all x ∈ X

if and only if

⟨A (y ), x⟩ = 0

for all x ∈ X

if and only if

A (y ) = 0 .

∗

∗

∗

∗

Hence R(A)⊥ = N(A∗ ). Similarly, we have x∈

⊥

R(A∗ )

if and only if

⟨A∗ (y∗ ), x⟩ = 0

for all y∗ ∈ Y ∗

if and only if

⟨y∗ , A(x)⟩ = 0

for all y∗ ∈ Y ∗

if and only if

A(x) = 0 .

Thus, ⊥ R(A∗ ) = N(A). (b) ⇒: It holds that R(A)⊥ = {0} and so with part (a), N(A∗ ) = {0}. Hence A∗ is injective. ⇐: It holds that N(A∗ ) = {0} and so with part (a), R(A)⊥ = {0}. Hence R(A) = Y. (c) ⇒: It holds that N(A) = {0} and so with part (a), ⊥ R(A∗ ) = {0}. Hence, w∗ R(A∗ ) = X ∗ ; see Remark 3.6.23. ⇐: It holds that ⊥ R(A∗ ) = {0} (see Remark 3.6.23), and so with part (a), N(A) = {0}. Hence, A is injective.

246 | 3 Basic Functional Analysis Remark 3.6.25. If X, Y are Banach spaces with X or Y finite dimensional and A ∈ L(X, Y), we know from linear algebra that A is surjective if and only if A∗ is injective , A∗ is surjective if and only if A is injective . Indeed in this case R(A) is closed if dim Y < ∞ and R(A∗ ) is closed if dim X < ∞ and so the equivalences above follow from Proposition 3.6.24. In the general infinite dimensional case we only have the following implications (see Proposition 3.6.24(a)) A is surjective A∗ is surjective

⇒ ⇒

A∗ is injective , A is injective .

The reverse implications fail. To see this, let X = Y = H = l2 , which is a Hilbert space and let A ∈ L(H) be defined by A(x)̂ = (1/nx n )n≥1 for all x̂ = (x n )n≥1 ∈ l2 . Then A∗ = A and A is injective but not surjective since R(A) = R(A∗ ) is only dense in H. Next we present some results dealing with the basic properties of projections. Proposition 3.6.26. If X is a normed space and P ∈ L(X), then P is a projection if and only if P∗ ∈ L(X ∗ ) is a projection. Proof. ⇒: For all x ∈ X and for all x∗ ∈ X ∗ we directly obtain ⟨P∗ (x∗ ), x⟩ = ⟨x∗ , P(x)⟩ = ⟨x∗ , P(P(x))⟩ = ⟨P∗ (P∗ (x∗ )), x⟩ . This shows that P∗ (x∗ ) = P∗ (P∗ (x∗ )) for all x∗ ∈ X ∗ . Hence P∗ is a projection as well. ⇐: This is proven in a similar fashion. Proposition 3.6.27. If X is a normed space and P ∈ L(X), then P is a projection if and only if I − P is a projection. Proof. ⇒: For every x ∈ X one gets (I − P)(I − P)(x) = x − 2P(x) + P(P(x)) = x − P(x) = (I − P)(x) . Hence I − P is a projection. ⇐: Note that P = I − (I − P) and so the implication follows from the previous part. Proposition 3.6.28. If X is a normed space and P ∈ L(X) is a projection, then N(P) = R(I − P) and R(P) = N(I − P). Proof. Let x ∈ N(p). Then (I − P)(x) = x and so N(P) ⊆ R(I − P). Let u ∈ R(I − P). Then u = (I − P)(x) with x ∈ X. Then P(u) = P(x − P(x)) = P(x) − P(P(x)) = P(x) − P(x) = 0 and so u ∈ N(p). Therefore we conclude that N(P) = R(I − P). Applying this result to the projection I − P we get R(P) = N(I − P).

3.6 Bounded and Unbounded Linear Operators |

247

Corollary 3.6.29. If X is a normed space and P ∈ L(X) is a projection, then R(P) = {x ∈ X : P(x) = x} and R(P) is closed. Corollary 3.6.30. If X is a Banach space and P ∈ L(X) is a projection, then X = N(P) ⊕ R(P). If V and W are complementary subspaces of a Banach space X (see Definition 3.2.27), then we obtain in a unique way, for every x ∈ X, that x = v + w with v ∈ V and w ∈ W. Let P V : X → V be the linear operator such that P V (x) = v. Evidently P2V = P V . Proposition 3.6.31. P V ∈ L(X), that is, P V is a projection. Proof. Suppose that x n → x in X and P V (x n ) → v in X. Then (I − P V )(x n ) → x − y in X. Note that v ∈ V and x − v ∈ W. So, v = P V (x) and by the Closed Graph Theorem (see Theorem 3.2.14), it follows that P V ∈ L(X). Corollary 3.6.32. If X is a Banach space and V ⊆ X is a subspace, then V is complemented if and only if V = R(P) with P ∈ L(X) being a projection. Corollary 3.6.33. If X is a Banach space and V, W ⊆ X are complementary subspaces, then V and X/W are isomorphic. Next we use complemented subspaces to obtain a kind of Hahn–Banach Extension Theorem for vector valued maps. Proposition 3.6.34. If X is a Banach space and V ⊆ X is a subspace, then the following statements are equivalent: (a) For every Banach space Y and every A ∈ L(V, Y), there is Â ∈ L(X, Y) such that Â V = A. (b) V is complemented in X. Proof. (a) ⇒ (b): Let i0 : V → V be the bounded linear operator defined by i0 (v) = v for all v ∈ V, that is, the identity map on V. Then by hypothesis there exists i0̂ ∈ L(X, V) such that i0̂ V = i0 . Due to the continuity of i0̂ we directly obtain that i0̂ V coincides ̂ with the identity operator of V. Therefore, i0 ∈ L(X) is a projection with R(i0̂ ) = V. Then Corollary 3.6.32 implies that V is complemented. (b) ⇒ (a): Corollary 3.6.32 implies that V = R(P) with P ∈ L(X) being a projection. Let Y be a Banach space and A ∈ L(V, Y). Then there exists A0 ∈ L(V , Y) such that A0 V = A; see Theorem 1.5.27. One gets A0 ∘ P ∈ L(X, Y) and A0 ∘ PV = A. So, Â = A0 ∘ P ∈ L(X, Y). Proposition 3.6.35. If X is a Banach space, Y is a normed space, and A ∈ L(X, Y), then A−1 ∈ L(Y, X) if and only if R(A) is dense in Y and there exists c > 0 such that ‖A(x)‖Y ≥ c‖x‖X for all x ∈ X. Proof. ⇒: This is obvious. ⇐: Evidently A is injective and A−1 ∈ L(V, X) with V = R(A). Hence A−1 ∈ L(Y, X) since by hypothesis V = Y. Moreover, note that ‖A−1 ‖L ≤ 1/c.

248 | 3 Basic Functional Analysis Using this proposition we can improve Proposition 3.6.7(e). Proposition 3.6.36. If X is a Banach space, Y is a normed space, and A ∈ L(X, Y), then A is invertible if and only if A∗ is invertible. Proof. ⇒: This follows from Proposition 3.6.7(e). ⇐: From Proposition 3.6.24(a) one has that R(A)⊥ = N(A∗ ) = {0} and so R(A) ⊆ Y is dense. Let x ∈ X and let x∗ ∈ X ∗ be such that ⟨x∗ , x⟩ = ‖x‖X

and

‖x∗ ‖ = 1 ;

see Proposition 3.1.50. Then ‖x‖X = ⟨x∗ , x⟩ = ⟨A∗ ((A∗ )−1 (x∗ )), x⟩ = ⟨(A∗ )−1 (x∗ ), A(x)⟩ ≤ ‖(A∗ )−1 (x∗ )‖∗ ‖A(x)‖Y ≤ ‖(A∗ )−1 ‖L ‖A(x)‖Y . This implies ‖A(x)‖Y ≥ c‖x‖X with c = (‖(A∗ )−1 ‖L )−1 . Now we may apply Proposition 3.6.35 and conclude that A−1 ∈ L(Y, X). Corollary 3.6.37. If X is a Banach space, Y is a normed space, and A ∈ L(X, Y), then the following statements are equivalent: (a) A is invertible. (b) A∗ is invertible. (c) There exist c, ĉ > 0 such that ‖A(x)‖Y ≥ c‖x‖X

for all x ∈ X ,

̂ ‖∗ ‖A (x )‖∗ ≥ c‖x ∗

∗

∗

for all x∗ ∈ X ∗ .

In the last part of this section we deal with unbounded linear operators. Definition 3.6.38. Let X, Y be Banach spaces. An unbounded linear operator is a linear map A : D(A) ⊆ X → Y from a linear subspace D(A) into Y. The subspace D(A) is called the domain of A. We say that A is closed if Gr A ⊆ X × Y is closed. By N(A) we denote the kernel of A, that is, N(A) = {x ∈ D(A) : A(x) = 0} and by R(A) the range of A, that is, R(A) = {A(x) : x ∈ D(A)}. Remark 3.6.39. In this context, A is closed if and only if for every {x n }n≥1 ⊆ D(A) such that x n → x in X and A(x n ) → y in Y, it follows that x ∈ D(A) and A(x) = y. Note that now it is not enough to check that if x n → 0 in X and A(x n ) → y in Y, then y = 0. Moreover, if A is closed, then N(A) is closed but R(A) need not be closed. In applications most unbounded linear operators are densely defined, that is, D(A) = X, and closed. We can extend the notion of adjoint to unbounded linear operators. So, let A : D(A) ⊆ X → Y be an unbounded linear operator that is densely defined, that is, D(A) = X. Let D(A∗ ) = {y∗ ∈ Y ∗ : |⟨y∗ , A(x)⟩| ≤ c‖x‖ for all x ∈ D(A) and for some c > 0} . (3.6.5) Evidently D(A∗ ) ⊆ Y ∗ is a vector subspace. Let y∗ ∈ D(A∗ ) and consider the functional f : D(A) → ℝ defined by f(x) = ⟨y∗ , A(x)⟩ for all x ∈ D(A). Because of (3.6.5) it follows

3.6 Bounded and Unbounded Linear Operators |

249

that |f(x)| ≤ c‖x‖ for all x ∈ D(A). Since D(A) is dense in X, extending by continuity, there exists a unique functional f ̂ : X → ℝ such that f ̂D(A) = f and |f ̂(x)| ≤ c‖x‖ for all x ∈ X. Thus, f ̂ ∈ X ∗ . Then we set A∗ (y∗ ) = f ∗ .

(3.6.6)

Definition 3.6.40. The unbounded linear operator A∗ : D(A∗ ) ⊆ Y ∗ → X ∗ defined by (3.6.6) is called the adjoint of A. So, according to the previous construction, we obtain ⟨y∗ , A(x)⟩ = ⟨A∗ (y∗ ), x⟩

for all x ∈ D(A) and for all y∗ ∈ D(A∗ ) .

(3.6.7)

Remark 3.6.41. In general, we cannot say that A∗ is densely defined. However, if A is also closed, then D(A∗ ) is w∗ -dense in Y ∗ . Therefore, if Y is reflexive and A : D(A) ⊆ X → Y is closed and densely defined, then A∗ : D(A∗ ) ⊆ Y ∗ → X ∗ is densely defined as well. Next we show that A∗ is always closed. Proposition 3.6.42. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is a densely defined unbounded linear operator, then A∗ is closed. Proof. Suppose that y∗n → y∗ in Y ∗ with y∗n ∈ D(A∗ ) for all n ∈ ℕ and A∗ (y∗n ) → x∗ in X ∗ . Thanks to (3.6.7) we have ⟨y∗n , A(x)⟩ = ⟨A∗ (y∗n ), x⟩

for all x ∈ D(A) and for all n ∈ ℕ ,

which implies ⟨y∗ , A(x)⟩ = ⟨x∗ , x⟩

for all x ∈ D(A) .

This gives |⟨y∗ , A(x)⟩| ≤ ‖x∗ ‖∗ ‖x‖X

for all x ∈ D(A) ,

which yields, because of (3.6.5), that y∗ ∈ D(A∗ ), which in combination with (3.6.7) results in ⟨A∗ (y∗ ), x⟩ = ⟨x∗ , x⟩ for all x ∈ D(A) . This implies x∗ = A∗ (y∗ ). Hence, A∗ is closed; see Remark 3.6.39. Let i0 : Y ∗ × X ∗ → X ∗ × Y ∗ be the isomorphism defined by i0 (y∗ , x∗ ) = (−x∗ , y∗ ) for all y∗ ∈ Y ∗ and for all x∗ ∈ X ∗ . Proposition 3.6.43. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is a densely defined unbounded linear operator, then i0 (Gr A∗ ) = (Gr A)⊥ . Proof. Let (y∗ , x∗ ) ∈ Y ∗ × X ∗ . Then, thanks to (3.6.7), one has (y∗ , x∗ ) ∈ Gr A∗

if and only if ⟨y∗ , A(x)⟩ = ⟨x∗ , x⟩

for all x ∈ D(A)

if and only if ⟨y , A(x)⟩ − ⟨x , x⟩ = 0

for all x ∈ D(A)

∗

∗

if and only if (−x , y ) ∈ (Gr A) . ∗

∗

⊥

250 | 3 Basic Functional Analysis The next result is an extension of Proposition 3.6.24 to unbounded linear operators. Its proof can be found in Brézis [48, Theorem 2.19, p. 46]. Proposition 3.6.44. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is a closed, densely defined, unbounded linear operator, then the following statements are equivalent: (a) R(A) ⊆ Y is closed. (b) R(A∗ ) ⊆ X ∗ is closed. (c) R(A) = ⊥ N(A∗ ). (d) R(A∗ ) = N(A)⊥ . The next two theorems provide useful characterizations of surjective operators. Theorem 3.6.45. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is a closed, densely defined, unbounded linear operator, then the following statements are equivalent: (a) A is surjective, that is, R(A) = Y. (b) ‖y∗ ‖∗ ≤ c‖A∗ (y∗ )‖ for all y∗ ∈ D(A∗ ) and for some c > 0. (c) R(A∗ ) ⊆ X ∗ is closed and N(A∗ ) = {0}. Proof. (a) ⇒ (b): It suffices to show that D∗ = {y∗ ∈ D(A∗ ) : ‖A∗ (y∗ )‖∗ ≤ 1} is bounded. Then according to Proposition 3.2.5 we need to show that for all y ∈ Y, ⟨D∗ , y⟩ ⊆ ℝ is bounded. Exploiting the surjectivity of A, there exists x ∈ D(A) such that y = A(x). Then ⟨y∗ , y⟩ = ⟨y∗ , A(x)⟩ = ⟨A∗ (y∗ ), x⟩ , which implies |⟨y∗ , y⟩| ≤ ‖x‖ for every y∗ ∈ D∗ . Thus, D∗ is bounded. (b) ⇒ (c): Let x∗n ∈ R(A∗ ) for all n ∈ ℕ and assume that x∗n → x∗ in X ∗ . We can find y∗n ∈ D(A∗ ) such that x∗n = A∗ (y∗n ) for all n ∈ ℕ. From (b) we see that ∗ y m − y∗n ∗ ≤ c A∗ (y∗m − y∗n )∗ = c A∗ (y∗m ) − A∗ (y∗n )∗ . This shows that {y∗n }n≥1 ⊆ Y ∗ is a Cauchy sequence and so, we conclude that y∗n → y∗ in Y ∗ . But from Proposition 3.6.42, we know that A∗ is closed. Hence, x∗ = A∗ (y∗ ), and so R(A∗ ) ⊆ X ∗ is closed. From (b) it is clear that N(A∗ ) = {0}. (c) ⇒ (a): From Proposition 3.6.44 one has R(A) = ⊥ N(A∗ ) = Y. In a similar way we can prove a dual version of this theorem. Theorem 3.6.46. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is a closed, densely defined, unbounded linear operator, then the following statements are equivalent: (a) A∗ is surjective, that is R(A∗ ) = X ∗ . (b) ‖x‖X ≤ c‖A(x)‖Y for all x ∈ D(A) and for some c > 0. (c) R(A) ⊆ Y is closed and N(A) = {0}.

3.6 Bounded and Unbounded Linear Operators | 251

Definition 3.6.47. Let X, Y be Banach spaces and let A : D(A) ⊆ X → Y be an unbounded linear operator. We say that A is closable if there is a closed unbounded linear operator Â : D(A)̂ ⊆ X → Y such that D(A) ⊆ D(A)̂

and

Â D(A) = A .

Every closable operator A has a smallest closed extension called the closure of A denoted by A. The next proposition characterizes closable operators. Proposition 3.6.48. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is an unbounded linear operator, then the following statements are equivalent: (a) A is closable. (b) If {x n }n≥1 ⊆ D(A) are such that x n → 0 in X and A(x n ) → y in Y, then y = 0. (c) The projection map p X : Gr A → X is injective. ̂ Proof. (a) ⇒ (b): For every closed extension Â of A one has y = A(0) = 0. (b) ⇒ (c): Gr A ⊆ X × Y is a vector subspace and so p X : Gr A → X is linear. By hypothesis, N(p X ) = {0} and so p X is injective. (c) ⇒ (a): Let D(A)̂ = p X (Gr A) ⊆ X. This is a vector subspace. Let p Y : Gr A → Y ̂ be the projection on the second factor. Then Â = p Y ∘ p−1 X : D( A) → Y is an unbounded ̂ ̂ linear operator with Gr A = Gr A and so A is a closed extension of A. Proposition 3.6.49. If X, Y are Banach spaces and A : D(A) ⊆ X → Y is a closable unbounded linear operator, then Gr A = Gr A. Proof. Let Â be a closed extension of A. Then Gr A ⊆ Gr Â and so if (0, y) ∈ Gr A, then y = 0. Let A0 : D(A0 ) → Y be defined by D(A0 ) = {x ∈ X : (x, y) ∈ Gr A for some y ∈ Y} and A0 (x) = y with y ∈ Y being the unique element such that (x, y) ∈ Gr A. One has Gr A0 = Gr A and so A0 is a closed extension of A. But A0 ⊆ A,̂ which is an arbitrary closed extension of A. Therefore, A0 = A. Remark 3.6.50. Note that the domain D(A) of an unbounded linear operator A : D(A) ⊆ X → Y is a normed space with the graph norm defined by |x| = ‖x‖X + ‖A(x)‖Y for all x ∈ D(A); see the proof of Theorem 3.2.14. Therefore an unbounded linear operator can be viewed also as a bounded linear operator from its domain equipped with the graph norm. It is easy to see that A : D(A) ⊆ X → Y is closed if and only if D(A) ⊆ X is a Banach space when furnished with the graph norm. Example 3.6.51. (a) Let X = C[0, 1] be equipped with the supremum norm. This is a Banach space. Let A : D(A) ⊆ X → X be the unbounded linear operator defined by A(u) = u for all u ∈ D(A) = C1 [0, 1]. Evidently A is closed and densely defined. Moreover, the graph norm on D(A) is the usual C1 -norm. (b) Let H be a separable Hilbert space. From Theorem 3.5.48, we know that H has a countable orthonormal basis {e n }n≥1 . Let λ̂ = (λ i )i≥1 ∈ ℝℕ and consider the linear

252 | 3 Basic Functional Analysis operator A λ̂ : D(A λ̂ ) ⊆ H → H defined by A λ̂ (x) = ∑ λ n (x, e n )e n

for all x ∈ D(A λ̂ ) ,

n≥1

where D(A λ̂ ) = {x ∈ H : ∑ |λ n (x, e n )|2 < ∞} . n≥1

This is a closed, densely defined unbounded linear operator. Note that A λ̂ ∈ L(H) if and only if λ̂ = (λ n )n≥1 is bounded. We extend the notion of self-adjoint operator to unbounded linear operators. Definition 3.6.52. Let H be a Hilbert space and A : D(A) ⊆ H → H is a densely defined unbounded linear operator. Then the adjoint of A is the unbounded linear operator A∗ : D(A∗ ) ⊆ H → H defined by D(A∗ ) = {u ∈ H : |(u, A(x)| ≤ c‖x‖ for all x ∈ D(A) and for some c > 0} and (A∗ (u), x) = (u, A(x))

for all x ∈ D(A) and for all u ∈ D(A∗ ) . We say that A is symmetric, if A ⊆ A∗ , that is, D(A) ⊆ D(A∗ ) and A∗ D(A) = A, so A∗ is ∗ an extension of A. We say that A is self-adjoint if A = A . Remark 3.6.53. Evidently A is symmetric if and only if (A(u), x) = (u, A(x)) for all x, u ∈ D(A). A symmetric operator is always closable (see Proposition 3.6.42). Recall that D(A∗ ) ⊇ D(A) is dense in H. If A is symmetric, then A∗ is a closed extension of A. So, we consider the smallest closed extension A∗∗ of A. We have A∗∗ ⊆ A∗ . Therefore for symmetric operators we obtain A ⊆ A∗∗ ⊆ A∗ . If A is closed and symmetric, then A = A∗∗ ⊆ A∗ . Finally, if A is self-adjoint, then A = A∗∗ = A∗ . Therefore, a closed symmetric operator A is self-adjoint if and only if A∗ is symmetric.

3.7 Compact Operators – Fredholm Operators In this section we study a class of operators that closely resemble the operators on finite dimensional spaces. These operators are similar to N × N matrices and so are small in the sense that they map the closed unit ball to a small set. Definition 3.7.1. Let X, Y be Banach spaces and let D ⊆ X be nonempty subset. A map f : D → Y, not necessarily linear, is said to be compact if it is continuous and for every bounded set B ⊆ D, the set f(B) ⊆ Y is compact. By K(D, Y) we denote the family of all compact maps. If D = X, then we define Lc (X, Y) = K(X, Y) ∩ L(X, Y). Remark 3.7.2. If Y is finite dimensional, then every continuous bounded map f : D → Y is compact. If A ∈ Lc (X, Y), then R(A) is separable.

3.7 Compact Operators – Fredholm Operators | 253

Another notion closely related to compactness is the following one. Definition 3.7.3. Let X, Y be Banach spaces and D ⊆ X is nonempty. A map f : D → Y is w said to be completely continuous if for every sequence {x n }n≥1 ⊆ D such that x n → x with x ∈ D, it follows f(x n ) → f(x) in Y. Remark 3.7.4. Completely continuous operators A ∈ L(X, Y) are also known as Dunford–Pettis Operators. It is easy to see that a linear operator A : X → Y is completely continuous if and only if A(C) ⊆ Y is compact for every weakly compact C ⊆ X. In general the classes of compact maps and of completely continuous maps are distinct. However, for linear operators we can relate the two classes. Proposition 3.7.5. If X, Y are Banach spaces and A ∈ Lc (X, Y), then A is completely continuous. w

Proof. Let x n → x in X. Then {x n }n≥1 ⊆ X is bounded and so {A(x n )}n≥1 ⊆ Y is compact. Thus there exists a subsequence {x n k }k≥1 of {x n }n≥1 such that A(x n k ) → y in Y. From w

Proposition 3.3.23, one has A(x n ) → A(x) in Y. Therefore y = A(x), and so we conclude that A(x n ) → A(x) in Y. This proves that A is completely continuous. Example 3.7.6. The converse is not true in general. Recall that in l1 , weak and norm convergent sequences coincide; see Remark 3.3.17. Then the identity map i : l1 → l1 is a completely continuous linear operator, but clearly it is not compact. However, if we strengthen the structure of X, then the converse of Proposition 3.7.5 holds. In fact we obtain the following result. Proposition 3.7.7. If X is a reflexive Banach space, Y is a Banach space, D ⊆ X is nonempty, w-closed, and f : D → Y is completely continuous, then f ∈ K(D, Y). Proof. Evidently, f is continuous. Let B ⊆ D be a bounded set. We need to show that f(B) ⊆ Y is compact. So, let {y n }n≥1 ⊆ f(B) ⊆ Y. Then y n = f(x n ) with {x n }n≥1 ⊆ B. The reflexivity of X implies that B is relatively weakly compact. So, the Eberlein–Smulian Theorem, Theorem 3.4.14, says that there exists a subsequence {x n k }k≥1 of {x n }n≥1 such w

that x n k → x ∈ D. We get y n k = f(x n k ) → f(x) ∈ f(B), which means that f(B) ⊆ Y is compact. Corollary 3.7.8. If X is a reflexive Banach space, Y is a Banach space, and A ∈ L(X, Y), then A ∈ Lc (X, Y) if and only if A is completely continuous. The next theorem explains why compact maps resemble maps between finite dimensional spaces. First a simple lemma about relatively compact sets in a Banach space Y. Lemma 3.7.9. If Y is a Banach space, K ⊆ Y is nonempty and for every ε > 0, there exists a relatively compact set K ε ⊆ Y such that for every y ∈ K we can find y ε ∈ K ε such that ‖y − y ε ‖Y < ε, then K ⊆ Y is relatively compact.

254 | 3 Basic Functional Analysis Proof. Let ε > 0 be given. There exists a relatively compact set K ε/2 ⊆ Y as postulated by the hypothesis of the lemma. The total boundedness of K ε/2 implies that there exist {y kε }m k=1 ⊆ K ε/2 such that m

K 2ε ⊆ ⋃ B 2ε (y kε ) . k=1

By hypothesis, given y ∈ K, there exists y ε/2 ∈ K ε/2 such that ‖y − y ε/2 ‖Y < ε/2. Since k k y ε/2 ∈ B ε/2 (y ε 0 ) for some k0 ∈ {1, . . . , m} one has ‖y ε/2 − y ε 0 ‖Y < ε/2. Therefore k0 k ‖y − y ε ‖Y < ε, which implies K ⊆ ⋃m k=1 B ε (y ε ). Hence, K is totally bounded and so relatively compact. Theorem 3.7.10. If X, Y are Banach spaces, D ⊆ X is nonempty, bounded, and f : D → Y, then the following two statements are equivalent: (a) f ∈ K(D, Y). (b) For every ε > 0 there exists a continuous, bounded map f ε : D → Y such that ‖f(x) − f ε (x)‖Y < ε for all x ∈ D and f ε (D) ⊆ conv f(D) as well as dim(span f ε (D)) < ∞. Proof. (a) ⇒ (b): Since f is compact, f(D) ⊆ Y is relatively compact. So, for every ε > 0 there exists a sequence {y k }m k=1 ⊆ f(D) such that min

k∈{1,...,m}

‖f(x) − y k ‖Y < ε

for all x ∈ D .

(3.7.1)

Recall that f(D) is totally bounded. Let λ k (x) = max{ε − ‖f(x) − y k ‖Y , 0}. Clearly λ k : D → ℝ+ with k = 1, . . . , m are continuous functions and do not all vanish simultaneously for x ∈ D, see (3.7.1). We introduce the map f ε : D → Y defined by m

∑ λ k (x)y k f ε (x) =

k=1 m

.

(3.7.2)

∑ λ k (x) k=1

Evidently f ε is continuous, bounded, and m m ∑ λ k (x)(y k − f(x)) λ k (x)ε ∑ k=1 k=1 =ε ‖f ε (x) − f(x)‖Y = < m m ∑ λ k (x) ∑ λ k (x) Y k=1 k=1 for all x ∈ D; see (3.7.1). Then the boundedness of f(D) implies the boundedness of f ε (D) while from (3.7.2) we see that dim(span f ε (D)) < ∞. Therefore, f ε is compact, and it is clear from (3.7.2) that f ε (D) ⊆ conv f(D). (b) ⇒ (a): Let ε = 1/n with n ∈ ℕ. Then there exist continuous, bounded maps f1/n : D → Y such that 1 f(x) − f 1 (x) < Y n n

for all x ∈ D ,

3.7 Compact Operators – Fredholm Operators | 255

which shows that f is continuous since it is the uniform limit of continuous maps. Let y = f(x) with x ∈ D. Then it follows that ‖y − y n ‖

0, X there exists n0 ∈ ℕ such that ‖A n (x) − A(x)‖Y < ε/2 for all x ∈ B1 and for all n ≥ n0 . X

The set A n0 (B1 ) is totally bounded; recall that A n0 ∈ Lc (X, Y). Hence, there exists a X

X

finite ε/2-net F ⊆ A n0 (B1 ); see Definition 1.5.31. Given x ∈ B1 there exists y ∈ F such that ‖A n0 (x) − y‖Y < ε/2. Then ‖A(x) − y‖Y ≤ ‖A(x) − A n0 (x)‖Y + ‖A n0 (x) − y‖Y < X

ε ε + =ε. 2 2

X

Hence, F is an ε-net for A(B1 ), thus, A(B1 ) is relatively compact. Therefore, A ∈ Lc (X, Y). Proposition 3.7.15. If X, Y, V are Banach spaces, A ∈ L(X, Y), T ∈ L(Y, V), and A or T is compact, then T ∘ A ∈ Lc (X, Y). X

X

Proof. First suppose that A is compact. Then A(B1 ) ⊆ Y is compact, hence T(A(B1 )) ⊆ V is compact. This means that T ∘ A ∈ Lc (X, V).

256 | 3 Basic Functional Analysis X

Now suppose that T is compact. The set A(B1 ) ⊆ Y is bounded. Since T ∈ Lc (Y, V) we have that

X T(A(B1 ))

⊆ V is relatively compact. This means that T ∘ A ∈ Lc (X, V).

Corollary 3.7.16. If X is a Banach space, then Lc (X) is a closed ideal of L(X). The next characterization of operator compactness is very useful in many occasions and is known as “Schauder’s Theorem.” Theorem 3.7.17 (Schauder’s Theorem). If X, Y are Banach spaces and A ∈ L(X, Y), then A ∈ Lc (X, Y) if and only if A∗ ∈ Lc (Y ∗ , X ∗ ). X

Proof. ⇒: Let K = A(B1 ). Then K ⊆ Y is compact. Moreover, let B ⊆ Y ∗ be bounded. Then |⟨y∗ , y1 − y2 ⟩| ≤ c‖y1 − y2 ‖

for all y∗ ∈ B1 , for all y1 , y2 ∈ K , for some c > 0 .

This shows that B ⊆ C(K) is bounded and equicontinuous. So, invoking the Arzela– Ascoli Theorem (see Theorem 1.6.16), we infer that B is relatively compact. Then, if {y∗n }n≥1 ⊆ B, there exists a subsequence {y∗n k }k≥1 of {y∗n }n≥1 , which is a uniformly Cauchy X

sequence on K. This implies that {y∗n k A}k≥1 is a uniformly Cauchy sequence on B1 . Therefore, {y∗n k A}k≥1 ⊆ X ∗ is convergent. But by Definition 3.6.6, y∗n k A = A∗ (y∗n k ). Thus, we conclude that A∗ ∈ Lc (Y ∗ , X ∗ ). ⇐: From the previous implication we obtain that A∗∗ ∈ Lc (X ∗∗ , Y ∗∗ ). Let j X : X → X ∗∗ and j Y : Y → Y ∗∗ be the corresponding canonical embeddings. Then A = j−1 Y ∘ A∗∗ ∘ j X and so Proposition 3.7.15 implies that A ∈ Lc (X, Y). Definition 3.7.18. (a) If X is a vector space and V is a vector subspace of X, then the codimension of V in X is the dimension of the quotient vector space X/V. (b) Let X, Y be Banach spaces and A ∈ L(X, Y). We say that A is a Fredholm operator if N(A) is finite dimensional and R(A) has finite codimension. The number i(A) = dim N(A) − codim R(A) = dim N(A) − dim(Y/R(A)) is called the index of A. Remark 3.7.19. If A ∈ L(X, Y) is a Fredholm operator, then X = N(A) ⊕ V and AV is an isomorphism of V onto R(A). Moreover, R(A) ⊆ Y is closed. Lemma 3.7.20. If X is a Banach space, A ∈ L(X), T = i X − A, and V = R(T) is a X proper closed subspace of X, then for every ε > 0 there exists x0 ∈ B1 such that d(A(x0 ), A(V)) ≥ 1 − ε. Proof. According to the Riesz Lemma (see Lemma 3.1.20), there exists x0 ∈ X with ‖x0 ‖ = 1 such that d(x0 , V) ≥ 1 − ε. One has T(x0 ) ∈ V and A(V) = (i X − T)(V) ⊆ Y. Therefore, d(A(x0 ), A(V)) ≥ d(A(x0 ) + T(x0 ), V) = d(x0 , V) ≥ 1 − ε . Using this lemma we can prove the following theorem, which gives an important class of Fredholm operators.

3.7 Compact Operators – Fredholm Operators | 257

Theorem 3.7.21. If X is a Banach space, A ∈ Lc (X), and λ ≠ 0, then λi X − A is a Fredholm operator. Proof. Clearly, we may assume that λ = 1. Let N = N(i X − A). For every x ∈ N one has A(x) = x. Therefore AN is an isomorphism with a subspace of X and AN is compact as well. It follows that N is finite dimensional. Proposition 3.2.28 implies that there is a closed subspace V of X such that X = N ⊕ V. Let T = i X − A and T̂ = T V . We obtain that R(T) = T(V) = R(T)̂ and N(T)̂ = N ∩ V = {0}, hence T̂ is injective. We claim that ̂ : x ∈ V, ‖x‖ = 1] > 0 . inf [T(x)

(3.7.3)

Arguing by contradiction, suppose that (3.7.3) does not hold. Then there exists x n ∈ V ̂ with ‖x n ‖ = 1 for all n ∈ ℕ such that T(x n ) → 0. Since A ∈ Lc (X) we may assume that ̂ A(x n ) → u in X. Note that A(x n ) = x n for all n ≥ 1, so ‖u‖ = 1. Moreover, T(u) = 0 and this contradicts the injectivity of T.̂ ̂ From (3.7.3) we infer that ‖T(x)‖ ≥ c‖x‖ for all x ∈ V and for some c > 0. Then, ̂ Theorem 3.6.45 implies that R(T) = R(T) ⊆ X is closed. We will show that codim R(T) < ∞. Inductively we define T0 = iX ,

T1 = T

and

T k+1 = TT k

for all k ∈ ℕ0 .

Moreover we set N k = N(T k ). Since T k = (i X − T)k and powers of compact operators are again compact operators (see Proposition 3.7.15), we get T k = i X − S k with S k ∈ Lc (X). From the first part of the proof we see that dim N k < ∞ for all k ∈ ℕ0 . Let Z k = R(T k ) = T k (V1 ) with k ∈ ℕ0 . We have that {N k }k∈ℕ0 is increasing and {Z k }k∈ℕ0 is decreasing .

(3.7.4)

For some n ∈ ℕ0 we obtain Z n = Z n+1 . Indeed if all the inclusions Z k ⊇ Z k+1 are strict, Zn then with Lemma 3.7.20 there exists u n ∈ B1 such that d(A(u n ), A(Z n+1 ) ≥ 1/2. Then ‖A(u n ) − A(u m )‖ ≥ 1/2 for n ≠ m, a contradiction to the fact that A ∈ Lc (X). Similarly, for some m ∈ ℕ0 , it holds N m = N m+1 . Indeed if x ∈ N k , that is, T k (x) = 0, then T k−1 (T(x)) = 0 and so T(x) ⊆ N k−1 ⊆ N k ; see (3.7.4). Therefore, again via Lemma 3.7.20, we conclude that N m = N m+1 for some m ∈ ℕ0 . Thus, we obtain Z n = Z n

for all n ≥ n

and

N m = N m

for all m ≥ m .

Let i = max{n, m}. We claim that X = N i ⊕ Z i . Let x ∈ X. Then T i (x) ∈ Z i and T i (Z i ) = T i (T i (X)) = T 2i (X) = T i (X) = Z i . Therefore there exists u ∈ Z i such that T i (u) = T i (x), hence T i (u − x) = 0. Therefore, u − x ∈ N i and x = x − u + u. Since X = N i ⊕ Z i , the codimension of Z i and also of Z1 ⊇ Z i is finite. Example 3.7.22. (a) If X, Y are finite dimensional Banach spaces, then every linear operator A : X → Y is a Fredholm operator and i(A) = dim X − dim Y. (b) If X, Y are Banach spaces and A ∈ L(X, Y) is a bijection, then A is a Fredholm operator and i(A) = 0.

258 | 3 Basic Functional Analysis (c) Let X = l p with 1 ≤ p ≤ ∞ and let A ∈ L(l p ) be defined by A(x)̂ = (x n+k )n≥1

for all x̂ = (x n )n≥1 ∈ l p and for some k ∈ ℕ .

Recall that for every n ∈ ℕ, e n = (0, . . . , 0, 1, 0 . . .) where 1 is located at the th

n =entry. We see that N(A) = span{e n }kn=1 , R(A) = {e n }n≥k+1 , and R(A) = l p . Therefore A is a Fredholm operator and i(A) = k. Let us consider the case where X = Y are Banach spaces and A ∈ Lc (X). Then according to Theorem 3.7.21, i X − A is a Fredholm operator. The next theorem, known as the “Fredholm Alternative Theorem,” asserts that either the nonhomogenous linear equation x − A(x) = u has a solution x ∈ X for every u ∈ X or the corresponding homogeneous equation x − A(x) = 0 has a nontrivial solution. The result has interesting applications in boundary values problems. Theorem 3.7.23 (Fredholm Alternative Theorem). If X is a Banach space, A ∈ Lc (X, Y) and λ ≠ 0, then the equation λx − A(x) = u has a solution for every u ∈ X if and only if the equation x − A(x) = 0 only has the trivial solution. Proof. Again we may assume that λ = 1. Let T = i X − A. If A(x) − x = 0 only has the trivial solution, then N = N(T) = {0} and so T is an isomorphism into. We will show that it is surjective. Let V k = R(T k ) for all k ∈ ℕ0 . From the proof of Theorem 3.7.21 we know that there exists n ∈ ℕ0 such that V k = V n for all k ≥ n. We claim that V1 = V0 = X. If this is not the case, let m ∈ ℕ be the smallest integer such that V m−1 ≠ V m = V m+1 . We pick u ∈ V m−1 \ V m . Then T(u) ∈ V m = V m+1 . Hence, there exists v ∈ V m such that T(u) = T(v) and u ≠ v since u ∈ ̸ V m . But this contradicts the injectivity of T. Next, assume that T is surjective. Let N k = N(T k ) for k ∈ ℕ. We need to show that N1 = N(T) = {0}. Recall that {N k }k≥1 is increasing. Arguing by contradiction, suppose that there is x1 ≠ 0 such that x1 ∈ N1 . Inductively we will generate a sequence {x k }k≥1 ⊆ X such that T(x k+1 ) = x k and x k ∈ N k \ N k−1 for all k ∈ ℕ. Suppose that x1 , . . . , x k have been constructed. Since R(T) = X, there exists x k+1 ∈ X such that T(x k+1 ) = x k . Then T k (x k+1 ) = T k−1 (x k ) = ⋅ ⋅ ⋅ = x1 ≠ 0 and T k (x k+1 ) = T(x1 ) = 0. This completes the induction. Since N m = N m+1 for some m ∈ ℕ0 (see the proof of Theorem 3.7.21), we have proven the assertion of the theorem. Next we prove a duality property of Fredholm operators, that is, we show that A ∈ L(X, Y) is Fredholm if and only if A∗ ∈ L(Y ∗ , X ∗ ) is Fredholm. We start with a simple lemma. Lemma 3.7.24. If X, Y are Banach spaces, A ∈ L(X, Y) and dim(Y/R(A)) < ∞, then R(A) ⊆ Y is a closed subspace. Proof. Let m = dim(Y/R(A)) < ∞. Then there exist vectors {y k }m k=1 ⊆ Y such that [y k ] = y k + R(A) ∈ Y/R(A)

for all k ∈ {1, . . . , m}

3.7 Compact Operators – Fredholm Operators | 259

form a basis of Y/R(A). We introduce the space X̂ = X × ℝm

̂ ̂ = ‖x‖X + |λ|̂ with norm ‖(x, λ)‖ X

m ̂ for all x ∈ X and for all λ̂ = (λ k )m k=1 ∈ ℝ . Of course X with the norm above is a Banach ̂ ̂ space. Let A ∈ L(X, Y) be defined by m

̂ A(x, λ)̂ = A(x) + ∑ λ k y k . k=1

Then Â is surjective and N(A)̂ = {(x, λ) ∈ X × ℝm : A(x) = 0, λ̂ = 0} = N(A) × {0} . Invoking Theorem 3.8.19, there exists c > 0 such that m inf[‖x + u‖X : u ∈ N(A)] + |λ|̂ ≤ c A(x) + ∑ λ k y k k=1 Y

for all x ∈ X, λ̂ ∈ ℝm .

Let λ̂ = 0. Then inf[‖x + u‖X : u ∈ N(A)] ≤ c ‖A(x)‖Y

for all x ∈ X ,

which shows that R(A) ⊆ Y is closed; see Theorem 3.8.19. Using this lemma, we can prove the duality property for Fredholm operators. Theorem 3.7.25. If X, Y are Banach spaces and A ∈ L(X, Y), then the following hold: (a) A is a Fredholm operator if and only if A∗ is a Fredholm operator. (b) If A is a Fredholm operator, then dim N(A∗ ) = dim(Y/R(A)) and dim N(A) = dim(X ∗ /R(A∗ )). Proof. According to Theorem 3.8.19, R(A) ⊆ Y is closed if and only if R(A∗ ) ⊆ X ∗ is closed. So, we may assume that both R(A) ⊆ Y and R(A∗ ) ⊆ X ∗ are closed subspaces. Then R(A∗ ) = N(A)⊥

and

R(A)⊥ = N(A∗ ) ;

(3.7.5)

see Proposition 3.6.44. Applying Proposition 3.2.25, one has N(A)∗ = X ∗ /N(A)⊥ = X ∗ /R(A∗ ) and (Y/R(A))∗ = R(A)⊥ = N(A∗ ) ; see (3.7.5). This completes the proof of both statements (a) and (b). The last part of this section is devoted to the spectral theory of bounded linear operators. First, let us recall some standard results about invertible operators. Recall that A ∈ L(X, Y) is invertible if and only if it is an isomorphism of X onto Y with X, Y being Banach spaces. Moreover, from Proposition 3.6.7(e) we know that A ∈ L(X, Y) with X, Y being Banach spaces is invertible if and only if A∗ is invertible and (A−1 )∗ = (A∗ )−1 . In addition, if X, Y, V are Banach spaces and A ∈ L(X, Y), T ∈ L(Y, V) are invertible operators, then T ∘ A ∈ L(X, V) is invertible as well and (T ∘ A)−1 = A−1 T −1 .

260 | 3 Basic Functional Analysis Lemma 3.7.26. If X is a Banach space, A ∈ L(X) and ‖A‖L < 1, then i X − A ∈ L(X) is invertible and (i X − A)−1 = ∑n≥0 A n with the series being absolutely convergent. Proof. Note that ∑ ‖A n ‖L ≤ ∑ ‖A‖nL < ∞ n≥0

n≥0

since by hypothesis ‖A‖L < 1. Hence ∑n≥0 A n is absolutely convergent in L(X). Then we obtain (i X − A) ∑ A n = (i X − A) + (A − A2 ) + . . . = i X , n≥0

which is called the telescoping sum. Similarly we get (∑n≥0 A n )(i X − A) = i X . Therefore we conclude that i X − A ∈ L(X) is invertible and (i X − A)−1 = ∑n≥0 A n . Lemma 3.7.27. If X is a Banach space, A, T ∈ L(X), A is invertible and ‖A − T‖L < 1/‖A−1 ‖L , then T is invertible as well and ‖T −1 −A−1 ‖L ≤ (‖A−1 ‖2L ‖T−A‖L )/(1−‖A−1 ‖L ‖T− A‖L ). Proof. Note that ‖A−1 (A − T)‖L ≤ ‖A−1 ‖L ‖T − A‖L < 1 . Using Lemma 3.7.26 it follows that i X − A−1 (A − T) = A−1 T ∈ L(X) is invertible. Hence T ∈ L(X) is invertible since T = A(A−1 T). Moreover, we get (i X − A−1 (A − T))−1 = ∑ (A−1 (A − T))n ; n≥0

see Lemma 3.7.26. Therefore T −1 = (A − (A − T))−1 = (A(i X − A−1 (A − T)))−1 = ∑ (A−1 (A − T))n A−1 . n≥0

Thus, n ‖T −1 − A−1 ‖L ≤ ∑ (A−1 (A − T))n A−1 L ≤ ‖A−1 ‖L ∑ (‖A−1 ‖L ‖A − T‖L ) n≥1

=

1

‖A−1 ‖2L ‖A − T‖L − ‖A−1 ‖L ‖T − A‖L

n≥1

.

Corollary 3.7.28. If X is a Banach space and L ⊆ L(X) is the set of all invertible operators, then L is an open set in L(X) and the map A → A−1 is a homeomorphism of L onto L. Now we introduce the spectrum of a bounded linear operator. In order to have a complete spectral theory we need to assume that X is a complex Banach space. Definition 3.7.29. Let X be a complex Banach space and let A ∈ L(X). The spectrum σ(A) of A is the set σ(A) = {λ ∈ ℂ : λi X − A is not invertible} .

3.7 Compact Operators – Fredholm Operators |

261

The resolvent set ρ(A) of A is the complement of σ(A), that is, ρ(A) = ℂ \ σ(A). The elements of ρ(A) are called regular values of A. Moreover, if λ ∈ ρ(A), then R(λ) = (λi X − A)−1 ∈ L(X) is called the resolvent of A at λ. The spectrum of A is decomposed in the following way: Pσ(A) = {λ ∈ ℂ : λi X − A is not injective} , Rσ(A) = {λ ∈ ℂ : λi X − A is injective but R(λi X − A) ⊆ X is not dense} , Cσ(A) = {λ ∈ ℂ : λi X − A is injective, R(λi X − A) ⊆ X is dense but λi X − A is not surjective} . We call Pσ(A) the point spectrum of A, Rσ(A) is the residual spectrum of A, and Cσ(A) is the continuous spectrum of A. Given λ ∈ ℂ we see that λ ∈ Pσ(A) if and only if there exists x ∈ X \ {0} such that A(x) = λx. The elements of Pσ(A) are called eigenvectors for λ and N(λi X − A) is the eigenspace for λ. Remark 3.7.30. If X is finite dimensional and n = dim X, then σ(A) = Pσ(A) and card σ(A) ≤ n. If X is infinite dimensional and A ∈ Lc (X), then 0 ∈ σ(A) or otherwise A would be a compact isomorphism, a contradiction. Proposition 3.7.31. If X is a Banach space and A ∈ L(X), then σ(A) = σ(A∗ ). Proof. From Proposition 3.6.7(e), we know that (λi X − A) is invertible if and only if (λi X − A)∗ is invertible. To conclude the proof just note that (λi X − A)∗ = λi X∗ − A∗ . On account of Remark 3.6.8, we can state the following corollary concerning operators defined on a Hilbert into itself. Corollary 3.7.32. If H is a complex Hilbert space and A ∈ L(H), then σ(A∗ ) = {λ : λ ∈ σ(A)}. Proposition 3.7.33. If X is a Banach space and A ∈ L(X), then σ(A) ⊆ ℂ is compact and if λ ∈ σ(A), then |λ| ≤ ‖A‖L . Proof. Corollary 3.7.28 implies that ρ(A) ⊆ ℂ is open. Hence, σ(A) = ℂ \ ρ(A) is closed. Let λ ∈ ℂ such that |λ| > ‖A‖L . Then λi X − A = λ(i X − 1/λA) and so with Lemma 3.7.26, λi X −A is invertible. Therefore, if λ ∈ σ(A), then |λ| ≤ ‖A‖L and σ(A) ⊆ ℂ is compact. The next result is valid only for complex Banach spaces. That is why we said that in order to have a complete theory, we need to consider Banach spaces over ℂ. Proposition 3.7.34. If X is a complex Banach space and A ∈ L(X), then σ(A) ≠ 0. Proof. We fix λ0 ∈ σ(A) and consider λ ∈ ℂ such that |λ − λ0 | < ‖(λ0 i X − A)−1 ‖−1 L . Using Lemma 3.7.27 for the operators λ0 i X − A and λi X − A, we get n

R(λ) = (λi X − A)−1 = ∑ [(λ0 i X − A)−1 (λ0 − λ)i X ] (λ0 i X − A)−1 n≥0

= ∑ (λ0 − λ)n (λ0 i X − A)−(n+1) = ∑ (λ0 − λ)n R(λ0 )n+1 ; n≥0

n≥0

262 | 3 Basic Functional Analysis see the proof of Lemma 3.7.27. Note that the series is absolutely convergent. So λ → R(λ) is an analytic function from ρ(A) into L(X). From the proof of Proposition 3.7.33 we know that if |λ| > ‖A‖L , then R(λ) = ∑n≥0 1/λ n+1 A n , hence ‖R(λ)‖L ≤ 1/(|λ| − ‖A‖L ). Arguing by contradiction, suppose that ρ(A) = ℂ. Then R(λ) → 0 as |λ| → +∞. So with Liouville’s Theorem, we obtain that R ≡ 0, a contradiction since the values of R are invertible operators. Therefore ρ(A) ≠ ℂ and so σ(A) ≠ 0. As we already pointed out (see Remark 3.7.30), if dim X < ∞ and A ∈ L(X), then σ(A) = Pσ(A), just recall that in this case A is injective if and only if A is surjective. However, it is not true in general that every point of σ(A) is an eigenvalue. For compact operators every nonzero element of σ(A) is an eigenvalue. Proposition 3.7.35. If X is a Banach space, A ∈ Lc (X) and λ ∈ σ(A)\{0}, then λ ∈ Pσ(A). Proof. Suppose that λ ≠ 0 is not an eigenvalue of A. Then according to Definition 3.7.29 we obtain N(λi X − A) = {0}. Then with the Fredholm Alternative Theorem (see Theorem 3.7.23), we have R(λi X − A) = X. Hence, according to Theorem 3.2.10, λi X − A is invertible, which means that λ ∈ ̸ σ(A). Lemma 3.7.36. If X is a Banach space, A ∈ L(X), {λ k }nk=1 are distinct eigenvalues of A and e k is an eigenvector corresponding to λ k for each k = 1, . . . , n with n ∈ ℕ, then {e k }nk=1 ⊆ X are linearly independent. Proof. The proof goes by induction. So, suppose that {e k }n−1 k=1 are linearly independent. n−1 n−1 Let e n = ∑k=1 ϑ k e k with ϑ k ∈ ℂ. Then ∑n−1 λ ϑ e = λ n e n = A(e n ) = ∑k=1 λ k ϑ k e k . k=1 n k k n−1 Hence, ∑n−1 k=1 (λ n − λ k )ϑ k e k = 0. Since by the induction hypothesis {e k }k=1 ⊆ X are linearly independent and λ n − λ k ≠ 0, we must have ϑ k = 0 for all k = 1, . . . , n. Therefore {e k }nk=1 ⊆ X are linearly independent. Proposition 3.7.37. If X is a Banach space, A ∈ Lc (X), and ε > 0, then A has only finitely many eigenvalues λ ∈ ℂ such that |λ| > ε. Proof. Arguing by contradiction, suppose that there exist distinct eigenvalues {λ k }k≥1 such that |λ k | > ε for all k ∈ ℕ. For every eigenvalue λ k , we choose an eigenvector e k . For n ∈ ℕ let X n = span{e k }nk=1 . With Lemma 3.7.36 it follows that A(X n ) = X n and X n−1 ≠ X n . Invoking the Riesz Lemma (see Lemma 3.1.20), there is a u n ∈ X n such that d(u n , u n+1 ) ≥

1 2

and ‖u n ‖ = 1

for all n ≥ 2 .

(3.7.6)

Let y n = 1/λ n u n and note that ‖y n ‖ ≤ 1/ε. Then A(y n ) ∈ X n and u n − A(y n ) ∈ X n−1 . To see this second inclusion, note that y n = ∑nk=1 ϑ k e k with ϑ k ∈ ℂ. Then n

u n − A(y n ) = ∑ (1 − k=1

n−1 λk λk ) ϑ k e k = ∑ (1 − ) ϑ k e k ∈ X n−1 . λn λ n k=1

3.7 Compact Operators – Fredholm Operators |

263

Let n > m. Then A(y m ) ∈ X m ⊆ X n−1 and u n − A(y n ) ∈ X n−1 . Therefore one has ‖A(y n ) − A(y m )‖ ≥ d(A(y n ), X n−1 ) = d(A(y n ) + u n − A(y n ), X n−1 ) 1 = d(u n , X n−1 ) ≥ ; 2

(3.7.7)

see (3.7.6). But {y n }n≥1 ⊆ A(B ε ) and the latter is relatively compact, a contradiction to (3.7.7). This proves that only finitely many eigenvalues λ ∈ ℂ satisfy |λ| > ε. Combining this proposition with Theorem 3.7.21 we obtain the following corollary. Corollary 3.7.38. If X is a Banach space and A ∈ Lc (X), then σ(A) = {0} ∪ Pσ(A) with Pσ(A) either a finite set possibly empty or a sequence {λ k }k≥1 ⊆ ℂ exists such that λ k → 0 as k → ∞ and each λ k has a corresponding eigenspace that is finite dimensional. Now we focus on self-adjoint operators defined on a Hilbert space. Proposition 3.7.39. If H is a Hilbert space and A ∈ L(H) is self-adjoint, then Pσ(A) ⊆ ℝ and eigenvectors corresponding to different eigenvalues are orthogonal. Proof. Since A ∈ L(H) is self-adjoint, from Definition 3.6.13(b) it follows that (A(x), y) = (x, A(y)) for all x, y ∈ H . Suppose x = y ∈ H. Then (A(x), x) = (x, A(x)) = (A(x), x) for all x ∈ H . Hence (A(x), x) ∈ ℝ

for all x ∈ H .

(3.7.8)

Suppose that λ ∈ Pσ(A). then (A(x), x) = (λx, x) = λ‖x‖2 , which implies, because of (3.7.8), that λ = (A(x), x)/‖x‖2 ∈ ℝ. Next let λ, μ ∈ Pσ(A) with λ ≠ μ and suppose that x, u ∈ H are eigenvectors corresponding to λ, μ, respectively. Then one gets (A(x), u) = (λx, u) = λ(x, u) , (A(x), u) = (x, A(u)) = (x, μu) = μ(x, u) since the eigenvalues are real; see above. It follows that (λ − μ)(x, u) = 0. As λ ≠ μ we conclude that (x, u) = 0. Proposition 3.7.40. If H is a Hilbert space and A ∈ L(H) is self-adjoint, then λ ∈ σ(A) if and only if inf [‖λx − A(x)‖ : ‖x‖ = 1] = 0.

264 | 3 Basic Functional Analysis Proof. ⇒: Suppose that inf [‖(λi H − A)(x)‖ : ‖x‖ = 1] > 0. Then there exists c > 0 such that ‖(λi H − A)(x)‖ ≥ c‖x‖

for all x ∈ H .

(3.7.9)

We will show that (λi H − A)−1 ∈ L(H) and so λ ∈ ρ(A). According to Proposition 3.6.35, it suffices to show that R(λi H − A) is dense in H. If this is not the case, then there exists û ∈ H \ {0} such that ((λi H − A)(x), u)̂ = 0 for all x ∈ H. This gives (x, (λi H − A)u)̂ = 0 ̂ that is, λ ∈ Pσ(A). for all x ∈ H. Therefore λ û = A(u), But from Proposition 3.7.39 we know that Pσ(A) ⊆ ℝ. Hence, λ = λ and so (λi H − A)(u)̂ = 0, a contradiction to (3.7.9). It follows that R(λi H − A) is dense in H and so Proposition 3.6.35 implies that (λi H − A)−1 ∈ L(H), and thus λ ∈ ρ(A). ⇐: Let λ ∈ ρ(A). Then (λi H − A)−1 ∈ L(H). So, for x ∈ H with ‖x‖ = 1 we get 1 = ‖x‖ = ‖(λi H − A)−1 (λi H − A)(x)‖ ≤ ‖(λi H − A)−1 ‖L ‖(λi H − A)(x)‖ ≤ ‖λi H − A‖−1 L ‖(λi H − A)(x)‖ . Hence, ‖λi H − A‖L ≤ ‖(λi H − A)(x)‖, which gives ‖(λi H − A)−1 ‖−1 L ≤ inf [‖(λi H − A)(x)‖ : ‖x‖ = 1] . So, if inf [‖(λi H − A)(x)‖ : ‖x‖ = 1] = 0, then we must have λ ∈ σ(A). Using this proposition we can conclude that the spectrum of a self-adjoint operator is real; compare with Proposition 3.7.39. Proposition 3.7.41. If H is a Hilbert space and A ∈ L(H) is self-adjoint, then σ(A) ⊆ ℝ. Proof. Let λ = η + iϑ with ϑ ≠ 0. For every x ∈ H with ‖x‖ = 1 we obtain (λx − A(x), x) − (x, λx − A(x)) = (λ − λ)‖x‖2 = 2iϑ . Hence, 2|ϑ| = |(λx − A(x), x) − (x, λx − A(x))| ≤ |(λx − A(x), x)| + |(x, λx − A(x)| ≤ 2‖(λi H − A)(x)‖ . Therefore, |ϑ| ≤ inf[‖(λi H − A)(x)‖ : ‖x‖ = 1] .

(3.7.10)

So, from (3.7.10) and Proposition 3.7.40, we see that λ ∈ σ(A) implies that ϑ = 0. Thus, σ(A) ⊆ ℝ. Using this fact we can locate more precisely the spectrum of a self-adjoint operator. Proposition 3.7.42. If H is a Hilbert space, A ∈ L(H) is self-adjoint, and m A = inf[(A(x), x) : ‖x‖ = 1] , then σ(A) ⊆ [m A , M A ].

M A = sup[(A(x), x) : ‖x‖ = 1] ,

3.7 Compact Operators – Fredholm Operators |

265

Proof. Note that if T = A + μi H , then T ∈ L(H) is self-adjoint and m T = m A + μ as well as M T = M A + μ. So, without any loss of generality we may assume that 0 ≤ m A ≤ M A . From Proposition 3.6.16, we know that M A = ‖A‖L , while from Proposition 3.7.41, we know that σ(A) ⊆ ℝ. We will show that, for every ϑ > 0, λ = M A + ϑ ∈ ̸ σ(A). According to Proposition 3.7.40, it suffices to show that inf[‖(λi H − A)(x)‖ : ‖x‖ = 1] > 0 . For every x ∈ H with ‖x‖ = 1 one has ((λi H − A)(x), x) = (λx, x) − (A(x), x) ≥ (λ − M A )‖x‖2 = ϑ‖x‖2 = ϑ . This gives 0 < ϑ ≤ ‖(λi H − A)(x)‖ for all x ∈ H with ‖x‖ = 1. Therefore, 0 < ϑ ≤ inf[‖(λi H − A)(x)‖ : ‖x‖ = 1]. Then, due to Proposition 3.7.40, this finally proves that λ ∈ ̸ σ(A). Similarly we show that for every ϑ > 0, λ = m A − ϑ ∈ ̸ σ(A). Hence, we conclude that σ(A) ⊆ [m A , M A ]. Proposition 3.7.43. If H is a Hilbert space and A ∈ L(H) is self-adjoint, then m A , M A ∈ σ(A); see Proposition 3.7.42. Proof. As before (see the proof of Proposition 3.7.42), we may assume that 0 ≤ m A ≤ M A . Recall that M A = ‖A‖L ; see Proposition 3.6.16. Let {x n }n≥1 ⊆ H with ‖x n ‖ = 1 for all n ∈ ℕ such that (A(x n ), x n ) → M A = ‖A‖L

as n → ∞ .

(3.7.11)

Then, using the fact that (A(x), x) ≥ 0 for all x ∈ H and the validity of (3.7.11), it follows that 0 ≤ ‖(M A i H − A)(x n )‖2 = (M A x n − A(x n ), M A x n − A(x n )) = M 2A + ‖A(x n )‖2 − 2M A (A(x n ), x n ) ≤ M 2A + M 2A − 2M A (A(x n ), x n )) → 0 as n → ∞. Hence inf[‖(M A i H − A)(x)‖ : ‖x‖ = 1] = 0, which gives M A ∈ σ(A); see Proposition 3.7.40. Similarly we show that m A ∈ σ(A). Next we restrict further ourselves to compact self-adjoint operators. Proposition 3.7.44. If H is a Hilbert space and A ∈ Lc (H) is self-adjoint, then Pσ(A) ≠ 0. Proof. If A = 0, then λ = 0 ∈ Pσ(A). So, suppose that A ≠ 0. Then Proposition 3.7.43 gives ‖A‖L ∈ σ(A). Since ‖A‖L ≠ 0, Corollary 3.7.38 implies that ‖A‖L ∈ Pσ(A). Proposition 3.7.45. If H is an infinite dimensional Hilbert space and A ∈ Lc (H) \ {0} is self-adjoint, then σ(A) = {0} ∪ {λ k }k≥1 with λ k being distinct nonzero eigenvalues of A, one of these eigenvalues equals ‖A‖L and {λ k }k≥1 is either finite or a countable sequence such that λ k → 0. Moreover, the Hilbert space H admits an orthonormal basis consisting of eigenvectors corresponding to the eigenvalues of A.

266 | 3 Basic Functional Analysis Proof. This is basically Corollary 3.7.38. From Proposition 3.7.43 we also know that one of the eigenvalues equals ‖A‖L . It remains to prove the last part of the proposition concerning the basis of H. Let λ ∈ Pσ(A) and let N λ = N(λi H − A). From Theorem 3.7.21 we know that dim N λ < +∞. Let B λ be an orthonormal basis for N λ and let B = ⋃λ∈Pσ(A) B λ . From Proposition 3.7.39 we know that B ⊆ H is an orthonormal set and ⊥ spanB contains all the eigenvectors of A. Suppose that H ≠ spanB and let V = (spanB) . Note that spanB is A-invariant. Hence so is V. One has σ(A) = σ(AspanB ) + σ(AV ). But σ(AV ) contains an eigenvalue (see Proposition 3.7.44), and so a corresponding eigenvector u as well. Then u is also an eigenvector of A and so u ∈ V ∩ spanB, u ≠ 0, a contradiction. This means that H = spanB, and so B is an orthonormal basis of H. Corollary 3.7.46. If H is a Hilbert space and A ∈ Lc (H) is self-adjoint, then σ(A) = Pσ(A). Proof. If H is finite dimensional, then σ(A) = Pσ(A) and it is compact; see Proposition 3.7.33. If H is infinite dimensional, then Pσ(A) is a countable sequence or a finite sequence. If it is a countable sequence, then the conclusion follows from Proposition 3.7.44. If it is a finite sequence, then since the eigenspaces for the nonzero eigenvalues are finite dimensional (see Corollary 3.7.38), and H is infinite dimensional, then on account of Proposition 3.7.44 we must have that λ = 0 ∈ Pσ(A).

We have reached the main result on the spectral analysis of compact self-adjoint operators defined on a Hilbert space. The result is known as the “Spectral Decomposition Theorem.” Theorem 3.7.47 (Spectral Decomposition Theorem). If H is an infinite dimensional separable Hilbert space and A ∈ Lc (H) is self-adjoint, then there exists an orthonormal basis {e k }k≥1 ⊆ H consisting of eigenvectors corresponding to the distinct eigenvalues {λ k }k≥1 ⊆ ℝ and A(x) = ∑ λ k (x, e k )e k

for all x ∈ H .

k≥1

Moreover, for every λ ∈ ρ(A) and x ∈ H, it holds that (x, e k ) ek . λ − λk k≥1

R(λ)(x) = ∑

Proof. Let {e k }k≥1 ⊆ H be an orthonormal basis of H consisting of eigenvectors; see Propositions 3.7.43 and 3.5.47. Then, for 1 ≤ n < m, one has m 2 m m ∑ λ k (x, e k )e k = ∑ |λ k (x, e k )|2 ≤ ‖A‖L ∑ |(x, e k )|2 → 0 k=n k=n k=n as n → ∞; see Proposition 3.7.33. Hence, ∑k≥1 λ k (x, e k )e k converges in H.

3.7 Compact Operators – Fredholm Operators |

267

If ‖x‖ ≤ 1, then, for every n ∈ ℕ, we derive 2 n n n ∑ λ k (x, e k )e k = ∑ λ2 |(x, e k )|2 ≤ ‖A‖2 ∑ |(x, e k )|2 L k k=1 k=1 k=1 2 2 ≤ ‖A‖L ∑ |(x, e k )| = ‖A‖2L ‖x‖2 . k≥1

Consider the operator T defined by T(x) = ∑k≥1 λ k (x, e k )e k . Of course, T ∈ L(H). Hence, A(e k ) = T(e k ) for all k ∈ ℕ and so A = T. Now suppose that λ ∈ ρ(A). Recalling that σ(A) = ℂ \ ρ(A) is compact, it follows that d(λ, σ(A)) > ϑ > 0. Hence, |λ − λ k | > ϑ for all k ∈ ℕ. Therefore, m 2 m m 2 ∑ (x, e k ) e k = ∑ |(x, e k )| < 1 ∑ |(x, e k )|2 . |λ − λ k |2 ϑ2 k=n k=n λ − λ k k=n This shows that ∑k≥1 (x, e k )/(λ − λ k )e k is convergent in H for all x ∈ H. Let T(x) = ∑k≥1 (x, e k )/(λ − λ k )e k . Then, for ‖x‖ ≤ 1, we obtain n 2 n ∑ (x, e k ) e k ≤ 1 ∑ |(x, e k )|2 = 1 ‖x‖2 ≤ 1 . ϑ2 k=1 ϑ2 ϑ2 k=1 λ − λ k Thus, T ∈ L(H). Since x = ∑k≥1 (x, e k )e k , we have A(x) = ∑k≥1 λ k (x, e k )e k and (λi H − A)(x) = ∑ (λ − λ k )(x, e k )e k . k≥1

As (e k , e i ) = δ k,i we then get (λi H − A)(T(x)) = ∑ (λ − λ k ) k,i≥1

(x, e i ) (e k , e i )e k = ∑ (x, e k )e k = x . λ − λi k≥1

Similarly we show that T((λi H − A)(x)) = x for all x ∈ H. Therefore, T = R(λ). We conclude this section by introducing two more classes of bounded linear operators of Hilbert space into itself. Definition 3.7.48. Let H be a Hilbert space and A ∈ L(H). (a) We say that A is normal if A ∘ A∗ = A∗ ∘ A. (b) We say that A is unitary if A is invertible and A−1 = A∗ . Remark 3.7.49. Clearly every unitary operator is normal and every self-adjoint operator is normal. Proposition 3.7.50. If H is a Hilbert space and A ∈ L(H), then A is normal if and only if ‖A(x)‖ = ‖A∗ (x)‖ for all x ∈ H.

268 | 3 Basic Functional Analysis Proof. For every x ∈ H, we derive ‖A(x)‖2 − ‖A∗ (x)‖2 = (A(x), A(x)) − (A∗ (x), A∗ (x)) = (A∗ (A(x)), x) − (A(A∗ (x)), x)

(3.7.12)

= ((A ∘ A − A ∘ A )(x), x) . ∗

∗

From (3.7.12) it follows that A is normal if and only if ‖A(x)‖ = ‖A∗ (x)‖ for all x ∈ H. Proposition 3.7.51. If H is a Hilbert space and A ∈ L(H) is surjective, then the following statements are equivalent: (a) A is unitary. (b) (A(x), A(u)) = (x, u) for all x, u ∈ H. (c) A is an isometry. Proof. (a) ⇒ (b): For every x, u ∈ H it holds that (A(x), A(y)) = (A∗ (A(x)), u) = (x, u) . (b) ⇐⇒ (c): This follows from the polarization identities; see Proposition 3.5.6(b). (c) ⇒ (a): The operator A is an isometry and surjective, and hence, A−1 ∈ L(H); see Theorem 3.2.10. Moreover, for all x, u ∈ H, one has (A∗ (A(x)), u) = (A(x), A(u)) = (x, u) , since (b) is equivalent to (c). Hence, A∗ ∘ A = i H and similarly A ∘ A∗ = i H , and so A−1 = A∗ . Remark 3.7.52. So according to the proposition above, A ∈ L(H) is unitary if and only if it preserves inner products.

3.8 Remarks (3.1) The major development of mathematics in the twentieth century was the emphasis on the axiomatic method. This abstract tendency with emphasis on the structural properties led to the development of whole new areas such as “Functional Analysis” with the seminal contributions of Banach, von Neumann, and Riesz to mention only a few major figures and to “Modern Algebra” where prominent figures were Noether and van der Waerden. In this approach, the emphasis is not on the objects but on the rules used to handle them, which are the same for many different classes of objects. The power of the axiomatic method can be traced back in the work of Euclid who provided a model for space locally. The first breakthrough in the abstract axiomatic approach was achieved by Fréchet who introduced abstract metric spaces in this thesis [117]. He was the first to go beyond the familiar concrete Euclidean space setting. The normed space axioms (see Definition 3.1.13(e)) were first introduced by Banach [23] in this thesis. Normed spaces are a subset of metric spaces. The thing that makes normed

3.8 Remarks |

269

spaces such a prolific concept is the linkage between the algebraic and the topological structures of the space. This is expressed by the requirement that the two algebraic operations, namely vector addition and scalar multiplication, are continuous. This leads at a higher level of generality to the notion of a topological space. Moreover, the convexity of the balls in a normed space lead to topological vector spaces with a local neighborhood basis consisting of convex sets. These are the locally convex spaces (see Definition 3.1.13(b)) first introduced by von Neumann [303]. Until the midforties, the study of functional analysis focused on normed spaces. The first major paper on the theory of locally convex spaces was that of Dieudonné-Schwartz [84] motivated by Schwartz’s construction of the theory of distributions. Lemma 3.1.20, called Riesz Lemma, was proved by Riesz [246] and turned out to be a fruitful result for many occasions. Theorem 3.1.30 is due to Carathéodory [62] and Theorem 3.1.41, due to Kolmogorov [179], seems to be the first theorem about locally convex spaces. The Hahn–Banach Theorem (see Theorem 3.1.42) is crucial in the development of the theory of normed spaces. The first version of it was due to Minkowski, who proved that every boundary point in the closed unit ball of a finite dimensional normed space admits at least one supporting hyperplane through it. Later Helly [143] generalized the ideas of Minkowski to certain separable spaces. Fifteen years later, in 1927, Hahn [138] starting from the work of Helly, proved an extension theorem in a more general form without any separability hypothesis. Soon thereafter, we have the result of Banach [24] (see also Banach [25]), who proved the theorem in general vector spaces apart from any topology. The Hahn–Banach Theorem turned out to be a major tool in the development of the theory of locally convex spaces. Although the original proof uses transfinite induction, this part of the argument was later replaced by use of the Zorn’s Lemma. The complex version of the result (see Theorem 3.1.44) is due to Bohnenblust–Sobczyk [38] and Suchomlinov [280]. Theorem 3.1.59, the First Separation Theorem, is due to Edelheit [96]. Theorem 3.1.60, the Strong Separation Theorem, is due to Tukey [287] and Klee [176]. (3.2) Theorem 3.2.1, the Uniform Boundedness Principle, was first proved by Hahn [137] for sequences of linear functionals. A more general form was produced by Hildebrandt [148]. The general version of the result and a proof based on the Baire Category Theorem were provided by Banach–Steinhaus [26]; see also Theorem 3.2.2. Theorem 3.2.9, the Open Mapping Theorem, was proved by Schauder [263] for Banach spaces. Banach [25] extended the result to Fréchet spaces; see Definition 3.1.13(d). Theorem 3.2.10 and Theorem 3.2.14, the Closed Graph Theorem, are due to Banach [25]. Banach [25] extended both the Open Mapping Theorem and the Closed Graph Theorem to topological groups. The book of Banach [25] turned out to be one of the most influential books in analysis and remains a reference even today. Definition 3.8.1. Let P be a property of normed spaces. Suppose that if X is a normed space and V ⊆ X is a closed subspace such that if two of the spaces X, V, X/V have property P, then so does the third. Then we say that P is a three space property. Using this notion we can improve Proposition 3.2.17 in the following way.

270 | 3 Basic Functional Analysis

Proposition 3.8.2. Completeness is a three space property. (3.3) We point out that Banach [25] worked only with weakly convergent sequences and did not use the notion of “weak topology.” In certain occasions this led to unnecessary separability assumptions. The first explicity description of weak neighborhoods in a Hilbert space was given by von Neumann [301] who was the first to recognize that the weak topology is indeed a topology. He also realized the nonmetrizability of the weak topology in an infinite dimensional normed space; see Proposition 3.3.15. Further discussion on this issue can be found in Wehausen [306]. Proposition 3.3.16 was first proven for X = l2 by von Neumann [301]. Theorem 3.3.18 is due to Mazur [210]. Earlier particular versions of this result for the Banach space C[0, 1] can be found in Gillespie–Hurwitz [128] and Zalcwasser [312]. That bounded linear operators are weakly continuous was first observed by Banach [24]. The converse (see Proposition 3.3.23) is due to Bade [19]. Theorem 3.3.37, Goldstine’s Theorem, is naturally due to Goldstine [132] and Theorem 3.3.38, Alaoglu’s Theorem, was proved by Alaoglu [2]. For separable Banach spaces the theorem can be found in Banach [25]. For this reason some people call it the “Banach–Alaoglu Theorem”; see, for example, Megginson [212, p. 229]. Theorem 3.3.41 is due to James [164] and is one of the most influential results in Banach space theory. Another locally convex topology on X ∗ being the dual of the normed space X is the bounded weak* topology introduced by Dieudonné [83]. Definition 3.8.3. Let X be a normed space. The bounded weak* topology (or the bw∗ topology) is the strongest topology on X ∗ which coincides with the relative w∗ -topology X∗ on each set tB1 = {x∗ ∈ X ∗ : ‖x∗ ‖∗ ≤ t}. Therefore a set U ⊆ X ∗ is bw∗ -open if and X∗

X∗

only if U ∩ tB1 is relatively w* -open in tB1 for every t > 0 and C ⊆ X ∗ is bw∗ -closed if X∗

X∗

and only if C ∩ tB1 is relatively w∗ -closed in B1 for all t > 0. Remark 3.8.4. It can be shown (see, for example Dunford–Schwartz [94, Lemma V.5.4, p. 427]) that a local basis at the origin for the bw∗ -topology is given by the sets B(S) = {x∗ ∈ X ∗ : |⟨x∗ , x⟩| < 1 for all x ∈ S} , where S = {x k }k≥1 ⊆ X is a sequence converging to zero. We have w∗ ⊆ bw∗ ⊆ norm. These inclusions are strict if X is an infinite dimensional normed space. Directly from Definition 3.8.3 we see that if {x∗α }α∈I ⊆ X ∗ is a bounded net and x∗ ∈ X ∗ , then w∗

bw∗

∗ x∗α → x∗ if and only if x∗α → x∗ . Of course Xbw ∗ is a locally convex space. ∗ ∗ ∗ Proposition 3.8.5. If X is a Banach space, then X = (Xw )∗ . ∗ ) = (X bw∗

Using this proposition one can show the following theorem known as the “Krein– Smulian Theorem.” Theorem 3.8.6 (Krein–Smulian Theorem). If X is a Banach space and C ⊆ X ∗ is a X∗ nonempty convex set, then C is w∗ -closed if and only if C ∩ tB1 is w∗ -closed for every t > 0, that is, C is w∗ -closed if and only if C is bw∗ -closed.

3.8 Remarks | 271

Remark 3.8.7. As in Mazur’s Theorem (see Theorem 3.3.18), in the theorem above we see that an algebraic property, namely the convexity of C, has topological consequences. Corollary 3.8.8. If X is a separable Banach space and C ⊆ X ∗ is a nonempty convex set, then C is w∗ -closed if and only if it is weakly* sequentially closed. We can introduce one more locally convex topology on X ∗ . Recall that the weak* topology is the weakest topology τ on X ∗ such that (X ∗τ )∗ = X. Suppose we ask for the ∗ )∗ = X is satisfied. strongest (finest) topology m on X ∗ for which (Xm ∗ )∗ = X. This Theorem 3.8.9. There exists a strongest topology m on X ∗ such that (Xm m

is the topology of uniform convergence on all w-compact sets, that is, x∗α → x∗ in X ∗ if and only if sup[⟨x∗α − x∗ , u⟩ : u ∈ K] → 0 for all w-compact K ⊆ X. The space ∗ is locally convex and m is called the Mackey topology on X ∗ and is denoted by Xm m(X ∗ , X). We have already seen how important the notion of convexity is. Next we will see that in some convex sets we can isolate special points of them that in fact generate the set. Definition 3.8.10. Let X be a topological vector space and C ⊆ X be a nonempty, closed, convex set. A set E ⊆ C is extremal in C if E is nonempty, closed, convex and if x, u ∈ C and (1 − λ)x + λu ∈ E for some λ ∈ (0, 1), then x, u ∈ E. An extreme point of C is an x ∈ C such that {x} is an extremal subset of C, that is, x is an extreme point of C if it does not lie in the interior of any nontrivial closed line segment of C. By ext C we denote the set of extreme points of C. The following is the basic theorem about extreme points and it is known as the “Krein– Milman Theorem.” Theorem 3.8.11 (Krein–Milman Theorem). If X is a locally convex space and C ⊆ X is nonempty, compact, and convex, then ext C ≠ 0 and C = conv ext C. For more on the structure of convex sets, we refer to Giles [127]. The books of AliprantisBorder [6], Beauzamy [29], Brézis [48], Denkowski-Migórski-Papageorgiou [77], Diestel [79], Fabian et al. [106], Giles [127], Holmes [155], Megginson [212], Rudin [260], and Yosida [311] discuss in detail the weak and weak* topologies. (3.4) Reflexive Banach spaces were introduced by Hahn [138]. He called them regular. The term “reflexive” is due to Lorch [204] and Theorem 3.4.5 is due to James [163]. There are other useful characterizations of reflexivity. We mention three of them. The first is due to Smulian [273]. Theorem 3.8.12. If X is a Banach space, then X is reflexive if and only if for every decreasing sequence {C n }n≥1 of nonempty, bounded, closed, convex subsets of X, it holds that ⋂n≥1 C n ≠ 0. The second is due to James [165].

272 | 3 Basic Functional Analysis

Theorem 3.8.13. If X is a Banach space, then the following statements are equivalent: (a) X is not reflexive. (c) For every λ ∈ (0, 1) there exists a sequence {x n }n≥1 ⊆ X with ‖x n ‖ = 1 for all n ∈ ℕ such that d(conv {x k }nk=1 , conv {x k }k≥n+1 ) ≥ λ for every n ∈ ℕ. (c) For some λ ∈ (0, 1) there exists a sequence {x n }n≥1 ⊆ X with ‖x n ‖ = 1 for all n ∈ ℕ such that d(conv {x k }nk=1 , conv {x k }k≥n+1 ) ≥ λ for every n ∈ ℕ. Remark 3.8.14. The interesting feature of the theorem above for reflexivity is that it is intrinsic. Namely, it does not require any knowledge of X ∗ or X ∗∗ . The third is also due to James [165]. Theorem 3.8.15. If X is a Banach space, then X is reflexive if and only if every x∗ ∈ X ∗ is norm attaining, that is, there exists x0 ∈ X, ‖x0 ‖ ≤ 1 such that ‖x∗ ‖∗ = ⟨x∗ , x0 ⟩. It is easy to check the next proposition. Proposition 3.8.16. Separability and reflexivity are three space properties; see Definition 3.8.1. The direct assertions in Theorem 3.4.12 are due to Banach [25] and concerning Theorem 3.4.14, the Eberlein–Smulian Theorem, Smulian [274] showed that weakly compact sets are weakly sequentially compact. Later Eberlein [95] proved the converse. Whitley [308] provided an elementary proof of the theorem. Theorem 3.4.18 reveals the distinctive character of weakly compact sets. They are sequentially compact and each subset of a weakly compact set has a sequentially determined closure. These properties are a particular instance of a more general class of spaces known as angelic space; see Floret [113]. Strict convexity and uniform convexity (see Definition 3.4.21(a),(b)) were introduced by Clarkson [68]. Local uniform convexity was introduced by Lovaglia [205]. In the paper of Smith [272], we find examples of reflexive Banach spaces that are locally uniformly convex but not uniformly convex, and of reflexive and nonreflexive Banach spaces that are strictly convex but not locally uniformly convex. The Kadec–Klee property is also called the Radon–Riesz property or the H-property; see Day [73]. Proposition 3.8.17. If X is a uniformly convex Banach space and V ⊆ X is a closed subspace, then X/V is uniformly convex as well. (3.5) The notion of abstract Hilbert spaces was introduced by von Neumann [300]. His definition is for a separable space and his aim was to develop the spectral theory for classes of operators on this abstract space. Earlier special realizations of Hilbert spaces were examined by many authors. In particular, Hilbert [147] published between 1904 and 1910 a series of six papers collected in book form developing Hilbert space methods to study integral equations. The name Hilbert space was first used by Riesz [241] for what we know today as l2 . Theorem 3.5.21 was stated by Riesz [242] and Fréchet [118] as separate notes in the same issue of the “Comptes Rendus.” In addition to Bessel’s

3.8 Remarks | 273

inequality (see Proposition 3.5.44), we should also mention the so-called Parseval’s identity. Proposition 3.8.18 (Parseval’s identity). If H is a Hilbert space and {e n }n≥1 ⊆ H is an orthonormal set, then {e n }n≥1 is an orthonormal basis for H if and only if ‖x‖2 = ∑n≥1 (x, e n )2 for all x ∈ H. The Gram–Schmidt Orthonormalization Process was first discovered by the Danish statistician Gram. It was elaborated further by Schmidt [265] who demonstrated its usefulness in the study of Hilbert spaces. (3.6) The operator topologies in Definition 3.6.1 were introduced, in the context of Hilbert spaces, by von Neumann [301]. The notion of adjoint operators (see Definition 3.6.6) was first introduced by Banach [25]. Of course the notion was used earlier in the context of matrix theory. The notion of projection operator (see Definition 3.6.13(a), (c)) is due to Schmidt [265]. The theory of unbounded linear operators was stimulated by attempts in the late 1920s to give quantum mechanics a rigorous mathematical foundation. The first fundamental works on this subject are those of von Neumann [300, 301], [302], and Stone [279]. A more detailed treatment of unbounded linear operators can be found in the books of Goldberg [131], Hille-Phillips [149], Kato [170], Reed-Simon [239], and Weidmann [307]. We state a theorem related to the material of this section. Theorem 3.8.19. If X, Y are Banach spaces and A ∈ L(X, Y), then the following statements are equivalent: (a) R(A) ⊆ Y is closed; (b) inf[‖x + v‖X : A(v) = 0] ≤ c‖Ax‖Y for all x ∈ X and for some c > 0; (c) R(A∗ ) ⊆ X ∗ is closed; (d) inf[‖y∗ + x∗ ‖Y ∗ : A∗ (v∗ ) = 0] ≤ c‖A∗ y∗ ‖X∗ for all y∗ ∈ Y ∗ and for some c > 0. (3.7) The notion of compact operators (see Definition 3.7.1) is essentially due to Hilbert [147]. However, the general definition was given by Riesz [246]. Theorem 3.7.10 is due to Schauder [263]. It is the starting point of the Leray–Schauder degree theory; see Section 6.2. Theorem 3.7.17 is due to Schauder [263]. The terminology “Fredholm Operator” was introduced in recognition of the pioneering work of E. Fredholm on integral equations. The work of Fredholm influenced Hilbert. Fredholm operators exhibit nice composition and stability properties. Proposition 3.8.20. If X, Y, V are Banach spaces and A ∈ L(X, Y), T ∈ L(Y, V) are Fredholm operators, then T ∘ A ∈ L(X, V) is a Fredholm operator and i(T ∘ A) = i(A) + i(T). Proposition 3.8.21. If X, Y are Banach spaces and A ∈ L(X, Y) is a Fredholm operator, then the following hold: (a) A + L is a Fredholm operator for every L ∈ Lc (X, Y) and i(A + L) = i(A); (b) there exists ε > 0 such that if T ∈ L(X, Y) with ‖T‖L < ε, then A + T is a Fredholm operator and i(A + T) = i(A).

274 | 3 Basic Functional Analysis The terminology “spectrum” of A ∈ L(X) comes from Hilbert who published some papers in book form [147] initiating modern spectral theory. The mathematical setting of self-adjoint operators on a Hilbert space was an important mathematical tool for the development by physicists of the theory of quantum mechanics. Definition 3.8.22. Let H be a Hilbert space and A ∈ L(H). We say that A is positive (or monotone) if (A(x), x) ≥ 0 for all x ∈ H. Then we write A ≥ 0. Moreover, if A, T ∈ L(H), then we write A ≥ T if and only if A − T ≥ 0. Remark 3.8.23. Every positive A ∈ L(H) with H being a complex Hilbert space is automatically self-adjoint. This is false for real Hilbert spaces. Moreover, A∗ ∘ A ≥ 0 for any A ∈ L(H). Proposition 3.8.24. If H is a Hilbert space, A ∈ L(H) and A ≥ 0, then there exists a unique T ∈ L(H), T ≥ 0 such that T 2 = A. Moreover, T commutes with every bounded linear operator, which commutes with A. We denote T by A1/2 , the square root of A. Definition 3.8.25. Let H be a Hilbert space and A ∈ L(H). Then |A| = (A∗ ∘ A)1/2 ; see Proposition 3.8.24. Finally let us state a result on the usage of unitary operators (see Definition 3.7.48), to identify compact self-adjoint operators. Proposition 3.8.26. If H is a separable Hilbert space and A, T ∈ Lc (H) is self-adjoint, then there exists a unitary operator U ∈ L(H) such that U ∗ ∘ T ∘ U = A if and only if dim N(λU − A) = dim N(λI − T) for all λ ∈ ℂ. We say that the operators A and T are unitarily equivalent.

Problems Problem 3.1. Let X be a vector space and let ρ : X → ℝ+ be a function such that (a) ρ(x) = 0 if and only if x = 0; (b) ρ(λx) = |λ|ρ(x) for all x ∈ X and for all λ ∈ 𝔽. Show that ρ is a norm if and only if B1 = {x ∈ X : ρ(x) ≤ 1} is convex. Problem 3.2. Let X be a vector space and let ‖ ⋅ ‖, | ⋅ | be two equivalent norms on X, that is, they generate the same topology. Show that (X, ‖ ⋅ ‖), (X, | ⋅ |) are either both Banach spaces or both are noncomplete. Problem 3.3. Let X be a topological vector space and let {C k }nk=1 be a finite family of compact, convex subsets of X. Show that conv (⋃nk=1 C k ) is compact. Problem 3.4. Let X be a normed space, Y ⊆ X be a closed subspace, and let V ⊆ X be a finite dimensional subspace. Show that Y + V = {y + v : y ∈ Y, v ∈ V} ⊆ X is closed.

3.8 Remarks | 275

Problem 3.5. Let X be a normed space and V ⊆ X is a finite dimensional subspace. Show that there exists x ∈ X with ‖x‖ = 1 such that 1 = d(x, V). Problem 3.6. Let X be a normed space that is a Polish space for the norm topology. Show that X is a Banach space. Problem 3.7. Show that a normed space X is complete, that is, X is a Banach space, if and only if every absolutely convergent series in X is convergent. Problem 3.8. Let K be a compact topological space and let D ⊆ K be a closed set. Show that C(D) is isomorphic to a quotient of C(K). Problem 3.9. Let K, D be compact topological spaces and let A : C(K) → C(D) be a linear operator such that f ≥ 0 implies A(f) ≥ 0, that is, A is positive. Show that A is continuous and ‖A‖L = ‖A(1)‖C(D) with 1 ∈ C(K) is the constant function equal to 1. Problem 3.10. Let X = C[0, 1], u ∈ X, and f : X → ℝ be a linear function defined by 1 1 f(y) = ∫0 y(t)u(t)dt for all y ∈ X. Show that f ∈ X ∗ and ‖f‖∗ = ∫0 |u(t)|dt. Problem 3.11. Let X be a normed space and C ⊆ X be a nonempty set. Show that conv C = {x ∈ X : ⟨x∗ , x⟩ ≤ σ C (x∗ ) = sup{⟨x∗ , c⟩ : c ∈ C}, whereby σ C : X ∗ → ℝ = ℝ ∪ {+∞} is called the support function of C. Problem 3.12. Show that every normed space is isometrically isomorphic to a subspace of C(K) for some compact topological space K. Problem 3.13. Let X, Y be Banach spaces and let A ∈ L(X, Y) be surjective. Show that there exists M > 0 such that for every y ∈ Y there is x ∈ A−1 (y) satisfying ‖x‖X ≤ M‖y‖Y . Problem 3.14. Let X, Y be Banach spaces and let A ∈ L(X, Y) be surjective. Show that Y is isomorphic to X/N(A). Problem 3.15. Let X be a Banach space and let C ⊆ X be a weakly compact set. Show that C is bounded. Problem 3.16. Let X be a normed space and {x∗n }n≥1 ⊆ X ∗ . Suppose that there exists a sequence {ε n }n≥1 ⊆ (0, +∞) with ε n → 0 such that for every x ∈ X there exists η x > 0 with |⟨x∗n , x⟩| ≤ η x ε n for all n ∈ ℕ. Show that x∗n → 0. Problem 3.17. Show that separability and reflexivity are three space properties; see Definition 3.8.1. Problem 3.18. Show that a normed space X is reflexive if and only if each separable, closed subspace V ⊆ X is reflexive. Problem 3.19. Show that if Y is an infinite dimensional subspace of l1 , then Y is not reflexive.

276 | 3 Basic Functional Analysis Problem 3.20. Let X be a separable Banach space. Show that there exists x∗n ∈ X ∗ with ‖x∗n ‖∗ = 1 for all n ∈ ℕ such that {x∗n }n≥1 is separating on X. Problem 3.21. Let X, Y be Banach spaces with X being reflexive and let A ∈ L(X, Y) be surjective. Show that Y is reflexive as well. Problem 3.22. Let X be a Banach space with a separable dual X ∗ . Show that B(X ∗ ) = ∗ B(Xw ∗ ). Recall that if Z is a Hausdorff topological space, then B(Z) denotes the Borel σ-algebra of Z. Problem 3.23. Let X be a normed space and let C ⊆ X ∗ be a nonempty, w∗ -closed set. Show that for any given x∗ ∈ X ∗ there exists u∗0 ∈ C such that ‖x∗ − u∗0 ‖∗ = d(x∗ , C). A set that has this best approximation property for every element in the space is called proximinal. Problem 3.24. Show that a Banach space X is reflexive if and only if every closed convex set is proximinal; see Problem 3.23. Problem 3.25. Let X, Y be two nontrivial normed spaces and assume that L(X, Y) equipped with the operator norm is a Banach space. Show that Y is a Banach space. Problem 3.26. Let X be a reflexive Banach space and let Y be another Banach space that is isomorphic to X. Show that Y is reflexive as well. Problem 3.27. Let X, Y be Banach spaces with X being nonreflexive and Y being reflexive. Suppose that A ∈ L(X, Y) is injective. Show that R(A) ⊆ Y cannot be closed. Problem 3.28. Let X be a Banach space and let C ⊆ X ∗ be a w∗ -compact set. Show that ∗ conv w C is w∗ -compact. Problem 3.29. Let X be a separable Banach space. Show that X ∗ is w∗ -separable. Problem 3.30. Let H and V be real Hilbert spaces and let k : H × V → ℝ be a bilinear form that is bounded, that is, there exists c > 0 such that |k(u, v)| ≤ c‖u‖H ‖v‖V for all u ∈ H and for all v ∈ V. Show that there exists a unique A ∈ L(H, V) such that k(u, v) = (A(u), v)V for all u ∈ H and for all v ∈ V. Problem 3.31. Let H be a Hilbert space and let {e n }n≥1 ⊆ H be an orthonormal set. Suppose that u = ∑n≥1 a n e n . Show that a n = (u, e n ) for all n ∈ ℕ. Problem 3.32. Let H, V be infinite dimensional separable Hilbert spaces, let {e n }n≥1 ⊆ H be an orthonormal basis for H, and let {ξ n }n≥1 ⊆ V be an orthonormal basis for V. Suppose that A ∈ L(H, V) and A = (e n ) = ∑m≥1 λ nm ξ m for all n ∈ ℕ. Show that ∑m≥1 |λ nm |2 ≤ ‖A‖2L for all n ∈ ℕ and ∑n≥1 |λ nm |2 ≤ ‖A‖L for all m ∈ ℕ. Problem 3.33. Let H be a Hilbert space and let A ∈ L(H) be a self-adjoint positive operator. Show that the following statements are equivalent:

3.8 Remarks | 277

(a) R(A) ⊆ H is dense. (b) N(A) = {0}. (c) (A(x), x) > 0 for all x ≠ 0. Problem 3.34. Let H be a Hilbert space and let A, T : H → H be two linear operators such that (A(x), u) = (x, T(u)) for all x, u ∈ H. Show that A ∈ L(H) and T = A∗ . Problem 3.35. Let H be a Hilbert space and let {A n }n≥1 ⊆ L(H) be such that supn≥1 |(A n (x), u)| < ∞ for all x, u ∈ H. Show that supn≥1 ‖A n ‖L < ∞. Problem 3.36. Let H be a Hilbert space and let {A n }n≥1 ⊆ L(H) be such that limn→∞ |(A n (x), u)| = 0 for all x, u ∈ H. Can we say that ‖A n ‖L → 0? Justify your answer. Problem 3.37. Let K, D be compact spaces, let g ∈ C(K, D), and let A : C(K) → C(D) be the operator defined by A(f)(t) = f(g(s)) for all s ∈ K and for all t ∈ D. Show that (a) A ∈ L(C(K), C(D)) and find ‖A‖L . (b) R(A) = C(D) if and only if g is injective. (c) A is an isometry if and only if g is surjective. Problem 3.38. Let X be a Banach space, let V be a normed space, and let A ∈ L(X, V). Show that: A−1 ∈ L(V, X) if and only if R(A) ⊆ V is dense and ‖A(x)‖V ≥ c‖x‖X for all x ∈ X and for some c > 0. Problem 3.39. Let H be a Hilbert space and let A ∈ L(H) be normal. Show that 1/n limn→∞ ‖A n ‖L = ‖A‖L . Problem 3.40. Let H be a Hilbert space and let P ∈ L(H) be a projection, that is, P2 = P. Show that the following properties are equivalent: (a) P is an orthogonal projection. (b) P is normal. (c) (P(x), x) = ‖P(x)‖2 for all x ∈ H. Problem 3.41. Let X, Y be Banach spaces with X being reflexive, A ∈ Lc (X, Y), ‖ ⋅ ‖X being the norm of X, and | ⋅ |X being another norm on X, which generates a weaker topology on X. Show that for every ε > 0 there exists c ε > 0 such that ‖A(x)‖Y ≤ ε‖x‖X + c ε |x|X

for all x ∈ X .

Problem 3.42. Let X be a normed space and let P ∈ Lc (X) be a projection, that is, P2 = P. Show that P ∈ L f (X). Problem 3.43. Let H be a Hilbert space and let A ∈ L(H) be self-adjoint. Assume that A ≥ ϑi H for some ϑ > 0; see Definition 3.8.22. Show that A is invertible. Problem 3.44. Let H be a Hilbert space and let A ∈ L(H) be self-adjoint. Show that the residual spectrum Rσ(A) of A (see Definition 3.7.29) is empty.

278 | 3 Basic Functional Analysis Problem 3.45. Let X be a Banach space and let A ∈ L(X) and λ ∈ ℂ. Suppose that there exists a sequence {x n }n≥1 ⊆ X with ‖x n ‖ = 1 for all n ∈ ℕ such that A(x n ) − λx n → 0 in X. Show that λ ∈ σ(A). Problem 3.46. Let H be a Hilbert space and let P ∈ L(H) be an orthogonal projection. Show that 0 ≤ P ≤ i H ; see Definition 3.8.22. Problem 3.47. Let H be a Hilbert space and let A ∈ L(H) be such that (A(x), x) ≥ c‖x‖2 for all x ∈ H and for some c > 0. Show that A is an isomorphism. Problem 3.48. Let X be an infinite dimensional Banach space and let A ∈ Lc (X). Show that there exists h ∈ X such that there is no x ∈ X for which we have A(x) = h. Problem 3.49. Let X be a Banach space and let A : D(A) ⊆ X → X be an unbounded linear operator. Suppose there exists λ ∈ ℂ such that (A − λI)−1 ∈ L(X). Show that A is closed. Problem 3.50. Let X, Y be Banach spaces and let A : D(A) ⊆ X → Y be an unbounded linear operator such that ‖A(x)‖Y ≥ c‖x‖X for all x ∈ D(A) and for some c > 0. Show that A is closed. Problem 3.51. Let H be a Hilbert space, let {u n }n≥1 ⊆ H be an orthonormal set and let A ∈ Lc (H). Show that A(u n ) → 0 in H. Problem 3.52. Let X, Y be Banach spaces, let A : X → Y be a linear operator, and suppose that for every y∗ ∈ Y ∗ one has y∗ ∘ A ∈ X ∗ . Show that A ∈ L(X, Y). Problem 3.53. Let X be a Banach space and let P ∈ L(X) be a projection, that is, P2 = P. Show that P∗ is a projection in X ∗ . Problem 3.54. Let H be a Hilbert space and let A ∈ Lc (H). Show that there exists x ∈ H with ‖x‖ ≤ 1 such that ‖A(x)‖ = ‖A‖L . Problem 3.55. Let X, Y be Banach spaces with Y ≠ 0. Show that X is reflexive if and only if for every A ∈ Lc (X, Y) there exists x ∈ X with ‖x‖X ≤ 1 such that ‖A(x)‖Y = ‖A‖L . Problem 3.56. Let X, Y be Banach spaces and let A ∈ Lc (X, Y). Show that R(A) ⊆ Y is separable. Problem 3.57. Let X, Y be Banach spaces and let A ∈ L(X, Y), which satisfies ‖A(x)‖Y ≥ c‖x‖X for all x ∈ X and for some c > 0. Is it possible for A to be compact? Justify your answer. Problem 3.58. Let X be an infinite dimensional Banach space and let A ∈ Lc (X). Show that 0 ∈ A(∂B1 ). Problem 3.59. Let X be a Banach space that is w-separable. Show that X is separable.

3.8 Remarks | 279

Problem 3.60. Let X be an infinite dimensional Banach space and let K ⊆ X be a nonempty, compact set. Show that int K = 0. Problem 3.61. Let X be a Banach space and assume that there exists an uncountable family {U i }i∈I such that (a) for each i ∈ I, U i ⊆ X is nonempty and open; (b) U i ∩ U j = 0 if i ≠ j. Show that X is nonseparable.

4 Banach Spaces of Functions and Measures Now that we have a reasonable background on measure theory and functional analysis, we can look at concrete spaces of functions and measures that are common in many different fields of analysis. We study them using the abstract tools developed in Chapter 2 (measure theory) and in Chapter 3 (functional analysis). We start with the L p -spaces, which we already encountered in Section 2.3. Now our emphasis is on the duality theory for such spaces. Then we consider Banach space-valued functions. Vector valued integration theories were first developed in the 1930s in an attempt to better understand differentiation theorems for Banach space-valued functions. Functions of bounded variation are associated with the early days of real analysis. Recently in the context of geometric measure theory they have found new applications. It is a small natural step to pass from functions of bounded variation to absolutely continuous and Lipschitz functions. Associated with Lipschitz functions are some interesting and useful extension theorems. Subsequently we pass to Sobolev spaces. The theory of Sobolev spaces is one of the most useful tools in modern mathematics with many remarkable applications. The Banach spaces of measures and their modes of convergence are useful in probability theory and stochastic analysis. Finally capacities and Young measures provide useful applications of the previous material.

4.1 L p -Spaces Let (X, Σ, μ) be a measure space and suppose that p, p ∈ [1, +∞] are conjugate exponents, that is, 1/p + 1/p = 1; see Definition 2.3.10. Given h ∈ L p (X), Hölder’s inequality (see Theorem 2.3.12) implies that the linear functional f → ξ h (f) = ∫X fhdμ is bounded, and hence continuous, and so ξ h ∈ L p (X)∗ . Moreover, ‖ξ h ‖∗ ≤ ‖h‖p . Next we more closely examine this functional. It will lead us to a very convenient description of the dual of L p (X). Proposition 4.1.1. If p and p are conjugate exponents and if p = ∞, μ is semifinite, then ‖h‖p = ‖ξ h ‖∗ = sup [∫ fhdμ : f ∈ L p (X), ‖f‖p = 1] . [X ] Proof. As we already mentioned for Hölder’s inequality, it holds that ‖ξ h ‖∗ ≤ ‖h‖p . If h(z) = 0 μ-a.e., so clearly we obtain equality. So, suppose now that h ≠ 0 and 1 < p < ∞. Let |h(x)|p −1 sgn h(x) f ̂(x) = , p −1 ‖h‖p https://doi.org/10.1515/9783110532982-004

282 | 4 Banach Spaces of Functions and Measures where

−1 if h(x) < 0 , { { { sgn h(x) = {0 if h(x) = 0 , { { if h(x) > 0 . {1

Since p − 1 = p /p, it follows that ̂p f = p

1

1

∫ |h(x)|(p −1)p dμ =

(p −1)p ‖h‖p X

p ‖h‖p

p

‖h‖p = 1 .

Then by Corollary 3.1.48, this implies ‖ξ h ‖∗ ≥ ∫ f ̂ hdμ = X

1

∫ |h|p dμ = ‖h‖p .

p −1 ‖h‖p X

This gives ‖ξ h ‖∗ = ‖h‖p . If p = 1, then f = sgn h and ‖f‖∞ = 1. Then ∫X fhdμ = ∫X |h|dμ = ‖h‖1 and so we have again ‖ξ h ‖∗ = ‖h‖1 . Finally, if p = +∞, then we need to assume that μ is semifinite; see Definition 2.1.30(a). Clearly every σ-finite measure is semifinite. Let ε > 0 and consider the set A ε = {x ∈ X : |h(x)| ≥ ‖h‖∞ − ε}. Then μ(A ε ) > 0 and since μ is semifinite, there exists B ∈ Σ with B ⊆ A ε such that 0 < μ(B) < ∞. Let f = 1/(μ(B))χ B sgn h. Then ‖f‖1 = 1 and so 1 ‖ξ h ‖∗ ≥ ∫ fhdμ = ∫ |h|dμ ≥ ‖h‖∞ − ε . μ(B) X

Since ε > 0 is arbitrary, we let ε → ‖h‖∞ .

B

0+

and obtain ‖ξ h ‖∗ ≥ ‖h‖∞ . Therefore, ‖ξ h ‖∗ =

Next we will show the converse. Namely, if f → ∫X fhdμ is a bounded linear functional, then h ∈ L p (X) in all cases of interest. Proposition 4.1.2. If p and p are conjugate exponents, h : X → ℝ is a Σ-measurable function such that fh ∈ L1 (X) for all f in the space L of Σ-simple functions, which vanish outside a set of finite measure and N p (h) = sup [∫ fhdμ : f ∈ L, ‖f‖p = 1] < ∞ [X ]

(4.1.1)

and one of the following holds: (a) D h = {x ∈ X : h(x) ≠ 0} is σ-finite; (b) μ is semifinite, then h ∈ L p (X) and ‖h‖p = N p (h). Proof. If f is bounded, Σ-measurable and vanishes outside a set A ∈ Σ of finite μ-measure, then according to Corollary 2.2.19, there exists a sequence {s n }n≥1 of Σ-simple

4.1 L p -Spaces

| 283

functions such that |s n | ≤ |f|

and

sn → f

uniformly on X .

With the Dominated Convergence Theorem (see Theorem 2.3.8), one has lim ∫ s n hdμ . ∫ fhdμ = n→∞ X X So, if ‖f‖p = 1, then ∫X fhdμ ≤ N p (h). First assume that p < ∞. If μ is semifinite, then, because of (4.1.1), it is easy to see that for every ε > 0 the set {x ∈ X : |h(x)| > ε} has finite μ-measure and so D h is σ-finite. Therefore it is enough to consider the case where (a) holds. Let {A n }n≥1 ⊆ Σ be an increasing sequence of sets of finite μ-measure such that D h = ⋃n≥1 A n . As before, let {s n }n≥1 be a sequence of Σ-simple functions such that s n (x) → h(x) μ-a.e. and |s n | ≤ |h|. Let ŝ n = s n χ E n . Then ŝ n (x) → h(x) μ-a.e. and |ŝ n | ≤ |h| for all n ∈ ℕ. Let |ŝ n |p −1 sgn h

fn =

p −1

‖ ŝ n ‖ p

.

From the proof of Proposition 4.1.1 one gets ‖f n ‖p = 1 and from Fatou’s Lemma (see Theorem 2.3.6), it follows that ‖h‖p ≤ lim inf ŝ n p = lim inf ∫ |f n ŝ n |dμ ≤ lim inf ∫ |f n h|dμ n→∞

n→∞

n→∞

X

= lim inf ∫ f n hdμ ≤ N p (h) ;

X

(4.1.2)

n→∞

X

see the first part of the proof. On the other hand, from Hölder’s inequality (see Theorem 2.3.12), we infer that N p (h) ≤ ‖h‖p .

(4.1.3)

From (4.1.2) and (4.1.3) we conclude that N p (h) = ‖h‖p . Next assume that p = +∞. For ε > 0, let A ε = {x ∈ X : |h(x)| ≥ N∞ (h) + ε}. If μ(A ε ) > 0, then there exists E ∈ Σ with E ⊆ A ε such that μ(E) ∈ (0, +∞). We set f = We get ‖f‖1 = 1 and ∫ fhdμ ≥ X

1 χ E sgn h . μ(E)

1 ∫ |h|dμ ≥ N∞ (h) + ε . μ(E) E

But since f is bounded, by the first part of the proof, this cannot happen. Therefore, ‖h‖∞ ≤ N∞ (h). The opposite inequality is clear from (4.1.1). Hence, ‖h‖∞ = N∞ (h).

284 | 4 Banach Spaces of Functions and Measures Now we are ready to describe the dual of L p (X) with 1 < p < ∞. The result is known as the “Riesz Representation Theorem.” Theorem 4.1.3 (Riesz Representation Theorem). If (X, Σ, μ) is a measure space and 1 < p < ∞, then L p (X)∗ is isometrically isomorphic to L p (X) with 1/p + 1/p = 1. Proof. First assume that μ is finite. Then all Σ-simple functions belong to L p (X). Suppose ξ ∈ L p (X)∗ and A ∈ Σ. We set ϑ(A) = ξ(χ A ). If {A n }n≥1 ⊆ Σ are pairwise disjoint and A = ⋃n≥1 A n , then χ A = ∑n≥1 χ A n . The series converges in L p (X) since 1 p m χ A − ∑ χ A n = ∑ χ A n = μ ( ⋃ A n ) → 0 p n≥m+1 p n≥m+1 n=1

as n → ∞ ,

where we recall that p < ∞. Therefore, from the linearity and continuity of ξ , we obtain ϑ(A) = ∑ ξ (χ A n ) = ∑ ϑ(A n ) . n≥1

n≥1

This shows that ϑ is a signed measure. If μ(A) = 0, then χ A = 0 and so ϑ(A) = 0. Hence ϑ ≪ μ; see Remark 2.4.22. Invoking the Radon–Nikodym Theorem (see Theorem 2.4.29), there exists h ∈ L1 (X) such that ξ(χ A ) = ϑ(A) = ∫A hdμ for all A ∈ Σ. This implies ξ(s) = ∫ shdμ

for every simple function s .

(4.1.4)

X

From (4.1.4) it follows ∫ shdμ = |ξ(s)| ≤ ‖ξ‖∗ ‖s‖p < ∞ . X

Invoking Proposition 4.1.2 we infer that h ∈ L p (X). Then from (4.1.4) and the density of simple functions in L p (X) (see Proposition 2.3.22), we conclude that ξ(f) = ∫X fhdμ for all f ∈ L p (X). Next suppose that μ is σ-finite. Let {A n }n≥1 ⊆ Σ be an increasing sequence such that 0 < μ(A n ) < ∞ for all n ∈ ℕ and X = ⋃n≥1 A n . From the first part of the proof we get that

L p (A n )∗ = L p (A n )

for all n ∈ ℕ

(4.1.5)

and that L P (A n ) (resp. L p (A n )) is a subspace of L p (X) (resp. L p (X)), namely those functions which vanish outside of A n . If ξ ∈ L p (X)∗ , then from (4.1.5) we obtain the existence of h n ∈ L p (A n ) such that ξ(f) = ∫ fh n dμ An

for all f ∈ L p (A n ) with n ∈ ℕ .

4.1 L p -Spaces

| 285

The function h n is unique up to a μ-null set and so h n (x) = h m (x) μ-a.e. on A n for n < m. Therefore we can define h : X → ℝ by setting h = h n on A n for all n ∈ ℕ. According to the Monotone Convergence Theorem (see Theorem 2.3.3), one has ‖h‖p = lim ‖h n ‖p ≤ ‖ξ‖∗ < ∞ . n→∞

L p (X).

L p (X),

Hence, h ∈ For f ∈ according to the Dominated Convergence Theorem (see Theorem 2.3.8), we obtain fχ A n → f in L p (X), which implies ξ(f) = lim ξ(fχ A n ) = lim ∫ fhdμ = ∫ fhdμ . n→∞

n→∞

An

X

Finally we consider the general case of an arbitrary measure space. If A ∈ Σ is σ-finite, then from the second part of the proof there exists a unique up to μ-null set h A ∈ L p (A) such that ξ(f) = ∫X fh A dμ for all f ∈ L p (A) and ‖h A ‖p ≤ ‖ξ‖∗ . If E ∈ Σ with A ⊆ E is σ-finite, then h E = h A μ-a.e. and so ‖h A ‖p ≤ ‖h E ‖p . Let η = sup [‖h A ‖p : A ∈ Σ is σ-finite] ≤ ‖ξ‖∗ . We choose a sequence {A n }n≥1 ⊆ Σ with each A n σ-finite such that ‖h A n ‖p → η. Let E = ⋃n≥1 A n . Then E ∈ Σ is σ-finite and ‖h E ‖p ≥ ‖h A n ‖p for all n ∈ ℕ. Therefore ‖h E ‖p = η. If D ∈ Σ is a σ-finite set with E ⊆ D, then

∫ |h E |p dμ + ∫ |h D\E |p dμ = ∫ |h D |p dμ ≤ η p = ∫ |h E |p dμ . X

X

X

X

|p dμ

= 0, that is, h D\E = 0 μ-a.e. since < ∞. Hence h D = h E μ-a.e. This gives ∫X |h D\E But if f ∈ L p (X), then D = E ∪ {x ∈ X : f(x) ≠ 0} ∈ Σ is σ-finite; see Problem 2.26. Hence p

ξ(f) = ∫ fh D dμ = ∫ fh E dμ . X

X

Therefore we can finally take h = h E . Corollary 4.1.4. If (X, Σ, μ) is a measure space and 1 < p < ∞, then L p (X) is a reflexive Banach space. Next we consider the dual of L1 (X). To this end we need to restrict ourselves to σ-finite measure spaces. The result is again known as the “Riesz Representation Theorem for L1 .” Theorem 4.1.5 (Riesz Representation Theorem for L1 ). If (X, Σ, μ) is a σ-finite measure space, then L1 (X)∗ is isometrically isomorphic to L∞ (X). Proof. First suppose that μ is finite. Let ξ ∈ L1 (X)∗ . Reasoning as in the first part of the proof of Theorem 4.1.3 there exists a unique h ∈ L1 (X) such that ξ(f) = ∫ fhdμ

for all f ∈ L1 (X) .

X

Invoking Proposition 4.1.2, we infer that h ∈ L∞ (X).

286 | 4 Banach Spaces of Functions and Measures Now assume that μ is σ-finite. Then there exists an increasing sequence {A n }n≥1 ⊆ Σ such that X = ⋃n≥1 A n and 0 < μ(A n ) < ∞ for all n ∈ ℕ. From the first part of the proof there exist a unique h n ∈ L∞ (A n ) for each n ∈ ℕ such that ξ(f) = ∫X fh n dμ for all f ∈ L1 (A n ). Evidently, h n = h m μ-a.e. on A n for n < m. So if h : X → ℝ is defined by h = h n on A n for all n ∈ ℕ, then ‖h‖∞ ≤ ‖ξ‖∗ . Hence, h ∈ L∞ (X) and ξ(f) = ∫X fhdμ for all f ∈ L1 (X) as well as ‖ξ‖∗ = ‖h‖∞ . Propositions 4.1.1 and 4.1.2 imply that L1 (X)∗ is isometrically isomorphic to L∞ (X). Proposition 4.1.6. If (X, Σ, μ) is a measure space and 1 < p < ∞, then L p (X) is uniformly convex. Proof. Let f, h ∈ L p (X) with ‖f‖p = ‖h‖p ≤ 1, ε > 0, and set A ε = {x ∈ X : |(f − h)(x)| ≤ ε|(f + h)(x)|}. We obtain p p 1 1 ∫ (f − h) dμ ≤ ε p ∫ (f + h) dμ ≤ ε p . 2 2

Aε

(4.1.6)

X

The function t → |t|p is strictly convex. Hence s → 1/2 [|s + 1|p + |s − 1|p ] − |s|p is continuous and positive on ℝ. So, there exists r = r(ε, p) > 0 such that 1 [|s + 1|p + |s − 1|p ] − |s|p ≥ r 2

1 1 for all s ∈ [− , ] . ε ε

(4.1.7)

Let s = (f(x) + h(x))/(f(x) − h(x)) for x ∈ X \ A ε . Then from (4.1.7) it follows that 1 [|f(x)|p + |h(x)|p ] ≥ 2

1 p 1 p r (f − h)(x) + (f + h)(x) 2 2

(4.1.8)

for all x ∈ X \ A ε . Moreover, recall that 1 p 1 [|f(x)|p + |h(x)|p ] ≥ (f + h)(x) 2 2

for all x ∈ X .

(4.1.9)

We integrate (4.1.8) over X \ A ε , (4.1.9) over A ε and then add both equations to obtain p p 1 1 1 ≥ ∫ (f + h)(x) dμ + ∫ r (f − h) dμ . 2 2 X

(4.1.10)

X\A ε

If we choose δ = rε p , then from (4.1.10) we see that p p 1 1 ∫ (f − h) dμ ≤ ε p if (f + h) ≥ 1 − δ . p 2 2

(4.1.11)

X\A ε

From (4.1.6) and (4.1.11) it follows that p 1 (f + h) ≥ 1 − δ implies 2 p

1 p (f − h) ≤ 2ε p . p 2

(4.1.12)

From (4.1.12) and Definition 3.4.21(b) we conclude that L p (X) with 1 < p < ∞ is uniformly convex.

4.1 L p -Spaces

| 287

Remark 4.1.7. From the Milman–Pettis Theorem (see Theorem 3.4.28), it follows that L p (X) with 1 < p < ∞ is reflexive. So, we have reached Corollary 4.1.4 following a different route. Let (X, Σ, μ) be a measure space and 1 ≤ p < ∞ as well as 1 < p ≤ ∞ be conjugate exponents, that is, 1/p + 1/p = 1. If p = ∞, then we set p = 1. When p = 1 or p = ∞ we assume in addition that μ is finite. On account of the Riesz Representation Theorems (see Theorems 4.1.3 and 4.1.5), we have the following modes of convergence. Definition 4.1.8. Let {f n , f}n≥1 ⊆ L p (X). w (a) If 1 ≤ p < ∞, then we say that the f n ’s converge weakly to f denoted by f n → f if ∫ f n hdμ → ∫ fhdμ X

X

is satisfied for all h ∈ L p (X).

w∗

(b) If p = +∞, then we say that the f n ’s converge weakly* to f denoted by f n → f if ∫ f n hdμ → ∫ fhdμ X

X

is satisfied for all h ∈ L1 (X). Applying Vitali’s Convergence Theorem (see Theorem 2.3.44), we have the following result. Proposition 4.1.9. If {f n }n≥1 ⊆ L p (X) with 1 < p < ∞ is bounded and f n (x) → f(x) w

μ

μ-a.e. or f n → f as n → +∞, then f ∈ L p (X) and f n → f in L p (X). From Propositions 3.3.13 and 3.3.31 we conclude the following. w

Proposition 4.1.10. If {f n }n≥1 ⊆ L p (X) with 1 ≤ p ≤ ∞ and f n → f in L p (X) for 1 ≤ p < w∗

∞ and f n → f in L∞ (X), then ‖f‖p ≤ lim inf ‖f n ‖p ≤ sup ‖f n ‖p < ∞ . n→∞

n≥1

Recalling that L p (X) is uniformly convex for 1 < p < ∞ and since uniformly convex Banach spaces exhibit the Kadec–Klee Property (see Proposition 3.4.32), we can state the following result. w

Proposition 4.1.11. If {f n }n≥1 ⊆ L p (X), 1 < p < ∞, f n → f in L p (X) and ‖f n ‖p → ‖f‖p , then ‖f n − f‖p → 0. The reflexivity of L p (X) for 1 < p < ∞ implies that bounded sets in L p (X) are relatively weakly compact; see Theorem 3.4.5. Then the Eberlein–Smulian Theorem (see Theorem 3.4.14), gives the following result.

288 | 4 Banach Spaces of Functions and Measures Proposition 4.1.12. If {f n }n≥1 ⊆ L p (X) with 1 < p < ∞ and ‖f n ‖p ≤ M for some M > 0 w

and for all n ∈ ℕ, then there exists a subsequence {f n k }k≥1 of {f n }n≥1 such that f n k → f in L p (X) with f ∈ L p (X). For the case p = ∞ instead of the Eberlein–Smulian Theorem, which is not valid for the w∗ -topology (see Remark 3.4.15), we use Theorem 3.4.12(a), which imposes additional restrictions on the measure space. So, we assume that the metric space (Σ(μ), d μ ) (see Definition 2.3.23) is separable. Then, according to Proposition 2.3.24, this is equivalent to saying that L1 (X) is separable. Therefore, applying Theorem 3.4.12(a), we obtain the following. Proposition 4.1.13. If {f n }n≥1 ⊆ L∞ (X), ‖f n ‖∞ ≤ M for some M > 0 and for all n ∈ ℕ and suppose that L1 (X) is separable, then there exists a subsequence {f n k }k≥1 of {f n }n≥1 w∗

such that f n k → f in L∞ (X) with f ∈ L∞ (X). Remark 4.1.14. If X ⊆ ℝN is Borel and μ is a Radon measure on B(X), then L1 (X) is separable. Recalling that L p (X)-simple functions are dense in L p (X) for any p ∈ [1, ∞] (see Proposition 2.3.22), we directly get the following proposition. w

Proposition 4.1.15. If {f n , f}n≥1 ⊆ L p (X) with 1 ≤ p ≤ ∞, then f n → f in L p (X) for w∗

1 ≤ p < ∞ and f n → f in L∞ (X) for p = +∞ if and only if (a) supn≥1 ‖f n ‖p < ∞; (b) ∫A f n dμ → ∫A fdμ for all A ∈ Σ with μ(A) < ∞. Remark 4.1.16. The space L1 (X) is not reflexive. To see this, assume that μ is nonatomic; see Definition 2.1.30(b). Then there exists a decreasing sequence {A n }n≥1 ⊆ Σ such that 0 < μ(A n ) for all n ∈ ℕ and μ(A n ) → 0+ . We set f n = χ A n ‖χ A n ‖−1 1 for n ∈ ℕ. Then ‖f n ‖1 = 1 for all n ∈ ℕ. If L1 (X) is reflexive, then by passing to a subsequence if w necessary we may assume that f n → f in L1 (X) with f ∈ L1 (X). Then it follows that ∫ f n hdμ → ∫ fhdμ X

for all h ∈ L∞ (X) ;

(4.1.13)

X

see Definition 4.1.8. Fix k ∈ ℕ and let h = χ A k ∈ L∞ (X). One has ∫ f n χ A k dμ = 1

for all n ≥ k ,

X

which in view of (4.1.13) results in ∫ fχ A k dμ = 1 X

for all k ∈ ℕ .

(4.1.14)

4.1 L p -Spaces

| 289

On the other hand, from the Dominated Convergence Theorem, we conclude that ∫ fχ A k dμ → 0

as k → ∞ .

(4.1.15)

X

Comparing (4.1.14) and (4.1.15) we have a contradiction, which proves that the space L1 (X) cannot be reflexive. For 1 < p < ∞, the space L p (X) is reflexive while we just saw that L1 (X) is not. It follows that the relatively weakly compact sets are the bounded ones. The situation is different for L1 (X) and the characterization of weakly compact sets is more involved. Example 4.1.17. Let X = (−1, 1) be equipped with the Lebesgue measure and consider the sequence 1 1 {n if x ∈ [− 2n , 2n ],n ∈ ℕ f n (x) = { 0 otherwise . { Then we easily see the following facts fn ≥ 0 ,

∫ f n dx = 1 for all n ∈ ℕ , X

f n (x) → 0 a.e. ,

∫ f n hdx → h(0) for all h ∈ C(−1, 1) . X

The sequence {f n }n≥1 ⊆ L1 (−1, 1) is bounded, but evidently we cannot find a subsequence that converges weakly. In the case of the space L1 (X), weakly compact sets can be characterized using the notion of uniform integrability; see Definition 2.3.40. The result is known as the “Dunford–Pettis Theorem.” Theorem 4.1.18 (Dunford–Pettis Theorem). If (X, Σ, μ) is a measure space and F ⊆ L1 (X) is bounded, then F is relatively weakly compact if and only if it is uniformly integrable. Proof. ⇒: We may assume that F = {f n }n≥1 . Since by hypothesis F ⊆ L1 (X) is relatively weakly compact, according to the Eberlein–Smulian Theorem (see Theorem 3.4.14), we w may assume that f n → f in L1 (X) as n → ∞. Then according to Proposition 4.1.15, we obtain ∫ f n dμ → ∫ fdμ for all A ∈ Σ . A

A

If ν n (A) = ∫A f n dμ with n ∈ ℕ and ν(A) = ∫A fdμ, then these are signed measures such that ν n ≪ μ for all n ∈ ℕ and ν n (A) → ν(A) for all A ∈ Σ. From the first part of the proof of the Vitali–Hahn–Saks Theorem (see Theorem 2.4.33), we derive that {ν n }n≥1

290 | 4 Banach Spaces of Functions and Measures is uniformly absolutely continuous with respect to μ. But dν n = f n dμ for all n ∈ ℕ. Therefore, {f n }n≥1 ⊆ L1 (X) is uniformly integrable. ⇐: First suppose that Σ is countably generated, that is, Σ = σ({A k }k≥1 ). Then via a diagonal argument on the generators {A k }k≥1 and by passing to a subsequence of {f n }n≥1 , we see that limn→∞ ∫A f n dμ exists for every k ∈ ℕ. Let k

{ } Σ0 = {E ∈ Σ : lim ∫ f n dμ exists} . n→∞ E { } Evidently {A k }k≥1 ⊆ Σ0 and Σ0 is a Dynkin system; see Definition 2.1.7. Invoking Theorem 2.1.11, we conclude that Σ0 = Σ. Therefore, lim ∫ f n dμ

n→∞

exists for all A ∈ Σ .

A

If ν n (A) = ∫A f n dμ for all A ∈ Σ and for all n ∈ ℕ, then ν n ≪ μ for all n ∈ ℕ and we have just proven that ν n (A) → ν(A) for all A ∈ Σ. From the Vitali–Hahn–Saks Theorem (see Theorem 2.4.33), we get that ν is a signed measure satisfying ν ≪ μ. Then, by the Radon–Nikodym Theorem (see Theorem 2.4.29), there exists f ∈ L1 (X) such that ν(A) = ∫A fdμ for all A ∈ Σ. Hence ∫A f n dμ → ∫A fdμ for all A ∈ Σ which, due to w

Proposition 4.1.15, gives that f n → f in L1 (X). Therefore, F ⊆ L1 (X) is relatively weakly compact. Next we remove the hypothesis that Σ is countably generated. In this case we replace Σ with the σ-algebra Σ generated by the countably many sets {x ∈ X : f n (x) > η} and {x ∈ X : f n (x) < −η} for all n ∈ ℕ and for all η > 0 with η ∈ ℚ. Moreover we replace X with V = ⋃ {x ∈ X : f n (x) ≠ 0} . n≥1

Finally note that by a straightforward application of the Radon–Nikodym Theorem, one has, for any h ∈ L∞ (V, Σ), the existence of h ∈ L∞ (V, Σ ) such that ∫ fhdμ = ∫ fh dμ V

for all f ∈ L1 (V, Σ ) .

V

Proposition 4.1.19. If (X, Σ, μ) is a finite measure space and {u n }n≥1 ⊆ L1 (X) is relatively w weakly compact, then u n → u in L1 (X) as n → ∞ if and only if ‖u + y‖1 ≤ lim inf ‖u n + y‖1 n→∞

for all y ∈ L1 (X) . w

Proof. ⇒: Let y ∈ L1 (X) be fixed. Evidently u n + y → u + y in L1 (X). So, invoking Proposition 3.3.13(c), it follows that ‖u + y‖1 ≤ lim inf ‖u n + y‖1 . n→∞

4.1 L p -Spaces

| 291

⇐: Since {u n }n≥1 ⊆ L1 (X) is relatively weakly compact, by passing to a subsew quence if necessary, we may assume that u n → û in L1 (X) as n → ∞. Let A = {x ∈ ̂ ̂ X : u(x) > u(x)} and E = {x ∈ X : u(x) ≤ u(x)}. Evidently, A, E ∈ Σ. From the Dunford– Pettis Theorem (see Theorem 4.1.18), we conclude that {u n − u}n≥1 ⊆ L1 (X) is uniformly integrable. So, given ε > 0, there exists c > 0 such that |u n − u|dμ ≤ ε

∫

for all n ∈ ℕ ;

(4.1.16)

{|u n −u|≥c}

see Definition 2.3.40. Let y = −u − cχ A + cχ E . Since μ is finite, one gets that y ∈ L1 (X). Moreover, we obtain |u n + y| ≤ c − (u n − u)χ A + (u n − u)χ E + 2|u n − u|χ{|u n −u|≥c} , which implies, since X = A ∪ E and because of (4.1.16), that cμ(X) = ‖u + y‖1 ≤ lim inf ‖u n + y‖1 n→∞

≤ cμ(X) − ∫ (û − u) dμ + ∫ (û − u) dμ + 2ε A

E

= cμ(X) − ∫ û − u dμ + 2ε . X

This implies that ∫X û − u dμ ≤ 2ε. Since ε > 0 is arbitrary, we let ε ↘ 0 to conclude w that û = u. Therefore, u n → u in L1 (X). Let ξ : ℝ → [0, +∞) be a continuous function with ξ(0) = 0 which satisfies the following condition: For every ε > 0, there exists c ε > 0 such that |ξ(s + t) − ξ(s)| ≤ εξ(s) + c ε ξ(t) for all s, t ∈ ℝ .

(4.1.17)

Remark 4.1.20. Condition (4.1.17) is satisfied by convex functions. Moreover, if ξ : ℝ → [0, +∞) is continuous, ξ(s) > 0 for all s ≠ 0 and there exist p, q ∈ (1, +∞) such that lim

s→0

ξ(s) = η0 > 0 and |s|p

lim

s→±∞

ξ(s) = η∞ > 0 , |s|q

then ξ satisfies condition (4.1.17). Proposition 4.1.21. If (X, Σ, μ) is a measure space, ξ : ℝ → [0, ∞) is a continuous function with ξ(0) = 0, which satisfies condition (4.1.17), and f n : X → ℝ with n ∈ ℕ is a sequence of Σ-measurable functions such that f n → f μ-a.e. ,

sup ∫ ξ(f n )dμ < ∞ , n≥1

X

and

∫ ξ(f)dμ < ∞ , X

then sup ∫ ξ(f n − f)dμ < ∞ n≥1

X

and

∫ |ξ(f n ) − ξ(f) − ξ(f n − f)|dμ → 0 as n → ∞ . X

292 | 4 Banach Spaces of Functions and Measures Proof. Applying condition (4.1.17) yields |ξ(f n ) − ξ(f n − f)| ≤ εξ(f n − f) + c ε ξ(f) .

(4.1.18)

Let ε = 1/2. Then from (4.1.18) we derive ξ(f n − f) ≤ 2[ξ(f n ) + c1/2 ξ(f)] for all n ∈ ℕ. Hence, supn≥1 ∫X ξ(f n − f)dμ < ∞. Let + ϑ n,ε = [ ξ(f n ) − ξ(f) − ξ(f n − f) − εξ(f n − f)] .

Then ϑ n,ε ≤ (1 + c ε )ξ(f)

(4.1.19)

μ-a.e.

Moreover we have ϑ n,ε → 0

μ-a.e. as n → ∞ .

(4.1.20)

Then (4.1.19), (4.1.20) and the Dominated Convergence Theorem imply that ∫X ϑ n,ε dμ → 0 as n → ∞. Note that |ξ(f n ) − ξ(f) − ξ(f n − f)| ≤ ϑ n,ε + εξ(f n − f)

μ-a.e. ,

which gives lim sup ∫ |ξ(f n ) − ξ(f) − ξ(f n − f)|dμ ≤ Mε n→∞

for some M > 0 .

X

Since ε > 0 is arbitrary, we conclude that ∫ |ξ(f n ) − ξ(f) − ξ(f n − f)|dμ → 0

as n → ∞ .

X

Of special interest is the case when ξ(s) = |s|p with 1 ≤ p < ∞. In this particular form, the result is known as the “Brézis–Lieb Lemma.” Lemma 4.1.22 (Brézis–Lieb Lemma). If (X, Σ, μ) is a measure space, {f n }n≥1 ⊆ L p (X) with 1 ≤ p < ∞ is bounded, and f n → f μ-a.e., then p

p

p

lim [‖f n ‖p − ‖f n − f‖p ] = ‖f‖p .

n→∞

Corollary 4.1.23. If (X, Σ, μ) is a measure space, {f n }n≥1 ⊆ L p (X) with 1 ≤ p < ∞ is bounded and f n → f μ-a.e. as well as ‖f n ‖p → ‖f‖p , then ‖f n − f‖p → 0. We have already seen that bounded sequences in L1 need not have weakly convergent subsequences; see Example 4.1.17. However, if we exclude a decreasing sequence of measurable sets {A k }k≥1 with μ(A k ) → 0, then we can extract a weakly convergent subsequence. This is the content of the so-called “Bitting Theorem.”

4.1 L p -Spaces

| 293

Theorem 4.1.24 (Bitting Theorem). If (X, Σ, μ) is a measure space and {f n }n≥1 ⊆ L1 (X) is a bounded sequence, then there exists a subsequence {f n k }k≥1 of {f n }n≥1 , a nonincreasing sequence {A m }m≥1 ⊆ Σ with μ(A m ) ↘ 0 and f ∈ L1 (X) such that f n k → f in L1 (X \ A m ) for all m ∈ ℕ as k → ∞. Proof. For n ∈ ℕ and c ≥ 0, let ϑ n,c = ∫{|f

n |≥c}

|f n |dμ ≥ 0. Evidently, c → ϑ n,c is non-

increasing. We define η = limc→∞ supn≥1 ϑ n,c ≥ 0. If η = 0, then {f n }n≥1 ⊆ L1 (X) is uniformly integrable and so on account of the Dunford–Pettis Theorem (see Theorem 4.1.18), the result holds with A m = 0 for all m ∈ ℕ. So, assume that η > 0. For each k ∈ ℕ, let n k ∈ ℕ be such that ϑ n k ,2k ≥ supn≥1 ϑ n,2k − 1/k. Hence, ϑ n k ,2k ≥ η −

1 , k

(4.1.21)

since {supn≥1 ϑ n,2k }k≥1 is nonincreasing. Moreover, by monotonicity we know that the following limit exists: η = lim sup c→∞ n≥1

|f n k |dμ ≥ 0 .

∫ {c≤|f nk | 0, there is a subsequence {k(c)} such that |f n k(c) |dμ ≥

∫

η . 2

(4.1.22)

{c≤|f n k(c) | ε}. p (b) By Lloc (Ω) with 1 ≤ p < ∞ we denote the space of all measurable functions f : Ω → ℝ such that f ∈ L p (U) for all U ⊂⊂ Ω, that is, for every open U ⊆ Ω such that U ⊆ Ω with U being compact. (c) A sequence of mollifiers is any sequence of functions on ℝN such that ϑ n ∈ C∞ c (Ω) ,

supp ϑ n ⊆ B 1n (0) ,

∫ ϑ n dx = 1 ,

ϑn ≥ 0

for all n ∈ ℕ ,

ℝN ∞ where C∞ c (Ω) denotes the space of all C -functions on Ω that have compact support. Consider the C∞ -function ϑ : ℝN → ℝ defined by

{c exp ( |x|21−1 ) if |x| ≤ 1 , ϑ(x) = { 0 if |x| > 1 { with c > 0 such that ∫ℝN ϑ(x)dx = 1. For any ε > 0 we set ϑ ε (x) = 1/ε N ϑ(x/ε) for all x ∈ ℝN . Then {ϑ ε }ε>0 is the standard mollifier. (d) Given f ∈ L1loc (Ω) we define f ε = ϑ ε ∗ f , where ∗ denotes the operation of convolution, that is, f ε (x) = ∫Ω ϑ ε (x − y)f(y)dy for all x ∈ Ω ε . Proposition 4.1.28. Given f ∈ L1loc (Ω), it holds that f ε ∈ C∞ (Ω ε ) for every ε > 0. th

Proof. We fix x ∈ Ω ε . For any k ∈ {1, . . . , N}, let e k = (0, . . . , 1, . . . 0) be the k =basic vector in ℝN . For λ ∈ ℝ with a small |λ|, one has x + λe k ∈ Ω ε . Then f ε (x + λe k ) − f ε (x) 1 1 x + λe k − y x−y = N ∫ [ϑ ( )−ϑ( )] f(y)dy λ λ ε ε ε Ω

1 1 x + λe k − y x−y = N ∫ [ϑ ( )−ϑ( )] f(y)dy λ ε ε ε U

for some U ⊂⊂ Ω. Note that 1 x + λe k − y x−y [ϑ ( )−ϑ( )] λ ε ε 1 ∂ϑ x − y ∂ϑ = (x − y) for all y ∈ U . ( ) = εN ε ∂x k ε ∂x k

lim

λ→0

(4.1.24)

296 | 4 Banach Spaces of Functions and Measures Moreover, x + λe k − y x−y 1 1 [ϑ ( )−ϑ( )] f(y) ≤ ‖Dϑ‖∞ |f(y)| λ ε ε ε

(4.1.25)

with 1/ε‖Dϑ‖∞ |f(⋅)| ∈ L1 (U). From (4.1.24) and (4.1.25) we see that we can apply the Dominated Convergence Theorem to get ∂f ε ∂ϑ ε (x) = ∫ (x − y)f(y)dy ∂x k ∂x k

for all k ∈ {1, . . . , N} .

Ω

Therefore we conclude that f ε ∈ C∞ (Ω ε ). Proposition 4.1.29. If f ∈ C(ℝ), then f ε → f as ε → 0+ uniformly on compact subsets of Ω. Proof. Let U ⊂⊂ Ω and let V ⊆ ℝN be open such that U ⊆ V ⊆ Ω. For x ∈ U we obtain f ε (x) =

1 x−y ) f(y)dy = ∫ ϑ(u)f(x − εu)du . ∫ ϑ( ε εN B ε (x)

Since ∫B

1 (0)

B1 (0)

ϑ(u)du = 1 (see Definition 4.1.27(c)), one has ε f (x) − f(x) ≤ ∫ ϑ(u)|f(x − εu) − f(x)|du .

(4.1.26)

B1 (0)

If f V is uniformly continuous, then from (4.1.26) we infer that f ε → f uniformly on V. p

p

Proposition 4.1.30. If f ∈ Lloc (Ω) with 1 ≤ p < ∞, then f ε → f in Lloc (Ω). Proof. Let U ⊂⊂ V ⊂⊂ Ω, x ∈ U and small ε > 0. For 1 < p < ∞ and 1/p + 1/p = 1 we derive, using Hölder’s inequality, that 1 1 ε f (x) ≤ ∫ ϑ(u) p ϑ(u) p |f(x − εu)|du

B1 (0) 1 p

≤ ( ∫ ϑ(u)du) B1 (0)

1 p

( ∫ ϑ(u)|f(x − εu)|p du) B1 (0) 1 p

= ( ∫ ϑ(u)|f(x − εu)|p du)

.

B1 (0)

Applying Fubini’s Theorem we obtain p ∫ f ε (x) dx ≤ ∫ ϑ(u) (∫ |f(x − εu)|p dx) du ≤ ∫ |f(y)|p dy U

B1 (0)

U

V

(4.1.27)

4.1 L p -Spaces

| 297

for small ε > 0. Since f ∈ L p (V), there exists h ∈ C(V) such that ‖f − h‖L p (V) ≤ δ

with δ > 0 ;

(4.1.28)

see Proposition 2.5.15. Hence, due to (4.1.27), ‖f ε − h ε ‖L p (U) ≤ δ .

(4.1.29)

Finally, combining (4.1.28), (4.1.29) and Proposition 4.1.29, we see that, for small ε > 0, ε f − f L p (U) ≤ f ε − h ε L p (U) + h ε − hL p (U) + ‖h − f‖L p (U) ≤ 3δ . p

Therefore, f ε → f in Lloc (Ω) as ε → 0+ . Corollary 4.1.31. If f ∈ L p (ℝN ) with 1 ≤ p < ∞, then f ε → f in L p (ℝN ) as ε → 0+ . Corollary 4.1.32. If Ω ⊆ ℝN is open, then Cc (Ω) is dense in L p (Ω) for 1 ≤ p < ∞. Remark 4.1.33. This corollary is a particular case of Proposition 2.5.15 when X = Ω and μ = λ N = the Lebesgue measure on ℝN . Finally we mention for future use that if f ∈ L1 (ℝN ) and h ∈ L p (ℝN ) with 1 ≤ p ≤ ∞, then f∗g ∈ L p (ℝN ) and ‖f∗h‖p ≤ ‖f‖1 ‖h‖p , which is a version of Young’s inequality. For the proof we refer to Brézis [48, Theorem 4.15, p. 104]. In the last part of this section we will have a quick look at some basic sequence spaces. So, let ℝℕ be the space of all real sequences. For 1 ≤ p < ∞, the l p -norm of a sequence x̂ = (x k )k≥1 ∈ ℝℕ is defined by ̂ x = ( ∑ |x k |p )

1 p

.

k≥1

For p = ∞, the l∞ -norm of x̂ = (x k )k≥1 ∈ ℝℕ is defined by ̂ x∞ = sup |x k | . k≥1

These are norms on L P (ℕ) with 1 ≤ p ≤ ∞ when we consider the counting measure on ℕ. Definition 4.1.34. We introduce the following sequence spaces c0 = {x̂ = (x k )k≥1 ∈ ℝℕ : x k → 0 as k → ∞} , c = {x̂ = (x k )k≥1 ∈ ℝℕ : lim x n exists in ℝ} , n→∞

l p = {x̂ = (x k )k≥1 ∈ ℝℕ : x̂ p < ∞} , 1 ≤ p ≤ ∞ , sc = {x̂ = (x k )k≥1 ∈ ℝℕ : x k = 0 for all but a finite number of k’s } .

298 | 4 Banach Spaces of Functions and Measures Remark 4.1.35. We can view sc as all continuous ℝ-valued functions on ℕ equipped with the discrete topology, which have compact support. Similarly, l∞ is the space of all bounded continuous ℝ-valued functions on ℕ while c0 is the space of all bounded continuous ℝ-valued functions on ℕ, which vanish at infinity. On sc , c0 and c we consider the ‖ ⋅ ‖∞ -norm. We easily see that sc ⊆ l p ⊆ c0 ⊆ c ⊆ l∞ ⊆ ℝℕ . Proposition 4.1.36. If 1 ≤ p < q ≤ ∞, then l p ⊆ l q and the inclusion is proper. Proof. Suppose that x̂ = (x k )k≥1 ∈ l p . Then {x k }k≥1 is a sequence converging to zero, hence it is bounded and so x̂ ∈ l∞ . Therefore, l p ⊆ l∞ for all 1 ≤ p < ∞. Now suppose that 1 ≤ p < q < ∞ and let x̂ = (x k )k≥1 ∈ l p . Since x k → 0, there exists m ∈ ℕ such that |x k | ≤ 1 for all k ≥ m. Then |x k |q ≤ |x k |p and this proves that ∑k≥1 |x|q < ∞. Hence x̂ = (x k )k≥1 ∈ l q . We conclude that l p ⊆ l q . Finally, note that x̂ = (1/k1/p )k≥1 ∈ l q , but x̂ ∈ ̸ l p . So the inclusion l p ⊆ l q is proper. On account of Remark 4.1.35 and since the l p -spaces result from L p (ℕ) with the counting measure, we obtain the following result. Proposition 4.1.37. The spaces l p with 1 ≤ p ≤ ∞ and c0 as well as c are Banach spaces. The space sc is not complete. From Theorems 4.1.3 and 4.1.5 we get the following proposition. Proposition 4.1.38. For 1 ≤ p < ∞, (l p )∗ = l p with 1/p + 1/p . Therefore (l1 )∗ = l∞ . Moreover, for 1 < p < ∞, l p is a reflexive Banach space.

Taking Hölder’s inequality (see Theorem 2.3.12) into account yields the following. Proposition 4.1.39. If 1 ≤ p, p ≤ ∞ are conjugate exponents and x̂ = (x k )k≥1 ⊆ l p , ̂ = ∑k≥1 x k û k converges absolutely and ⟨x,̂ u⟩̂ ≤ û = (u k )k≥1 ⊆ l p , then the series ⟨x,̂ u⟩ ̂ ̂ xp up . From Remark 3.3.17 we know that l1 has the following property known as the Schur property. Proposition 4.1.40. The Banach space l1 has the following property w

x̂ n → x̂

in l1

implies

̂ x n − x̂ 1 → 0 ,

which is called the Schur property. Hence, every weakly compact subset of l1 is also norm compact. Consider the particular sequences ê = (1, 1, 1, . . .) ,

ê k = (0, . . . , 0, e k = 1, 0, . . .) for k ∈ ℕ .

4.1 L p -Spaces

| 299

Proposition 4.1.41. The sequence {e,̂ ê k }k≥1 is a Schauder basis (see Definition 3.5.50(b)) for the Banach space c. Hence, c is separable. Proof. Let x̂ = (x k )k≥1 ∈ c and let x∞ = limk→∞ x k . Then x̂ = x∞ ê + ∑ (x k − x∞ )ê k , k≥1

which shows that {e,̂ ê k }k≥1 is a Schauder basis for c. Since c0 is a closed subspace of c, we infer the following. Corollary 4.1.42. The Banach space c0 is separable. Proposition 4.1.43. If 1 ≤ p < ∞, then {ê k }k≥1 is a Schauder basis for l p . Hence, l p is separable for 1 ≤ p < ∞. Proof. Let x̂ = (x k )k≥1 ∈ l p with 1 ≤ p < ∞. Then it follows that 1 p n x̂ − ∑ x k ê k = ( ∑ |x k |p ) → 0 p k=1 k≥n+1

as n → ∞ .

Hence x̂ = ∑k≥1 x k ê k and so we have that {ê k }k≥1 is a Schauder basis for l p with 1 ≤ p < ∞. Proposition 4.1.44. c∗0 = l1 . Proof. Every û = (u k )k≥1 ∈ l1 defines a linear functional ξ û : c0 → ℝ by ξ û (x)̂ = ∑ x k u k

for all x̂ = (x k )k≥1 ∈ c0 .

k≥1

Moreover, one has ξ û (x)̂ ≤ ∑ |x k u k | ≤ x̂ ∞ ∑ |u k | = x̂ ∞ û 1 , k≥1

(4.1.30)

k≥1

which shows that ξ û ∈ c∗0 . Thus, û → ξ û is a bounded linear map from l1 into c∗0 . We claim that this map is an isometric isomorphism. To see this, let û = (u k )k≥1 ∈ l1 and define λ k = sgn u k as well as λ k = 0 if u k = 0 for k ∈ ℕ. We infer that λ k = 1 if u k > 0 , For n ∈ ℕ we define

λ k = 0 if u k = 0 ,

λ k = −1 if u k < 0 for k ∈ ℕ .

n

ŷ n = ∑ λ k ê k ∈ c0 . k=1

This gives ξ û (ŷ n ) = ∑nk=1 |y k | and ‖ŷ n ‖∞ = 1, which implies ‖ξ û ‖∗ ≥ ∑nk=1 |u k | for all n ∈ ℕ. Thus ‖ξ û ‖∗ ≥ ‖u‖̂ 1 .

(4.1.31)

300 | 4 Banach Spaces of Functions and Measures From (4.1.30) and (4.1.31), it follows that ‖ξ û ‖∗ = ‖u‖̂ 1 . Therefore, ξ û is an isometry. We need to show that û → ξ û is surjective. So, let ξ ∗ ∈ c∗0 and let u k = ξ ∗ (ê k ) for all k ∈ ℕ. As before, we define ŷ n = ∑nk=1 (sgn u k )ê k ∈ c0 . Then ‖ŷ n ‖∞ = 1 for all n ∈ ℕ large enough and so we derive n

∑ |u k | = ξ ∗ (ŷ n ) ≤ ‖ξ ∗ ‖∗

for all n ∈ ℕ ,

k=1

which directly yields ̂ u1 = ∑ |u k | ≤ ‖ξ ∗ ‖∗

with û = (u k )k≥1 .

k≥1

Therefore, û ∈ l1 . Since ξ û (ê k ) = u k = ξ ∗ (ê k ) for all k ∈ ℕ and since span{ê k }k≥1 is dense in c0 we infer that ξ û = ξ ∗ . Thus, the map û → ξ û is surjective, hence an isomorphism.

4.2 Lebesgue–Bochner Spaces In this section we deal with Banach space-valued functions. We define integrals for such functions and Lebesgue spaces for them which we study in detail. These spaces play an important role in the theory of evolution equations and in the study of Young measures, which in turn are basic tools in the theory of calculus of variations and in optimal control. We start by introducing some notions of measurability for Banach space-valued functions. Definition 4.2.1. Let (Ω, Σ, μ) be a measure space and let X be a Banach space. (a) A simple function s : Ω → X is a function of the form n

s(w) = ∑ η k χ A k (w) for all w ∈ Ω k=1

with n ∈ ℕ, {η k }nk=1 ⊆ X and {A k }nk=1 ⊆ Σ mutually disjoint. (b) A function f : Ω → X is strongly measurable if there exist a sequence {s n }n≥1 of simple functions such that ‖s n (w) − f(w)‖ → 0 μ-a.e. (c) A function f : Ω → X (resp. f : Ω → X ∗ ) is said to be weakly measurable (resp. weakly* -measurable) if w → ⟨x∗ , f(w)⟩ is Σ-measurable for all x∗ ∈ X ∗ (resp. w → ⟨f(w), x⟩ is Σ-measurable for all x ∈ X). Here by ⟨⋅, ⋅⟩ we denote the duality brackets for the pair (X ∗ , X). Proposition 4.2.2. If f : Ω → X is strongly measurable, then w → ‖f(w)‖ is Σmeasurable from Ω into ℝ+ .

4.2 Lebesgue–Bochner Spaces |

301

Proof. By Definition 4.2.1(b), there exists a sequence of simple functions {s n }n≥1 such that ‖s n (w) − f(w)‖ → 0 μ-a.e., which implies that ‖s n (w)‖ − ‖f(w)‖ ≤ ‖s n (w) − f(w)‖ → 0

μ-a.e.

Hence, w → ‖f(w)‖ is Σ-measurable; see Proposition 2.2.12. Definition 4.2.3. Let (Ω, Σ, μ) be a measure space and let X be a Banach space. A function f : Ω → X is said to be essentially separably valued if there exists N ∈ Σ with μ(N) = 0 such that f(Ω \ N) ⊆ X is separable. Using this definition we can state a convenient characterization of strongly measurable functions. The result is known as the “Pettis Measurability Theorem.” Theorem 4.2.4 (Pettis Measurability Theorem). If (Ω, Σ, μ) is a measure space, X is a Banach space and f : Ω → X, then the following statements are equivalent: (a) f is strongly measurable. (b) f is essentially separably valued and f −1 (U) ∈ Σ for all open sets U ⊆ X. (c) f is essentially separably valued and weakly measurable. Proof. (a) ⇒ (b): The strong measurability of f implies that there exists a sequence of simple functions s n : Ω → X with n ∈ ℕ such that s n (w) → f(w) μ-a.e. Then, for every x∗ ∈ X ∗ , ⟨x∗ , s n (w)⟩ → ⟨x∗ , f(w)⟩ in R for all w ∈ Ω \ N with μ(N) = 0. Since w → ⟨x∗ , s n (w)⟩ is Σ-measurable for each n ∈ ℕ, from Proposition 2.2.12 it follows that w → ⟨x∗ , f(w)⟩ is Σ-measurable. So, f is weakly measurable. The union E of the ranges of the s n ’s is a countable set. Hence E ⊆ X is separable and f(w) ∈ E for all w ∈ Ω \ N with μ(N) = 0. Hence, f is essentially separably valued. (b) ⇒ (c): Note that for every open V ⊆ ℝ and for every x∗ ∈ X ∗ , (x∗ )−1 (V) ⊆ X is open and so (x∗ ∘ f)−1 (V) = f −1 ((x∗ )−1 (V)) ∈ Σ. Therefore, f is weakly measurable. (c) ⇒ (a): Without any loss of generality we may assume that X is separable, otherwise we replace X by span f(Ω \ N). Then the separability of X implies that X∗ there exists a sequence {x∗n }n≥1 ⊆ B1 such that ‖f(w)‖ = supn≥1 |⟨x∗n , f(w)⟩|; see Theorem 3.4.12(a). Hence w → ‖f(w)‖ is Σ-measurable. Let A+ = {w ∈ Ω : ‖f(w)‖ > 0}. Then A+ ∈ Σ and w → f(w) − x is weakly measurable on A+ for every x ∈ X. Hence, w → ‖f(w) − x‖ is Σ ∩ A+ -measurable. Let {x n }n≥1 ⊆ X be a dense sequence. Given ε > 0, let D n = {w ∈ A+ : ‖f(w) − x n ‖ < ε} ∈ Σ ∩ A+ with n ∈ ℕ . We define C n = D n \⋃n−1 k=1 D k ∈ Σ∩A + and these sets are disjoint. Note that A + = ⋃n≥1 C n . We define {x n if w ∈ C n , n ∈ ℕ , f ε (w) = { 0 if w ∈ Ω \ A+ . { Evidently, f ε is countably valued and ‖f(w) − f ε (w)‖ < ε for all w ∈ Ω. Therefore, f is the uniform limit of countably valued functions. Truncating the f n ’s, we produce a

302 | 4 Banach Spaces of Functions and Measures sequence {s n }n≥1 of simple functions such that s n (w) → f(w) μ-a.e. Hence f is strongly measurable. Corollary 4.2.5. If (Ω, Σ, μ) is a measure space, X is a separable Banach space, and f : Ω → X, then the following statements are equivalent: (a) f is strongly measurable. (b) f is measurable. (c) f is weakly measurable. Another useful consequence of Theorem 4.2.4 is the following result. Corollary 4.2.6. If f : Ω → X is the μ-a.e. limit of a sequence of strongly measurable functions, then f is strongly measurable. Example 4.2.7. (a) Let X = l2 [0, 1] and let {e t }t∈[0,1] be the canonical basis of this nonseparable Hilbert space. Let f : [0, 1] → X be defined by f(t) = e t . For x∗ ∈ X ∗ we see that ⟨x∗ , f(t)⟩ = 0 for all t ∈ [0, 1] \ C with C being countable since 2 ∑t∈[0,1] (⟨x∗ , e t ⟩) < ∞. Therefore, f is weakly measurable. However, ‖f(s) − f(t)‖ = √2 for s ≠ t, which implies that f is not essentially separably valued and so it is not strongly measurable. (b) Let X = L∞ [0, 1] and let f : [0, 1] → X be defined by f(t) = χ[0,t] . Note that X ∗ is the space of finitely additive measures, which are absolutely continuous with respect to the Lebesgue measure; see Dunford–Schwartz [94, IV 8.16]. So, every x∗ ∈ X ∗ is the difference of two positive elements. For x∗ ≥ 0, the function t → ⟨x∗ , f(t)⟩ is increasing and so f is weakly measurable. On the other hand, ‖f(s) − f(t)‖∞ = 1 for s ≠ t and so f is not essentially separably valued. Thus it is not strongly measurable. The notion of a Bochner integral is an abstraction of Proposition 2.3.22. Definition 4.2.8. Let (Ω, Σ, μ) be a σ-finite measure space and X is a Banach space. (a) A simple function s : Ω → X is Bochner integrable if it has the form n

s(w) = ∑ η k χ A k (w)

for all w ∈ Σ

k=1

with n ∈ ℕ, distinct elements {η k }nk=1 ⊆ X and {A k }nk=1 ⊆ Σ are mutually disjoint and η k = 0 if μ(A k ) = +∞. For any A ∈ Σ the Bochner integral of s over A is defined by n

∫ sdμ = ∑ η k μ(A k ∩ A) A

k=1

with η k μ(A k ∩ A) = 0 if η k = 0 and μ(A k ∩ A) = +∞. (b) A strongly measurable function f : Ω → X is Bochner integrable if there exists a sequence {s n }n≥1 of Bochner integrable simple functions such that ‖s n (w) − f(w)‖ → 0 μ-a.e.

and

∫ ‖s n − f‖dμ → 0 as n → ∞ . Ω

4.2 Lebesgue–Bochner Spaces |

303

Then for any A ∈ Σ the Bochner integral of f over A is defined by ∫ fdμ = lim ∫ s n dμ .

(4.2.1)

n→∞

A

A

Remark 4.2.9. It is easy to see that the limit in (4.2.1) exists and is independent of the sequence {s n }n≥1 . Evidently ∫A fdμ = ∫Ω fχ A dμ. An immediate consequence of the definition above is the following result. Proposition 4.2.10. If f : Ω → X is Bochner integrable, then ∫ fdμ ≤ ∫ ‖f‖dμ Ω Ω

and

lim ∫ fdμ = 0 .

μ(A)→0

A

As in the case with the Lebesgue measure, the Bochner integral defines a vector valued measure as well. Proposition 4.2.11. If {A k }k≥1 ⊆ Σ is a disjoint partition of Ω and f : Ω → X is Bochner integrable over A k with k ∈ ℕ and ∑k≥1 ∫A |f|dμ < ∞, then f is Bochner integrable over k

Ω and ∫Ω fdμ = ∑k≥1 ∫A fdμ. k

Proof. Let ε > 0 and choose n ∈ ℕ such that ∑ ∫ |f|dμ ≤ ε . k≥n+1 A

(4.2.2)

k

For each k ∈ {1, . . . , n}, we choose a Bochner integrable simple function s k : Ω → X such that ∫ ‖f − s k ‖dμ ≤ Ak

ε . 2k

(4.2.3)

We set s = ∑nk=1 s k χ A k . Clearly this is a simple function. By applying (4.2.2) and (4.2.3) we obtain n

∫ ‖f − s‖dμ = ∑ ∫ ‖f − s‖dμ = ∑ ∫ ‖f − s k ‖dμ + ∑ ∫ ‖f‖dμ X

k≥1 A

k

k=1 A

k

k≥n+1 A

n

ε ≤ ∑ k + ε ≤ 2ε . 2 k=1 Hence, f is Bochner integrable over Ω. Finally note that n ∫ fdμ − ∑ ∫ fdμ ≤ lim ∑ ∫ |f|dμ = 0 . n→∞ Ω k=1 A k≥n+1 A k k

k

304 | 4 Banach Spaces of Functions and Measures The definition of the Bochner integral is not that easy to use. The next proposition provides a very convenient criterion for Bochner integrability. Proposition 4.2.12. A function f : Ω → X is Bochner integrable if and only if f is strongly measurable and ‖f(⋅)‖ ∈ L1 (Ω). Proof. ⇒: Suppose that f is Bochner integrable. Then there exists a sequence of Bochner integrable simple functions {s n }n≥1 such that ∫Ω ‖s n − f‖dμ → 0. For w ∈ Ω and n, m ∈ ℕ we get 0 ≤ ‖s n (w)‖ − ‖s m (w)‖ ≤ ‖s n (w) − s m (w)‖ ≤ ‖s n (w) − f(w)‖ + ‖f(w) − s m (w)‖, which implies that ∫ ‖s n ‖ − ‖s m ‖dμ ≤ ∫ ‖s n − f‖dμ + ∫ ‖f − s m ‖dμ → 0 as n, m → ∞ . Ω

Ω

Hence, {‖s n ‖}n≥1 ⊆

L1 (Ω)

Ω

is a Cauchy sequence, thus bounded. We finally see that

∫ ‖f‖dμ ≤ ∫ ‖f − s n ‖dμ + ∫ ‖s n ‖dμ ≤ M Ω

Ω

for some M > 0 and for all n ∈ ℕ .

Ω

⇐: From Proposition 4.2.11, we may assume without any loss of generality that μ is finite. From the proof of the Pettis Measurability Theorem (see Theorem 4.2.4), we know that for a given ε > 0 we find a countably valued measurable function h ε : Ω → X such that ‖h ε (w) − f(w)‖ ≤ ε

for all w ∈ Ω \ N with μ(N) = 0 .

(4.2.4)

This gives ∫ ‖h ε ‖dμ ≤ ∫ ‖h ε − f‖dμ + ∫ ‖f‖dμ ≤ εμ(Ω) + ∫ ‖f‖dμ < ∞ . Ω

Hence ‖h ε (⋅)‖ ∈

Ω

L1 (Ω)

Ω

Ω

and we find δ > 0 such that

∫ ‖h ε ‖dμ ≤ ε

for all A ∈ Σ with μ(A) ≤ δ .

(4.2.5)

A

We consider a Σ-Partition Ω = E ∪ A with μ(A) ≤ δ and such that ĥ ε = h ε χ A has finite range, that is, ĥ ε is a simple function. Then, because of (4.2.4) and (4.2.5), we conclude that ∫ f − ĥ ε dμ ≤ ∫ ‖f − h ε ‖ dμ + ∫ h ε − ĥ ε dμ Ω

Ω

Ω

≤ ∫ ‖f − h ε ‖ dμ + ∫ ‖h ε ‖dμ ≤ εμ(Ω) + ε . Ω

A

Therefore, f is Bochner integrable in the sense of Definition 4.2.8(b).

4.2 Lebesgue–Bochner Spaces |

305

The next result follows directly from the definitions for Bochner integrable simple functions and by approximation for general Bochner integrable functions. Proposition 4.2.13. If (Ω, Σ, μ) is a σ-finite measure space, X, Y are Banach spaces, T ∈ L(X, Y), f : Ω → X and T(f) : Ω → Y are both Bochner integrable, then T (∫ fdμ) = ∫(T ∘ f)dμ . Ω

Ω

Corollary 4.2.14. If (Ω, Σ, μ) is a σ-finite measure space, X is a Banach space, and f : Ω → X is Bochner integrable, then ⟨x∗ , ∫ fdμ⟩ = ∫⟨x∗ , f⟩dμ Ω

for all x∗ ∈ X ∗ .

Ω

The next result is a straightforward consequence of Definition 4.2.8. Proposition 4.2.15. If f, h : Ω → X are Bochner integrable functions and ϑ, λ ∈ ℝ, then the following hold: (a) ∫A [ϑf + λh]dμ = ϑ ∫A fdμ + λ ∫A hdμ for all A ∈ Σ. (b) If f(w) ≤ h(w) μ-a.e., then ∫A fdμ ≤ ∫A hdμ for all A ∈ Σ. Proposition 4.2.16. If f, h : Ω → X are Bochner integrable functions and ∫A fdμ = ∫A hdμ for all A ∈ Σ, then f(w) = h(w) μ-a.e. Proof. On account of the Pettis Measurability Theorem (see Theorem 4.2.4) without any loss of generality, we may assume that X is separable. Then according to TheoX∗

w∗

X∗

rem 3.4.12(a), there exists a sequence {x∗n }n≥1 ⊆ B1 such that {x∗n }n≥1 = B1 . Applying Corollary 4.2.14 yields ∫A ⟨x∗n , f − h⟩dμ = 0 for all n ∈ ℕ. Hence ⟨x∗n , f(w) − h(w)⟩ = 0 for all w ∈ Ω \ N with μ(N) = 0 and n ∈ ℕ. Therefore, we obtain ‖f(w) − h(w)‖ = 0 for all w ∈ Ω \ N with μ(N) = 0 and so f = h μ-a.e. An interesting byproduct of this proof is the following corollary. Corollary 4.2.17. If f, h : Ω → X are strongly measurable and ⟨x∗ , f(w)⟩ = ⟨x∗ , h(w)⟩ μ-a.e. for all x∗ ∈ X ∗ , the exceptional μ-null set depending on x∗ , then f(w) = h(w) μ-a.e. Next we present a version of the mean value theorem for Bochner integrals. Proposition 4.2.18. If f : Ω → X is Bochner integrable and A ∈ Σ with μ(A) > 0, then 1/(μ(A)) ∫A fdμ ∈ conv f(A). Proof. We argue by contradiction. So, suppose that 1/(μ(A)) ∫A fdμ ∈ ̸ conv f(A). Then according to the Strong Separation Theorem (see Corollary 3.1.61), there exists x∗ ∈ X ∗ \ {0} such that ⟨x∗ ,

1 ∫ fdμ⟩ < ϑ ≤ ⟨x∗ , f(w)⟩ for all w ∈ A . μ(A) A

306 | 4 Banach Spaces of Functions and Measures This implies, thanks to Corollary 4.2.14, that 1 ∫ ⟨x∗ , f ⟩ dμ < ϑ ≤ ⟨x∗ , f(w)⟩ μ(A)

for all w ∈ A .

A

Therefore, ∫ ⟨x∗ , f ⟩ dμ < ϑμ(A) ≤ ∫⟨x∗ , f⟩dμ , A

A

a contradiction. This proves the proposition. The Lebesgue Dominated Convergence Theorem has its counterpart for the Bochner integral as stated in the next theorem. Theorem 4.2.19. If {f n }n≥1 is a sequence of Bochner integrable functions, f n → f μ-a.e. and there exists η ∈ L1 (Ω) such that ‖f n (w)‖ ≤ η(w) μ-a.e. for all n ∈ ℕ, then f is Bochner integrable, ∫ ‖f n − f‖dμ → 0

and

Ω

∫ f n dμ → ∫ fdμ A

for all A ∈ Σ .

A

Proof. Since ‖f n (w)‖ ≤ η(w) μ-a.e. for all n ∈ ℕ with η ∈ L1 (Ω), Proposition 4.2.12 implies that f is Bochner integrable. By hypothesis, ‖f n (w) − f(w)‖ → 0 μ-a.e. as n → ∞ and ‖f n (w) − f(w)‖ ≤ 2η(w) μ-a.e. So, by the scalar Dominated Convergence theorem we get that ∫Ω ‖f n − f‖dμ → 0 as n → ∞. Moreover, for every A ∈ Σ, we obtain that ∫ f n dμ − ∫ fdμ = ∫(f n − f)χ A dμ ≤ ∫ ‖f n − f ‖ χ A dμ A Ω Ω A ≤ ∫ ‖f n − f‖dμ → 0

as n → ∞ .

Ω

Now we are ready to introduce the analogs of the L p -spaces for 1 ≤ p ≤ ∞ for Banach space-valued functions. These spaces are known as Lebesgue–Bochner spaces. Definition 4.2.20. Let (Ω, Σ, μ) be a measure space and X a Banach space. (a) For 1 ≤ p < ∞ we define L p (Ω, X) to be the space of all equivalence classes for the relation of equality μ-a.e. of Bochner integrable functions f : Ω → X such that ∫Ω ‖f‖p dμ < ∞. This is a normed space with the norm defined by 1 p

‖f‖p = (∫ ‖f‖ dμ) p

.

Ω

L∞ (Ω,

(b) For p = ∞ we define X) to be the space of all equivalence classes of Bochner integrable functions f : Ω → X, which are essentially bounded, that is, ess sup ‖f(w)‖ = inf [M > 0 : ‖f(w)‖ ≤ M μ-a.e.] < ∞ . Ω

4.2 Lebesgue–Bochner Spaces |

307

This is a normed space with the norm defined by ‖f‖∞ = ess supΩ ‖f(w)‖. Remark 4.2.21. By Proposition 4.2.12, L1 (Ω, X) coincides with the class of all Bochner integrable functions. The next proposition is an easy consequence of the definition above and of the properties of the Bochner integral. Proposition 4.2.22. If (Ω, Σ, μ) is a measure space and X is a Banach space, then the following hold: (a) L p (Ω, X) is a Banach space for every 1 ≤ p ≤ ∞. (b) The set of integrable simple functions is dense in L p (Ω, X) for 1 ≤ p < ∞ and the countably valued functions in L∞ (Ω, X) are dense in L∞ (Ω, X). (c) If (Σ(μ), d μ ) is a separable metric space (see Definition 2.3.23), and X is a separable Banach space, then L p (Ω, X) is separable as well for 1 ≤ p < ∞. (d) If X is reflexive (resp. uniformly convex), then the same is true for L p (Ω, X) for 1 < p < ∞. (e) If X is continuously embedded into Y, then so does L p (Ω, X) into L q (Ω, Y) for 1 ≤ q ≤ p ≤ ∞. A basic problem in the theory of Lebesgue–Bochner spaces is the identification of the dual of L p (Ω, X) for 1 ≤ p < ∞. To do this, we need to introduce some basic definitions from the theory of vector measures. Definition 4.2.23. Let (Ω, Σ) be a measurable space and X a Banach space. (a) We say that ξ : Σ → X is a vector measure if ξ(0) = 0 and for every {A n }n≥1 ⊆ Σ mutually disjoint, one has ξ(⋃n≥1 A n ) = ∑n≥1 ξ(A n ) in the norm topology of X. (b) If μ is a measure on Σ, we say that the vector measure ξ : Σ → X is μ-continuous if limμ(A)→0 ξ(A) = 0. This is equivalent to saying that μ(A) = 0 implies ξ(A) = 0. As in the scalar case, we denote this by ξ ≪ μ. (c) We say that a vector measure ξ : Σ → X is of bounded variation if |ξ|(Ω) = sup ∑ ‖μ(A)‖ < ∞ , P A∈P

where P runs through the finite Σ-partitions of Ω. Similarly we can define this for E ∈ Σ by |ξ|(E) = sup ∑ ‖μ(A)‖ < ∞ , P A∈P

where P runs through the finite Σ-partitions of E. We call |ξ|(⋅) the variation of ξ and it is a measure on Σ. (d) The Banach space X is said to have the Radon–Nikodym Property (the RNP for short) if for every probability measure μ on Σ and every vector measure ξ : Σ → X of bounded variation with ξ ≪ μ, there exists f ∈ L1 (Ω, X) such that ξ(A) = ∫A fdμ for all A ∈ Σ.

308 | 4 Banach Spaces of Functions and Measures Remark 4.2.24. If f ∈ L1 (Ω, X), then from Proposition 4.2.11 we know that Σ ∋ A → ξ(A) = ∫A fdμ is a vector measure. The RNP is not a property that every Banach space X has. For example let X = c0 and let (Ω, Σ, μ) = ([0, 1], B([0, 1]), λ) with B([0, 1]) being the Borel σ-algebra of [0, 1] and λ being the Lebesgue measure. We consider the vector measure ξ : Σ → c0 defined by ξ(A) = (∫A cos(nt)dt)n≥1 , which is well-defined by the Riemann–Lebesgue Lemma; see Hewitt–Stromberg [145, p. 249]. Clearly, ξ is of bounded variation and ξ ≪ λ. But (cos(nt))n≥1 ∈ ̸ c0 . So ξ does not have a density in L1 ([0, 1], c0 ). Hence c0 does not have the RNP. The RNP is a hereditary property, that is, every closed subspace of a Banach space with the RNP has the RNP. The next theorem identifies two major classes of Banach spaces that exhibit the RNP. For a proof of this result we refer to Diestel–Uhl [80, pp. 79, 82]. Theorem 4.2.25. If X is a reflexive Banach space or X is a separable dual Banach space, then X has the RNP. Using this notion, we can state our first result concerning the dual of a Lebesgue– Bochner space. Theorem 4.2.26. If (Ω, Σ, μ) is a σ-finite measure space and X is a Banach space such that X ∗ has the RNP, then L p (Ω, X)∗ = L p (Ω, X ∗ ) for all 1 ≤ p < ∞ with 1/p + 1/p = 1. Proof. On account of Proposition 4.2.11, we may assume that μ is finite. Let h ∈ L p (Ω, X ∗ ) and define η h (f) = ∫⟨f, h⟩dμ

for all f ∈ L p (Ω, X) .

Ω

Evidently, η h :

L p (Ω,

X) → ℝ is linear and

η h (f) ≤ ∫ |⟨f, h⟩|dμ ≤ ∫ ‖f‖X ‖h‖X∗ dμ ≤ ‖f‖p ‖h‖p , Ω

Ω

which implies that η h ∈ L p (Ω, X)∗

and ‖η h ‖∗ ≤ ‖h‖p .

(4.2.6)

First suppose that h = ∑k≥1 x∗k χ A k with x∗k ∈ X ∗ and a partition {A k }k≥1 ⊆ Σ of Ω with μ(A k ) > 0 for all k ∈ ℕ. Given ε > 0 we choose ϑ ∈ L p (Ω) with ϑ ≥ 0, ϑ ≠ 0 and ‖ϑ‖p ≤ 1 such that ‖h‖p −

ε ≤ ∫ ‖h‖X∗ ϑdμ . 2

(4.2.7)

Ω

For each k ∈ ℕ, let x k ∈ X with ‖x k ‖X = 1 such that ε ∗ ≤ ⟨x∗k , x k ⟩ . x k X∗ − 2‖ϑ‖1

(4.2.8)

4.2 Lebesgue–Bochner Spaces |

309

Let f = ∑k≥1 x k ϑχ A k ∈ L p (Ω, X). Then ‖f‖p = ‖ϑ‖p ≤ 1 and, because of (4.2.7) and (4.2.8), ∫⟨f, h⟩dμ = ∫ ϑ ∑ ⟨x∗k , x k ⟩χ A k dμ ≥ ∫ ϑ ∑ (x∗k X∗ − Ω

k≥1

Ω

Ω

= ∫ ϑ‖h‖X∗ dμ − Ω

k≥1

ε ) χ A k dμ 2‖ϑ‖1

ε ≥ ‖h‖p − ε . 2

Letting ε ↘ 0 we conclude that in this case ‖η h ‖∗ = ‖h‖p ; see (4.2.6). Now suppose that h ∈ L p (Ω, X ∗ ) is general, not necessarily countably valued. Then from the proof of Theorem 4.2.4 we know that there exists a sequence of countably valued functions such that ‖h n − h‖p → 0. Moreover, we know that ‖η h ‖∗ = ‖h n ‖p for all n ∈ ℕ and ‖η h n − η h ‖∗ ≤ ‖h n − h‖p → 0 as n → ∞. Therefore, lim η h n ∗ = lim ‖h n ‖p = ‖h‖p . η h ∗ = n→∞ n→∞ So, we have proved that L p (Ω, X ∗ ) is contained isometrically as a subspace of L p (Ω, X)∗ . Now suppose that X ∗ has the RNP. Let β ∈ L p (Ω, X)∗ and consider ξ : Σ → X ∗ defined by ⟨ξ(A), x⟩ = β(xχ A ) for all A ∈ Σ and for all x ∈ X. Clearly, ξ is a vector X

valued measure. Let {A k }nk=1 ⊆ Σ be a finite partition of Ω and let {x k }nk=1 ⊆ B1 , that is, ‖x k ‖X ≤ 1 for all k = 1, . . . , n. Then it follows that n n n ∑ ⟨ξ(A k ), x k ⟩ = β ( ∑ x k χ A k ) ≤ ‖β‖∗ ∑ x k χ A k k=1 k=1 k=1 p n 1 ≤ ‖β‖∗ ∑ χ A k ≤ ‖β‖∗ μ(Ω) p . k=1 p Then |ξ|(Ω) < ∞; see Definition 4.2.23(c). Since X ∗ has the RNP there exists h ∈ L1 (Ω, X ∗ ) such that ξ(A) = ∫A hdμ for all A ∈ Σ. Note that if f ∈ L p (Ω, X) is simple, then ξ(f) = ∫Ω ⟨h, f⟩dμ. Let {A k }k≥1 ⊆ Σ an increasing sequence such that Ω = ⋃k≥1 A k and hA k is bounded. We fix k0 ∈ ℕ and observe that f → ∫A ⟨h, f⟩dμ is a bounded linear functional on L p (Ω, X), which agrees k0

with ξ on the simple functions, which are supported on A k0 . It follows that β (fχ A k0 ) = ∫ ⟨hχ A k0 , f ⟩ dμ

for all f ∈ L p (Ω, X) .

Ω

Note that hχ A k0 is bounded, hence hχ A k0 ∈ L p (Ω, X ∗ ) and hχ A k0 p ≤ ‖β‖∗ . Since this is

true for all k0 ∈ ℕ, by the Monotone Convergence Theorem, we get that h ∈ L p (Ω, X ∗ ). Then β(f) = lim ∫ ⟨h, fχ A k ⟩ dμ = ∫⟨h, f⟩dμ n→∞

Ω

Ω

for all f ∈ L p (Ω, X) .

310 | 4 Banach Spaces of Functions and Measures

Remark 4.2.27. In fact the converse is also true, namely if L p (Ω, X)∗ = L p (Ω, X ∗ ) with 1 ≤ p < ∞, then X ∗ has the RNP; see Diestel–Uhl [80, p. 99]. We can also state a vector valued version of the Dunford–Pettis Theorem; see Theorem 4.1.18. Theorem 4.2.28. Let (Ω, Σ, μ) be a finite measure space, X is a Banach space such that both X and X ∗ have the RNP, and let F ⊆ L1 (Ω, X) satisfies the following conditions: (a) F is bounded. (b) F is uniformly integrable, that is, limμ(A)→0 supf ∈F ∫A ‖f‖dμ = 0. (c) The set {∫A fdμ : f ∈ F} is relatively weakly compact for every A ∈ Σ. Then F ⊆ L1 (Ω, X) is relatively weakly compact. Proof. Let {f n }n≥1 ⊆ F. Invoking the Pettis Measurability Theorem (see Theorem 4.2.4), there exists a countable algebra L = {A k }k≥1 ⊆ Σ such that if Σ1 = σ(L), then each f n is Σ1 -measurable. By a diagonalization process based on condition (c) and using the Eberlein–Smulian Theorem, we produce a subsequence {f n m }m≥1 of {f n }n≥1 such that w − lim ∫ f n m dμ exists for all k ∈ ℕ . m→∞

Ak

Therefore, w − lim ∫ f n m dμ m→∞

exists for all A ∈ L .

A

The condition (b) implies that {∫A f n m dμ}m≥1 ⊆ X is weakly a Cauchy sequence for all A ∈ Σ1 . Hence, condition (c) implies that we can define a set function ξ : Σ1 → X by for all A ∈ Σ1 .

ξ(A) = w − lim ∫ f n m dμ m→∞

A

⟨x∗ ,

We see that limμ(A)→0 ξ(A)⟩ = 0 for each x∗ ∈ X ∗ . Therefore ξ is weakly countably additive and so by the Orlicz–Pettis Theorem (see Remark 3.5.41), we know that ξ is a vector measure such that ξ ≪ μ. Next we show that ξ is of bounded variation. One has ‖ξ(A)‖X ≤ lim inf ∫ f n m dμ m→∞ A

for all A ∈ Σ1 .

So, if P ⊆ Σ1 is a finite partition of Ω, then, due to condition (a), ∑ ‖ξ(A)‖X ≤ ∑ lim inf ∫ f n m dμ ≤ lim inf ∑ ∫ f n m dμ m→∞ m→∞ A∈P A∈P A∈P A A ≤ sup ∑ ∫ f n m dμ = sup f n m 1 ≤ sup ‖f‖1 < ∞ . m∈ℕ A∈P

A

m≥1

f ∈F

4.2 Lebesgue–Bochner Spaces | 311

Hence, ξ is of bounded variation. Since X has the RNP there exists f ∈ L1 (Ω, Σ1 , X) such that ξ(A) = ∫ fdμ

for all A ∈ Σ1 .

A w

We need to show that f n m → f in L1 (Ω, Σ, X) as m → ∞. Hence we will also have weak convergence in L1 (Ω, X). Then according to the Eberlein–Smulian Theorem, F ⊆ L1 (Ω, X) is relatively weakly compact. w Note that ∫A f n m dμ → ∫A fdμ for all A ∈ Σ1 . Hence, for every countably valued h ∈ L∞ (Ω, Σ1 , X ∗ ) we have ∫Ω ⟨h, f n m ⟩dμ → ∫Ω ⟨h, f⟩dμ. But countably valued functions are dense in L∞ (Ω, Σ1 , X ∗ ); see Proposition 4.2.22(b). So, finally ∫⟨h, f n m ⟩dμ → ∫⟨h, f⟩dμ Ω

for all h ∈ L∞ (Ω, Σ1 , X ∗ ) .

Ω

w

Thus, f n m → f in L1 (Ω, X). Next we examine what is the dual of L1 (Ω, X) when X is an arbitrary Banach space, that is, no condition is imposed on X ∗ ; see Theorem 4.2.26. Definition 4.2.29. Let (Ω, Σ, μ) be a σ-finite measure space and let X be a Banach space. (a) Two functions f, h : Ω → X ∗ , which are w∗ -measurable are said to be equivalent, denoted by f ∼ h, if ⟨f(w), x⟩ = ⟨h(w), x⟩ μ-a.e. for all x ∈ X. The exceptional μ-null set depends on x ∈ X in general. Evidently ∼ is an equivalence relation. ∗ (b) By L∞ (Ω, Xw ∗ ) we denote the linear space of the equivalence classes for the relation ∗ ∼ of w -measurable functions f : Ω → X ∗ such that |⟨f(w), x⟩| ≤ c‖x‖

μ-a.e., for all x ∈ X and for some c > 0 .

The exceptional μ-null set may depend on x ∈ X. The infimum of all c > 0 is ∗ denoted by ‖f‖L∞ (Ω,Xw∗ ∗ ) and is a norm on L∞ (Ω, Xw ∗ ). ∗ Remark 4.2.30. If X is separable and f ∈ L∞ (Ω, Xw ∗ ), then the function w → ‖f(w)‖ X ∗ ∞ belongs to L (Ω) and it holds ‖f‖L∞ (Ω,Xw∗ ∗ ) = ess supΩ ‖f(⋅)‖X∗ . Some authors denote ∗ ∞ ∗ the space L∞ (Ω, Xw ∗ ) by L w (Ω, X ).

Example 4.2.31. Let (Ω, Σ, μ) be a nonatomic σ-finite measure space and let X = l2 [0, 1] = {x = (x α )α∈[0,1] ∈ ℝ[0,1] : ‖x‖2 = ∑0≤α≤1 |x α |2 < ∞}. This means that x α = 0 except for at most a countable number of indices; see Definition 3.5.40. ∗ ∞ (Ω, X ) consists of all This is a nonseparable Hilbert space and L∞ (Ω, Xw ∗) = L w functions w → f(w) = (f α (w))α∈[0,1] with each f α being Σ-measurable and essentially bounded with ess supΩ |f α (⋅)| ≤ M for all α ∈ [0, 1]. Consider the function

312 | 4 Banach Spaces of Functions and Measures w → e(w) = (e α (w))α∈[0,1] , where {1 if w = α , e α (w) = { 0 otherwise . { Then e ∼ 0 but ‖e(w)‖ = 1 for all w ∈ Ω. Therefore a function in the equivalence class ∗ of zero in L∞ (Ω, Xw ∗ ) may be nonzero everywhere. If we multiply e(w) with a scalar function ϑ(w), we obtain another element in the same class with norm |ϑ(w)|. Hence w → ‖f(w)‖X∗ need not be essentially bounded or even measurable for an element ∗ f ∈ L∞ (Ω, Xw ∗ ). The next remarkable result known as the “Lifting Theorem” eliminates the exceptional μ-null set from all elements in L∞ (Ω) at once. For a proof of this result we refer to A. and C. Ionescu–Tulcea [162, Theorem IV. 3, p. 46]. In what follows, by B(Ω) we denote the space of all bounded functions f : Ω → ℝ with the supremum norm. Theorem 4.2.32 (Lifting Theorem). If (Ω, Σ, μ) is a σ-finite space, then there exists a linear map ρ : L∞ (Ω) → B(Ω) such that: (a) ρ(f) ∼ f ; (b) ρ(1) = 1, where 1 is the function identically 1; (c) ρ(f)(w) ≥ 0 for all w ∈ Ω if f(w) ≥ 0 μ-a.e. The map ρ is called a linear lifting. Proposition 4.2.33. If (Ω, Σ, μ) is a σ-finite measure space, X is a Banach space, and ∗ K ∈ L(L1 (Ω), X ∗ ), then there exists a unique f ∈ L∞ (Ω, Xw ∗ ) such that ⟨K(h), x⟩ = ∫ h(w)⟨f(w), x⟩dμ

for all h ∈ L1 (Ω) and for all x ∈ X .

(4.2.9)

Ω ∗ Moreover, ‖K‖L = ‖f‖L∞ (Ω,Xw∗ ∗ ) . The map S : L(L1 (Ω), X ∗ ) → L∞ (Ω, Xw ∗ ) defined by ∗ ∞ S(K) = f is linear and surjective, that is, every f ∈ L (Ω, Xw∗ ) corresponds to a K ∈ L(L1 (Ω), X ∗ ) via (4.2.9).

Proof. Let x ∈ X. Then η x (h) = ⟨K(h), x⟩ is a linear functional on L1 (Ω) since η x (h) ≤ ‖K(h)‖X∗ ‖x‖X ≤ ‖K‖L ‖h‖1 ‖x‖X ,

(4.2.10)

which shows that η x ∈ L1 (Ω)∗ . Theorem 4.1.5 implies that there exists a unique f x ∈ L∞ (Ω) such that ⟨K(h), x⟩ = ∫ hf x dμ

for all h ∈ L1 (Ω)

and ‖f x ‖∞ ≤ ‖K‖L ‖x‖X ;

(4.2.11)

Ω

see (4.2.10). Note that x → f x ∈ L∞ (Ω) is linear and bounded; see (4.2.11). Let ρ be the linear lifting from Theorem 4.2.32. Then x → ρ(f x )(w) belongs to X ∗ for every w ∈ Ω;

4.2 Lebesgue–Bochner Spaces | 313

see (4.2.11). Therefore there exists f(w) ∈ X ∗ with ‖f(w)‖X∗ ≤ ‖K‖L (see again (4.2.11)) such that ⟨f(w), x⟩ = ρ(f x )(w) for all x ∈ X . Then f ∈ (4.2.12),

L∞ (Ω,

∗ Xw ∗)

(4.2.12)

and ‖f(w)‖X∗ ≤ ‖K‖L for all w ∈ Ω. Hence, because of (4.2.11) and ‖f‖L∞ (Ω,Xw∗ ∗ ) ≤ ‖K‖L

(4.2.13)

⟨K(h), x⟩ = ∫ h(w)⟨f(w), x⟩dμ .

(4.2.14)

and

Ω

From (4.2.14) it follows that ‖K‖L ≤ ‖f‖L∞ (Ω,Xw∗ ∗ ) , which becomes equality because of (4.2.13), that is, ‖K‖L = ‖f‖L∞ (Ω,Xw∗ ∗ ) . Evidently S is linear. ∗ ∗ Conversely, consider f ∈ L∞ (Ω, Xw ∗ ). Then (4.2.9) defines a unique K(h) ∈ X for every h ∈ L1 (Ω). In addition, h → K(h) is linear and ‖K‖L ≤ ‖f‖L∞ (Ω,Xw∗ ∗ ) . Reasoning as in the first part of the proof, we obtain f ̂ ∈ L∞ (Ω, X ∗ ∗ ) with ‖f ̂‖L∞ (Ω,X∗ ) ≤ ‖K‖L . Then w

f − f ̂ produce the zero operator (see (4.2.9)), hence f ∼ f ̂.

w∗

Remark 4.2.34. In fact from the proof above we have supw∈Ω ‖f(w)‖X∗ = ‖K‖L . ∗ So, we can state the following lifting theorem for L∞ (Ω, Xw ∗ ).

Corollary 4.2.35. If (Ω, Σ, μ) is a σ-finite measure space and X is a Banach space, then ∗ ∞ (Ω, X ∗ ) such that there exists a continuous linear map ρ̂ : L∞ (Ω, Xw ∗) → L w∗ ̂ ∼ f; (a) ρ(f) ∗ ̂ (b) supw∈Ω ‖ρ(f)(w)‖ . X ∗ = ‖f‖L∞ (Ω,Xw ∗) ∗ This map ρ̂ is called a lifting on L∞ (Ω, Xw ∗ ). Remark 4.2.36. Evidently, ρ̂ depends on the lifting ρ on L∞ (Ω) stated in Theorem 4.2.32. ∗ ∞ (Ω, X ∗ ) Note that if X is separable, then ‖f‖L∞ (Ω,X∗ ) = ‖K‖L . In general, L∞ (Ω, Xw ∗ ) ≠ L even if X is separable. Now we are ready to characterize L1 (Ω, X)∗ for an arbitrary Banach space X. Theorem 4.2.37. If (Ω, Σ, μ) is a σ-finite measure space and X is a Banach space, then ∗ L1 (Ω, X)∗ is isometrically isomorphic to L∞ (Ω, Xw ∗ ) and the duality pairing is given by ⟨f, g⟩ = ∫⟨f(w), g(w)⟩X dμ

∗ for all g ∈ L1 (Ω, X), f ∈ L∞ (Ω, Xw ∗) .

Ω ∗ 1 Proof. Let f ∈ Xw ∗ ). Then η f : L (Ω, X) → ℝ defined by η f (g) = ∫ ⟨f, g⟩dμ is Ω 1 ∗ bounded linear, hence η f ∈ L (Ω, X) with ‖η f ‖∗ ≤ ‖f‖L∞ (Ω,Xw∗ ∗ ) . We show that the opposite inequality also holds. So, let ε > 0. Then there exist x ∈ X with ‖x‖X = 1 and A ∈ Σ with μ(A) > 0 such that

L∞ (Ω,

‖f‖L∞ (Ω,Xw∗ ∗ ) − ε ≤ ⟨f(w), x⟩X

for all w ∈ A .

314 | 4 Banach Spaces of Functions and Measures This implies (‖f‖L∞ (Ω,Xw∗ ∗ ) − ε) μ(A) ≤ η f (χ A x) . Hence, ‖f‖L∞ (Ω,Xw∗ ∗ ) ≤ ‖η f ‖∗ since ε > 0 is arbitrary and thus, ‖η f ‖∗ = ‖f‖L∞ (Ω,Xw∗ ∗ ) . Next we show that every element in L1 (Ω, X)∗ is of the form η f for some f ∈ ∗ ∞ 1 ∗ and let h ∈ L 1 (Ω). Then ϑ : X → ℝ defined by L (Ω, Xw ∗ ). So let β ∈ L (Ω, X) ϑ(x) = β(hx) is linear and |ϑ(x)| = |β(hx)| ≤ ‖β‖∗ ‖h‖1 ‖x‖X , which gives ϑ ∈ X∗

and ‖ϑ‖X∗ ≤ ‖β‖∗ ‖h‖1 .

(4.2.15)

Hence, there exists x∗h ∈ X ∗ such that β(hx) = ⟨x∗h , x⟩ and ‖x∗h ‖X∗ ≤ ‖β‖∗ ‖h‖1 ; see (4.2.15). Consider the map K : L1 (Ω) → X ∗ defined by K(h) = x∗h . Then K ∈ L(L1 (Ω), X ∗ ) ∗ and ‖K‖L ≤ ‖β‖∗ . Invoking Proposition 4.2.33 there exists a unique f ∈ L∞ (Ω, Xw ∗) such that β(hx) = ⟨x∗h , x⟩ = ∫ h(w)⟨f(w), x⟩X dμ

for all h ∈ L1 (Ω) and for all x ∈ X .

Ω

Thus, ‖β‖∗ = ‖f‖L∞ (Ω,Xw∗ ∗ ) . Finally if we take {h k }nk=1 ⊆ L1 (Ω) and {x k }nk=1 ⊆ X, then n

n

β ( ∑ h k x k ) = ∫ ⟨f(w), ∑ h k (w)x⟩ dμ . k=1

Ω

k=1

X

But such functions are dense in L1 (Ω, X). Therefore, β(g) = ∫⟨f(w), g(w)⟩X dμ

for all g ∈ L1 (Ω, X) .

Ω

Remark 4.2.38. If X ∗ has the RNP, for example when X ∗ is separable or when X is ∗ ∞ (Ω, X ∗ ). reflexive, then L∞ (Ω, Xw ∗) = L Next we introduce a notion that is a useful tool in the theory of evolution equations. Definition 4.2.39. A triple (X, H, X ∗ ) of spaces is called an evolution triple (or Gelfand triple) if the following properties hold: (a) X ⊆ H and X ∗ is the dual of X. (b) X is a separable reflexive Banach space. (c) H is a separable Hilbert space that is identified with its dual, pivot space; see Theorem 3.5.21. (d) X is embedded continuously and densely in H. Remark 4.2.40. We can easily check that property (d) implies that H ∗ = H is embedded into X ∗ continuously. Moreover, the reflexivity of X implies that the embedding H → X ∗ is also dense. So, we have X → H → X ∗ with all embeddings being continuous and dense. If 2 ≤ p < ∞, then X = L p [0, 1], H = L2 [0, 1], X ∗ = L p [0, 1] with 1/p+1/p = 1 is an evolution triple. Other evolution triple, useful in partial differential equations, can be produced using Sobolev functions; see Section 4.5.

4.2 Lebesgue–Bochner Spaces | 315

In what follows, we denote by ⟨⋅, ⋅⟩ the duality brackets for the pair (X ∗ , X) and by (⋅, ⋅) we denote the inner product of H. Moreover by ‖ ⋅ ‖, | ⋅ |, ‖ ⋅ ‖∗ we denote the norms of X, H, X ∗ , respectively. We easily see that ⟨⋅, ⋅⟩H×X = (⋅, ⋅), ‖ ⋅ ‖∗ ≤ c1 | ⋅ | and | ⋅ | ≤ c2 ‖ ⋅ ‖ for some c1 , c2 > 0 .

(4.2.16)

Definition 4.2.41. Let T = [0, b] and let X, Y be Banach spaces with X ⊆ Y and u ∈ L1 (T, X). Then the distributional derivative (du)/(dt) of u is understood as the linear operator (du)/(dt) ∈ L(C∞ c (0, b), Y) defined by b

du dφ (φ) = − ∫ u dt dt dt

for all φ ∈ C∞ c (0, b) .

0

Here

C∞ c (0,

b) denotes the space of all C∞ functions with compact support. We write b

b

(du)/(dt) = u and if u ∈ L1 (T, Y), then ∫0 φ udt = − ∫0 φu dt for all φ ∈ C∞ c (0, b). The next proposition is an immediate consequence of this definition. Proposition 4.2.42. If (X, H, X ∗ ) is an evolution triple, 1 ≤ p < ∞ and u ∈ L p (T, X), then the distributional derivative u ∈ L p (T, X ∗ ) = L p (T, X)∗ with 1/p + 1/p = 1 exists if and only if there exists v ∈ L p (T, X ∗ ) such that b

b

∫(u(t), x)φ (t)dt = − ∫⟨v(t), x⟩φ(t)dt 0

for all x ∈ X and for all φ ∈ C∞ c (0, b) .

0

The distributional derivative is uniquely defined and u = v. Definition 4.2.43. Let T = [0, b] and let X, Y be Banach spaces such that X ⊆ Y and let 1 ≤ p, q. The space W pq (T, X, Y) is defined by W pq (T, X, Y) = {u ∈ L p (T, X) : u ∈ L q (T, Y)} where u denotes the distributional derivative of u; see Definition 4.2.41. The space W pq (T, X, Y) is equipped with the norm ‖u‖W = ‖u‖L p (T,X) + ‖u ‖L q (T,Y) . This is clearly a Banach space. If X = Y and p = q, then W pp (T, X, X) = W 1,p ((0, b), X) and this is a vector valued Sobolev space; see Section 4.5. Finally when Y = X ∗ and q = p with 1/p + 1/p = 1, then we write W p (0, b) = W pp (T, X, X ∗ ). Remark 4.2.44. If X, Y are separable and 1 < p, q < +∞, then W pq (0, b) is a separable and reflexive Banach space. Proposition 4.2.45. If T = [0, b] and if X, Y are Banach spaces with X → Y continuously and 1 ≤ p, q, then W pq (0, b) → C(T, Y) continuously.

316 | 4 Banach Spaces of Functions and Measures Proof. Let u ∈ W pq (0, b). Then from Definition 4.2.43 we know that u is Bochner 1 integrable. We set v(t) = ∫0 u (s)ds. Then t t ‖v(t) − v(τ)‖Y = ∫ u (s)ds ≤ ∫ ‖u (s)‖Y ds . Y τ τ Hence v : T → Y is continuous. But v = u + y with y ∈ Y. Therefore u : T → Y is continuous as well. From Hölder’s inequality we get b

‖v(t)‖Y ≤ ∫ ‖u (t)‖Y dt = ‖u ‖L1 (T,Y) ≤ c1 ‖u ‖L q (T,Y)

(4.2.17)

0

for some c1 > 0. Taking into account (4.2.16), (4.2.17) as well as the fact that X → Y continuously, we obtain b

1 p

1 1 p ‖y‖Y = 1 [∫ ‖y‖Y dt] = 1 ‖v − u‖L p (T,Y) b p [0 bp ] 1 ≤ 1 [‖v‖L p (T,Y) + ‖u‖L p (T,Y) ] bp 1 1 ≤ 1 [b b c1 ‖u ‖L q (T,Y) + c2 ‖u‖L p (T,Y) ] for some c2 > 0 . p b

(4.2.18)

Then, applying Hölder’s inequality in (4.2.17) and using (4.2.18), we derive t b ‖u‖C(T,Y) = sup ‖v(t) − y‖Y = sup ∫ u (s)ds − y ≤ ∫ ‖u (s)‖Y ds + ‖y‖Y t∈T t∈T Y 0 0 c 2 ≤ 2c1 ‖u ‖L q (T,Y) + 1 ‖u‖L p (T,Y) ≤ c3 ‖u‖W for some c3 > 0 . bb Thus, W pq (0, b) → C(T, Y) continuously. Proposition 4.2.46. If (X, H, X ∗ ) is an evolution triple, then C1 (T, X) is dense in W p (0, b) and W p (0, b) → C(T, H) continuously and densely. Proof. Let a < 0 < b < d and let u ∈ W p (0, b). We extend u to (a, 0) and to (b, d) by symmetry. Let φ ∈ C∞ c (a, d) with φ T = 1 and set û = φu. Evidently û ∈ W p (a, d) and ̂ uT = u. Moreover, we get ‖u‖W p (0,b) ≤ û W p (a,d) ≤ c(φ)‖u‖W p (0,b) with c(φ) > 0 depending on the test function φ. Note that û vanishes on some neighborhoods of a and d. Let ϑ be the standard mollifier (see Definition 4.1.27(c)), and let {u m = f1/m ∗ u}m≥1 be the regularizations of u; see Definition 4.1.27(d). Then, as

4.2 Lebesgue–Bochner Spaces | 317

in Proposition 4.1.28, one has u m ∈ C∞ c ((a, d), X) for every m ≥ 1 large enough and u m → û in W p (a, d) with ‖u m ‖W p (a,d) ≤ ‖u‖̂ W p (a,d) . It follows that C1 (T, X) is dense in W p (a, b). Moreover, for every m, n ∈ ℕ, we infer that 1 d m 2 u (t) − u n (t) = ((u m ) (t) − (u n ) (t), u m (t) − u n (t)) . 2 dt Then, by applying (4.2.16), it results in t

1 m 2 u (t) − u n (t) = ∫ ⟨(u m ) (s) − (u n ) (s), u m (s) − u n (s)⟩ ds 2 a

t

≤ ∫ (u m ) (s) − (u n ) (s)X∗ u m (s) − u n (s)X ds a

≤

1 m u − u n W p (a,d) 2

for all t ∈ (a, d) .

Hence, {u m }m≥1 is a Cauchy sequence in C([a, d], H). Therefore u m → û in C([a, d], H) as m → ∞ and ‖u‖̂ C([a,d],H) ≤ ‖u‖̂ W p (a,d) . So, we conclude that u ∈ C(T, H). More precisely, there is a class representative with this property, and W p (0, b) → C(T, H) continuously and of course densely. An interesting byproduct of the proof above is the following integration by parts formula. Corollary 4.2.47. If (X, H, X ∗ ) is an evolution triple and u, v ∈ W p (0, b), then d/(dt)(u(t), v(t)) = ⟨u (t), v(t)⟩ + ⟨u(t), v (t)⟩ for a.a. t ∈ T. The embedding of W p (0, b) into C(T, H) is not compact in general. However we can prove compact embedding of W p (0, b) into L p (T, H). This is a particular case of Theorem 4.2.49 below. First we need a interpolation-type lemma known as “Ehrling’s inequality.” Lemma 4.2.48 (Ehrling’s inequality). If X, Y, V are Banach spaces such that X ⊆ Y ⊆ V with the embedding of X into Y being compact and the embedding of Y into V being continuous, then, for a given ε > 0 there exists c(ε) > 0 such that ‖x‖Y ≤ ε‖x‖X + c(ε)‖x‖V

for all x ∈ X .

(4.2.19)

Proof. Suppose that (4.2.19) is false. Then there exists ε > 0 and a sequence {x n }n≥1 ⊆ X such that ‖x n ‖Y > ε‖x n ‖X + n‖x n ‖V for all n ∈ ℕ . We set u n = x n /‖x n ‖X for all n ∈ ℕ. Then ‖u n ‖X = 1 for all n ∈ ℕ and ‖u n ‖Y ≥ ε + n‖u n ‖V

for all n ∈ ℕ .

(4.2.20)

318 | 4 Banach Spaces of Functions and Measures The set {u n }n≥1 ⊆ X is bounded. Since by hypothesis X → Y compactly and Y → V continuously, by passing to a suitable subsequence if necessary, we may assume that un → u

in Y and in V as n → ∞ .

(4.2.21)

From (4.2.20) and (4.2.21) we infer that ‖u‖V = 0 and ‖u‖Y ≥ ε > 0, a contradiction. Using this lemma, we can prove the following theorem concerning the embedding of W p (0, b) into L p (T, H). Theorem 4.2.49. If X, Y, V are Banach spaces, X, V are reflexive, X ⊆ Y ⊆ V, the embedding of X into Y is compact, the embedding of Y into V is continuous, and 1 < p < ∞ as well as 1 ≤ q < ∞, then the embedding W pq (T, X, V) → L p (T, Y) is compact. Proof. We need to show that bounded sets in W pq (T, X, V) are relatively compact in L p (T, Y). So let {u n }n≥1 ⊆ W pq (T, X, V) be bounded. Then {u n }n≥1 ⊆ L p (T, X) is bounded and L p (T, X) is reflexive; see Proposition 4.2.22(d). Therefore, we may assume that w

un → u

in L p (T, X) as n → ∞ .

(4.2.22)

Recall that L q (T, V) ⊆ L1 (T, V). So we obtain {u n }n≥1 ⊆ L1 (T, V) is bounded .

(4.2.23)

Without any loss of generality we may assume that u = 0; otherwise replace u n by u n − u. Applying Lemma 4.2.48 we see that for given ε > 0 p

p

p

‖u n ‖L p (T,Y) ≤ ε‖u n ‖L p (T,X) + c(ε)‖u n ‖L p (T,V)

for all n ∈ ℕ .

(4.2.24)

Pick δ ∈ (0, b/2]. For t ∈ [0, b/2] we can write δ

u n (t) = ũ n (t) + y n (t)

with

1 ũ n (t) = ∫ u n (t + s)ds . δ

(4.2.25)

0

Note that, by using integration by parts, δ δ s d s 1 u n (t + s)ds = [ − 1] u n (t + s) − ∫ u n (t + s)ds ∫ ( − 1) δ ds δ δ 0 δ

0

0

= u n (t) − ũ n (t) = y n (t) for t ∈ [0,

b ]. 2

Hence, b 2

b 2

b 2

0

[0

0

p p p [ ] ∫ ‖u n (t)‖V dt ≤ 2p−1 [∫ ‖ũ n (t)‖V dt + ∫ ‖y n (t)‖V dt] .

]

(4.2.26)

4.2 Lebesgue–Bochner Spaces | 319

Moreover, we have the following estimate: b 2

b 2

p ∫ ‖y n (t)‖V dt 0

p

δ

s p ≤ ∫ [∫ (1 − ) ‖un (t + s)‖V ds] dt = ‖‖un ‖V ∗ η δ ‖ p b , L [0, 2 ] δ 0 [0 ]

where ∗ denotes convolution and η δ (t) = [t/δ + 1] χ(−δ,0] (t). Then, taking Remark 4.1.33 into account, one has ‖‖un ‖V ∗ η δ ‖L p [0, b ] ≤ ‖un ‖L1 ([0, b ],V) ‖η δ ‖p . 2

2

So, finally we obtain, due to (4.2.23), that b 2

p p ∫ ‖y n (t)‖V ≤ ‖un ‖L1 ([0,b],V) √δ ≤ ĉ √δ

(4.2.27)

0

for some ĉ > 0 and for all n ∈ ℕ. From (4.2.22), (4.2.25) and recalling that u = 0, we have w ũ n (t) → 0 in X for all t ∈ T . Since X → Y is compact and Y → V is continuous, we get ũ n (t) → 0

in Y for all t ∈ T

ũ n (t) → 0

in V for all t ∈ T .

This gives δ

‖ũ n (t)‖V ≤ c0 ‖ũ n (t)‖X ≤

c0 p1 c0 b ‖u n ‖L p (T,X) ∫ ‖u n (t + s)‖X ds ≤ δ δ 0

for some c0 > 0 and for all n ∈ ℕ. Hence {ũ n (t)}n≥1 ⊆ V is bounded uniformly for t ∈ T. Then, from the Lebesgue Dominated Convergence Theorem, we obtain b 2

p

∫ ‖ũ n (t)‖V dt → 0

as n → ∞ ,

0

which, because of (4.2.26) and (4.2.27), results in b 2

p

∫ ‖u n (t)‖V dt → 0 as n → ∞ . 0

In a similar way we show that b p

∫ ‖u n (t)‖V dt → 0 b 2

as n → ∞ ,

320 | 4 Banach Spaces of Functions and Measures which implies p

‖u n ‖L p (T,V) → 0

p

and ‖u n ‖L p (T,Y) → 0

as n → ∞ ;

see (4.2.24) and (4.2.22), and recall that ε > 0 is arbitrary. Proposition 4.2.50. If X, Y are Banach spaces, X is reflexive, X → Y is continuous, and u ∈ L∞ (T, X) ∩ C(T, Yw ), then u ∈ C(T, Xw ), where Xw and Yw denote the spaces X and Y endowed with their weak topologies, respectively. ‖⋅‖Y

Proof. By replacing Y with X if necessary we may assume that the embedding of X into Y is also dense. Then Y ∗ → X ∗ is continuous and dense since X is reflexive; see Remark 4.2.40. Let t n → in T. Since u ∈ C(T, Yw ), we have w

u(t n ) → u(t) in Y as n → ∞ . First we show that u(t) ∈ X for all t ∈ T and that ‖u(t)‖X ≤ ‖u‖L∞ (T,X) for all t ∈ T. To this end, we extend the function u by zero outside T and denote this extension by u.̂ Using mollification we regularize û and obtain a sequence {u n }n≥1 ⊆ C1 (T, X) such that ‖u n (t)‖ ≤ ‖u‖L∞ (T,X) w

u n (t) → u(t)

for all t ∈ T and for all n ∈ ℕ ,

in Y and for all t ∈ T .

(4.2.28)

Using (4.2.28) one has for all y∗ ∈ X ∗ and for all t ∈ T |⟨y∗ , u n (t)⟩Y | = |⟨y∗ , u n (t)⟩X | ≤ ‖y∗ ‖X∗ ‖u n (t)‖X ≤ ‖y∗ ‖X∗ ‖u n ‖L∞ (T,X) ≤ ‖y∗ ‖X∗ ‖u‖L∞ (T,X) .

(4.2.29)

The density of Y ∗ in X ∗ and (4.2.29) imply that u(t) ∈ X

and ‖u(t)‖X ≤ ‖u‖L∞ (T,X)

for all t ∈ T .

(4.2.30)

Let x∗ ∈ X ∗ . The density of Y ∗ in X ∗ implies that there exists {y∗m }m≥1 ⊆ Y ∗ such that y∗m → x∗ in X ∗ as m → ∞. Thanks to (4.2.30) one gets ⟨y∗m , u(t n )⟩X → ⟨y∗m , u(t)⟩X

as n → ∞ and for all m ∈ ℕ

and ⟨y∗m , u(t)⟩X → ⟨x∗ , u(t)⟩X

as m → ∞ .

So, we can find a sequence {m(n)}n≥1 not necessarily strictly increasing such that m(n) → ∞ as n → ∞ and ⟨y∗m(n) , u(t n )⟩X → ⟨x∗ , u(t)⟩X

as n → ∞ .

(4.2.31)

4.2 Lebesgue–Bochner Spaces | 321

Finally, because of (4.2.30) and (4.2.31), we infer that ∗ ⟨x , u(t n )⟩X − ⟨x∗ , u(t)⟩X ≤ ⟨x∗ , u(t n )⟩X − ⟨y∗m(n) , u(t n )⟩X + ⟨y∗m(n) , u(t n )⟩X − ⟨x∗ , u(t)⟩X ≤ ‖x∗ − y∗m(n) ‖X∗ ‖u‖L∞ (T,X) + ⟨y∗m(n) , u(t n )⟩X − ⟨x∗ , u(t)⟩X → 0 as n → ∞. Hence, u ∈ C(T, Xw ). We conclude this section with a brief mention of another weaker integral for Banach space valued functions. This is called the Pettis integral. Definition 4.2.51. Let (Ω, Σ, μ) be a measure space and X a Banach space with dual X ∗ . Suppose that f : Ω → X is weakly measurable and ⟨x∗ , f(⋅)⟩ ∈ L1 (Ω) for all x∗ ∈ X ∗ We call such functions weakly integrable. We say that f is Pettis integrable if for all A ∈ Σ there exists x A ∈ X such that ⟨x∗ , x A ⟩ = ∫⟨x∗ , f(w)⟩dμ

for all x∗ ∈ X ∗ .

A

We write x A = P − ∫A fdμ and call it the Pettis integral of f . Remark 4.2.52. Clearly a Bochner integrable function is Pettis integrable and the two integrals coincide. The converse is not true in general. Proposition 4.2.53. If (Ω, Σ, μ) is a finite nonatomic measure space and X is a Banach space, then the following statements are equivalent: (a) X is finite dimensional. (b) Every Pettis integrable function is also Bochner integrable. (c) Every strongly measurable and Pettis integrable function is Bochner integrable. Proof. (a) ⇒ (b): Let ξ(A) = P − ∫A fdμ for all A ∈ Σ. This is a vector measure of bounded variation, ξ ≪ μ and so by the Radon–Nikodym Theorem it holds that f ∈ L1 (Ω, X). (b) ⇒ (c): This is clear. (c) ⇒ (a): Suppose that dim X = +∞. Then according to the Dvoretzky– Rogers Theorem (see Remark 3.5.41), there exists a sequence {x n }n≥1 ⊆ X such that ∑ x n is unconditionally convergent and ∑ ‖x n ‖ = +∞ . n≥1

n≥1

Let {A n }n≥1 ⊆ Σ be a partition of Ω with μ(A n ) > 0 for all n ∈ ℕ. Recall that μ is nonatomic. Then f = ∑n≥1 x n /(μ(A n ))χ A n is strongly measurable and Pettis integrable, but f ∈ ̸ L1 (Ω, X); see Proposition 4.2.12. The next proposition describes an essential difference between Bochner and Pettis integrable functions.

322 | 4 Banach Spaces of Functions and Measures Proposition 4.2.54. If (Ω, Σ, μ) is a finite measure space, X is a Banach space, and f : Ω → X is a function such that f = ∑n≥1 x n χ A n with {x n }n≥1 ⊆ X and {A n }n≥1 ⊆ Σ pairwise disjoint. Then the following hold: (a) f is Bochner integrable if and only if ∑n≥1 x n μ(A n ) is absolutely convergent in X. (b) f is Pettis integrable if and only if ∑n≥1 x n μ(A n ) is unconditionally convergent in X. In both cases the integral over A ∈ Σ equals ∑n≥1 x n μ(A ∩ A n ). Definition 4.2.55. Let (Ω, Σ, μ) be a finite measure space and let X be a Banach space. (a) Two weakly measurable functions f, h : Ω → X are said to be weakly equivalent if ⟨x∗ , f(w)⟩ = ⟨x∗ , h(w)⟩

μ-a.e. and for all x∗ ∈ X ∗ .

The exceptional μ-null set depends on x∗ ∈ X ∗ . By ∼ we denote the equivalence relation of weak equivalence of functions as above. (b) Let P(μ, X) be the space of all Pettis integrable functions and set P(μ, X) = P(μ, X)/ ∼. This is a normed space when equipped with the norm ‖f‖Pe = sup [∫ |⟨x∗ , f⟩|dμ : x∗ ∈ X ∗ , ‖x‖∗ ≤ 1] . [Ω ] Remark 4.2.56. This norm satisfies sup ∫ fdμ ≤ ‖f‖Pe ≤ 2 sup ∫ fdμ A∈Σ A∈Σ A A

for all f ∈ P(μ, X) .

Moreover we have ‖f‖Pe ≤ ‖f‖1 , and they are not equivalent unless X is finite dimensional.

4.3 Functions of Bounded Variations We start by examining monotone functions. Definition 4.3.1. Let A ⊆ ℝ and consider a function f : A → ℝ. (a) We say that f is increasing (resp. decreasing) if for all x, u ∈ A with x < u we have f(x) ≤ f(u) (resp. f(u) ≤ f(x)). (b) We say that f is strictly increasing (resp. strictly decreasing) if the inequalities from (a) are strict. Remark 4.3.2. An increasing or decreasing function is called monotone. Similarly a strictly increasing or strictly decreasing function is called strictly monotone. Of course strictly monotone functions are monotone. A monotone function need not be continuous. The next proposition tell us how discontinuous they can be.

4.3 Functions of Bounded Variations | 323

Proposition 4.3.3. If T ⊆ ℝ is an interval and f : T → ℝ is a monotone function, then the set of discontinuity points of f is countable. Proof. First suppose that T = [a, b] and that f is increasing. The reasoning is similar if f is decreasing. Let x ∈ (a, b). Then the following limits exist: f+ (x) = lim+ f(u) and u→x

f− (x) = lim− f(u) . u→x

Obviously it holds that s(x) = f+ (x) − f− (x) ≥ 0 for all x ∈ (a, b). This is the “jump” of f at x. Evidently f is continuous at x if and only if s(x) = 0. For each n ∈ ℕ, let A n = {x ∈ (a, b) : s(x) ≥ 1/n}. If x1 , . . . , x m ∈ A n , then it follows that m

f(b) − f(a) ≥ ∑ [f+ (x k ) − f− (x k )] ≥ k=1

m , n

which implies that A n is finite for all n ∈ ℕ and so ⋃n≥1 A n is countable. But this union is clearly the set of discontinuity points of f . Now suppose that T is an arbitrary interval. Let T n = [a n , b n ] with n ∈ ℕ such that a n ↘ inf T

and

b n ↗ sup T

as n → ∞ .

On each T n the set D n of discontinuity points of f T n is countable. Hence ⋃n≥1 D n is countable and is the set of discontinuity points of f on T. There is a kind of converse to this proposition. Proposition 4.3.4. If D ⊆ ℝ is a countable subset, then there exists a monotone function f : ℝ → ℝ, which has D as the set of its discontinuity points. Proof. If D ⊆ ℝ is finite, then the construction of f is evident. So assume that D is countably infinite, that is, D = {x n }n≥1 . For each n ∈ ℕ let f n : ℝ → ℝ be defined by {− 12 f n (x) = { n 1 { n2

if x < x n , if x ≥ x n .

Then f has only one discontinuity point at x n . We define f(x) = ∑ f n (x) for all x ∈ ℝ . n≥1

Since |f n (x)| = 1/n2 for all x ∈ ℝ, by the Weierstrass M-test, ∑n≥1 f n (x) converges uniformly. Hence, f is continuous at all x ∈ ℝ where each f n is continuous. Therefore f is continuous on ℝ \ D. Corollary 4.3.5. There exists an increasing function f : ℝ → ℝ that is continuous at all x ∈ ℝ \ ℚ and discontinuous at all x ∈ ℚ. The previous considerations lead to the following definition.

324 | 4 Banach Spaces of Functions and Measures Definition 4.3.6. Let T ⊆ ℝ be an interval and let f : T → ℝ be an increasing function. The function s f (x) = ∑ [f+ (u) − f− (u)] + f(x) − f− (x) for all x ∈ T u∈T u inf I = inf T f . Then f(x) < u0 for all x ∈ T with x < h(u0 ). Given ε > 0, let x0 ∈ T ∩ [h(u0 ) − ε, h(u0 ) + ε). Then for every u ∈ (f(x0 ), u0 ) we deduce that h(u) = inf [x ∈ T : f(x) ≥ u] ≥ x0 ≥ h(u0 ) − ε . Hence h is left continuous. (b) ⇒: Suppose that h(u0 ) < h+ (u0 ) for some u0 ∈ I with u0 < sup I = supT f . Then we get f(x) ≥ u0 for all x > h(u0 ). On the other hand if h(u0 ) < x < h+ (u0 ), then the monotonicity of h (see part (a)) implies that h(u) > x for all u > u0 . So, f(x) < u for all u > u0 , hence f(x) ≤ u0 . Therefore, f(x) = u0 for all x ∈ (h(u0 ), h+ (u0 )). ⇐: Suppose that f(x) = u0 for all x ∈ (x1 , x2 ) and u0 < sup I = supT f . Then h(u0 ) ≤ x1 . Moreover, if u ∈ (u0 , sup I), then f(x) = u0 < u and so h(u) ≥ x for every x ∈ (x1 , x2 ). Let x → x+2 . Then h(u) ≥ x2 for all u ∈ (u0 , sup I). Hence h+ (u0 ) ≥ x2 and so we conclude that h(u0 ) ≤ x1 < x2 ≤ h+ (u0 ).

4.3 Functions of Bounded Variations | 325

(c) ⇒: Let u = f(x). Then h(u) = h(f(x))) ≤ x. Suppose this last inequality is strict, so then there exists v ∈ T with v < x such that f(v) ≥ f(x). Hence, f [v,x] = f(x). ⇐: If f [v,x] = constant, then h(f(x)) ≤ v < x. (d) ⇒: If x ∈ T with x > x0 , then f(x) ≥ u for all u ∈ (u1 , u2 ) with u1 < u2 . Let u → u+2 . It holds that f(x) ≥ u2 , hence f+ (x0 ) ≥ u2 . On the other hand, if x ∈ T with x < x0 , then f(x) < u for all u ∈ (u1 , u2 ). Let u → u−1 . Then f(x) ≤ u1 and so f− (x0 ) ≤ u1 . ⇐: Let u ∈ (f− (x0 ), f+ (x0 )). If x < x0 , then f(x) < u and so h(u) ≥ x0 . On the other hand one has x0 ≤ h(u) ≤ h(f+ (x0 )) ≤ h(f(x)) ≤ x . Let x → x+0 . We obtain that h(u) = x0 for all u ∈ (f− (x0 ), f+ (x0 )). Proposition 4.3.9. If f : T → ℝ is strictly increasing, then h is the left inverse of f and it is continuous. Proof. Since f is strictly increasing, using parts (a) and (b) of Proposition 4.3.8, we infer that h is continuous. Moreover, part (c) of Proposition 4.3.8 implies that h(f(x)) = x for all x ∈ T. Hence, h is the left inverse of f . Remark 4.3.10. The previous two propositions remain valid if f is decreasing (resp. strictly decreasing). In this case h : I → ℝ is defined by h(u) = inf [x ∈ T : f(x) ≤ u] . Now we turn our attention to the differentiability properties of monotone functions. To do this we will need the so-called “Vitali Covering Theorem,” which we present first. Definition 4.3.11. Consider the ℝN equipped with the Lebesgue measure λ and L, which denotes a family of nontrivial, closed cubes in ℝN . We say that L is a fine cover for a set A ⊆ ℝN if for every x ∈ A and every δ > 0, there exists a cube Q ∈ L such that x ∈ Q and diam Q ≤ δ. Example 4.3.12. Let m ∈ ℕ and η̂ = (η k )Nk=1 ∈ ℤN = the N-tuples of integers. We define Q m, η̂ = {x̂ = (x k )Nk=1 ∈ ℝN :

ηk − 1 ηk ≤ x k ≤ m for all k = 1, . . . , N} . m 2 2

This is a closed diadic cube. We can partition ℝN into closed diadic cubes with pairwise disjoint interiors using the hyperplane {x i = η k i 1/2m } where for every i = 1, . . . , N, the numbers η k i ∈ ℤ. The collection L of all N-dimensional closed diadic cubes of diameter ≤ ϑ for some ϑ > 0, form a fine cover for any A ⊆ ℝN . The next result is known as “Vitali’s Covering Theorem.” Theorem 4.3.13 (Vitali’s Covering Theorem). If A ⊆ ℝN is bounded and Lebesgue measurable and L is a fine cover of A, then there exists a countable subcollection {Q n }n≥1 ⊆ L with pairwise disjoint interiors such that λ(A \ ⋃n≥1 Q n ) = 0.

326 | 4 Banach Spaces of Functions and Measures Proof. Evidently we may assume that there exists a closed cube Q̂ such that A ⊆ Q̂ and Q ⊆ Q̂ for all Q ∈ L. Let L0 = L and let Q0 ∈ L0 . If Q0 covers A we are done. Otherwise we define L1 = {Q ∈ L : int Q ∩ int Q0 = 0}. Since Q0 does not cover A, we see that L1 ≠ 0. Let ϑ1 = sup[diam Q : Q ∈ L1 ]. Choose Q1 ∈ L1 such that 1/2ϑ1 < diam Q1 . If Q0 ∪ Q1 covers A, we are done; otherwise, we define L2 = {Q ∈ L1 : int Q∩int Q1 = 0}. Moreover we set ϑ2 = sup[diam Q : Q ∈ L2 ]. Choose Q2 ∈ L2 such that 1/2ϑ2 < diam Q2 . We continue this way and inductively we generate collections {Ln }n≥1 , positive numbers {ϑ n }n≥1 , and cubes {Q n }n≥1 such that Ln = {Q ∈ Ln−1 : int Q ∩ int Q n−1 = 0} 1 Q n ∈ Ln with ϑ n < diam Q n . 2

ϑ n = sup[diam Q : Q ∈ Ln ]

We have ∑( n≥1

diam Q n N ) = ∑ λ(Q n ) ≤ λ(Q)̂ < ∞ . √N n≥1

(4.3.1)

This gives diam Q n → 0

as n → ∞ .

(4.3.2)

Arguing by contradiction, suppose that the conclusion of the theorem is not true. So, there exists ε > 0 such that λ (A \ ⋃ Q n ) ≥ 2ε .

(4.3.3)

n≥1

Let Qn be another cube with the same center and parallel faces to Q n such that diam Qn = (4√N + 1) diam Q n .

(4.3.4)

From (4.3.1) and (4.3.4) we see that there exists n0 = n0 (ε) ∈ ℕ such that ∑ λ(Qn ) ≤ ε .

λ ( ⋃ Qn ) ≤ n≥n0 +1

(4.3.5)

n≥n0 +1

Applying (4.3.3) and (4.3.5) yields n0

λ ([A \ ⋃ Q n ] \ ⋃ Qn ) ≥ 2ε − ε = ε . n=1

n≥n0 +1

n

n

0 0 So, we can find x ∈ [A \ ⋃n=1 Q n ] \ ⋃n≥n0 +1 Qn . Then, since ⋃n=1 Q n ⊆ ℝN is closed,

n0

2β = d (x, ⋃ Q n ) > 0 . n=1

4.3 Functions of Bounded Variations | 327

Then according to Definition 4.3.11, there exists Q μ ∈ L with diam Q μ = μ ≤ β and x ∈ Q μ . It holds that Q μ ∩ int Q n = 0 for all n = 1, . . . , n0 . Therefore Q μ ∈ Ln0 +1 . We claim that Q μ ∩ int Q n ≠ 0 for some n ≥ n0 . (4.3.6) Indeed, if (4.3.6) is not true, then Q μ ∈ Ln for all n ∈ ℕ. Hence, because of (4.3.2) and since 1/2ϑ n < diam Q n , 0 < μ ≤ ϑn → 0

as n → ∞ ,

a contradiction. So, (4.3.6) is true. Let m ≥ n0 + 1 be the smallest integer ≥ n0 + 1 such that (4.3.6) holds. Then Q μ ∈ ̸ Lm+1 , Q μ ∈ Lm and μ ≤ ϑ m . Recall that x ∈ ̸ Qm . Therefore Q μ ∩ int Q m ≠ 0. If μ = diam Q μ >

1 [diam Qm − diam Q m ] , √ 2 N

(4.3.7)

then from (4.3.4) and (4.3.7) it follows that ϑ m ≥ μ > ϑ m , a contradiction. This proves the theorem. Corollary 4.3.14. If A ⊆ ℝN is bounded and Lebesgue measurable and L is a fine cover of n0 A, then for any given ε > 0 there exists a fine collection Fε = {Q n }n=1 ⊆ L with elements that have pairwise disjoint interiors and n0

∑ λ(Q n ) − ε ≤ λ(A) ≤ λ ( ⋃ (A ∩ Q n )) + ε . n≥1

n=1

Proof. Exploiting the regularity of the Lebesgue measure, there exists U ε ⊆ ℝN such that A ⊆ Uε

and

λ(U ε ) ≤ λ(A) + ε .

(4.3.8)

Moreover, let Lε = {Q ∈ L : Q ⊆ U ε }. Using Theorem 4.3.13, we find {Q n }n≥1 ⊆ Lε with pairwise disjoint interiors such that λ (A \ ⋃ Q n ) = 0 ,

(4.3.9)

n≥1

which in combination with (4.3.8) implies that ∑ λ(Q n ) ≤ λ(U ε ) ≤ λ(A) + ε < ∞ . n≥1

Hence, ∑ λ(Q n ) ≤ ε

for some n0 = n0 (ε) ∈ ℕ .

n≥n0 +1

From this and (4.3.9) we obtain n0

λ(A) = λ ( ⋃ (A ∩ Q n )) ≤ λ ( ⋃ (A ∩ Q n )) + ε . n≥1

n=1

(4.3.10)

328 | 4 Banach Spaces of Functions and Measures Then, due to (4.3.10) it follows that ∑ λ(Q n ) − ε ≤ λ(A) . n≥1

We will use this corollary to establish the differentiability properties of monotone functions. Definition 4.3.15. Let f : [a, b] → ℝ and for fixed x ∈ [a, b] we define f(x + h) − f(x) , h h→0 f(x + h) − f(x) , D+ f (x) = lim sup h h→0+

D+ f (x) = lim inf +

f(x) − f(x − h) , h h→0 f(x) − f(x − h) D− f (x) = lim sup . h h→0+ D− f (x) = lim inf +

We call D± f(x), D± f(x) the Dini derivatives or derivatives of f at x. Clearly, D+ f(x) ≤ D+ f(x) and D− f(x) ≤ D− f(x). Moreover f is differentiable at x with derivative f (x) ∈ ℝ if and only if f (x) = D+ f(x) = D+ f(x) = D− f(x) = D− f(x). Proposition 4.3.16. If f : [a, b] → ℝ is increasing, then the functions x → D± f(x) and x → D± f(x) are all measurable. Proof. We do the proof for D+ f(x); the proofs for the other functions are similar. For every n ∈ ℕ let ξ n (x) = sup [(f(x + h) − f(x))/h : 0 < h ≤ 1/n]. We see that D+ f(x) = limn→∞ ξ n (x). So, according to Proposition 2.2.10, it suffices to show that each ξ n is measurable. Let Q n = (0, 1/n]∩ℚ and set ϑ n (x) = sup [(f(x + h) − f(x))/h : h ∈ Q n ]. Then ϑ n (x) ≤ ξ n (x) for all n ∈ ℕ. We will show that the reverse inequality is also true. So, let ε > 0. We can find s ∈ (0, 1/n] such that ξ n (x) − ε ≤ (f(x + s) − f(x))/s. Having fixed ε > 0 and s ∈ (0, 1/n] as above, we choose h ∈ Q n such that 1 1 ε − < , s h |f(x + s)| + |f(x)| + 1 which is equivalent to [

1 1 − ] (|f(x + s)| + |f(x)| + 1) < ε . s h

Since s < h, we obtain

1 1 − ] (f(x + s) + f(x)) < ε s h and because of f(x + s) ≤ f(x + h), it follows that [

f(x + s) − f(x) f(x + h) − f(x) < +ε. s h Hence, ξ n (x) − 2ε ≤ ϑ n (x) for all ε > 0 and thus ξ n = ϑ n . But ϑ n is measurable. Hence, so is ξ n , and thus D+ f is measurable. Similarly we show this for D− f, D+ f , and D− f . Theorem 4.3.17. If f : [a, b] → ℝ is increasing, then f is differentiable a.e. on [a, b].

4.3 Functions of Bounded Variations | 329

Proof. We will show that the set where the Dini derivatives are not equal is Lebesguenull. We show this for the set {D+ f > D− f}. The proofs for the others are similar. So, let r, q ∈ ℚ and define C r,q = {x ∈ [a, b] : D+ f(x) > r > q > D− f(x)} . By Proposition 4.3.16, this set is Lebesgue measurable. Let ε > 0 and choose an open set U ε ⊆ ℝ such that λ(U ε ) ≤ λ(C r,q ) + ε, which is possible because of the regularity of the Lebesgue measure λ. For every x ∈ C r,q there exists an arbitrarily small interval [x − h, x] ⊆ U ε such that f(x) − f(x − h) < qh .

(4.3.11) n

0 Invoking Corollary 4.3.14 there exists a finite collection {I n }n=1 of such intervals that ∗ ∗ cover A ⊆ C r,q with λ (A) > λ(C r,q ) − ε where λ denotes the Lebesgue outer measure; see Definition 2.1.33 and Example 2.1.35. Summing over these intervals, we obtain, due to (4.3.11), that

n0

n0

n=1

n=1

∑ [f(x n ) − f(x n − h n )] < q ∑ h n < qλ(U ε ) ≤ q(λ(C r,q ) + ε) .

(4.3.12)

Each y ∈ A is the left endpoint of an arbitrarily small interval (y, y + μ) ⊆ I n and rμ < f(y + μ) − f(y). A new application of Corollary 4.3.14 gives a finite collection ñ 0 {I n }n=1 of such intervals such that their union contains a subset of A of outer measure > λ(C r,q ) − 2ε. Summing over these intervals, one gets ñ 0

ñ 0

n=1

n=1

∑ [f(y n + μ n ) − f(y n )] ≥ r ∑ μ n > r(λ(C r,q ) − 2ε) .

(4.3.13)

Each interval I n is contained in some interval I m and if we sum over those n for which I n ⊆ I m , then, since f is increasing, this gives n0

∑ [f(y n + μ n ) − f(y n )] ≤ f(x n ) − f(x n − h n ) , n=1

which implies ñ 0

n0

n=1

n=1

∑ [f(y n + μ n ) − f(y n )] ≤ ∑ (f(x n ) − f(x n − h n )) . Then, from (4.3.12) and (4.3.13), we see that r(λ(C r,q ) − 2ε) < q(λ(C r,q ) + ε). Since ε > 0 is arbitrary, we let ε ↘ 0 to obtain rλ(C r,q ) < qλ(C r,q ). Hence, λ(C r,q ) = 0, since q < r. So, we have that k(x) = limh→0 (f(x + h) − f(x))/h exists for a.a. x ∈ [a, b]. We need to show that k(x) ∈ ℝ for a.a. ∈ [a, b]. We define k n (x) = n[f(x + 1/n) − f(x)] with f(x) = f(b) if x ≥ b. Then k n (x) → k(x) for a.a. x ∈ [a, b]. Thus, k is measurable. Moreover k n ≥ 0 for

330 | 4 Banach Spaces of Functions and Measures all n ∈ ℕ since f is increasing. From Fatou’s Lemma we infer that b

b

b

∫ kdx ≤ lim inf ∫ k n dx = lim inf n ∫ [f (x + n→∞

a

n→∞

a

a b+ 1n

1 ) − f(x)] dx n a+ 1n

a+ 1n

] [ = lim inf [n ∫ fdx − n ∫ fdx] = f(b) − lim sup n ∫ fdx n→∞

a

[ b ≤ f(b) − f(a) .

]

n→∞

a

Hence, k ∈ L1 [a, b] and so k(x) ∈ ℝ for a.a. x ∈ [a, b]. Therefore f is differentiable almost everywhere and f = k. Remark 4.3.18. The result above is sharp in the sense that for any given Lebesgue-null set D ⊆ ℝ there exists an increasing continuous function that is differentiable at all ℝ \ D. Corollary 4.3.19. If T ⊆ ℝ is an interval and f : T → ℝ is a monotone function, then f ∈ L1 [a, b] for every [a, b] ⊆ T and b

∫ |f (x)|dx ≤ |f(b) − f(a)| . a

Hence

f

∈

L1loc (T).

Proposition 4.3.20. If T ⊆ ℝ is an interval, f : T → ℝ is a monotone function, and h > 0, [a, b] ⊆ T with b − a > h, then b−h

1 ∫ |f(x + h) − f(x)|dx ≤ |f(b) − f(a)| . h a

Moreover if f is bounded, then 1 ∫ |f(x + h) − f(x)|dx ≤ sup f − inf f T h T Th

with T h = {x ∈ T : x + h ∈ T}. Proof. To fix things we may assume that f is increasing. Let {f(x) g(x) = { f(b) { Since f is increasing, we obtain b

b+h

if x ≤ b , if b ≤ x .

a+h

1 1 1 ∫[g(x + h) − g(x)]dx = [ ∫ g(x)dx − ∫ g(x)dx] ≤ [f(b)h − f(a)h] h h h a a [b ] = f(b) − f(a) .

4.3 Functions of Bounded Variations | 331

This gives b−h

b

a

a

1 1 ∫ [f(x + h) − f(x)]dx ≤ ∫[g(x + h) − g(x)]dx ≤ f(b) − f(a) . h h Now suppose that f is bounded. Let T n = [a n , b n ] such that a n ↘ inf T and b n ↗ sup T. Recalling that f is increasing, we get b n −h

1 0≤ ∫ [f(x + h) − f(x)]dx ≤ f(b n ) − f(a n ) ≤ sup f − inf f . T h T an

If we let n → ∞, using the Lebesgue Monotone Convergence Theorem, it follows that 1 ∫ |f(x + h) − f(x)|dx ≤ sup f − inf f . T h T Th

An important consequence of Theorem 4.3.17 is the following result. Theorem 4.3.21. If T ⊆ ℝ is an interval, f n : T → ℝ with n ∈ ℕ is a sequence of increasing functions, and ∑n≥1 f n (x) converges pointwise on T, then ∑n≥1 f n (x) converges uniformly on compact sets in T; the function f(x) = ∑n≥1 f n (x) is differentiable for a.a. x ∈ T and f (x) = ∑n≥1 f n (x) for a.a. x ∈ T. Proof. The result is local. So, without any loss of generality we may assume that T = [a, b]. Let s n (x) = ∑nk=1 f k (x) for all n ∈ ℕ. Then {s n }n≥1 is a sequence of increasing functions such that s n (x) → f(x) for all x ∈ [a, b]. We will show that f is increasing as well. Arguing by contradiction, suppose that f is not increasing. Then there exists x1 , x2 ∈ [a, b] such that x1 < x2 and f(x2 ) < f(x1 ). Let h = f(x1 ) − f(x2 ) and ε ∈ (0, h/2). We can find n0 = n0 (ε, x1 , x2 ) ∈ ℕ such that |s n (x1 ) − f(x1 )| ≤ ε

and

|s n (x2 ) − f(x2 )| ≤ ε

for all n ≥ n0 .

This yields h h − f(x1 ) + = h − [f(x1 ) − f(x2 )] = h − h = 0 2 2 for all n ≥ n0 . Therefore s n (x1 ) − s n (x2 ) > 0 for all n ≥ n0 , a contradiction to the fact the s n is increasing. This proves that f is increasing as well. Now we will show that s n → f uniformly on [a, b]. Without any loss of generality we may assume that f n (a) ≥ 0 for all n ∈ ℕ. Hence, f n ≥ 0 for all n ∈ ℕ. We have s n (x2 ) − s n (x1 ) < f(x2 ) +

0 ≤ f(x) − s n (x) = ∑ f k (x) ≤ ∑ f k (b) , k≥n+1

k≥n+1

which results in sup |f(x) − s n (x)| ≤ ∑ f k (b) → 0 as n → ∞ . x∈[a,b]

k≥n+1

This proves the desired uniform convergence on [a, b].

332 | 4 Banach Spaces of Functions and Measures Since {f n }n≥1 and f are all increasing, by Theorem 4.3.17, they are all differentiable on D ⊆ [a, b] with λ([a, b] \ D) = 0 where λ is the Lebesgue measure on ℝ. Let x ∈ D and h > 0 be small such that x + h ∈ [a, b]. Then n f(x + h) − f(x) f n (x + h) − f n (x) f k (x + h) − f k (x) = ∑ ≥ ∑ = s n (x) , h h h n≥1 k=1

which shows that

n

f (x) ≥ ∑ f k (x) = sn (x) ≥ 0 . k=1

sn (x)

Hence, → g(x) for all x ∈ D as n → ∞. We will show that g = f . It holds that s n (b) → f(b). So, there exists a subsequence {s n k (b)}k≥1 such that 0 ≤ f(b) − s n k (b) ≤ 1/2k . Since f − s n k is increasing, we then obtain 1 for all x ∈ [a, b] . 2k Thus, {f − s n k }k≥1 is a sequence of increasing functions and it is convergent to 0. So, reasoning as above, it follows that f (x) − sn k (x) → 0 for a.a. x ∈ [a, b]. Since {sn }n≥1 is increasing we conclude that f (x) − sn (x) → 0 for a.a. x ∈ [a, b]. 0 ≤ f(x) − s n k (x) ≤

The set of monotone functions is not a vector space because the difference of two monotone functions need not be monotone. So, we pass to the smallest vector space containing the set of monotone functions. This is the space of functions of bounded variation. Definition 4.3.22. (a) Let T ⊆ ℝ be an interval. A partition of T is a finite set P = {x k }nk=0 ⊆ T such that x0 < x1 < . . . < x n . Let P be the set of all finite partitions of T. (b) Let T ⊆ ℝ be an interval and f : T → ℝ. We say that f is of bounded variation if n−1

var f = sup [ ∑ |f(x k+1 ) − f(x k )| : P = {x k }nk=0 ∈ P] < ∞ . T

k=0

We denote the space of functions of bounded variation by BV(T). (c) We say that f ∈ BVloc (T) if var[a,b] f < ∞ for all [a, b] ⊆ T. Remark 4.3.23. Suppose that b = sup T ∈ T. Then in the definition above it suffices to consider partitions P = {x k }nk=0 of the form x0 < x1 < . . . < x n = b. Indeed if P = {x k }nk=0 ∈ P with x n < b and P = P ∪ {b} ∈ P, then n−1

n−1

∑ |f(x k+1 ) − f(x k )| ≤ ∑ |f(x k+1 ) − f(x k )| + |f(b) − f(x n )| ≤ var f . k=0

k=0

T

Similarly if a = inf T ∈ T. So, in what follows we will use this fact without further comment. Note that BVloc ([a, b]) = BV([a, b]). Finally if U ⊆ ℝ is open, then U = ⋃n≥1 T n with {T n }n≥1 pairwise disjoint intervals. Then for f : U → ℝ we define varU f = ∑n≥1 varT n f . The following proposition is a straightforward consequence of Definition 4.3.22.

4.3 Functions of Bounded Variations | 333

Proposition 4.3.24. If f, h ∈ BV([a, b]), then the following hold: (a) f ± h ∈ BV([a, b]); (b) fh ∈ BV([a, b]); (c) if h(x) ≥ c > 0 for all x ∈ [a, b], then f/h ∈ BV([a, b]); (d) if f is differentiable on [a, b] and f is bounded, then f ∈ BV([a, b]); (e) if f is Lipschitz continuous on [a, b], then f ∈ BV([a, b]). Proposition 4.3.25. If T ⊆ ℝ is an interval and f : T → ℝ, then the following hold: (a) for every u ∈ T, supT |f| ≤ |f(u)| + varT f ; hence if f ∈ BV(T), then f is bounded; (b) for every u ∈ T it holds that varT f = varT∩(−∞,u] f + varT∩[u,+∞) f ; (c) if T does not contain sup T (resp. inf T), then var f = limu→(sup T)− varT∩(−∞,u] f (resp. var f = limu→(inf T)+ varT∩[u,+∞) f ). Proof. (a): Let x ≠ u and let P = {u, x} ∈ P. It holds that |f(x)| ≤ |f(u)| + |f(x) − f(u)| ≤ |f(u)| + var f , T

which implies supT |f| ≤ |f(u)| + varT f . (b): Let T1 = T ∩ (−∞, u] and T2 = T ∩ [u, +∞). Consider a partition P1 = {x k }nk=0 of T1 and a partition P2 = {y i }m i=0 of T 2 . As we already pointed out (see Remark 4.3.23), we can have x n = c = y0 . Then P = P1 ∪ P2 is a partition of T and so n−1

m−1

∑ |f(x k+1 ) − f(x k )| + ∑ |f(y i+1 ) − f(y i )| ≤ var f . T

i=0

k=0

This gives var f + var f ≤ var f . T1

T2

T

(4.3.14)

Next let P = {x k }nk=0 be a partition of T with u < x n . Let m ∈ {1, . . . , n} be such that n x m+1 ≤ u ≤ x m and let P1 = {x k }m−1 k=0 ∪ {u} as well as P 2 = {u} ∪ {x k }k=m . These are partitions of T1 and T2 , respectively. We obtain n−1

∑ |f(x k+1 ) − f(x k )| k=0 m−2

= ∑ |f(x k+1 ) − f(x k )| + |f(x m ) + f(u) − f(u) − f(x m−1 )| k=0 n−1

+ ∑ |f(x k+1 ) − f(x k )|

(4.3.15)

k=m m−2

≤ ∑ |f(x k+1 ) − f(x k )| + |f(x m−1 ) − f(u)| + |f(u) − f(x m )| k=0 n−1

+ ∑ |f(x k+1 ) − f(x k )| ≤ var f + var f . k=m

T1

T2

From (4.3.14) and (4.3.15), we conclude that varT f = varT1 f + varT2 f .

334 | 4 Banach Spaces of Functions and Measures (c) Evidently we may assume that varT f > 0. We consider the case sup T ∈ ̸ T; the other case is treated similarly. Let η ∈ (0, varT f) and consider a partition P = {x k }nk=0 such that η < ∑n−1 k=0 |f(x k+1 ) − f(x k )|. Consider u ∈ (x n , sup T). Then P is a partition of T ∩ (−∞, u] and so from part (b) we conclude that n−1

η < ∑ |f(x k+1 ) − f(x k )| ≤ k=0

var

T∩(−∞,u]

f ≤ var f . T

Hence, η

0 we have 1 ∫ |f(x + h) − f(x)|dx ≤ var f , h T Th

where T h = {x ∈ X : x + h ∈ T}. Proof. Clearly we may assume that varT f < ∞. Consider [a, b] ⊆ T with 0 < h ≤ b − a. From (4.3.18) and Proposition 4.3.20, we get b−h

b−h

a

a

1 1 ∫ |f(x + h) − f(x)|dx ≤ ∫ (V(x + h) − V(x))dx h h

(4.3.19)

≤ V(b) − V(a) = var f , [a,b]

since V is increasing; see Proposition 4.3.29. Let T n = [a n , b n ] with n ∈ ℕ be an increasing sequence of subintervals of T such that a n ↘ inf T and b n ↗ sup T. Assume that λ(T h ) > 0, otherwise the result is obvious. Then for large enough n ∈ ℕ, one has 0 < h < b n − a n and so from (4.3.19) it follows that b n −h

1 ∫ |f(x + h) − f(x)|dx ≤ var f ≤ var f h T [a n ,b n ] an

for all n ∈ ℕ large enough .

336 | 4 Banach Spaces of Functions and Measures Passing to the limit as n → ∞ and using the Lebesgue Monotone Convergence Theorem, we obtain 1 ∫ |f(x + h) − f(x)|dx ≤ var f . h T Th

Now we come to the theorem that characterizes functions of bounded variation. Theorem 4.3.31. If T ⊆ ℝ is an interval, then the smallest vector space containing all monotone functions (resp. all bounded monotone functions) is BVloc (T) (resp. BV(T)). Moreover every f ∈ BVloc (T) (resp. f ∈ BV(T)) can be written as the difference of two increasing functions (resp. of two bounded increasing functions). Proof. Let f, h : T → ℝ. From Definition 4.3.22(b) we see that for every subinterval I ⊆ T we have varI (ϑf) = |ϑ| varI f for all ϑ ∈ ℝ and varI (f + h) ≤ varI f + varI h. Hence, BVloc (T) (resp. BV(T)) is a vector space. Proposition 4.3.27 implies that the monotone functions (resp. the bounded monotone functions) belong to BVloc (T) (resp. to BV(T)). Moreover, if f ∈ BVloc (T) (resp. f ∈ BV(T)), then f = V − (V − f) and both V, V − f are increasing; see Proposition 4.3.29. So, BVloc (T) (resp. BV(T)) is the smallest vector space containing the monotone (resp. bounded monotone) functions. Corollary 4.3.32. f ∈ BV([a, b]) if and only if f is the difference of two increasing functions. Corollary 4.3.33. If T ⊆ ℝ is an interval and f ∈ BVloc (T), then f has countably many discontinuity points, f exists λ-a.e. and b

∫ |f |dx ≤ var f [a,b]

a

for all [a, b] ⊆ T .

Moreover, if f ∈ BV(T), then f ∈ L1 (T) and for x0 ∈ T, one has ∫ |f |dx ≤ ∫ |V x 0 |dx ≤ sup V x0 − inf V x0 − var f . T

T

T

T

T

Proposition 4.3.34. If f ∈ BV([a, b]) ∩ C([a, b]), then V a (x) = var[a,x] f is continuous on [a, b]. Proof. Let x0 ∈ [a, b). We show that V a is right continuous at x0 . Given ε > 0, let {x k }nk=1 ⊆ [x0 , b] be a partition such that n−1

var f − ε ≤ ∑ |f(x k+1 ) − f(x k )| = s n .

[x0 ,b]

(4.3.20)

k=0

Clearly the sum s n only increases if we add new points to the partition. So we may assume that |f(x1 ) − f(x0 )| ≤ ε. Then from (4.3.20), we infer that n−1

n−1

var f ≤ ε + ∑ |f(x k+1 ) − f(x k )| ≤ 2ε + ∑ |f(x k+1 ) − f(x k )| ≤ 2ε + var f .

[x0 ,b]

k=0

k=1

Hence, var[x0 ,x1 ] f ≤ 2ε and so V a (x1 ) − V a (x0 ) ≤ 2ε.

[x,b]

4.3 Functions of Bounded Variations | 337

Since ε > 0 is arbitrary we conclude that limx→x+0 V a (x) = V a (x0 ) and this proves the right continuity of V a at x0 ∈ [a, b). Similarly we show the left continuity of V a at x̂ ∈ (a, b]. Therefore V a is continuous. Corollary 4.3.35. If f ∈ BV([a, b]) ∩ C([a, b]), then f can be written as the difference of two continuous increasing functions. Proof. Note that f = V a − (V a − f), and use Propositions 4.3.34 and 4.3.29. We have already seen that f → varT f is absolutely homogeneous, that is, varT (ϑf) = |ϑ| varT f for all ϑ ∈ ℝ and subadditive, that is, varT (f + h) ≤ varT f + varT h. So, it is almost a norm. What we miss is that varT f = 0 does not imply that f ≡ 0. Instead we have that f is constant. This can be remedied if we add the term |f(u)| for some u ∈ T. This leads to the next result. Proposition 4.3.36. If T ⊆ ℝ is an interval and u ∈ T, then f → |f(u)| + varT f = ‖f‖BV is a norm on BV(T). So, BV(T) is a normed space. It is natural to ask whether it is complete. This will be proved using the so-called “Helly’s Selection Theorem.” To prove this result we will need some auxiliary results that are actually of independent interest. Proposition 4.3.37. If T ⊆ ℝ is an interval and H = {f} is an infinite family of functions f : T → ℝ such that |f(x)| ≤ M for all x ∈ T, for all f ∈ H and for some M > 0, then for every countable set D ⊆ T there exists a sequence {f n }n≥1 ⊆ H such that limn→∞ f n (x) exists in ℝ for all x ∈ D. Proof. Let D = {x k }k≥1 . Then {f(x1 ) : f ∈ H} ⊆ ℝ is bounded. So, there exists a sequence {f n1 }n≥1 ⊆ H such that η1 = limn→∞ f n1 (x1 ) exists. For this sequence of functions we consider the real sequence {f n1 (x2 )}n≥1 ⊆ ℝ. Then we find a subsequence {f n2 }n≥1 of {f n1 }n≥1 such that η2 = limn→∞ f n2 (x2 ) exists. Inductively, for every k ∈ ℕ with k ≥ 2, there exists a subsequence {f nk }n≥1 of {f nk−1 }n≥1 for which η k = limn→∞ f nk (x k ) exists. We form the sequence {f nn (x k )} of diagonal elements based on the Cantor diagonalization process. Then, for every x k ∈ D, we obtain that {f nn (x k )}n≥k is a subsequence of {f nk (x k )}n≥1 and so it converges to η k . Proposition 4.3.38. If T ⊆ ℝ is an interval and H = {f} is an infinite family of increasing functions f : T → ℝ such that |f(x)| ≤ M for all x ∈ T, for all f ∈ H and for some M > 0, then there exists a sequence {f n }n≥1 ⊆ H and an increasing function f∗ : T → ℝ such that f n (x) → f∗ (x) for all x ∈ T as n → ∞. Proof. Let D be the rational points of T union with the endpoints of T which belong to T. We apply Proposition 4.3.37 to find a sequence {f n }n≥1 ⊆ H such that f(x) = limn→∞ f n (x) for all x ∈ D. If x, y ∈ D with x ≤ y, then f n (x) ≤ f n (y) for all n ∈ ℕ. Hence, f(x) ≤ f(y). We extend f on T by setting f(x) = sup[f(y) : y ∈ D, y < x] .

338 | 4 Banach Spaces of Functions and Measures Then f is clearly increasing and so the discontinuity points of f are at most countable. We will show that f n (x) → f(x) at every continuity point x of f . So, let ε > 0 and x i , x k ∈ D be given such that x i < x < x k and f(x k ) − f(x i ) ≤ ε/2. There exists n0 ∈ ℕ such that |f n (x k ) − f(x k )| ≤ ε/2 and |f n (x i ) − f(x i )| ≤ ε/2 for all n ≥ n0 . Recall that f is increasing. So, we obtain f(x i ) ≤ f(x) ≤ f(x k ) ≤ f(x i ) +

ε , 2

f(x i ) − f n (x i ) ≤ |f(x i ) − f n (x i )| ≤

ε . 2

Hence, f(x) ≤ f n (x i ) + ε

for all n ≥ n0 .

(4.3.21)

f n (x k ) ≤ f(x) + ε

for all n ≥ n0 .

(4.3.22)

Similarly we show that

From (4.3.21) and (4.3.22) it follows that f(x) − ε ≤ f n (x i ) ≤ f n (x) ≤ f n (x k ) ≤ f(x) + ε

for all n ≥ n0 .

Thus, f n (x) → f(x) as n → ∞ for every continuity point x of f . Let E ⊆ [a, b] be the countable set where f is not continuous. Then Proposition 4.3.37 says that there exists a subsequence of {f n }n≥1 , still denoted by the same index, such that f n (y) → f ̂(y) for all y ∈ E as n → ∞. Finally let {f(x) if x ∈ T \ E , f∗ (x) = { f ̂(x) if x ∈ E . { Now we are ready to state and prove “Helly’s Selection Theorem.” Theorem 4.3.39 (Helly’s Selection Theorem). If T ⊆ ℝ is an interval and H ⊆ BV(T) is an infinite subset such that ‖f‖BV = |f(u)| + varT f ≤ M for all f ∈ H, for some u ∈ T and for some M > 0, then there exists a sequence {f n }n≥1 ⊆ H and a function f ∈ BV(T) such that f n (x) → f(x) for all x ∈ T. Proof. Corollary 4.3.32 says that f = V f − (V f − f) for all f ∈ H with V f as in Proposition 4.3.29. It holds that V f and V f − f are increasing and we obtain |V f (x)| ≤ M and |V f (x) − f(x)| ≤ |V f (x)| + |f(x) − f(v)| + |f(v)| ≤ 3M , for all x ∈ T. Using Proposition 4.3.38 there is a sequence {f n }n≥1 ⊆ H and an increasing function h1 : T → ℝ such that V f n (x) → h1 (x) for all x ∈ T. A new application of Proposition 4.3.38 on {V f n − f n } implies that there exists a subsequence {V f nk − f n k }k≥1 of {V f n − f n }n≥1 and an increasing function h2 : T → ℝ such that (V f nk − f n k )(x) → h2 (x) for all x ∈ T. Theorem 4.3.31 implies that f = h1 − h2 ∈ BV(T) and f n k (x) → f(x) for all x ∈ T.

4.3 Functions of Bounded Variations | 339

Using this theorem we can show that (BV(T), ‖ ⋅ ‖BV ) (see Proposition 4.3.36) is in fact a Banach space. Theorem 4.3.40. If T ⊆ ℝ is an interval, then (BV(T), ‖ ⋅ ‖BV ) is a Banach space. Proof. Let {f n }n≥1 ⊆ BV(T) be a Cauchy sequence. Hence it is ‖ ⋅ ‖BV -bounded. So, by Theorem 4.3.39 there exists a subsequence {f n k }k≥1 of {f n }n≥1 and a function f ∈ BV(T) such that f n k (x) → f(x) for all x ∈ T. Given ε > 0 we find a number n0 = n0 (ε) ∈ ℕ such that ‖f n − f m ‖BV = |f n (u) − f m (u)| + var(f n − f m ) ≤ ε T

for all n, m ≥ n0 .

This implies that ‖f n − f n k ‖BV = |f n (u) − f n k (u)| + var(f n − f n k ) ≤ ε T

for all n, n k ≥ n0 .

Letting k → +∞ yields |f n (u) − f(u)| + lim sup var(f n − f n k ) ≤ ε T

k→∞

for all n ≥ n0 .

(4.3.23)

Claim: The map f → varT f is lower semicontinuous for the pointwise convergence. Let f n → f pointwise and assume that varT f > 0, otherwise there is nothing to prove. Let η ∈ (0, varT f) and let {x k }m k=0 be a partition of T such that m−1

η < ∑ |f(x k+1 ) − f(x k )| . k=0

Exploiting the pointwise convergence, there exists a number n0 = n0 (ε) ≥ 1 such that |f n (x k ) − f(x k )| ≤

ε 2m

for all n ≥ n0 and for all k = 1, . . . , m .

Then it follows that m−1

m−1

η < ∑ |f(x k+1 ) − f(x k )| ≤ ∑ (|f(x k+1 ) − f(x k )| + k=0

k=0

ε ) ≤ var f n + ε m T

for all n ≥ n0 . Since ε > 0 is arbitrary, we conclude that η ≤ lim inf n→∞ varT f n and so varT f ≤ lim inf n→∞ varT f n . This proves the claim. Using the claim in (4.3.23) we obtain |f n (u) − f(u)| + var(f n − f) ≤ ε T

for all n ≥ n0 .

Hence, f n → f in BV(T), and so the latter is a Banach space. Remark 4.3.41. The space BV(T) is not separable. Indeed, let f t = χ{t} with t ∈ T and consider the open balls B t = {h ∈ BV(T) : ‖h − f t ‖BV < 1} with t ∈ T. Evidently if t ≠ s, then B t ∩ B s = 0 and {B t }t∈T is uncountable. So, BV(T) must be nonseparable; see Problem 3.61.

340 | 4 Banach Spaces of Functions and Measures We conclude with a result characterizing continuous functions of bounded variation. Definition 4.3.42. Let A, B be nonempty sets, D ⊆ A, and f : A → B. For every b ∈ B we define {card{a ∈ D : f(a) = b} if the set is finite , N f (b, D) = { +∞ otherwise . { Then N f (⋅, D) is called the Banach indicatrix of f on D. The proof of the next theorem can be found in Leoni [195, p. 68] or Natanson [229, p.225]. Theorem 4.3.43. If T ⊆ ℝ is an interval and f ∈ C(T), then N f (⋅, T) is Borel and ∫ℝ N f (u, T)du = varT f . Moreover, f ∈ BV(T) if and only if N f (⋅, T) ∈ L1 (ℝ).

4.4 Absolutely Continuous Functions Functions of bounded variation, although differentiable almost everywhere, fail to satisfy the fundamental theorem of calculus for the Lebesgue integral. The Cantor function h (see Remark 2.2.2) is continuous, increasing, and h (x) = 0 for almost all x ∈ [0, 1]. So, we have 1

0 = ∫ h (x)dx < h(1) − h(0) = 1 . 0

Hence, we need to go to a smaller space of functions. This smaller class of functions is given in the next definition. Definition 4.4.1. Let T ⊆ ℝ be an interval and f : T → ℝ. We say that f is absolutely continuous if for every ε > 0 there exists a δ > 0 such that n ∑ (f(b k ) − f(a k )) ≤ ε k=1

(4.4.1)

for any family of nonoverlapping open intervals (a k , b k ), k = 1, . . . , n with [a k , b k ] ⊆ T and ∑nk=1 (b k −a k ) ≤ δ. We denote the space of absolutely continuous functions by AC(T). We say that f : T → ℝ is locally absolutely continuous if it is absolutely continuous in [a, b] for every [a, b] ⊆ T. We denote the space of locally absolutely continuous functions by ACloc (T). Of course it holds that AC([a, b]) = ACloc ([a, b]). Remark 4.4.2. If U ⊆ ℝ is open, then the definition above is still valid if instead of T we use U. Now we require that [a k , b k ] ⊆ U for all k = 1, . . . , n. In the definition above, n ∈ ℕ is arbitrary and in fact we can allow it to be +∞ by replacing the finite series by infinite ones. In the definition of absolute continuity of a function, without altering the

4.4 Absolutely Continuous Functions |

341

definition, we can replace (4.4.1) by the following stronger requirement n

∑ |f(b k ) − f(a k )| ≤ ε .

(4.4.2)

k=1

Indeed, let {(a k , b k )}nk=1 be a family of pairwise disjoint open intervals such that [a k , b k ] ⊆ T and ∑nk=1 (b k − a k ) ≤ δ. Let L1 = the subintervals [a k , b k ] for which f(b k ) − f(a k ) ≥ 0 and L2 = the subintervals for which f(b k ) < f(a k ). Moreover, let δ > 0 be corresponding to ε/2 in (4.4.1). Then ε ∑ |f(b k ) − f(a k )| = ∑(f(b k ) − f(a k )) ≤ L1 2 L1 and

ε ∑ |f(b k ) − f(a k )| = ∑(f(b k ) − f(a k )) ≤ . L2 2 L2

Therefore (4.4.2) holds. If we take n = 1 in Definition 4.4.1, this leads to the following result. Proposition 4.4.3. If f ∈ AC(T), then f is uniformly continuous. Remark 4.4.4. The converse is not true. The function {x sin ( 1x ) f(x) = { 0 {

if x ∈ (0, 1] , if x = 0 ,

is uniformly continuous on [0, 1] but not absolutely continuous. On the other hand the function {x1+ε sin ( 1x ) if x ∈ (0, 1] , f(x) = { 0 if x = 0 , { is absolutely continuous on [0, 1] for ε > 0. Now if f ∈ AC(T), then, on account of the previous proposition, f is uniformly continuous and so it can be extended uniquely to T to a uniformly continuous function f ̂; see Theorem 1.5.27. Then f ̂ ∈ AC(T). Another straightforward consequence of Definition 4.4.1 is the following proposition. Proposition 4.4.5. If f, h ∈ AC([a, b]) and c ∈ ℝ, then the following hold: (a) f ± ch ∈ AC([a, b]); (b) fh ∈ AC([a, b]); (c) if h > 0, then f/h ∈ AC([a, b]). Proposition 4.4.6. If f ∈ AC([a, b]), f([a, b]) ⊆ [c, d] and ξ : [c, d] → ℝ is Lipschitz continuous, then h = ξ ∘ f ∈ AC([a, b]).

342 | 4 Banach Spaces of Functions and Measures Proof. Suppose that |ξ(x) − ξ(u)| ≤ η|x − u| for all x, u ∈ [a, b] and with η > 0. Then for any family {(a k , b k )}nk=1 of pairwise disjoint open intervals, we obtain n

n

k=1

k=1

∑ ξ(f(b k )) − ξ(f(a k )) ≤ η ∑ |f(b k ) − f(a k )| .

Hence, ξ ∘ f ∈ AC([a, b]). Proposition 4.4.7. If T ⊆ ℝ is an interval and f ∈ ACloc (T) (resp. f ∈ AC(T)), then f ∈ BVloc (T) (resp. f ∈ BV(I)) for every bounded subinterval I ⊆ T). Proof. First assume that f ∈ ACloc (T). Let [a, b] ⊆ T. Choose ε = 1 and let δ > 0 be as in Definition 4.4.1. Moreover, let n = [(2(b − a))/δ] be the integer part of (2(b − a))/δ and partition [a, b] to n intervals [x k , x k+1 ] of length (b − a)/n with k = 0, . . . , n − 1, that is, a = x0 < x1 < . . . < x n = b. Recall ε = 1. It follows that var f ≤ 1

[x k ,x k+1 ]

for all k = 0, . . . , n − 1 ,

which implies n−1

var f = ∑

[a,b]

k=0

var f ≤ n ≤

[x k ,x k+1 ]

2(b − a) < +∞ . δ

Hence f ∈ BVloc (T). Next assume that f ∈ BV(T) and let I ⊆ T be a bounded subinterval. We know that f can extended to I and for the extension f ̂ we have f ̂ ∈ AC(I); see Remark 4.4.4. Then from the first part of the proof it follows that varI f < +∞ and so varI f < +∞. Corollary 4.4.8. (a) If T ⊆ ℝ is an interval and f ∈ ACloc (T), then f is differentiable a.e. on T and f ∈ L1loc (T). (b) If T ⊆ ℝ is a bounded interval, then AC(T) ⊆ BV(T) and for f ∈ AC(T) it holds that f ∈ L1 (T). Remark 4.4.9. The inclusion above is clearly strict. Consider a monotone discontinuous function. Moreover, there exist continuous monotone functions that are not absolutely continuous. Think of the Cantor function; see Remark 2.2.2. So, it is natural to ask what is missing from a continuous function f ∈ BVloc (T) in order to be absolutely continuous. The property that we seek is given in the next definition. Definition 4.4.10. Let T ⊆ ℝ be an interval and f : T → ℝ a function. We say that f satisfies Lusin’s Condition (N) if f maps sets of Lebesgue measure zero to sets of Lebesgue measure zero. Proposition 4.4.11. If T ⊆ ℝ is an interval and f : T → ℝ is a continuous function, then f maps Lebesgue measurable sets to Lebesgue measurable sets if and only if f satisfies Lusin’s Condition (N). Proof. ⇒: Arguing by contradiction, suppose that f does not satisfy Lusin’s condition (N). Then we can find a Lebesgue-null set D0 ⊆ T such that 0 < λ∗ (f(D0 )) with λ∗

4.4 Absolutely Continuous Functions |

343

being the Lebesgue outer measure on ℝ. Then f(D0 ) contains a nonmeasurable set E. Let C ⊆ D0 such that f(C) = E. Since C is a subset of a Lebesgue-null set, it is Lebesgue measurable. But E is not, which is a contradiction to the hypothesis. ⇐: Let A ⊆ T be a Lebesgue measurable set. Then A = C ∪ D with C being σ-compact and D being Lebesgue-null. Here we use the fact that the Lebesgue measure is Radon; see Theorem 2.5.14. Then it follows that f(A) = f(C ∪ D) = f(C) ∪ f(D) .

(4.4.3)

Since C = ⋃n≥1 K n with compact K n , then f(C) = f(⋃n≥1 K n ) = ⋃n≥1 f(K n ) and for each n ∈ ℕ, f(K n ) ⊆ ℝ is compact. In addition, by hypothesis, f(D) is Lebesgue-null. So, from (4.4.3) it follows that f(A) is measurable. Theorem 4.4.12. If T ⊆ ℝ is an interval and f : T → ℝ, then f ∈ ACloc (T) if and only if (a) f is continuous on T; (b) f ∈ BVloc (T); (c) f satisfies Lusin’s Condition (N). Proof. Since the result is local we may assume that T = [a, b]. First we suppose that f is absolutely continuous on [a, b]. Evidently f is continuous and by Proposition 4.4.7, f ∈ BVloc (T). So it remains to prove statement (c). Let ε > 0. There exists δ > 0 such that for any finite (or countable) collection of mutually disjoint intervals {(a k , b k )}k≥1 with [a k , b k ] ⊆ [a, b] such that ∑ (b k − a k ) ≤ δ implies ∑ |f(b k ) − f(a k )| ≤ ε . k≥1

(4.4.4)

k≥1

Let D ⊆ T be Lebesgue-null and let U = ⋃k≥1 (c k , d k ) be an open set such that D ⊆ U and λ(U) = ∑k≥1 (d k − c k ) ≤ δ. It holds that f(D) ⊆ f(U) ⊆ f (⋃ [c k , d k ]) ⊆ ⋃ [f(m k ), f(M k )] k≥1

k≥1

with m k , M k ∈ [c k , d k ] such that f(m k ) = min[f(u) : u ∈ [c k , d k ]] , f(M k ) = max[f(u) : u ∈ [c k , d k ] . Taking into account (4.4.4) and recalling that ∑k≥1 (M k − m k ) ≤ δ, we derive λ∗ (f(D)) ≤ ∑ [f(M k ) − f(m k )] ≤ ε . k≥1

Because ε > 0 is arbitrary, we let ε → 0+ to conclude that λ(f(D)) = 0, that is, f(D) is Lebesgue-null. Therefore f satisfies statement (c). Now suppose that properties (a), (b), and (c) hold. Arguing by contradiction, suppose that f is not absolutely continuous. So, there exists ε0 > 0 such that for every

344 | 4 Banach Spaces of Functions and Measures δ > 0 there is a pairwise disjoint family of open intervals {(a k , b k )}nk=1 in [a, b] such that n

n

∑ (b k − a k ) ≤ δ

and

k=1

∑ (M k − m k ) ≥ ε0 k=1

with m k = min[f(u) : u ∈ [a k , b k ]} and M k = max[f(u) : u ∈ [a k , b k ]}. Let ∑m≥1 δ m be m a convergent series of positive terms. For each δ m , let (a m k , b k ) with k = 1, . . . , n m be pairwise disjoint open intervals for which nm

m ∑ (b m k − ak ) ≤ δm

and

k=1

Let

nm

∑ (M km − m m k ) ≥ ε0 .

(4.4.5)

k=1

nm

m C m = ⋃ (a m k , bk )

and

B = ⋂ ⋃ C m = lim sup C m . n≥1 m≥n

k=1

m→∞

Then λ(B) = 0, and so by (c) it follows that λ(f(B)) = 0. For k = 1, . . . , n m with m ∈ ℕ, we define the following functions: m {1 if f(x) = u for some x ∈ (a m k , bk ) , ξ km (u) = { 0 otherwise . { m m m m Then ξ km (u) = 1 for all u ∈ (m m k , M k ) and ξ k (u) = 0 for all u ∈ ̸ [m k , M k ]. Therefore ̂ = f([a, b]), we infer that ̂ M] with m̂ = minT f, M̂ = maxT f , that is [m, M̂

∫ ξ km (u)du = M km − m m k .

(4.4.6)

m̂ n

m m We set N m (u) = ∑k=1 ξ km (u). Then N m (u) is the number of intervals (a m k , b k ) that contain at least one x with f(x) = u. So, N m (u) ≤ N f (y, T) the latter being indicatrix function of f on T = [a, b]; see Definition 4.3.42. From (4.4.5) and (4.4.6) we obtain M̂

∫ N m (u)du ≥ ε0 .

(4.4.7)

m̂

Let E = {u ∈ T = [a, b] : limm→∞ N m (u) ≠ 0} and F = {u ∈ T = [a, b] : N f (u, T) = +∞}. From Theorem 4.3.43 one has that N f (⋅, T) ∈ L1 (T). Therefore λ(F) = 0. Let u0 ∈ E \ F = E ∩ F c . There exists a sequence {m i }i≥1 ⊆ ℕ such that N m i (u0 ) ≥ 1. For every i ∈ ℕ there exists x m i such that f(x m i ) = u0 ,

x mi ∈ C mi .

(4.4.8)

Since N f (u0 , T) < ∞, there are only finitely many distinct {x m i }i∈ℕ such that (4.4.8) holds. Therefore one of them, call it x0 , occurs an infinite number of times in {x m i }i∈ℕ

4.4 Absolutely Continuous Functions | 345

and f(x0 ) = u0 . Then x0 ∈ B and f(x0 ) = u0 ∈ f(B). Hence, we get E \ F = E ∩ F c ⊆ f(B), which shows that λ(E) = 0 and so limm→∞ N m (u) = 0 for almost all u ∈ [m, M]. By the M̂

Lebesgue’s Dominated Convergence Theorem, we infer that limm→∞ ∫m̂ N m (u)du = 0. This contradicts (4.4.7). Thus, f is absolutely continuous. As a consequence of Proposition 4.4.11 and Theorem 4.4.12, we obtain the following corollary. Corollary 4.4.13. If T ⊆ ℝ is an interval and f ∈ ACloc (T), then f maps Lebesgue measurable sets to Lebesgue measurable sets. Next we present some results about differentiable functions, which will help us better understand absolutely continuous functions. Proposition 4.4.14. If T ⊆ ℝ is an interval, f : T → ℝ is differentiable at every x ∈ A with A ⊆ T not necessarily measurable and |f (x)| ≤ M for all x ∈ A and for some M > 0, then λ∗ (f(A)) ≤ Mλ∗ (A) with λ∗ being the Lebesgue outer measure. Proof. Without any loss of generality, we assume that A ⊆ int T. For every n ∈ ℕ, let A n be the set of points x ∈ A such that λ∗ (f(I)) ≤ (M + ε)λ(I) for all intervals I ⊆ T with x ∈ I and λ(I) ∈ (0, 1/n). We see that {A n }n≥1 is increasing. In addition, if x ∈ A, then |f (x)| ≤ M

and

f (x) = lim

u→x

f(u) − f(x) . u−x

So, there exists δ > 0 such that |f(u) − f(x)| ≤ (M + ε)(u − x) for all u ∈ I with |u − x| ≤ δ .

(4.4.9)

Let u, v ∈ I with |u − v| ≤ δ and u < x < u . Thanks to (4.4.9), we obtain |f(u) − f(v)| ≤ |f(u) − f(x)| + |f(x) − f(v)| ≤ (M + ε)(v − u) . This gives x ∈ A n for every n ∈ ℕ with n > 1/δ. It follows that A = ⋃ An .

(4.4.10)

n≥1

We fix n ∈ ℕ and let U n be open such that λ(U n ) ≤ λ∗ (A n ) + ε. Replacing U n with U n ∩ int T if necessary, we may assume that U n ⊆ int T. We can write U n = ⋃k≥1 I kn with {I kn }k≥1 being pairwise disjoint intervals with 0 < λ(I kn ) < 1/n for all k ∈ ℕ. Let L = {k ∈ ℕ : I kn ∩ A n ≠ 0}. If k ∈ L, then λ∗ (f(I kn )) ≤ (M + ε)λ∗ (I kn ) and so λ∗ (f(A n )) ≤ λ∗ ( ⋃ f(I kn )) ≤ ∑ λ∗ (f(I kn )) ≤ (M + ε)λ (⋃ I kn ) k∈L

k∈L

k≥1

≤ (M + ε)λ(U n ) ≤ (M + ε)(λ (A n ) + ε) . ∗

Letting n → ∞ gives λ∗ (f(A)) ≤ (M + ε)(λ∗ (A) + ε) ;

346 | 4 Banach Spaces of Functions and Measures see (4.4.10). Finally we let ε ↘ 0 to conclude that λ∗ (f(A)) ≤ Mλ∗ (A) . Corollary 4.4.15. If T ⊆ ℝ is an interval, f : T → ℝ is differentiable on A ⊆ T and either A is Lebesgue-null or f A = 0, then λ(f(A)) = 0. Proof. First suppose that A is Lebesgue-null. For every n ∈ ℕ, let A n = {x ∈ A : |f (x)| ≤ n}. Then A = ⋃n≥1 A n and by Proposition 4.4.14, we have that λ∗ (f(A n )) ≤ nλ∗ (A n ) = 0. Since f(A) = ⋃n≥1 f(A n ), the countable subadditivity of λ∗ gives λ(f(A)) = 0. The case of f A = 0 follows directly from Proposition 4.4.14 with M = 0. Remark 4.4.16. If we assume that f is almost everywhere differentiable on A in the corollary above, then the result fails. Think of the Cantor function; see Remark 2.2.2. Proposition 4.4.17. If T ⊆ ℝ is an interval, f : T → ℝ is Lebesgue measurable, A ⊆ T is Lebesgue measurable, and f is differentiable on A, then f(A) ⊆ ℝ is Lebesgue measurable and λ(f(A)) ≤ ∫A |f (x)|dx. Proof. From Corollary 4.4.15 we see that f A satisfies Lusin’s Condition (N) and so Proposition 4.4.11 implies that f(A) ⊆ ℝ is Lebesgue measurable. First suppose that λ(A) < +∞. We fix n ∈ ℕ and for every k ∈ ℕ, we define A kn = {x ∈ A :

k−1 k ≤ |f (x)| < n } . n 2 2

We see that {A kn }k≥1 are pairwise disjoint and A = ⋃k≥1 A kn . Due to Proposition 4.4.14 and the σ-additivity of λ, we get k λ (A kn ) n 2 k≥1

λ(f(A)) = λ (f (⋃ A kn )) ≤ ∑ λ (f (A kn )) ≤ ∑ k≥1

k≥1

1 1 k−1 λ (A kn ) + n ∑ λ (A kn ) ≤ ∑ ∫ |f (x)|dx + n λ(A) = ∑ n 2 2 k≥1 2 k≥1 k≥1 A kn

= ∫ |f (x)|dx + A

1 λ(A) 2n

for all n ∈ ℕ .

Letting n → ∞ we conclude that λ(f(A)) ≤ ∫A |f (x)|dx. Next suppose that λ(A) = +∞. For every k ∈ ℤ we set A k = A ∩ [k, k + 1). Then we obtain from the previous part that λ(f(A)) = λ (f ( ⋃ A k )) = λ ( ⋃ f(A k )) = ∑ λ(f(A k )) k∈ℤ

k∈ℤ

≤ ∑ ∫ |f (x)|dx = ∫ |f (x)|dx .

k∈ℤ A

k

A

k∈ℤ

4.4 Absolutely Continuous Functions |

347

Corollary 4.4.18. If T ⊆ ℝ is an interval and f : T → ℝ is differentiable on [a, b] ⊆ T, b then |f(b) − f(a)| ≤ ∫a |f (x)|dx. From this corollary we infer the following result. Theorem 4.4.19. If T ⊆ ℝ is an interval, f : T → ℝ is differentiable everywhere on T and f ∈ L1loc (T), then f ∈ ACloc (T). Next we will show that absolutely continuous functions are exactly the class of functions for which the fundamental theorem of calculus for the Lebesgue integral holds. x

Lemma 4.4.20. If h ∈ L1 [a, b] and f(x) = ∫a h(t)dt for all x ∈ [a, b], which is the indefinite Lebesgue integral of h, then f ∈ AC([a, b]) and f = h. Proof. From Proposition 2.3.42, we know that for a given ε > 0 there exists a δ > 0 such that if {(a k , b k )}nk=1 are pairwise disjoint intervals such that [a k , b k ] ⊆ [a, b] for all k = b 1, . . . , n and ∑nk=1 (b k − a k ) ≤ δ, then ∑nk=1 ∫a k h(t)dt ≤ ε. Then ∑nk=1 (f(b k ) − f(a k )) ≤ k ε and so f ∈ AC([a, b]); see Definition 4.4.1. Clearly if h is simple, then f = h a.e. For the general case, use Corollary 2.2.19. Theorem 4.4.21. If T ⊆ ℝ is an interval and f : T → ℝ, then f ∈ ACloc (T) if and only if (a) f is continuous on T; (b) f is differentiable a.e. on T and f ∈ L1loc (T); x (c) f(x) = f(c) + ∫c f (t)dt for all c, x ∈ T. Proof. First assume that f ∈ ACloc (T). On account of Corollary 4.4.8 we need to prove (c). Let [a, b] ⊆ T such that c ∈ [a, b]. From Lemma 4.4.20 we know that f ∈ AC([a, b]) and there exists a Lebesgue-null set D ⊆ [a, b] such that f is differentiable on [a, b] \ D. x If we set g(x) = f(x) − [f(c) + ∫c f (t)dt], then g (x) = 0 for all x ∈ [a, b] \ D. Then Corollary 4.4.15 implies that λ(g([a, b] \ D)) = 0. Note that g ∈ AC([a, b]). So, from Theorem 4.4.12, it maps Lebesgue-null sets to Lebesgue-null sets. Hence, λ(g(D)) = 0. Therefore it follows that λ(g([a, b])) = 0. The continuity of g implies that g([a, b]) is either a singleton or a nondegenerate interval. Then the second possibility is excluded and so we conclude that g is constant. We have g(c) = 0, hence g [a,b] = 0 and so x

f(x) = f(c) + ∫c f (t)dt for all x ∈ [a, b]. Conversely if (a), (b), and (c) hold, then from Lemma 4.4.20 we get f ∈ ACloc (T).

Corollary 4.4.22. If T ⊆ ℝ is an interval, f : T → ℝ is everywhere differentiable and x f ∈ L1loc (T), then f(x) = f(c) + ∫c f (t)dt for all c, x ∈ T. An important consequence of Theorem 4.4.21 is the following formula known as “Integration by Parts.”

348 | 4 Banach Spaces of Functions and Measures Proposition 4.4.23 (Integration by Parts). If T ⊆ ℝ is an interval and f, h ∈ ACloc (T), then x

x

∫ fh dt + ∫ f hdt = (fh)(x) − (fh)(c) for all c, x ∈ T .

c

c

Proof. Recall that fh ∈ ACloc (T); see Proposition 4.4.5. So, from Theorem 4.4.21, we obtain x

(fh)(x) − (fh)(c) = ∫(fh) (x)dx . c

Since f, h, fh are differentiable almost everywhere, one has (fh) (x) = f (x)h(x) + f(x)h (x) for a.a. x ∈ T and so we obtain the desired formula. Now we will use Theorem 4.4.21 to make a connection between the notions of absolute continuity for functions and for Lebesgue–Stieltjes measures; see Example 2.1.35. Proposition 4.4.24. A continuous increasing function f : [a, b] → ℝ is absolutely continuous if and only if the corresponding Lebesgue–Stieltjes measure ϑ f ([a, b]) = f(b) − f(a) is absolutely continuous with respect to the Lebesgue measure λ, that is, ϑ f ≪ λ. Proof. ⇒: From Theorem 4.4.21 we get ϑ f (A) = λ(f(A)) for all Lebesgue measurable A ⊆ [a, b]. Then invoking Theorem 4.4.12, we see that ϑ f ≪ λ. ⇐: Since ϑ f is finite on [a, b] and ϑ f ≪ λ, for a given ε > 0 there exists δ > 0 such that λ(A) ≤ δ

implies

ϑ f (A) ≤ ε ;

(4.4.11)

see Proposition 2.4.11. Suppose that {(a k , b k )}nk=1 are pairwise disjoint open intervals with [a k , b k ] ⊆ [a, b] for all k = 1, . . . , n. Then for A = ⋃nk=1 (a k , b k ), from (4.4.11), one obtains that n

λ(A) = ∑ (b k − a k ) ≤ δ

n

implies

k=1

ϑ f (A) = ∑ [f(b k ) − f(a k )] ≤ ε . k=1

Hence, f ∈ AC([a, b]). Corollary 4.4.25. If T ⊆ ℝ is an interval, f : T → ℝ is a monotone function and [a, b] ⊆ b T, then f ∈ AC([a, b]) if and only if |f(b) − f(a)| = ∫a |f (x)|dx. Moreover, if f is bounded, then f ∈ AC(T) if and only if ∫T |f (x)|dx = supT f − inf T f . This corollary leads to the following result. Theorem 4.4.26. If T ⊆ ℝ is an interval, f ∈ BVloc (T), and [a, b] ⊆ T, then f ∈ AC([a, b]) b if and only if ∫a |f (x)|dx = var[a,b] f . Moreover, if f ∈ BV(T), then f ∈ AC(T) if and only ∫T |f (x)|dx = varT f .

4.4 Absolutely Continuous Functions |

349

Proof. ⇒: Let {x k }nk=0 be a partition of [a, b]. Then n−1 x k+1 x k+1 b ∑ |f(x k+1 ) − f(x k )| = ∑ ∫ f (x)dx ≤ ∑ ∫ |f (x)|dx = ∫ |f (x)|dx . k=0 k=0 x k k=0 x k−1 a n−1

n−1

Hence, b

var f ≤ ∫ |f (x)|dx .

[a,b]

(4.4.12)

a b

Combining (4.4.12) with Corollary 4.3.33 we conclude that var[a,b] f = ∫a |f (x)|dx. If f ∈ AC(T), then from the previous part, for every [a, b] ⊆ T, we obtain b

∫ |f (x)|dx = var f .

(4.4.13)

[a,b]

a

Let m = inf T as well as M = sup T and consider a n ↘ m and b n ↗ M. From (4.4.13) it follows that bn

∫ |f (x)|dx = var f

for all n ∈ ℕ .

[a n ,b n ]

an

Passing to the limit as n → ∞ and using Proposition 4.3.25 as well as the Monotone Convergence Theorem, we derive ∫ |f (x)|dx = var f . T

T b

b

⇐: From Corollary 4.3.33 it holds that ∫a |f (x)|dx = var[a,b] f = ∫a V a (x)dx = V a (b). Since V a is increasing (see Proposition 4.3.29), from Corollary 4.4.25 we get V a ∈ AC([a, b]). But from Proposition 4.3.29 it follows that |f(x) − f(y)| ≤ V a (x) − V a (y) for all a ≤ x ≤ y ≤ b. Hence, f ∈ AC([a, b]). If f ∈ BV(T), then Corollary 4.3.33 gives, for x0 ∈ T, that ∫ |f (x)|dx ≤ ∫ |V x 0 (x)|dx ≤ sup V x0 − inf V x0 = var f . T

T

T

T

T

Using the hypothesis we obtain ∫ |V x 0 (x)|dx = sup V x0 − inf V x0 . T

T

T

Hence, V x0 ∈ AC(T), see Corollary 4.4.25 and as before, we conclude that f ∈ AC(T). By replacing the absolute value by the norm, the notion of absolute continuity can be extended to vector-valued functions.

350 | 4 Banach Spaces of Functions and Measures Definition 4.4.27. Let T = [a, b] and let X be a Banach space. A function f : T → X is said to be absolutely continuous if for every ε > 0 there exists δ = δ(ε) > 0 such that for any family {(a k , b k )}nk=1 of pairwise disjoint open subintervals of T such that ∑nk=1 (b k − a k ) ≤ δ, we have ∑nk=1 ‖f(b k ) − f(a k )‖ ≤ ε. In Theorem 4.4.21 we have seen that if X = ℝ or more generally ℝN , then the fundamental theorem of Lebesgue calculus characterizes absolutely continuous functions. This is no longer true for X-valued functions when X is an infinite dimensional Banach space. Example 4.4.28. Let X = L1 [0, 1] and let f : [0, 1] → X be defined by f(t) = χ[0,t] . Clearly f is absolutely continuous. However f is nowhere differentiable. Indeed, if f was differentiable at t = t0 , then for every h ∈ L∞ [0, 1] = L1 [0, 1]∗ , t → ξ(t) = ⟨h, f(t)⟩ is t differentiable at t = t0 . This means that t → ξ(t) = ∫0 h(s)ds is differentiable at t = t0 for every h ∈ L∞ [0, 1]. Choose {t {1 if s < t0 h(s) = { , then ξ(t) = { −1 if t0 < s 2t − t { { 0

if t < t0 if t0 ≤ t,

and this function is not differentiable at t = t0 . The problem with the example above is the fact that X = L1 [0, 1] does not have the RNP; see Definition 4.2.23(d). In fact we can state the following result. Theorem 4.4.29. If T = [a, b] and X is a Banach space, then the following statements are equivalent: (a) X has the RNP; (b) every absolutely continuous function f : T → X is almost everywhere differentiable t and f(t) = f(s) + ∫s f (τ)dτ for all 0 ≤ s ≤ t ≤ b. Proof. (a) ⇒ (b): Let f : T → X be an absolutely continuous function. The variation of f , n−1

V0 (t) = sup [ ∑ ‖f(t k+1 ) − f(t k )‖ : 0 = t0 < t1 < . . . < t n = b] , k=0

is an ℝ+ -valued absolutely continuous function. Therefore there exists a finite positive measure μ on T such that V0 (t) = μ([0, t]). Evidently μ ≪ λ. We introduce a vector measure m : B(X) → X, with B(X) being the Borel σ-field of T, defined as follows. If U ⊆ T is open, then U = ⋃k≥1 (a k , b k ) with a disjoint union and we set m(U) = ∑k≥1 [f(b k ) − f(a k )]. For a general A ∈ B(T), let {U n }n≥1 be a decreasing sequence of open sets, A ⊆ U n for all n ∈ ℕ such that μ(U n ) ↘ μ(A). Note that μ(U n \ U m ) → 0 for n < m and n → ∞. Therefore m(A) = limn→∞ m(U n ) exists. Evidently m is well-defined, σ-additive, thus a measure, and ‖m(A)‖ ≤ μ(A) for all A ∈ B(T). Therefore m ≪ μ, hence m ≪ λ. Since X has the RNP (see Definit tion 4.2.23(d)), there exists h ∈ L1 (T, X) such that m([0, t]) = ∫0 h(s)ds. Therefore t

f(t) = f(0) + m([0, t]) = f(0) + ∫0 h(s)ds. This finally gives f (t) = h(t) for a.a. t ∈ T.

4.5 Sobolev Spaces | 351

(b) ⇒ (a): Let m : B(T) → X be a vector measure such that m ≪ λ. Let f(t) = m([0, t]) for all t ∈ T. Evidently f : T → X is absolutely continuous. So, by hypothesis, t f (t) exists for almost all t ∈ T and f(t) = f(0) + ∫0 f (s)ds for all t ∈ T. Then, as in the first part, we see that m(A) = ∫A f (s)ds for all A ∈ B(T). Hence, X has the RNP.

4.5 Sobolev Spaces In this section we present an outline of the theory of Sobolev spaces. These spaces play a key role in the theory of partial differential equations. Definition 4.5.1. Let Ω ⊆ ℝN be an open set and let 1 ≤ p ≤ ∞. (a) Given u ∈ L p (Ω), the distributional derivative ∂u/(∂z k ) with k = 1, . . . , N is defined by ∂u ∂φ (φ) = − ∫ u dz for all φ ∈ C1c (Ω) . ∂z k ∂z k Ω

If ∂u/(∂z k ) ∈ L p (Ω), then ∫ Ω

∂u ∂φ φdz = − ∫ u dz ∂z k ∂z k

for all k = 1, . . . , N .

Ω

If u is differentiable in the classical sense, then it is also differentiable in the distributional sense and the two are equal. (b) The Sobolev space W 1,p (Ω) is defined by W 1,p (Ω) = {u ∈ L p (Ω) :

∂u ∈ L p (Ω) for all k = 1, . . . , N} . ∂z k

The space W 1,p (Ω) is equipped with the norm ‖u‖W 1,p (Ω) ‖u‖W 1,∞ (Ω)

1

∂u p p for 1 ≤ p < ∞ , = + ∑ ] ∂x k p k=1 ∂u , . . . , ∂u } for p = +∞ . = max {‖u‖∞ , ∂z ∂z1 ∞ N ∞ p [‖u‖p

N

1,p

We say that u ∈ Wloc (Ω) if u ∈ W 1,p (U) for every open set U such that U is compact and U ⊆ Ω, that is, U is compactly contained in Ω denoted by U ⊂⊂ Ω. Remark 4.5.2. If p = 2, then we usually write W 1,2 (Ω) = H 1 (Ω) to denote that the space is a Hilbert space with inner product given by N

(u, v)H 1 = ∫ uvdz + ∑ ∫ Ω

k=1 Ω

∂u ∂v dz . ∂z k ∂z k

352 | 4 Banach Spaces of Functions and Measures Another equivalent norm on W 1,p (Ω) is given by N ∂u |u|W 1,p (Ω) = ‖u‖p + ∑ ∂z k p k=1

for all u ∈ W 1,p (Ω) .

Note that C∞ c (Ω), the space of test functions, also denoted by D(Ω), is a subspace of W 1,p (Ω) for 1 ≤ p ≤ ∞. So, we may consider its closure in W 1,p (Ω). 1,p

‖⋅‖W 1,p (Ω)

Definition 4.5.3. W0 (Ω) = C∞ c (Ω)

.

Later when we introduce the notion of trace, we will characterize the elements of 1,p W0 (Ω) as those Sobolev functions that vanish on ∂Ω. Example 4.5.4. Let N = 1, Ω = (−1, 1), and u(z) = |z| for all z ∈ (−1, 1). The function is not differentiable in the classical sense at z = 0. Let us compute its distributional derivative. To this end, let φ ∈ C∞ c (−1, 1). Then we derive 1

1

0

1

0

1

∫ uφ dz = ∫ zφ dz + ∫ (−z)φ dz = − ∫ φdz + ∫ φdz = − ∫ sgn(z)φdz .

0

−1

−1

0

−1

−1

Recall that {−1 if − 1 ≤ z < 0 , sgn z = { 1 if 0 < z ≤ 1 . { Hence, du/(dz) = u = sgn and so u ∈ L∞ (−1, 1). Therefore u ∈ W 1,∞ (−1, 1). In particular then it holds u ∈ W 1,p (−1, 1) for all 1 ≤ p < ∞. Remark 4.5.5. A function with a jump at a point is not a Sobolev function. Consider the Heaviside function u : (−1, 1) → ℝ defined by u(z) = 0 if z < 0 and u(z) = 1 if z ≥ 0. Then u = δ0 where δ0 is the Dirac measure concentrated at z = 0. But the distribution δ0 does not correspond to a function f ∈ L1 (−1, 1). Continuing with functions of one variable, that is, N = 1, we will show that every such Sobolev function has an absolutely continuous representative. Lemma 4.5.6. If u ∈ L1loc (a, b) and u = 0 in the distributional sense, then u(z) = c ∈ ℝ for a.a. z ∈ (a, b). b

Proof. By hypothesis we have ∫a uφ dz = 0 for all φ ∈ C∞ c (a, b). Consider the space V = b

∞ ∞ {φ : φ ∈ C∞ c (a, b)}. Note that if φ ∈ C c (a, b), then φ ∈ C c (a, b) and ∫a φ (z)dz = b

φ(b) − φ(a) = 0. On the other hand, if ϑ ∈ C∞ c (a, b) and ∫a ϑ(z)dz = 0, then we z

set φ(z) = ∫0 ϑ(t)dt and obtain φ ∈ C∞ c (a, b) with φ = ϑ. Therefore V = {ϑ ∈ b

∞ C∞ c (a, b) : ∫a ϑ(z)dz = 0}. In order to produce a function ϑ ∈ V, let η ∈ C c (a, b) such

4.5 Sobolev Spaces | 353

b

that ∫a η(z)dz = 1. Then for any φ ∈ C∞ c (a, b), we set b

ϑ(z) = φ(z) − (∫ φ(t)dt) η(z) . a b

Evidently ϑ ∈ V and ∫a uϑdz = 0. We get b

b

b

b

b

∫ uφdz = (∫ φdz) ∫ uηdz = ∫ cφdz with c = ∫ uηdz . a

a

a

a

a

b

Hence, ∫a (u − c)φdz = 0 for all φ ∈ C∞ c (a, b) and so u(z) = c for a.a. z ∈ (a, b), (see Corollary 4.1.31) and recall that C∞ (a, b) is dense in Cc (a, b). c Theorem 4.5.7. If Ω = (a, b) and 1 ≤ p ≤ ∞, then the following hold: (a) for any u ∈ W 1,p (a, b) there exists a unique u ∈ AC([a, b]) such that u = u a.e. on Ω x and u(x) − u(v) = ∫v u (z)dz for all x, v ∈ (a, b); x

(b) if u ∈ L p (a, b) and u(x) − u(v) = ∫v h(z)dz for some h ∈ L p (a, b), then u ∈ W 1,p (a, b) and u = h in the distributional sense. x

Proof. (a) Let h(x) = ∫a u (z)dz. Then h ∈ AC([a, b]); see Theorem 4.4.21. Moreover, for every φ ∈ C∞ c (a, b), by using Fubini’s Theorem, we obtain b

b

z

b

b

h (φ) = − ∫ hφ dz = − ∫ (∫ u dt) φ dz = − ∫ (∫ χ[a,z] u dt) φ dz a

a

b

a

b

a b

a b

b

= − ∫ u (∫ χ[a,z] (t)φ dz) dt = − ∫ u (∫ φ dz) dt = ∫ u φdt .

a

a

a

t

a

Hence, h = u and so (h − u) = 0 in the distributional sense. Thus, Lemma 4.5.6 gives h = u + c with c ∈ ℝ. Then u = h ∈ AC([a, b]) is the desired absolutely continuous representative of u. (b) Evidently u ∈ AC([a, b]) and so it is differentiable almost everywhere in the classical sense and u = h ∈ L p (a, b). Hence u ∈ W 1,p (a, b) and u = b in the distributional sense. Remark 4.5.8. If N ≥ 2, then the result above fails. Proposition 4.5.9. If Ω ⊆ ℝN is an open set, then the following hold: (a) W 1,p (Ω) is a Banach space for all 1 ≤ p ≤ ∞; (b) W 1,p (Ω) is reflexive for 1 < p < ∞; (c) W 1,p (Ω) is separable for 1 ≤ p < ∞ and W 1,2 (Ω) = H 1 (Ω) is a separable Hilbert space.

354 | 4 Banach Spaces of Functions and Measures Proof. (a) Let {u n }n≥1 ⊆ W 1,p (Ω) be a Cauchy sequence. Let Du = (∂u/(∂z k ))Nk=1 be the gradient of u. We see that {u n }n≥1 ⊆ L p (Ω) and {Du n }n≥1 ⊆ L p (Ω, ℝN ) are Cauchy sequences. Therefore u n → u in L p (Ω) and Du n → g in L p (Ω, ℝN ). By definition it holds that ∫(Du n )φdz = (−1)N ∫ u n (Dφ)dz Ω

for all φ ∈ C∞ c (Ω) and for all n ∈ ℕ .

Ω

This implies ∫ gφdz = (−1)N ∫ u(Dφ)dz Ω

for all φ ∈ C∞ c (Ω) .

Ω

Hence, g = Du and so u ∈ W 1,p (Ω). Therefore u n → u in W 1,p (Ω) and we conclude that W 1,p (Ω) is a Banach space. (b) The space V = L p (Ω) × L p (Ω, ℝN ) is reflexive for 1 < p < ∞. Let k ∈ L(W 1,p (Ω), V) be defined by K(u) = (u, Du). Clearly this is an isometry into V and so K(W 1,p (Ω)) is a close subspace of V. Therefore K(W 1,p (Ω)) is reflexive and then so is W 1,p (Ω). (c) The space V = L p (Ω)×L p (Ω, ℝN ) is separable for 1 ≤ p < ∞. Then K(W 1,p (Ω)) ⊆ V is separable, hence so is W 1,p (Ω). Then W 1,2 (Ω) = H 1 (Ω) is a separable Hilbert space. Next we will see how we can approximate Sobolev functions with smooth functions. We will use approximation by mollification; see Definition 4.1.27(c) and (d). 1,p

Proposition 4.5.10. If Ω ⊆ ℝN is open and u ∈ Wloc (Ω) for some 1 ≤ p < ∞, then 1,p u ε → u in Wloc . Proof. Let {ϑ ε }ε>0 be the standard mollifier and u ε = ϑ ε ∗ u with ε > 0; see Definition 4.1.27(c) and (d). From the proof of Proposition 4.1.28 we know that ∂u ε ∂ϑ ε ∂ϑ ε =∫ (z − y)u(y)dy = − ∫ (z − y)u(y)dy ∂z k ∂z k ∂y k Ω

Ω

∂u ∂u = ∫ ϑ ε (z − y) (y)dy = (ϑ ε ∗ ) (z) for all z ∈ Ω . ∂z k ∂y k Ω

Then the result follows from Proposition 4.1.30. Theorem 4.5.11. If Ω ⊆ ℝN is open and u ∈ W 1,p (Ω) for some 1 ≤ p < ∞, then there exists a sequence {u n }n≥1 ⊆ W 1,p (Ω) ∩ C∞ (Ω) such that u n → u in W 1,p (Ω). Proof. Let ε > 0 be given and define Ω0 = 0 ,

Ω n = {z ∈ Ω : d (z, ∂Ω) >

1 } ∩ Br n

4.5 Sobolev Spaces | 355

for n ∈ ℕ with B r = {z ∈ ℝN : |z| < r}. We set U n = Ω n+1 \ Ω n for all n ∈ ℕ and consider a smooth partition {ψ n }n≥1 of unity subordinate to {U n }n≥1 . Then we have ψ n ∈ C∞ c (U n ) ,

0 ≤ ψ n ≤ 1 for all n ∈ ℕ and ∑ ψ n (z) = 1 for all z ∈ Ω . n≥1

Then uψ n ∈ W 1,p (Ω) and supp(uψ n ) ⊆ U n . Hence, there exists ε n > 0 such that supp (ϑ ε n ∗ (uψ n )) ⊂ U n , 1 p

ε p (∫ ϑ ε n ∗ (uψ n ) − uψ n dz) < n , 2

(4.5.1)

Ω

1 p

ε p (∫ ϑ ε n ∗ (D(uψ n )) − D(uψ n ) dz) < n . 2 Ω

We set u ε = ∑n≥1 ϑ ε n ∗ (uψ n ). For every z ∈ Ω in some neighborhood of it, only a finite number of terms in this sum are nonzero. Therefore u ε ∈ C∞ (Ω). We obtain u = ∑n≥1 (uψ n ). Then from (4.5.1) it follows that 1 p

p ‖u ε − u‖p ≤ ∑ (∫ ϑ ε n ∗ (uψ n ) − uψ n dz) < ε , n≥1

Ω 1 p

(4.5.2)

p ‖Du ε − Du‖p ≤ ∑ (∫ ϑ ε n ∗ (D(uψ n )) − D(uψ n ) dz) < ε . n≥1

Ω

Then u ε ∈ W 1,p (Ω) ∩ C∞ (Ω) and u ε → u in W 1,p (Ω) as ε → 0+ . Remark 4.5.12. In this theorem we do not claim that the approximating smooth functions belong to C∞ (Ω). For this reason the result is known as the “Local Approximation Theorem by Smooth Functions.” In order to have a “Global Approximation Theorem by Smooth Functions,” that is, for the approximating smooth functions to belong to C∞ (Ω), we need to strengthen the condition on Ω. Definition 4.5.13. Let Ω ⊆ ℝN be an open set. We say that ∂Ω is Lipschitz if it can be represented locally as the graph of a Lipschitz function defined on some open ball of ℝN−1 . The next theorem is the global approximation result and its rather technical proof can be found in Evans–Gariepy [105, Theorem 3, p. 127]. Theorem 4.5.14. If Ω ⊆ ℝN is bounded, ∂Ω is Lipschitz, and u ∈ W 1,p (Ω) for some 1 ≤ p < ∞, then there exists a sequence {u n }n≥1 ⊆ W 1,p (Ω) ∩ C∞ (Ω) such that u n → u in W 1,p (Ω).

356 | 4 Banach Spaces of Functions and Measures The approximating theorems permit the extension of the usual calculus rules to Sobolev functions. Proposition 4.5.15. If Ω ⊆ ℝN is an open set and u, v ∈ W 1,p (Ω) ∩ L∞ (Ω) for some 1 ≤ p < ∞, then uv ∈ W 1,p (Ω) ∩ L∞ (Ω) and ∂(uv) ∂u ∂v = v+u ∂z k ∂z k ∂z k

a.e. for all k ∈ ℕ .

Proof. Let ψ ∈ C1c (Ω) with supp ψ ⊆ U ⊂⊂ Ω. From Proposition 4.5.10 it follows that ∫ uv Ω

∂ψ ∂ψ ∂ψ dz = ∫ uv dz = lim ∫ u ε v ε dz ε→0 ∂z k ∂z k ∂z k U

U

∂v ε ∂u ∂v ∂u ε ε v + uε v+u = − lim ∫ [ ] ψdz = − ∫ [ ] ψdz ε→0 ∂z k ∂z k ∂z k ∂z k U

U

∂u ∂v = −∫[ v+u ] ψdz . ∂z k ∂z k Ω

1,p

Remark 4.5.16. The result is also true if W 1,p (Ω) is replaced by W0 (Ω); see Definition 4.5.3. Before proving additional calculus rules for Sobolev functions, let us state a result 1,p that shows when a Sobolev function u ∈ W 1,p (Ω) actually belongs to W0 (Ω); see Definition 4.5.3. Proposition 4.5.17. If Ω ⊆ ℝN is an open set, u ∈ W 1,p (Ω) for some 1 ≤ p < ∞ and u 1,p vanishes outside of a compact set K ⊆ Ω, then u ∈ W0 (Ω). Proof. Let Ω∗ ⊆ ℝN be a bounded open set such that K ⊆ Ω∗ ⊂⊂ Ω with Lipschitz N boundary ∂Ω∗ . Choose a cut-off function ψ ∈ C∞ c (ℝ ) such that ψ K ≡ 1. Evidently u = ψu. Applying Corollary 4.1.32 and Theorem 4.5.14, there exists a sequence {u n }n≥1 ⊆ C∞ c (Ω) such that u n → u in L p (Ω) and

Du n → Du in L p (Ω∗ , ℝN ) . 1,p

Then ψu n → ψu = u in W 1,p (Ω) and ψu n ∈ C∞ c (Ω). Hence u ∈ W 0 (Ω); see Definition 4.5.3. Now we can prove a chain rule for Sobolev functions. Theorem 4.5.18 (Chain Rule for Sobolev Functions). If Ω ⊆ ℝN is a bounded open set with Lipschitz boundary, φ : ℝ → ℝ is Lipschitz continuous and u ∈ W 1,p (Ω) (resp. u ∈ 1,p 1,p W0 (Ω)) for some 1 ≤ p < ∞, then φ ∘ u ∈ W 1,p (Ω) (resp. φ ∘ u ∈ W0 (Ω)) and ∂ ∂u (φ ∘ u)(z) = φ∗ (u(z)) (z) ∂z k ∂z k

for a.a. z ∈ Ω and for all k ∈ ℕ ,

where φ∗ : ℝ → ℝ is any measurable function such that φ∗ = φ a.e.

4.5 Sobolev Spaces | 357

Proof. First assume that u ∈ W 1,p (Ω). Using Theorem 4.5.14 there exists a sequence {u n }n≥1 ⊆ C∞ (Ω) such that u n → u in W 1,p (Ω). We set ξ n = φ ∘ u n with n ∈ ℕ. It is clear that ξ n is Lipschitz continuous and ∂ξ n /(∂z k ) ≤ Lip(ξ n ) for all k = 1, . . . , N and for all n ∈ ℕ where Lip(ξ n ) denotes the Lipschitz constant of ξ n . Hence ∂ξ n /(∂z k ) ∈ L p (Ω) and so ξ n ∈ W 1,p (Ω) for all n ∈ ℕ. We obtain |ξ n (z) − φ(u(z))| = |φ(u n (z)) − φ(u(z))| ≤ Lip(φ)|u n (z) − u(z)|

for a.a. z ∈ Ω ,

which gives ξ n → φ ∘ u in L p (Ω). Let {e k }Nk=1 be the standard orthonormal basis of ℝN . Using this yields |ξ n (z + te k ) − ξ n (z)| Lip(φ)|u n (z + te k ) − u n (z)| ≤ , |t| |t| which implies ∂ξ ∂u n lim sup ≤ Lip(φ) ∂z ∂z n→∞ k p k p

for all k = 1, . . . , N .

Hence, {∂u/(∂z k )}n≥1 ⊆ L p (Ω) is bounded for all k = 1, . . . , N. By passing to a suitable subsequence if necessary we may assume that ∂ξ n w → h k in L p (Ω) as n → ∞ with h k ∈ L p (Ω) and k = 1, . . . , N . ∂z k

(4.5.3)

From (4.5.2) and (4.5.3) it follows that h k = (∂φ(u))/(∂z k ) for all k = 1, . . . , N. Therefore ξ n → φ(u) in W 1,p (Ω) and so φ ∘ u ∈ W 1,p (Ω). Note that Dξ n = φ (u n )Du n for all n ∈ ℕ. So, taking the limit as n → ∞ we derive D(φ ∘ u) = φ∗ (u)Du with a measurable function φ∗ : ℝ → ℝ such that φ∗ = φ a.e. 1,p If u ∈ W0 (Ω), then in the proof above we have {u n }n≥1 ⊆ C∞ c (Ω); see Definition 4.5.3. So, ξ n ∈ W 1,p (Ω) has compact support. Thus by Proposition 4.5.17, we 1,p 1,p infer that ξ n ∈ W0 (Ω), and so the limit as n → ∞ gives φ ∘ u ∈ W0 (Ω). Moreover, D(φ ∘ u) = φ∗ (u)Du. This chain rule has interesting consequences. Corollary 4.5.19. If Ω ⊆ ℝN is a bounded open set with Lipschitz boundary, X = W 1,p (Ω) 1,p or W0 (Ω) with 1 ≤ p < ∞ and u, v ∈ X, then the following hold: (a) u+ , u− , |u| ∈ X and {0 Du+ = { Du { −Du { { { D|u| = {0 { { {Du

a.e. on {u ≤ 0} a.e. on {0 < u}

,

a.e. on {u < 0} a.e. on {u = 0} . a.e. on {0 < u}

{−Du Du− = { 0 {

a.e. on {u < 0} a.e. on {0 ≤ u}

,

358 | 4 Banach Spaces of Functions and Measures (b) max{u, v} = h ∈ X, min{u, v} = g ∈ X and {Du Dh = { Dv {

a.e. on {u ≥ v} a.e. on {v ≥ u}

,

{Du Dg = { Dv {

a.e. on {u ≤ v} a.e. on {v ≤ u}

.

The proper definition of boundary values for Sobolev functions requires the notion of trace. This notion is produced in the next theorem whose proof can be found in Evans–Gariepy [105, Theorem 1, p. 133] and Kufner–John-Fučík [180, Theorem 6.6.4, p. 325]. Theorem 4.5.20. If Ω ⊆ ℝN is a bounded open set with Lipschitz boundary ∂Ω and 1 ≤ p < ∞, then the following hold: (a) there exists a bounded linear operator γ0 : W 1,p (Ω) → L p (∂Ω), where we consider the (N − 1)-dimensional Hausdorff surface measure σ on ∂Ω, such that γ0 (u) = u∂Ω for all u ∈ W 1,p (Ω) ∩ C(Ω); (b) for every ϑ ∈ C(ℝN , ℝN ) and u ∈ W 1,p (Ω) it holds that ∫ u(div ϑ)dz + ∫(Du, ϑ)ℝN dz = ∫ (ϑ, n)ℝN dσ , Ω

Ω

∂Ω

with n being the outward unit normal on ∂Ω; 1,p (c) W0 (Ω) = ker γ0 . Remark 4.5.21. Since by hypothesis ∂Ω is Lipschitz, n(z) exists for σ-almost all z ∈ ∂Ω. Definition 4.5.22. The function γ0 (u) ∈ L p (∂Ω) is uniquely defined up to σ-null sets. It is called the trace of u on ∂Ω and we interpret γ0 (u) as describing the boundary values of the Sobolev function u ∈ W 1,p (Ω) with 1 ≤ p < ∞. Remark 4.5.23. In the definition above we have excluded the case p = +∞. The reason 1,p for this is that u ∈ Wloc (Ω) if and only if u is locally Lipschitz. So, it admits a continuous extension on Ω and then the values of u on ∂Ω are defined in the classical sense. The operator γ0 : W 1,p (Ω) → L p (∂Ω) is not surjective. In order to describe the range of γ0 we need to introduce Sobolev spaces of fractional order that are beyond our scope here. Finally we mention a compactness result for the trace map, which can be found in Kufner–John-Fučík [180, Theorem 6.10.5, p. 344]. Theorem 4.5.24. If Ω ⊆ ℝN is a bounded open set with Lipschitz boundary ∂Ω, then the following hold: (a) if 1 < p < N, then γ0 : W 1,p (Ω) → L r (∂Ω) is compact for any 1 ≤ r < (Np − p)/(N − p); (b) if N ≥ p, then γ0 : W 1,p (Ω) → L r (∂Ω) is compact for any r ≥ 1. Another result in this vein is the so-called “Rellich–Kondrachov Theorem,” which can be found in Adams [1, Theorem 6.2, p. 144].

4.5 Sobolev Spaces | 359

Theorem 4.5.25 (Rellich–Kondrachov Theorem). If Ω ⊆ ℝN is a bounded open set with Lipschitz boundary and let 1 ≤ p < ∞, then the following hold: (a) when 1 ≤ p < N, it holds W 1,p (Ω) → L r (Ω) for all r ∈ [1, p∗ = Np/(N − p)] and the embedding is compact if 1 ≤ r < p∗ ; (b) when p = N, it holds W 1,p (Ω) → L r (Ω) for all r ∈ [1, +∞] and the embedding is compact; (c) when p > N, it holds that W 1,p (Ω) → C(Ω) compactly. Remark 4.5.26. The embedding W 1,p (Ω) → L p (Ω) is never compact. Also, if Ω is not bounded, then the embedding W 1,p (Ω) → L p (Ω) is not compact. ∗

As a consequence of Theorem 4.5.25 we have the following proposition providing useful equivalent norms on W 1,p (Ω). Proposition 4.5.27. If Ω ⊆ ℝN is a bounded open set with Lipschitz boundary, then |u| = ‖u‖r + ‖Du‖p is an equivalent norm on W 1,p (Ω) in the following cases: (a) 1 ≤ r ≤ p∗ if 1 ≤ p < N; (b) 1 ≤ r < ∞ if p = N; (c) 1 ≤ r ≤ ∞ if p > N. 1,p

For the space W0 (Ω) we can do better. The result is known as “Poincaré’s inequality.” Theorem 4.5.28 (Poincaré’s inequality). If Ω ⊆ ℝN is a bounded open set and 1 ≤ p < ∞, then there exists a constant c = c(p, N, Ω) > 0 such that ‖u‖p ≤ c‖Du‖p for all 1,p u ∈ W0 (Ω). Proof. Since Ω ⊆ ℝN is bounded, there exists η ∈ ℝ such that Ω ⊆ (−η, η)N . For ϑ ∈ C∞ c (Ω), we have zN ∂ϑ ϑ(z) = ∫ (z1 , . . . , z N−1 , s)ds . ∂z N −η

By using Hölder’s inequality this results in η p ∂ϑ p p−1 (z1 , . . . , z N−1 , s) ds . |ϑ(z)| ≤ (2η) ∫ ∂z N −η

Therefore

∫

|ϑ(z1 , . . . , z N−1 , z N )|p dz1 . . . dz N−1

[−η,η]N−1 η

≤ (2η)p−1 which implies

∫ [−η,η]N−1

∂ϑ p (z1 , . . . , z n−1 , s) ds) dz1 . . . dz N−1 , ( ∫ ∂z N −η

∂ϑ p p ‖ϑ‖p ≤ (2η)p ∫ dz ≤ (2η)p ‖Dϑ‖p . ∂z N

Thus, ‖u‖p ≤ 2η‖Du‖p for all u ∈

Ω 1,p W0 (Ω)

where we have used Definition 4.5.3.

360 | 4 Banach Spaces of Functions and Measures Corollary 4.5.29. If Ω ⊆ ℝN is a bounded open set and 1 ≤ p < ∞, then | ⋅ | = ‖D(⋅)‖p is 1,p an equivalent norm on W0 (Ω). There are some other related inequalities that are also useful. Definition 4.5.30. We say that Ω ⊆ ℝN is a domain if it is open and connected. Theorem 4.5.31. If Ω ⊆ ℝN is a bounded domain with Lipschitz boundary and V ⊆ W 1,p (Ω) with 1 < p < ∞ is a closed subspace such that the only constant function belonging to V is the zero function, then there exists a constant c > 0 such that ‖v‖p ≤ c‖Dv‖p for all v ∈ V. Proof. Arguing by contradiction, suppose that the conclusion of the theorem is false. Then there exists a sequence {v n }n≥1 ⊆ V such that ‖v n ‖p > n‖Dv n ‖p . Let y n = v n /‖v n ‖p with n ∈ ℕ. Then y n ∈ V, ‖y n ‖p = 1 for all n ∈ ℕ and ‖Dy n ‖p < 1/n for all n ∈ ℕ. Therefore {y n }n≥1 ⊆ W 1,p (Ω) is bounded and so by passing to a suitable subsequence w if necessary we may assume that y n → y in W 1,p (Ω). Hence, y n → y in L p (Ω) (see Theorem 4.5.25), and so y ∈ V and ‖y‖p = 1. From the weak lower semicontinuity of the norm (see Proposition 3.3.13(c)), it follows that ‖y‖p + ‖Dy‖p ≤ lim inf [‖y n ‖p + ‖Dy n ‖p ] . n→∞

Hence, ‖Dy‖p ≤ lim inf n→∞ ‖Dy n ‖p = 0, which implies y ≡ c ∈ ℝ as Ω is connected. Since y ∈ V we obtain y = 0, a contradiction to the fact that ‖y‖ = 1. Corollary 4.5.32. If Ω ⊆ ℝN is a bounded domain with Lipschitz boundary, Γ∗ ⊆ ∂Ω with σ(Γ∗ ) > 0 and V = {v ∈ W 1,p (Ω) : γ0 (v) = 0 on Γ∗ } with 1 < p < ∞, then there exists a constant c > 0 such that ‖v‖p ≤ c‖Dv‖p for all v ∈ V. The next consequence of Theorem 4.5.31 is the following result known as the “Poincaré– Wirtinger inequality.” Corollary 4.5.33 (Poincaré–Wirtinger inequality). If Ω ⊆ ℝN is a bounded domain with a Lipschitz boundary, then there exists a constant c > 0 such that 1 ∫ udz ≤ c‖Du‖p u − N λ (Ω) p Ω

for all u ∈ W 1,p (Ω) with 1 < p < ∞ ,

where λ N denotes the Lebesgue measure on ℝN . Proof. Let V = {v ∈ W 1,p (Ω) : ∫Ω vdz = 0} and apply Theorem 4.5.31. We conclude with another equivalent norm for Sobolev spaces that is useful in the study of boundary value problems.

4.5 Sobolev Spaces |

361

Proposition 4.5.34. If Ω ⊆ ℝN is a bounded domain with a Lipschitz boundary and β ∈ L∞ (∂Ω) with β(z) ≥ 0 σ-a.e. and β ≢ 0, then 1 p

u → |u| =

[‖Du‖pp [

+ ∫ β(z)|u| dσ] ] ∂Ω p

is an equivalent norm on the Sobolev space W 1,p (Ω) with 1 < p < ∞. Proof. For every u ∈ W 1,p (Ω) we obtain p

p

p

p

|u|p ≤ ‖Du‖p + ‖β‖L∞ (∂Ω) ‖u‖L p (∂Ω) ≤ ‖Du‖p + ‖β‖L∞ (∂Ω) ‖γ0 ‖L ‖u‖p ; see Theorem 4.5.20. Then |u| ≤ c1 ‖u‖

for some c1 > 0 .

(4.5.4)

Next we show that there exists a constant c2 > 0 such that ‖u‖p ≤ c2 |u|

for all u ∈ W 1,p (Ω) .

(4.5.5)

Arguing by contradiction, suppose that (4.5.5) is not true. Then there exists a sequence {u n }n≥1 ⊆ W 1,p (Ω) such that ‖u n ‖p > n|u n | for all n ∈ ℕ. Normalizing in L p (Ω), we can say that ‖u n ‖p = 1

and |u n |

0 .

(4.5.9)

Then (4.5.4) and (4.5.9) imply that ‖ ⋅ ‖ and | ⋅ | are equivalent norms on W 1,p (Ω).

362 | 4 Banach Spaces of Functions and Measures

4.6 Spaces of Measures Let (X, Σ) be a measure space. We define ca(Σ) = the set of all ℝ-valued signed measures on Σ , ca+ (Σ) = the set of all ℝ+ -valued measures on Σ . Given μ ∈ ca(Σ), one has the total variation |μ| : Σ → ℝ+ defined by n

|μ|(A) = sup [ ∑ |μ(A k )| : n ∈ ℕ, {A k }nk=1 is a Σ-partition of A] ; k=1

see Definition 2.4.15 and Remark 2.4.16. We know the following. Proposition 4.6.1. If μ ∈ ca(Σ), then the following hold: (a) sup [|μ(C)| : C ∈ Σ, C ⊆ A] ≤ |μ|(A) ≤ 2 sup [|μ(C)| : C ∈ Σ, C ⊆ A]; (b) |μ| ∈ ca+ (Σ). Moreover, for μ ∈ ca(Σ) we have the positive and negative variations of μ defined by μ+ =

1 [|μ| + μ] 2

and

μ− =

1 [|μ| − μ] . 2

Evidently μ = μ+ − μ− , |μ| = μ+ + μ− and μ+ , μ− ∈ ca+ (Σ). Moreover μ+ ⊥μ− and there exist P, N ∈ Σ, the Hahn decomposition of X, such that for all A ∈ Σ μ+ (A) = μ(A ∩ P) = sup[μ(C) : C ∈ Σ, C ⊆ A] , μ− (A) = −μ(A ∩ N) = − inf[μ(C) : C ∈ Σ, C ⊆ A] , μ+ (A) ≥ 0 for all A ⊆ P ,

μ− (A) ≥ 0 for all A ⊆ N .

On ca(Σ) we define the following two quantities ‖μ‖ = |μ|(X)

and ‖μ‖∞ = sup[|μ(A)| : A ∈ Σ] .

It is easy to see that both are norms on ca(Σ). Moreover, we have the following result. Proposition 4.6.2. ‖ ⋅ ‖, ‖ ⋅ ‖∞ are equivalent norms on ca(Σ) and (ca(Σ), ‖ ⋅ ‖) and (ca(Σ), ‖ ⋅ ‖∞ ) are Banach spaces. Proof. From Proposition 4.6.1(a) it follows that ‖μ‖∞ ≤ ‖μ‖ ≤ 2‖μ‖∞

for all μ ∈ ca(Σ) ,

(4.6.1)

which shows that ‖ ⋅ ‖, ‖ ⋅ ‖∞ are equivalent norms on ca(Σ). In addition, from (4.6.1) we see that ‖⋅‖

μn → μ

‖⋅‖∞

if and only if

μn → μ

if and only if

sup[|μ n (A) − μ(A)| : A ∈ Σ] → 0 .

(4.6.2)

4.6 Spaces of Measures |

363

Now suppose that {μ n }n≥1 ⊆ ca(Σ) is a ‖ ⋅ ‖-Cauchy sequence, hence also a ‖ ⋅ ‖∞ Cauchy sequence because of their equivalence; see (4.6.1). From (4.6.2) we obtain that {μ n (A)}n≥1 ⊆ ℝ is a Cauchy sequence uniformly in A ∈ Σ. Thus, there exists an additive μ : Σ → ℝ such that sup[|μ n (A) − μ(A)| : A ∈ Σ] → 0

as n → ∞ .

(4.6.3)

We need to show that μ ∈ ca(Σ). Let {A k }k≥1 ⊆ Σ be pairwise disjoint and let A = ⋃k≥1 A k ∈ Σ. From (4.6.3) we see that there exists a number n0 ∈ ℕ such that |μ n (C) − μ(C)| ≤ ε

for all C ∈ Σ and for all n ≥ n0 .

(4.6.4)

Moreover, since μ n ∈ ca(Σ) for n ∈ ℕ, there exists a number n1 ∈ ℕ with n1 ≥ n0 such that n μ n0 (A) − ∑ μ n0 (A k ) ≤ ε for all n ≥ n1 . (4.6.5) k=1 Taking (4.6.4) and (4.6.5) into account, we get for n ≥ n1 that n n n n μ(A) − ∑ μ(A k ) = μ (A \ ⋃ A k ) ≤ μ (A \ ⋃ A k ) − μ n0 (A \ ⋃ A k ) k=1 k=1 k=1 k=1 n n n + μ n0 (A \ ⋃ A k ) = μ (A \ ⋃ A k ) − μ n0 (A \ ⋃ A k ) k=1 k=1 k=1 n + μ n0 (A) − ∑ μ n0 (A k ) ≤ 2ε . k=1 Hence, μ ∈ ca(Σ). Therefore (ca(Σ), ‖ ⋅ ‖) and (ca(Σ), ‖ ⋅ ‖∞ ) are Banach spaces. Definition 4.6.3. Let X be a locally compact topological space. We introduce the following vector spaces of continuous functions: (a) Cc (X) = {f : X → ℝ : f is continuous with compact support}. (b) C0 (X) = {f : X → ℝ : f is continuous and for every ε > 0 there exists a compact set K ε ⊆ X such that |f(x)| ≤ ε for all x ∈ X \ K ε }. These are the continuous functions on ℝ that vanish at infinity. (c) Cb (X) = {f : X → ℝ : f is continuous and bounded}. Remark 4.6.4. From the definitions above it is clear that we always have Cc (X) ⊆ C0 (X) ⊆ Cb (X). If X is compact, then the three spaces coincide. We endow the space Cb (X) with the supremum norm ‖f‖∞ = sup[|f(x)| : x ∈ X] . Proposition 4.6.5. If X is a locally compact topological space, then the following hold: (a) (Cb (X), ‖ ⋅ ‖∞ ) is a Banach space; (b) C0 (X) is a closed subspace of Cb (X) and so a Banach space as well; ‖⋅‖∞ (c) Cc (X) = C0 (X).

364 | 4 Banach Spaces of Functions and Measures Proof. (a) Let {f n }n≥1 ⊆ Cb (X) be a Cauchy sequence. The completeness of ℝ implies that for every fixed x ∈ X, the sequence {f n (x)}n≥1 ⊆ ℝ has a limit f(x). We claim that ‖f n − f‖∞ → 0 and

f ∈ Cb (X) .

(4.6.6)

Given ε > 0, there exists n0 ∈ ℕ such that ‖f n − f m ‖∞

0, there exists an open set U ⊆ X such that |μ|(U \ K) < ε. Invoking Urysohn’s Lemma (see Theorem 1.2.17), there exists a continuous function f : X → [0, 1] such that f K = 1 and f X\U = 0. Then ξ f (μ) = ∫ f(x)dμ = ∫ f(x)dμ = μ(K) + ∫ f(x)dμ , X

U

U\K

which gives μ(K) = ξ f (μ) − ∫ f(x)dμ ≥ − ∫ f(x)dμ ≥ −μ(U \ K) ≥ −ε . U\K

U\K

Since ε > 0 is arbitrary we let ε ↘ 0 to conclude that μ(K) ≥ 0 for all compact sets K ⊆ X. But μ ∈ caR (B(X)). Hence it is compact regular; see Definition 2.5.8(d). Therefore μ ∈ ( caR )+ (B(X)) and we have proven that the latter is τn (X)-closed. The next theorem yields several equivalent definitions of the narrow topology on ca+ (B(X)). In what follows, we denote by Ub (X) the space of all ℝ-valued, bounded, uniformly continuous functions on X. Theorem 4.6.21. If X is metrizable and {μ α , μ}α∈I ⊆ ca+ (B(X)), then the following statements are equivalent: (a) (b) (c) (d) (e)

τn (X)

μ α → μ; ξ f (μ α ) → ξ f (μ) for all f ∈ Ub (X); lim supα∈I μ α (C) ≤ μ(C) for all closed C ⊆ X; μ(U) ≤ lim inf α∈I μ α (U) for all open U ⊆ X; limα μ α (A) = μ(A) for all A ∈ B(X) with μ(A \ int A) = μ(bd A) = 0.

Proof. (a) ⇒ (b): This is clear from Definition 4.6.11(a) since Ub (X) ⊆ Cb (X). (b) ⇒ (c): Let C ⊆ X be closed and let U n = {x ∈ X : d(x, C) < 1/n}. Then U n is open and C ∩ (X \ U n ) = 0. We define f n (x) = (d(x, X \ U n ))/(d(x, C) + d(x, X \ U n )) with x ∈ X. Evidently f n ∈ Ub (X) and f n C = 1 as well as f n X\U n = 0 and 0 ≤ f n ≤ 1.

4.6 Spaces of Measures | 371

Moreover, ⋂n≥1 U n = C. Then it follows that lim sup μ α (C) ≤ lim sup ∫ f n (x)dμ α = ∫ f n (x)dμ ≤ μ(U n ) for all n ≥ 1 . α

α

X

X

Hence, lim supα μ α (C) ≤ μ(C). (c) ⇐⇒ (d): Since closed sets are the complements of open sets the result follows. (d) ⇒ (e): Let A ∈ B(X) with μ(bd A) = μ(A \ int A) = 0. Then lim sup μ α (A) ≤ lim sup μ α (A) ≤ μ(A) = μ(A)

(4.6.24)

lim inf μ α (A) ≥ lim inf μ α (int A) ≥ μ(int A) = μ(A) .

(4.6.25)

α

α

and α

α

From (4.6.24) and (4.6.25) we conclude that μ α (A) → μ(A). (e) ⇒ (a): Let f ∈ Cb (X) and define μ f (A) = μ({x ∈ X : f(x) ∈ A}) for all A ∈ B(X). Then μ f can have at most a countable number of mass points; see Saks Lemma (Lemma 2.8.1). Hence, given any ε > 0, there exists {η k }m k=0 and a, b ∈ ℝ such that a < f(x) < b for all x ∈ X , η k − η k−1 < ε ,

a = η0 < η1 < . . . < η m = b ,

(4.6.26)

μ({x ∈ X : f(x) = η k }) = 0 for all k = 1, . . . , m .

(4.6.27)

Let A k = {x ∈ X : η k−1 ≤ f(x) < η k } with k = 1, . . . , m. Then {A k }m k=1 ⊆ B(X) are disjoint and X = ⋃m A ; see (4.6.26). Moreover, we obtain k=1 k A k \ int A k ⊆ {x ∈ X : f(x) = η k−1 } ∪ {x ∈ X : f(x) = η k } . This gives, due to (4.6.27), that μ(A k \ int A k ) = μ(bd A k ) = 0 for all k = 1, . . . , m. Then μ α (A k ) → μ(A k ) for all k = 1, . . . , m .

(4.6.28)

Let s(x) = ∑m k=1 η k−1 χ A k (x) for all x ∈ X. Then |s(x) − f(x)| < ε for all x ∈ X, see (4.6.27). It follows that ∫ f(x)dμ α − ∫ f(x)dμ X X ≤ ∫ |f(x) − s(x)|dμ α + ∫ s(x)dμ α − ∫ s(x)dμ + ∫ |s(x) − f(x)|dμ X X X X m

≤ ε[μ α (x) + μ(x)] + ∑ |μ α (A k ) − μ(A k )||η k−1 | . k=1

Hence, thanks to (4.6.28),

lim sup ∫ f(x)dμ α − ∫ f(x)dμ ≤ 2εμ(X) . α X X τn (X)

Letting ε ↘ 0 we conclude that μ α → μ.

372 | 4 Banach Spaces of Functions and Measures Remark 4.6.22. The theorem above fails on all of ca(B(X)). To see this, let X = ℝ with τn (X)

the usual metric topology. Let μ n = δ1/n − δ−1/n for all n ∈ N. Then μ n → 0. On the other hand if C = [0, +∞), then μ n (C) = 1 for all n ∈ ℕ. Hence (b) is not satisfied. Next we consider a metrizable space X and we focus on the set of all probability measures on X denoted by P(X). Note that all probability measures on X are regular; see Theorem 2.5.12. So, P(X) = Pr (X). The reason we focus on P(X) is that span P(X) = ca(B(X)). It turns out that P(X) equipped with the relative τn (X)-topology inherits many of the properties of the space X. So, in what follows without any further saying, we consider on P(X) the trace of the τn (X)-topology. Theorem 4.6.23. X is compact if and only if P(X) is compact metrizable. Proof. ⇒: From Proposition 4.6.13 we know that τn (X) = τw∗ (X). Since Cb (X) = C(X) is separable, the w∗ -topology on the closed unit ball of ca(B(X)) is compact and metrizable; see Theorem 3.4.11. But P(X) is a w∗ -closed subset of the closed unit ball in ca(B(X)). We conclude that P(X) is compact and metrizable. ⇐: Applying Proposition 4.6.17, we know that X is a topological subspace of P(X). So, it follows that X is compact. Theorem 4.6.24. X is separable if and only if P(X) is separable metrizable. Proof. ⇒: By Urysohn’s Theorem (see Theorem 1.5.21), X is homeomorphic to a subset of the Hilbert cube ℍ = [0, 1]ℕ . Therefore there exists an equivalent totally bounded metrization of X. We use this metric d on X. Then the completion (X, d) of (X, d) is compact and we have an isometry ϑ : U d (X) → C(X) defined by ϑ(f) = f with f being the unique continuous extension of f on X; see Theorem 1.5.27. Here, U d (X) denotes the space of all ℝ-valued d-uniformly continuous functions on X. Since C(X) is separable, so is U d (X). Let D ⊆ U d (X) be a countable dense subset. Let S : P(X) → ℝℕ be defined by { } S(μ) = {∫ f(x)dμ : f ∈ D} . {X } We claim that S is a homeomorphism. First we show that S is injective. So suppose that S(μ) = S(λ). Then ∫X f(x)dμ = ∫X f(x)dλ for all f ∈ D. Exploiting the density of D in U d (X), we obtain that ∫X f(x)dμ = ∫X f(x)dλ for all f ∈ U d (X). Let C ⊆ X be closed and let U n = {x ∈ X : d(x, C) < 1/n}. Then U n is open, C ∩ (X \ U n ) = 0, and C = ⋂n≥1 U n . Then as in the proof of Theorem 4.6.21, we produce a function f n̂ : X → [0, 1] such that f n̂ C = 1 ,

f n̂ X\U n = 0 ,

0 ≤ f n̂ ≤ 1 ,

f n̂ ∈ U d (X) for all n ∈ ℕ .

It follows that μ(C) ≤ ∫ f n̂ (x)dμ = ∫ f n̂ (x)dλ = ∫ f n̂ (x)dλ ≤ λ(U n ) for all n ∈ ℕ , X

X

Un

4.6 Spaces of Measures | 373

which shows that μ(C) ≤ λ(C). Interchanging the roles for μ and λ in the argument above, we also get that λ(C) ≤ μ(C); hence μ(C) = λ(C), for all closed C ⊆ X. Therefore μ = λ; see Remark 2.5.9. We have proven that S is injective. Next we show that S is bicontinuous. So, suppose that μ α → μ in P(X). Then ∫X f(x)dμ α → ∫X f(x)dμ for all f ∈ D; see Theorem 4.6.21. Hence, S(μ α ) → S(μ) and this proves the continuity of S. In order to show the continuity of S−1 , let {μ α }α∈I ⊆ P(X) be a net and assume that S(μ α ) → S(μ) in ℝℕ , that is, ∫X f(x)dμ α → ∫X f(x)dμ for all f ∈ D. For any h ∈ U d (X), one has ∫ h(x)dμ α − ∫ h(x)dμ ≤ 2‖h − f‖∞ + ∫ f(x)dμ α − ∫ f(x)dμ X X X X with f ∈ D. Then

lim sup ∫ h(x)dμ α − ∫ h(x)dμ ≤ 2‖h − f‖∞ α X X

for all f ∈ D .

Hence, μ α → μ in P(X) since D ⊆ U d (X) is dense. Therefore S−1 is continuous and so S is bicontinuous. Thus, S is a homeomorphism. Since ℝℕ is separable metrizable, then so is P(X). ⇐: By Proposition 4.6.17, X can be viewed as a topological subspace of P(X). Therefore X is separable. Theorem 4.6.25. If X is separable metrizable, then X is Polish if and only if P(X) is Polish. Proof. ⇒: As in the proof of Theorem 4.6.24, we may assume that X is totally bounded for some metric d. Its d-completion X is compact and X being Polish is a G δ -subset of X; see Theorem 1.5.47. From Theorem 4.6.23 we know that P(X) is compact metrizable. Let P0 = {μ ∈ P(X) : μ(X \ X) = 0} Then P(X) and P0 are homeomorphic. If we show that P0 ⊆ P(X) is G δ , then Theorem 1.5.47 implies that P0 is Polish, and hence, so is P(X). Since X is G δ in X, there exist open sets {U n }n≥1 in X such that X = ⋂n≥1 U n . Let P n = {μ ∈ P(X) : μ(X \ U n ) = 0} with n ∈ ℕ . Then P0 = ⋂n≥1 P n and P n = ⋂k≥1 P kn with P kn = {μ ∈ P(X) : μ(X \ U n ) < 1/k}. Suppose that {μ α }α∈I ⊆ P(X) \ P kn such that μ α → μ in P(X). Then by Theorem 4.6.21, we obtain μ(X \ U n ) ≥ lim sup μ α (X \ U n ) ≥ α

1 , k

which implies that P kn ⊆ P(X) is open for all n, k ∈ ℕ. Hence P0 is G δ in P(X) and so P(X) is Polish. ⇐: Once again we use Proposition 4.6.17 to view X as a closed subspace of P(X). Then X is Polish by Proposition 1.5.45. Dudley [88] furnished a compatible metric d D (μ, μ∗ ) = sup [ξ f (μ) − ξ f (μ∗ ) : f ∈ Lipb (X), ‖f‖Lipb ≤ 1] ,

374 | 4 Banach Spaces of Functions and Measures where Lipb (X) = {f : X → ℝ : f is bounded and Lipschitz continuous} and ‖f‖Lipb = ‖f‖∞ + sup

x=u ̸ x,u∈X

|f(x) − f(u)| . d(x, u)

Definition 4.6.26. Let X be a completely regular topological space and C ⊆ ca(B(X)). We say that C is tight if (a) sup[|μ|(X) : μ ∈ C] < ∞; (b) for every ε > 0 there exists a compact set K ε ⊆ X such that sup[|μ|(X \ K ε ) : μ ∈ C] ≤ ε. Using Proposition 4.6.17, we easily see that the following result is true. Proposition 4.6.27. If X is completely regular, A ⊆ X and C = {δ x : x ∈ A}, then C is tight if and only if A is relatively compact. Proposition 4.6.28. If X is completely regular and C ⊆ ca(B(X)) is bounded, then C is tight if and only if there exists a function φ : X → ℝ = ℝ ∪ {+∞} such that (a) for all η ≥ 0, it holds that φ η = {x ∈ X : φ(x) ≤ η} is compact; (b) sup[∫X φd|μ| : μ ∈ C] < +∞. Proof. ⇒: On account of Definition 4.6.26, there exists an increasing sequence {K n }n≥1 of compact subsets of X such that sup[|μ|(X \ K n ) : μ ∈ C] ≤ 1/2n . Let φ = ∑n≥1 χ X\K n and let [η] be the integer part of η > 0. We have φ η = K[η]+1 and so we have satisfied (a). Moreover, sup [∫ φd|μ| : μ ∈ C] = sup [ ∑ |μ|(X \ K n ) : μ ∈ C] n≥1 [X ]

1 0, let K ε = φ M/ε ⊆ X be compact. Then for every μ ∈ C we get |μ|(X \ K ε ) = |μ| ({x ∈ X : φ(x) >

M ε }) ≤ ∫ φd|μ| ≤ ε . ε M X

Hence, C is tight. Remark 4.6.29. A function φ : X → ℝ = ℝ ∪ {+∞} such that φ η = {x ∈ X : φ(x) ≤ η} is compact for every η ∈ ℝ is said to be inf-compact or coercive. Proposition 4.6.30. If X is metrizable and C ⊆ ca+ (B(X)) is tight, then C well.

τn (X)

is tight as

4.7 Young Measures | 375

Proof. Let M = sup[μ(X) : μ ∈ C] < +∞; see Definition 4.6.26. Given μ ∈ C

τn (X)

, there

τn (X)

exists {μ α }α∈I ⊆ C such that μ α → μ. Let f ≡ 1 ∈ Cb (X). Then μ α (X) = ξ f (μ α ) → τn (X)

ξ f (μ) = μ(X), and hence, μ(X) ≤ M for all μ ∈ C . For every ε > 0 we can find a compact set K ε ⊆ X such that μ(X \ K ε ) ≤ ε

for all μ ∈ C .

(4.6.29)

The function χ X\K ε is lower semicontinuous and nonnegative. So, if μ ∈ C

τn (X)

and

τn (X)

{μ α }α∈I ⊆ C satisfy μ α → μ, then by Proposition 4.6.18, we infer that μ(X \ K ε ) ≤ lim inf μ α (X \ K ε ) ≤ ε α

see (4.6.29). Hence, C

τn (X)

for all μ ∈ C

τn (X)

;

is tight.

We conclude with a result characterizing the τn (X)-compact subsets of caR (B(X)). The result is known as “Prokhorov’s Theorem.” Its proof can be found in Bourbaki [43, Theorem 5.2, p. 64] in the case of X being completely regular and in Dudley [90, Theorem 11.5.4, p. 316] in the case of X being Polish. Theorem 4.6.31 (Prokhorov’s Theorem). If X is completely regular and C ⊆ caR (B(X)) is tight, then C is relatively τn (X)-compact. Remark 4.6.32. For Polish spaces we know that C ⊆ ca(B(X)) is tight if and only if it is relatively τn (X)-compact; see Dudley [90].

4.7 Young Measures In this section we present a brief introduction of the theory of Young measures. Such measures generalize measurable functions and in fact can be viewed as the completion in some sense of the set of measurable functions. In this completion, the measurable functions correspond to the Dirac Young measures. Nowadays Young measures are an important tool in many parts of mathematical analysis. Let (Ω, Σ, μ) be a complete probability space and X a Polish space. By B(X), we denote the Borel σ-algebra of X and by P(X), the set of all probability measures on X endowed with the narrow topology τn (X). The next result is a “disintegration theorem” and its proof can be found in Valadier [291]. Theorem 4.7.1. If p Ω : Ω×X → Ω is the projection map, λ ∈ ca(Σ⊗B(X)), and μ = λ∘p−1 Ω , ̂ then there exists a Σ-measurable λ̂ : Ω → P(X) such that λ(A × C) = ∫A λ(w)(C)dμ. Remark 4.7.2. The map λ̂ : Ω → P(X) is unique in the sense that if λ̂ 0 : Ω → P(X) is ̂ another such measurable map, then λ(w)(C) = λ̂ 0 (w)(C) μ-a.e. for every C ∈ B(X). ̂ ̂ Since B(X) is countably generated, λ(w) = λ0 (w) μ-a.e. The map λ̂ is said to be the

376 | 4 Banach Spaces of Functions and Measures disintegration of λ ∈ ca(Σ⊗B(X)) with respect to μ. Using the Monotone Class Theorem (see Theorem 2.1.12), one easily sees that ̂ λ(E) = ∫ λ(w)(E w )dμ

(4.7.1)

Ω

for all E ∈ Σ ⊗ B(X) where E w = {x ∈ X : (w, x) ∈ E}. Using (4.7.1) and approximating with simple functions, we infer that for every f ∈ L1 (Ω × X, λ) (or alternatively for every Σ ⊗ B(X)-measurable f : Ω × X → ℝ+ ), we obtain ∫ f(w, x)dλ = ∫ [∫ f(w, x)λ̂ w (dx)] dμ . Ω×X Ω [X ] Now we can introduce Young measures. Definition 4.7.3. A Young measure on Ω×X is a λ ∈ ca+ (Σ⊗B(X)) such that μ = λ∘p−1 Ω , that is, μ(A) = λ(A × X) for all A ∈ Σ. By Y(Ω × X) (or simply by Y if no ambiguity can occur), we denote the space of Young measures on Ω × X. Remark 4.7.4. On account of Theorem 4.7.1 we can identify each λ ∈ Y(Ω × X) ̂ with its unique disintegration λ(w). So, we can say that a Young measure is a Σmeasurable map λ̂ : Ω → P(X). So, Y ⊆ ca+ (Σ ⊗ B(X)) and Y ⊆ L0 (Ω, P(X)) = {λ̂ : Ω → P(X) : λ̂ is Σ-measurable}. Note that if λ ∈ Y(Ω × X), then for every A ∈ Σ, λ(A × ⋅) ∈ ca+ (B(X)) and so it is a Radon measure; see Theorem 2.5.14. Proposition 4.7.5. If λ̂ : Ω → P(X), then the following two statements are equivalent: (a) λ̂ is Σ-measurable; ̂ (b) The map w → ξ C (w) = λ(w)(C) is Σ-measurable for every C ∈ B(X). Proof. (a) ⇒ (b): Let U ⊆ X be open and let η U : P(X) → [0, 1] be defined by η U (ϑ) = ϑ(U) for all ϑ ∈ P(X). According to Theorem 4.6.21 η ϑ is lower semicontinuous. Recall that we consider the τn (X)-topology on P(X). Then ξ U = η U ∘ λ̂ and so ξ U is Σ-measurable. Since the elements of P(X) are regular (see Theorem 2.5.12), we conclude that ξ C is Σ-measurable for all C ∈ B(X). (b) ⇒ (a): For every E ∈ Σ ⊗ B(X), let h E : Ω → [0, 1] be defined by h E (w) = ̂λ(w)(E w ). We set M = {E ∈ Σ ⊗ B(X) : h E is Σ-measurable}. A standard argument shows that M is a monotone class that contains the algebra of measurable rectangles. Then from the Monotone Class Theorem (see Theorem 2.1.12), we obtain Σ ⊗ B(X) ⊆ M; see Remark 2.2.24. Hence, h E is Σ-measurable for every E ∈ Σ ⊗ B(X). Now let C ∈ B(X) and ̂ ̂ let E = Ω × C. We have h E (w) = λ(w)(C) = ∫C χ C (x)λ(w)(dx). We have seen that h E is ̂ Σ-measurable. Then w → ∫ s(x)λ(w)(dx) is Σ-measurable for every simple function X

s(x) on X. Finally if h ∈ Cb (X), then there exists a sequence {s n }n≥1 of simple functions such that |s n (x)| ≤ h(x) for all x ∈ X, for all n ∈ ℕ and by Corollary 2.2.19, we also get that s n (x) → h(x) for all x ∈ X as n → ∞. Then by the Lebesgue Dominated

4.7 Young Measures | 377

Convergence Theorem (see Theorem 2.3.8), it follows ̂ ̂ ̂ → ∫ h(x)λ(w)(dx) = ⟨λ(w), h⟩ , ∫ s n (x)λ(w)(dx) X

X

̂ ̂ which shows that w → ⟨λ(w), h⟩ is Σ-measurable for all h ∈ Cb (X). Hence, w → λ(w) is Σ-measurable. Let L0 (Ω, X) = {u : Ω → X : u is Σ-measurable}. Given u ∈ L0 (Ω, X), let λ̂ u : Ω → P(X) be defined by λ̂ u (w) = δ u(w) . Note that w → λ̂ u (w)(C) = χ u−1 (C) (w) is Σ-measurable for every C ∈ B(X). Invoking Proposition 4.7.5, we infer that λ̂ u : Ω → P(X) is Σ-measurable and so it is a Young measure λ̂ u ; see Remark 4.7.4. Definition 4.7.6. Given u ∈ L0 (Ω, X), λ̂ u defined above is the Young measure associated to the measurable function u. Remark 4.7.7. Evidently, λ̂ u is the disintegration of the Young measure λ u defined by λ u (A × C) = μ(A ∩ u−1 (C)) for all A ∈ Σ and for all C ∈ B(X) . Then for every Σ ⊗ B(X)-measurable function φ : Ω × X → ℝ+ = ℝ+ ∪ {+∞} or every φ ∈ L1 (Ω × X, λ u ) we have ∫ φ(w, x)dλ u = ∫ φ(w, u(w))dμ . Ω×X

Ω

Moreover, the map u → is an embedding of L0 (Ω, X) into Y(Ω × X). Therefore, we can view L0 (Ω, X) as a subspace of Y(Ω × X). Of course this also leads to an identification of X with a subspace of Y(Ω × X), that is, identify x ∈ X with the constant function u ≡ x. λu

We would like to have conditions that allow us to infer that a Young measure is associated with a measurable function. The next proposition provides a criterion to identify such Young measures. Proposition 4.7.8. If λ ∈ Y(Ω × X) and u ∈ L0 (Ω, X), then λ = λ u if and only if λ((Gr u)c ) = 0, where Gr u = {(w, x) ∈ Ω × X : u(w) = x} denotes the graph of u and (Gr u)c = (Ω × X) \ Gr u. Proof. ⇒: Let η u : Ω → Ω × X be defined by η u (w) = (w, u(w)). For every E ∈ Σ ⊗ B(X) we obtain λ(E) = λ u (E) = μ(η−1 u (E)) .

(4.7.2)

Note that Gr u ∈ Σ ⊗ B(X). Hence, E = (Ω × X) \ Gr u ∈ Σ ⊗ B(X) and η−1 u (E) = 0. Therefore, due to (4.7.2), λ((Ω × X) \ Gr u) = 0. ⇐: Let E ∈ Σ ⊗ B(X) and A = η−1 u (E) ∈ Σ. Then λ(E) = λ(E \ (Gr u)c ) ≤ λ(A × X) = μ(A) .

(4.7.3)

378 | 4 Banach Spaces of Functions and Measures Moreover we have μ(A) = λ(A × X) = λ((A × X) \ (Gr u)c ) ≤ λ(E) ,

(4.7.4)

since A × X \ (Gr u)c ⊆ E. From (4.7.3) and (4.7.4) we infer that u λ(E) = μ(A) = μ(η−1 u (E)) = λ (E) .

Hence, λ = λ u . Definition 4.7.9. (a) An integrand is a Σ ⊗ B(X)-measurable function φ : Ω × ℝ → ℝ∗ = ℝ ∪ {±∞}. (b) We say that the integrand φ is L1 -bounded if there exists k ∈ L1 (Ω) such that |φ(w, x)| ≤ k(w) μ-a.e. on Ω for all x ∈ X. (c) We say that a function φ : Ω × X → ℝ is Carathéodory if w → φ(w, x) is Σmeasurable for all x ∈ X and w → φ(w, x) is continuous for μ-a.e. w ∈ Ω; see also Definition 2.2.30. A Carathéodory function is an integrand; see Proposition 2.2.31. By Car(Ω × X) (resp. Carb (Ω × X)), we denote the set of all Carathéodory (resp. L1 -bounded Carathéodory) integrands. (d) We say that an integrand φ is normal if it takes values in ℝ = ℝ ∪ {+∞} and φ(w, ⋅) is lower semicontinuous for μ-a.e w ∈ Ω. By Nor(Ω × X) we denote the set of all normal integrands on Ω × X. Let d be the metric generating the topology of X. Using d and reasoning as in the proof of Proposition 1.7.6, we obtain the following approximation result. Proposition 4.7.10. If φ ∈ Nor(Ω × X), then there exists a sequence {φ n }n≥1 ⊆ Car(Ω × X) such that φ n (w⋅) is d-Lipschitz for every n ∈ ℕ and for μ-a.a. w ∈ Ω and φ ↗ φ on Ω × X. Moreover, if φ is L1 -bounded, then so are the φ n ’s. Using L1 -bounded Carathéodory integrands we can define a counterpart of the narrow topology (see Definition 4.6.11(a)), for the space of Young measures Y(Ω × X). Definition 4.7.11. The Young narrow topology on Y(Ω × X) is the weakest topology on Y(Ω × X) for which the maps ̂ ] dμ λ → I φ (λ) = ∫ φ(w, x)dλ = ∫ [∫ φ(w, x)λ(w)(dx) Ω×X Ω [X ] with λ̂ the disintegration of λ (see Theorem 4.7.1), and φ ∈ Carb (Ω×X), are all continuous. Y We denote by τY n (Ω × X) or simply τ n this topology on Y(Ω × X). Remark 4.7.12. From the definition above it is clear that τY n

τn (X)

λ n → λ if and only if λ n (A × ⋅) → λ(A × ⋅) for all A ∈ Σ .

(4.7.5)

In fact, using the Monotone Class Theorem, we can show that we can replace Σ in (4.7.5) with an algebra a such that Σ = σ(a).

4.7 Young Measures | 379

Using Definition 4.7.11, (4.7.5), and Proposition 4.6.18 together with Theorem 4.6.21, we easily reach the following result. Proposition 4.7.13. If {λ α }α∈I ⊆ Y(Ω × X) is a net and λ ∈ Y(Ω × X), then the following statements are equivalent: τY n

(a) λ α → λ; (b) for every φ = χ A ⊗ f with A ∈ Σ and a lower semicontinuous and bounded from below function f : X → ℝ, it holds that I φ (λ) ≤ lim inf I φ (λ α ) ; α

(c) for every φ = χ A ⊗ f with A ∈ Σ and for every f ∈ U db (X) = bounded, d-uniformly continuous functions on X, it holds I φ (λ α ) → I φ (λ); (d) λ(A × U) ≤ lim inf α λ α (A × U) for all A ∈ Σ and for all open sets U ⊆ X; (e) lim supα λ α (A × C) ≤ λ(A × C) for all A ∈ Σ and for all closed sets C ⊆ X. In the next proposition we establish the metrizability of Y(Ω × X). Proposition 4.7.14. If Σ is countably generated, then (Y(Ω × X, τY n ) is metrizable. Proof. Let a = {A n }n≥1 be a countable algebra such that Σ = σ(a). Also, as before, d is a metric generating the topology on X. For λ, λ∗ ∈ Y(Ω × X) we have λ(A n × ⋅), λ∗ (A n × ⋅) ∈ P(X) for all n ∈ ℕ and by Theorem 4.6.25, P(X) equipped with the τn (X)-topology is Polish. We know that the Dudley metric d D is compatible and d D (λ(A n × ⋅), λ∗ (A n × ⋅)) = sup [|I φ (λ) − I φ (λ∗ )| : φ = χ A n ⊗ f, f ∈ Lipb (X), ‖f‖Lipb ≤ 1] ≤ 2μ(A n ) ≤ 2 for all n ∈ ℕ. So, we can define e : Y(Ω × X) × Y(Ω × X) → ℝ+ by 1 d (ϑ(A n × ⋅), ϑ (A n × ⋅)) for all ϑ, ϑ ∈ Y(Ω × X) . n D 2 n≥1

e(ϑ, ϑ ) = ∑

It is easy to check that this is a metric on Y(Ω × X). If {λ α }α∈I ⊆ Y(Ω × X) is a net and λ ∈ Y(Ω × X), then according to Remark 4.7.12, we obtain τY n

λα → λ

if and only if

τn (X)

λ α (A n × ⋅) → λ(A n × ⋅) for all n ∈ ℕ .

Hence, τY n

λα → λ

if and only if

α

d D (λ α (A n × ⋅), λ(A n × ⋅)) → 0 for all n ∈ ℕ ,

α

Y which shows that e(λ α , λ) → 0. So, e generates τY n and we conclude that (Y(Ω × X), τ n ) is metrizable.

Every ϑ ∈ P(X) generates a Young measure λϑ = μ ⊗ ϑ , whose disintegration is λ̂ ϑ (w) = ϑ for all w ∈ Ω.

(4.7.6)

380 | 4 Banach Spaces of Functions and Measures Definition 4.7.15. λ ϑ ∈ Y(Ω × X) defined by (4.7.6) is called the Young measure corresponding to ϑ ∈ P(X). The map ϑ → λ ϑ is injective from P(X) into Y(Ω × X) and so we can say that P(X) ⊆ Y(Ω × X). Proposition 4.7.16. (a) τY n (Ω × X)P(X) = τ n (X); (b) P(X) is τY n -closed in Y(Ω × X). Proof. (a) We know that if {ϑ α }α∈I ⊆ P(X) is a net and ϑ ∈ P(X), then τn (X)

ϑα → ϑ

if and only if

ξ f (ϑ α ) → ξ f (ϑ) for all f ∈ Cb (X) ;

(4.7.7)

see Definition 4.6.11(a). Then for φ = χ A ⊗ f with A ∈ Σ and f ∈ Cb (X) we get from (4.7.7) that I φ (λ ϑ α ) = μ(A)ξ f (ϑ α ) → μ(A)ξ f (ϑ) = I φ (λ) .

(4.7.8)

Invoking Proposition 4.7.13, from (4.7.8) we conclude that τn (X)

ϑα → ϑ

if and only if

τY n

λ ϑα → λ ϑ .

(4.7.9)

τY n

(b) Let {ϑ α }α∈I be a net in P(X) such that λ ϑ α → λ ∈ Y(Ω × X). Let ϑ ∈ P(X) be defined by ̂ ϑ(C) = ∫ λ(w)(C)dμ

for all C ∈ B(X) .

(4.7.10)

Ω

Let λ ϑ = μ ⊗ ϑ. Then, by using (4.7.10), we obtain for φ = χ A ⊗ f with A ∈ Σ and f ∈ Cb (X) that ̂ ] dμ = μ(A)I φ (λ) I φ (λ ϑ ) = μ(A)ξ f (ϑ) = μ(A) ∫ [∫ f(x)λ(w)(dx) Ω [X ] ϑα = μ(A) lim I φ (λ ) = μ(A) lim ξ f (ϑ α ) = lim I φ (λ ϑ α ) , α

α

α

τY n

which shows that λ ϑ α → λ; see Proposition 4.7.13. From (4.7.9) we see that λ = λ ϑ ∈ P(X). Therefore P(X) is τY n -closed in Y(Ω × X). Proposition 4.7.17. If X, Y are Polish spaces with X homeomorphic to a subset of Y, then Y (Y(Ω × X, τY n ) is homeomorphic to a subset of (Y(Ω × Y), τ n ). Proof. Let j : X → Y be the homeomorphism between X and j(X). We have B(X) = j−1 (B(Y)). Let e : ca+ (B(X)) → ca+ (B(Y)) be defined by e(ϑ)(C) = ϑ(j−1 (C)). This is a homeomorphism between ca+ (B(X)) and e(ca+ (B(X))) ⊆ ca+ (B(Y)) when the spaces are equipped with their narrow topologies. Similarly for eP(X) with e(P(X)) ⊆ P(Y). ̂ We know that B(P(X)) = e−1 (B(P(Y))). Note that if λ ∈ Y(Ω × X) and λ(w) is its

4.7 Young Measures | 381

̂ ̂ disintegration, then η(w) is the disintegration of η ∈ Y(Ω × Y) and for all = e ∘ λ(w) E ∈ Σ ⊗ B(Y) we derive −1 ̂ ̂ η(E) = ∫ e(λ(w))(E(w))dμ = ∫ λ(w)(j (E w ))dμ Ω

Ω

(4.7.11)

−1 ̂ = ∫ λ(w)(k (E)w )dμ = λ(k−1 (E)) Ω

with k : Ω × X → Ω × Y defined by k(w, x) = (w, j(x)). Hence, the map H : Y(Ω × X) → Y(Ω × Y) defined by H(λ) = η is injective. τY n

Let {λ α }α∈I ⊆ Y(Ω × X) be a net and λ ∈ Y(Ω × X). We assume that λ α → λ. Then by Proposition 4.7.13, this is equivalent to saying that λ(A × U) ≤ lim inf λ α (A × U)

for all A ∈ Σ and for all open U ⊆ X .

α

Due to (4.7.11) this is equivalent to η(A × V) = λ(A × j−1 (V)) ≤ lim inf λ α (A × j−1 (V)) = lim inf η α (A × V) α

α

with η α = H(λ α ) and with A ∈ Σ and V ⊆ Y open. This proves that H is bicontinuous and so a homeomorphism. The next theorem provides a characterization of the τY n -topology in terms of Carathéodory integrands. τY n

Theorem 4.7.18. If {λ α }α∈I ⊆ Y(Ω × X) is a net and λ ∈ Y(Ω × X), then λ α → λ if and only if I φ (λ α ) → I φ (λ) for all φ ∈ Carb (Ω × X). Proof. ⇒: First assume that X is compact. Then from Proposition 4.2.33 we know that L1 (Ω, C(X)) = L∞ (Ω, ca(B(X))w∗ ) and w∗ = τn (X); see Proposition 4.6.13. We have Carb (Ω × X) = L1 (Ω, C(X)) and the set D = {φ = χ A ⊗ f : A ∈ Σ, f ∈ C(X)} is dense in Carb (Ω × X) = L1 (Ω, C(X)). So, the implication follows from Proposition 4.7.13. Now consider the general noncompact case. Since X is Polish, it is homeomorphic to a subset of the Hilbert cube ℍ = [0, 1]ℕ . Then by Proposition 4.7.17, Y(Ω × X) is homeomorphic to a subset of Y(Ω × ℍ). Since ℍ is compact, the result follows from the first part of the proof. ⇐: This is immediate from Definition 4.7.11. Using Proposition 4.7.10 we can state the following alternative characterization of convergence in the Young narrow topology. τY n

Theorem 4.7.19. If {λ α }α∈I ⊆ Y(Ω × X) is a net and λ ∈ Y(Ω × X), then λ α → λ if and only if I φ (λ) ≤ lim inf α I φ (λ α ) for all bounded from below functions φ ∈ Nor(Ω × X).

382 | 4 Banach Spaces of Functions and Measures We conclude with a compactness result similar to Prokhorov’s Theorem; see Theorem 4.6.31. Definition 4.7.20. A subset E ⊆ Y(Ω × X) is said to be Y-tight if for every ε > 0 there exists a compact set K ε ⊆ X such that λ(Ω × (X \ K ε )) < ε for all λ ∈ E. Remark 4.7.21. In fact the definition above is equivalent to each of the following conditions: (a) There exists f : X → ℝ+ = ℝ+ ∪ {+∞} such that for every η ≥ 0, f η = {x ∈ X : f(x) ≤ η} is compact for which for φ = χ Ω ⊗ f it holds that sup[I φ (λ) : λ ∈ E] < ∞. (b) For every ε > 0 there exists a tight set C ε ⊆ P(X) (see Definition 4.6.26), such that ̂ for λ ∈ E and with λ(w) being its disintegration, it holds that ̂ ̂ ∈ P(X) \ C ε } ∈ Σ and μ({w ∈ Ω : λ(w) ∈ P(X) \ C ε }) ≤ ε . {w ∈ Ω : λ(w) Using the notion of Y-tightness we can state the following Young counterpart of Theorem 4.6.31. The result can be found in Valadier [293]. Theorem 4.7.22. A set E ⊆ Y(Ω × X) is relatively sequentially τY n -compact if and only if E is tight. Remark 4.7.23. If C ⊆ L1 (Ω, ℝN ) is bounded and E = {λ u : u ∈ C}, then E ⊆ Y(Ω × ℝN ) is Y-tight. So, if {u n }n≥1 ⊆ C, then there exist a subsequence {u n k }k≥1 of {u n }n≥1 and τY n

λ ∈ Y(Ω × ℝN ) such that λ u nk → λ. Moreover, if {u n }n≥1 ⊆ C is uniformly integrable, then there exist a subsequence {u n k }k≥1 of {u n }n≥1 and u ∈ L1 (Ω, ℝN ), λ ∈ Y(Ω×) such that τY w n ̂ λ u nk → λ and u n → u in L1 (Ω, ℝN ) with u(w) = ∫ xd λ(w)(dx) k

ℝN

̂ with λ(w) being the disintegration of the Young measure λ. The limit function u is ̂ known as the barycenter of λ(w). Moreover we mention that if μ is nonatomic, then D = u 0 {λ : u ∈ L (Ω, X)} is dense in (Y(Ω × X), τY n ). Finally it is clear from Definitions 4.6.11(a), 4.7.6, and 4.7.11 that if C ∈ P(X) and E = {λ ϑ : ϑ ∈ C} ⊆ Y(Ω × X), then E is Y-tight if and only if C is tight. We conclude by providing a list of alternative equivalent definitions of Y-tightness; see Definition 4.7.20. Recall that f : X → [0, +∞] is inf-compact if for every η ≥ 0, f −1 ([0, η]) ⊆ X is compact. Evidently f is lower semicontinuous. An integrand f : Ω × X → [0, +∞] is said to be inf-compact if for every w ∈ Ω, f(w, ⋅) is inf-compact. Proposition 4.7.24. If X is Polish and E ⊆ Y(Ω × X), then the following statements are equivalent: (a) E is Y-tight. (b) There exists an inf-compact function f : X → [0, +∞] such that sup ∫ f(x)dλ(w, x) < +∞ . λ∈E

Ω×X

4.8 Remarks | 383

(c) There exists an inf-compact integrand ψ : Ω × X → [0, +∞] such that sup ∫ ψ(w, x)dλ(w, x) < +∞ . λ∈E

Ω×X

(d) For every ε > 0 there is a measurable multifunction F : Ω → 2X \ {0} with compact values such that ̂ sup ∫ λ(w)(X \ F(w))dμ < ε λ∈E

Ω

̂ with λ(w) being the disintegration of λ.

4.8 Remarks (4.1) As we already mentioned in Section 2.3, L p [0, 1] with p ≠ 2 is a Banach space, which was proven by Riesz [244, 245] while the completeness of L2 [0, 1] was established by Fischer [111] and Riesz [243]. Theorem 4.1.3 for the Hilbert space case, that is, p = 2, was established simultaneously by Fréchet [118] and Riesz [242] for Ω = [0, 1], that is, for L2 [0, 1]. For L p [0, 1] with 1 < p < ∞, the result was proven by Riesz [245]. For a finite measure space (Ω, Σ, μ) the result was proven by Dunford [92] and for an arbitrary measure space by McShane [211]. Theorem 4.1.5 is due to Schwartz [267]. The uniform convexity of the L p -spaces with 1 < p < ∞ stated in Proposition 4.1.6 is usually proven using the Clarkson inequalities; see Clarkson [68] and Hewitt–Stromberg [145, p. 225, p, 227]. Theorem 4.1.18 is due to Dunford–Pettis [93]. The notion of uniform integrability is very important with applications to different parts of mathematical analysis. The following result provides a sufficient condition for uniform integrability of a sequence in L1 (Ω). Proposition 4.8.1. If (Ω, Σ, μ) is a finite measure space and {f n }n≥1 ⊆ L1 (Ω) is bounded such that lim ∫ f n dμ = 0

n→∞

for all A ∈ Σ ,

A

then {f n }n≥1 ⊆ L1 (Ω) is uniformly integrable. In fact the proof of the result above leads to the following interesting proposition. Proposition 4.8.2. If (Ω, Σ, μ) is a finite measure space and C ⊆ L1 (Ω) is bounded but not uniformly integrable, then there exist sequences {f n }n≥1 ⊆ C and {A n }n≥1 ⊆ Σ mutually disjoint and ε0 > 0 such that ∫ f n dμ ≥ ε0 A n

for all n ∈ ℕ .

384 | 4 Banach Spaces of Functions and Measures In the problems of Chapter 2 the reader can find alternative equivalent definitions of the notion of uniform integrability. Proposition 4.1.21 and Lemma 4.1.22 are due to Brézis–Lieb [50]. The notion of bitting convergence (see Definition 4.1.25) is due to Chacon, and Theorem 4.1.24 can be found in Brooks–Chacon [53]. Mollification techniques (see Definition 4.1.27) are standard especially in the theory of Sobolev space; see Adams [1] and Brézis [48]. More detailed accounts on the Banach spaces of sequences can be found in Dunford-Schwartz [94] and Lindenstrauss-Tzafriri [201]. (4.2) Integration theories for vector valued functions appeared in the 1930s as a tool in the study of the differentiation properties of vector-valued functions. A second revival of the subject occurred in the 1970s with the study of the geometry of Banach spaces. Nowadays vector-valued integration and the associated Lebesgue-Bochner spaces constitute a mature subject in mathematical analysis with many applications such as in the theory of infinite dimensional dynamical systems and in the theory of infinite dimensional stochastic processes. The Pettis Measurability Theorem (see Theorem 4.2.4) can be found in Pettis [235]. The Bochner integral (see Definition 4.2.8) was first introduced by Bochner [34]. Various parts of the theory of Bochner integration can be found in the books of Denkowski-Migórski-Papageorgiou [77], Diestel-Uhl [80], Gasiński-Papageorgiou [124], Hille-Phillips [149], and Schwabik-Ye [266]. Two basic references for the theory of vector-valued measures (see Definition 4.2.23) are the books of Diestel-Uhl [80] and Dinculeanu [85]. For the notion of RNP (see Definition 4.2.23(d)) we refer to Diestel-Uhl [80]. Theorem 4.2.26 is essentially due to Bochner-Taylor [35] although not formulated in terms of the RNP. The book of A. and C. Ionescu Tulcea [162] is a very good reference for the notion of linear lifting; see Theorem 4.2.32. Theorem 4.2.37 is due to Dieudonné [82] and Dinculeanu-Foiaş [86] and can be found in the book of A. and C. Ionescu Tulcea [162, p. 95]. Another structural result for the Lebesgue-Bochner space L1 (Ω, X) is the following one due to Talagrand [283]. Recall that a Banach space is weakly complete if every weakly Cauchy sequence in X converges. Proposition 4.8.3. If (Ω, Σ, μ) is a finite measure space and X is a Banach space, which is weakly sequentially complete, then L1 (Ω, X) is weakly sequentially complete. Evolution triples (see Definition 4.2.39) are a basic tool in the study of evolution equations; see Hu-Papageorgiou [158], Lions [202], Roubíček [257], and Zeidler [314]. Lemma 4.2.48 is due to Ehrling [99] and in various forms it is used extensively in the theory of evolution equations. Theorem 4.2.49 is due to Aubin [16] and Lions [202]. For extensions of this result we refer to Roubíček [257, pp. 208, 211]. The Pettis integral was first introduced by Pettis [235]. Interest for it was revived in the 1970s in order to develop an integration theory for functions that are only weakly measurable or for which ‖f(⋅)‖ is not integrable. Detailed accounts on the theory of the Pettis integral can be found in Musial [227] and Talagrand [282]. (4.3) The notion of a function of bounded variation (see Definition 4.3.22) goes back to Jordan [167] who also proved that such a function is the difference of two increasing functions. For the theory of monotone functions we refer to Leoni [195]

4.8 Remarks | 385

and Natanson [229]. Theorem 4.3.13 is due to Vitali [296]. Covering Theorems like Theorem 4.3.13 are important in geometric measure theory. We can extend the notion of bounded variation to functions of many variables. Definition 4.8.4. Let Ω ⊆ ℝN be an open set. We say that f ∈ L1 (Ω) is of bounded variation in Ω if sup [∫ f div ϑdz : ϑ ∈ C1c (Ω, ℝN ) with |ϑ(x)| ≤ 1 for all z ∈ Ω] < ∞ . [Ω ] By BV(Ω) we denote the space of all functions on Ω that are of bounded variation. A function f ∈ L1loc (Ω) has locally bounded variation in Ω if for each open U with U ⊂⊂ Ω we have f ∈ BV(U). We denote the space of such functions by BVloc (Ω). A Lebesgue measurable set A ⊆ Ω has a finite perimeter (resp. locally finite perimeter if χ A ∈ BV(Ω) (resp. χ A ∈ BVloc (Ω)). The next theorem is the basic structural result for the space BVloc (Ω). Theorem 4.8.5. If f ∈ BVloc (Ω), then there exists a Radon measure μ on Ω and a μ-measurable function ξ : Ω → ℝN such that (a) |ξ(z)| = 1 μ-a.e.; (b) ∫Ω f div ϑdz = − ∫Ω (ϑ, ξ)ℝN dμ for all ϑ ∈ C1c (Ω, ℝN ). Remark 4.8.6. We usually write μ = ‖Df‖. Then f ∈ BV(Ω) if and only if ‖Df‖(Ω) < ∞ and ‖f‖BV = ‖f‖1 + ‖Df‖(Ω) is a norm on BV(Ω) making it a Banach space. For further details on the space BV(Ω) we refer to Evan-Gariepy [105], Leoni [195], and Ziemer [316]. (4.4) The notion of absolutely continuous functions is due to Lebesgue [192] and Vitali [296]. Lebesgue proved the fundamental theorem of the calculus for the Lebesgue integral while Vitali showed that a function is absolutely continuous if and only if it is an indefinite integral of an L1 -function; see Theorem 4.4.21. Extensions to vector valued functions can be found in Diestel-Uhl [80]. (4.5) The material on Sobolev spaces is standard and can be found in many books such as Adams [1], Brézis [48], Evans-Gariepy [105], and Leoni [195]. Let us also mention 1,p a result describing the dual of W0 (Ω); see Adams [1, Theorem 3.10, p. 50]. Theorem 4.8.7. If Ω ⊆ ℝN is open and 1 ≤ p < ∞, then N

1,p

∗ N p N W0 (Ω)∗ = {g ∈ C∞ c (Ω) : g = − ∑ D k h k for some h = (h k )k=1 ⊆ L (Ω, ℝ )} . k=1

1,p

We write W −1,p (Ω) = W0 (Ω)∗ with 1/p + 1/p = 1. (4.6) A good and complete reference for the space ca(B(X)) and its topologies is the book of Bourbaki [44]. Additional information can be found in the books of Aliprantis-

386 | 4 Banach Spaces of Functions and Measures Border [6], Bogachev [37], Dellacherie-Meyer [76], Dunford-Schwartz [94], FlorescuGodet-Thobie [112], and Schwartz [268]. The space of finitely additive set functions describes the bidual of L1 (Ω). Definition 4.8.8. Let (Ω, Σ) be a measurable space. We define ba(Σ) = {μ : Σ → ℝ : μ is finitely additive} . Equipped with the supremum norm ‖μ‖∞ = sup[|μ(A)| : A ∈ Σ], ba(Σ) is a Banach space. Another equivalent norm, making ba(Σ) a Banach space, is the total variation norm ‖μ‖ = |μ|(Ω). Let λ : Σ → ℝ+ = ℝ+ ∪ {+∞} be σ-finite. We define baλ (Σ) = {μ ∈ ba(Σ) : μ ≪ λ} . The next result characterizes the bidual of L1 (Ω), that is, the dual of L∞ (Ω), and can be found in Dunford-Schwartz [94, Theorem IV.8.16,p. 296]. Theorem 4.8.9. If (Ω, Σ, λ) is a σ-finite measure space, then L1 (Ω)∗∗ = L∞ (Ω)∗ = baλ (Σ). Continuing with the dual of L∞ (Ω) we will provide a more detailed description of its elements. Definition 4.8.10. Let (Ω, Σ, λ) be a σ-finite measure space and X a Banach space. (a) We say that η ∈ L∞ (Ω, X)∗ is absolutely continuous with respect to λ if there ∗ exists u ∈ L1 (Ω, Xw ∗ ) such that η(v) = ∫⟨u(w), v(w)⟩dλ

for all v ∈ L∞ (Ω, X) .

Ω

We say that u is the density of η and ‖η‖∗ = ‖u‖L1 (Ω,Xw∗ ∗ ) = ∫ ‖u(w)‖X∗ dλ . Ω

Hence, an absolutely continuous element of L∞ (Ω, X)∗ can be identified with its λ-density. (b) We say that η ∈ L∞ (Ω, X)∗ is singular with respect to λ if there exists a decreasing sequence {A n }n≥1 ⊆ Σ such that λ(A n ) ↘ 0 and η is supported by A n with n ∈ ℕ, that is v ∈ L∞ (Ω, X) with vA n = 0 for some n ∈ ℕ implies η(v) = 0 . The next result is due to Levin [198]. Theorem 4.8.11. If (Ω, Σ, λ) is a σ-finite measure space and X is a Banach space, then ∗ L∞ (Ω, X)∗ = L1 (Ω, Xw ∗ ) ⊕ L s with L s being the space of λ-singular functions, that is, if ∞ ∗ η ∈ L (Ω, X) , then η(v) = ∫⟨u(w), v(w)⟩dλ + η s (v) Ω

for all v ∈ L∞ (Ω)

4.8 Remarks | 387

∗ with u ∈ L1 (Ω, Xw ∗ ) and η s ∈ L s . Moreover,

‖η‖∗ = ‖u‖L1 (Ω,Xw∗ ∗ ) + ‖η s ‖∗ . (4.7) For a complete account of the theory of Young measures we refer to Balder [21, 22], Florescu-Godet-Thobie [112], Pedregal [234], Roubíček [256], and Valadier [292, 293]. Applications to mathematical economics, optimal control, and calculus of variations can be found in Balder [22], Pedregal [234], and Roubíček [256].

Problems Problem 4.1. Let (Ω, Σ, μ) be a measure space and for p ≥ 1 let L p (Ω)+ = {f ∈ L p (Ω) : f(w) ≥ 0 μ-a.e.}. Show that the map f → f ϑ is a homeomorphism from L p (Ω)+ onto L p/ϑ (Ω)+ for every ϑ > 0. p

Problem 4.2. Let (Ω, Σ, μ) be a measure space, 1 ≤ p, q ≤ ∞, and B1 = {f ∈ p L p (Ω) : ‖f‖p ≤ 1}. Show that the set B1 ∩ L q (Ω) is w-closed in L q (Ω). Problem 4.3. Let (Ω, Σ, μ) be a measure space, 1 ≤ p ≤ ∞, 1/p + 1/p = 1, and f : Ω → ℝ be Σ-measurable such that fg ∈ L1 (Ω) for all g ∈ L p (Ω). Show that f ∈ L p (Ω). Problem 4.4. Let (Ω, Σ, μ) be a finite measure space, 1 < p < ∞, {f n , f}n≥1 ⊆ L p (Ω), and assume that f n (w) → f(w) μ-a.e. and ‖f n ‖p → ‖f‖p as n → ∞. Show that f n → f in L p (Ω). Problem 4.5. Let (Ω, Σ, μ) be a measure space and f : Ω → X is a nonzero Σ-measurable function. We set T f = {p ∈ [1, ∞] : f ∈ L p (Ω)}. Show that T f is an interval. Problem 4.6. Let (Ω, Σ, μ) be a measure space, 1 < p < ∞, {f n }n≥1 ⊆ L p (Ω), and w assume that f n → f in L p (Ω) and lim supn→∞ ‖f n ‖p ≤ ‖f‖p . Show that f n → f in L p (Ω). Problem 4.7. Suppose that {f n , f}n≥1 ⊆ L1 [0, 1] and assume that f n (t) → f(t) a.e. on [0, 1]. Is it true that f n → f in L1 [0, 1]? Justify your answer. Problem 4.8. Let (Ω, Σ, μ) be a finite measure space and f, h : Ω → ℝ+ are two Σmeasurable functions such that μ({h > λ}) ≤

1 ∫ fdμ λ

for all λ > 0 .

{h>λ}

Show that ‖h‖p ≤ p/(p − 1)‖f‖p for all p ∈ (1, ∞). Problem 4.9. Let (Ω, Σ, μ) be a measure space, 1 ≤ p ≤ ∞ and {f n }n≥1 ⊆ L p (Ω) such that f n → f in L p (Ω). Show that there exists a subsequence {f n k }k≥1 of {f n }n≥1 such that for every ε > 0 there exists a set A ε ∈ Σ with μ(A ε ) < ε and f n k → f uniformly on X \ A ε and f n k (w) → f(w) μ-a.e.

388 | 4 Banach Spaces of Functions and Measures x

Problem 4.10. Let f ∈ L p (0, +∞), 1 < p < ∞ and for all x > 0 let F(x) = 1/x ∫0 f(s)ds. Show that F ∈ L p (0, ∞) and ‖F‖p ≤ p/(p − 1)‖f‖p . This inequality is known as “Hardy’s inequality.” w

Problem 4.11. Let (Ω, Σ, μ) be a measure space, 1 ≤ p ≤ ∞, {f n , f}n≥1 ⊆ L p (Ω), f n → f in L p (Ω), and f n (w) → f ̂(w) μ-a.e. Show that f(w) = f ̂(w) μ-a.e. Problem 4.12. Let (Ω, Σ, μ) be a semifinite measure space and f, h : Ω → ℝ+ = ℝ+ ∪ {+∞} be two Σ-measurable functions such that ∫ fdμ ≤ ∫ hdμ A

for all A ∈ Σ with μ(A) < +∞ .

A

Show that f(w) ≤ h(w) μ-a.e. Problem 4.13. Let (Ω, Σ, μ) be a measure space and f ∈ L1 (Ω). Show that μ({f = ±∞}) = 0. Problem 4.14. Let (X, d) be a separable metric space, μ a locally finite Borel measure, that is, for all x ∈ X, there exists r > 0 such that μ(B r (x)) > 0 where B r (x) = {u ∈ X : d(u, x) < r}, V a Banach space, and 1 ≤ p < ∞. (a) Show that C(X, V) ∩ L P (X, V) is dense in L p (T, V). (b) If μ is a Radon measure, then show that Cc (X, V) is dense in L p (X, V). Problem 4.15. Let (X, d) be a locally compact, separable metric space, μ a locally finite Borel measure on X (see Problem 4.14), V a Banach space, and 1 ≤ p < ∞. Show that Cc (X, V) is dense in L p (X, V). Problem 4.16. Let T = [0, b], H be a separable Hilbert space, and {f n }n≥1 ⊆ L2 (T, H) such that w (a) f n → f in L2 (T, H); (b) supn≥1 |f n (t)| ≤ M for a.a. t ∈ T; (c) there exists a countable dense set D of H such that {(h, f n (⋅))}n≥1 ⊆ L1 (T) is relatively w compact. Show that there exists a subsequence {f n k }k≥1 of {f n }n≥1 such that f n k (t) → f(t) in H for a.a. t ∈ T. Problem 4.17. Let Ω ⊆ ℝN be an open set, X a Banach space, D ⊆ X a dense subset, and 1 ≤ p < ∞. Show that Cc (Ω) ⊗ C is dense in L p (Ω, X). Problem 4.18. Let (Ω, Σ, μ) be a finite measure space with countably generated Σ, X is a separable Banach space and 1 ≤ p < ∞. Show that L p (Ω, X) is separable. Problem 4.19. Let (Ω, Σ, μ) be a σ-finite measure space and X is a Banach space. Show that simple functions are dense in L∞ (Ω, X) if and only if X is finite dimensional. Problem 4.20. Let (Ω, Σ, μ) be a measure space, X a Banach space, and f : Ω → X. Prove the following:

4.8 Remarks | 389

(a) If f is the μ-a.e. limit of a sequence of countably valued functions {h n }n≥1 , then f is essentially separably valued. (b) If f is as in (a), then there exists a sequence {h}̂ n≥1 of countably valued functions such that ĥ n → f uniformly on Ω \ N with μ(N) = 0. (c) If f is essentially separably valued and w → ‖f(w) − y‖ is Σ-measurable for all x ∈ X, then f is strongly measurable. Problem 4.21. Let (Ω, Σ, μ) be a finite measure space, Σ0 ⊆ Σ a sub-σ-algebra, and X a Banach space. Show that there exists a unique operator E Σ0 ∈ L(L1 (Ω, Σ, X), L1 (Ω, Σ0 , X)) such that ∫ fdμ = ∫ E Σ0 fdμ A

for all f ∈ L1 (Ω, Σ, X) .

A

E Σ0 is the conditional expectation of f with respect to Σ0 . Problem 4.22. Show that L1 (0, 1) is not the dual space of any normed space V. Problem 4.23. Let (Ω, Σ, μ) and (X, L, λ) be two measure spaces, A ∈ L(L1 (Ω), L1 (X)), and D ⊆ L1 (Ω) be a uniformly integrable set. Show that A(D) ⊆ L1 (X) is uniformly integrable. Problem 4.24. Let (X, ‖ ⋅ ‖) and (H, | ⋅ |) be two Hilbert spaces with X → H compactly and densely and let {f n }n≥1 ⊆ L2 (T, X) with T = [0, b]. Show that the following two statements are equivalent: (a) f n → f in L2 (T, H); w (b) f n (t) → f(t) a.e. in H and limλ(A)→0 supn≥1 ∫A |f n (t)|2 dt = 0. Problem 4.25. Show that the composition of two functions of bounded variation functions need not be of bounded variation. Similarly show this for absolutely continuous functions. Problem 4.26. Is the uniform limit of absolutely continuous functions an absolutely continuous function? Justify your answer. Problem 4.27. Find a condition that guarantees that the pointwise limit of a sequence of absolutely continuous functions is an absolutely continuous function. Hint: Recall the Arzela-Ascoli Theorem. Problem 4.28. Suppose that f : [a, b] → ℝ is continuous and of bounded variation. Assume that for every c ∈ (a, b), f ∈ AC([a, c]). Show that f ∈ AC([a, b]). Problem 4.29. Suppose f ∈ AC([a, b]) and let p ≥ 1. Show that |f|p ∈ AC([a, b]). Problem 4.30. Let A ⊆ [0, 1] be a measurable set such that λ(A ∩ [a, b]) ≥ ϑ(b − a) for some ϑ > 0 and for all 0 ≤ a ≤ b ≤ 1 where λ stands for the Lebesgue measure on ℝ. Show that λ(A) = 1.

390 | 4 Banach Spaces of Functions and Measures Problem 4.31. Let A ⊆ ℝ be not necessarily Lebesgue measurable. Show that lim

δ→0

λ∗ (A ∩ [t − δ, t + δ]) =1 2δ

for a.a. t ∈ ℝ ,

where λ∗ denotes the Lebesgue outer measure. Problem 4.32. Let f : [a, b] → ℝ be continuous and η < var[a,b] f ∈ [0, +∞]. Show that there exists δ > 0 such that for every partition a = x0 < x1 < . . . < x n = b with max{x k+1 − x k : k = 0, 1, . . . , n − 1} < δ it holds that n−1

∑ |f(x k+1 ) − f(x k )| > η . k=0

Problem 4.33. Show that BV([a, b]) is compactly embedded into L p ([a, b]). b

Problem 4.34. Let f ∈ BV([a, b]) and let f = 1/(b − a) ∫a fdx. Show that b

∫ |f(x) − f |dx ≤ a

b

b−a ∫ |f |dx 2 a

and that this inequality can be strict. Problem 4.35. Let f ∈ C1 (a, b). Show that f ∈ BV([a, b]) if and only if f ∈ L1 (a, b) and we have var[a,b] f = ‖f ‖1 . Problem 4.36. Find a bounded function f such that f ∈ BVloc (ℝ) but f ∈ ̸ BV(ℝ). Problem 4.37. Let f(x) = x2 sin(π/x) if x ∈ (0, 1], f(0) = 0, and f ̂(x) = x2 sin(π/x2 ) if x ∈ (0, 1], f ̂(0) = 0. Show that f is absolutely continuous, but f ̂ is not. Problem 4.38. Let X, Y, Z be Banach spaces with X reflexive, X → V continuously, and K ∈ Lc (X, Y). Show that for every ε > 0 there exists c ε > 0 such that ‖K(x)‖Y ≤ ε‖x‖X + c ε ‖x‖V

for all x ∈ X . 1,p

Problem 4.39. Let Ω ⊆ ℝN be an open set, 1 ≤ p ≤ ∞, and u ∈ W0 (Ω) with u ≥ 0. Show that there exists a sequence {u n }n≥1 ⊆ C∞ c (Ω) with u n ≥ 0 for all n ∈ ℕ such that 1,p u n → u in W0 (Ω). Problem 4.40. Let Ω ⊆ ℝN be an open set, 1 ≤ p < ∞, u ∈ W 1,p (Ω), and u vanishes 1,p outside a compact set K ⊆ Ω. Show that u ∈ W0 (Ω). Problem 4.41. Let Ω ⊆ ℝN be a bounded open set, 1 ≤ p < ∞, and u ∈ W 1,p (Ω) be 1,p such that limz→x u(z) = 0 for all x ∈ ∂Ω. Show that u ∈ W0 (Ω). Problem 4.42. Let {u n }n≥1 ⊆ W 1,1 (0, b) be a sequence such that u n (t) → u(t) for a.a. t ∈ (0, 1) and there exists h ∈ L1 (0, 1) such that |un (t)| ≤ h(t) for a.a. t ∈ T and for all n ∈ ℕ. Show that u n → u uniformly on [0, 1].

4.8 Remarks | 391

Problem 4.43. Let u ∈ W 1,p (0, 1) with 1 ≤ p < ∞ and let Z = {t ∈ (0, 1) : u(t) = 0}. Show that u (t) = 0 for a.a. t ∈ Z. Problem 4.44. Let Ω ⊆ ℝN be a bounded open set with Lipschitz boundary, p ∈ (1, N), and r ∈ (1, ((N − 1)p)/(N − p)). Show that for every ε > 0 there exists c ε > 0 such that ‖γ0 (u)‖L r (∂Ω) ≤ ε‖Du‖L p (Ω) + c ε ‖u‖L p (Ω)

for all u ∈ W 1,p (Ω) .

Problem 4.45. Let Ω ⊆ ℝN be a bounded open set with Lipschitz boundary ∂Ω. Consider u ∈ C1 (Ω) ∩ W 1,p (Ω) with 1 < p < ∞, and assume that it has finitely many nodal domains where a nodal domain of u is a connected component of Ω \ Z(u) with Z(u) = {z ∈ Ω : u(z) = 0}. Show that for any nodal domain Ω0 , u0 = χ Ω0 u ∈ W 1,p (Ω) and {u(z) if z ∈ ∂Ω ∩ ∂Ω0 , u0 ∂Ω (z) = { 0 if z ∈ ∂Ω \ ∂Ω0 . {

Problem 4.46. Let Ω ⊆ ℝN be a bounded open set, 1 ≤ p < ∞, and h∗ ∈ L p (Ω) ⊆ W −1,p (Ω). Show that ⟨h∗ , u⟩ = ∫ h∗ udz

1,p

for all u ∈ W0 (Ω) .

Ω

Problem 4.47. Let X be a Hausdorff topological space. Show that car (B(X)) is a closed subspace of ca(B(X)). Hence it is itself a Banach space. Problem 4.48. Let X be a locally compact topological space, {μ α }α∈I ⊆ caR + (B(X)) be a net, and μ ∈ caR + (B(X)). Show that τn (X)

τv (X)

μ α → μ if and only if μ α → μ and μ α (X) → μ(X) . Problem 4.49. Let (X, Σ) be a measurable space. Show that the Banach space ca(Σ) is weakly complete. Problem 4.50. Let X be a separable metric space and D ⊆ X a countable dense subset. Show that the set of all probability measures supported by finite subsets of D is dense in (P(X), τn (X)). Problem 4.51. Let X, Y be separable metric spaces, h n , h : X → Y with n ∈ ℕ be Borel maps with h continuous, h n → h uniformly on compact subsets of X, and {μ n }n≥1 ⊆ P(X) τn (X)

τn (Y)

be tight with μ n → μ ∈ P(X). Show that μ n ∘ h−1 → μ ∘ h−1 . n Problem 4.52. Let (X, Σ) be a measurable space and {μ n , μ}n≥1 ⊆ ca(Σ). Show that w μ n → μ in ca(Σ) if and only if μ n (A) → μ(A) for all A ∈ Σ. w

Problem 4.53. Let (X, Σ) be a measurable space, {μ n }n≥1 ⊆ ca+ (Σ), μ n → μ, and f n : X → [0, +∞) be a sequence of Σ-measurable functions such that f n (x) → f(x) for all x ∈ X. Show that ∫X fdμ ≤ lim inf n→∞ ∫X f n dμ n .

392 | 4 Banach Spaces of Functions and Measures Problem 4.54. Let (Ω, Σ, μ) be a complete probability space, X a Polish space, and u ∈ L0 (Ω, X). Let λ u be the Young measure associated with u (see Definition 4.7.6), and define λ u (Ω) = {δ u(w) : w ∈ Ω} ⊆ P(X). Show that λ u (Ω) is tight if and only if u(Ω) ⊆ X is relatively compact. Problem 4.55. Let X be a metric space and μ, λ ∈ P(X) such that ∫X fdμ = ∫X fdλ for all f ∈ Ub (X). Show that μ = λ. Problem 4.56. Let V, X be separable metric spaces, λ̂ : V → P(X) be a Borel map, and ̂ f ∈ C(V × X). Show that v → g(v) = ∫X f(v, x)λ(v)(dx) is continuous on V. Problem 4.57. Let Ω ⊆ ℝN be bounded and open, C ⊆ W 1,p (Ω) with 1 < p < ∞ be w closed and convex, {u n }n≥1 ⊆ C, u ∈ L p (Ω), y ∈ L p (Ω, ℝN ), and u n → u in L p (Ω), w Du n → y in L p (Ω, ℝN ). Show that u ∈ C and y = Du. Problem 4.58. Let Ω ⊆ ℝN be bounded and open, {u n }n≥1 ⊆ W 1,p (Ω) with 1 < p < ∞ w be bounded, and u n (z) → u(z) a.e. in Ω. Show that u ∈ W 1,p (Ω) and u n → u in W 1,p (Ω). w

Problem 4.59. Let {f n }n≥1 ⊆ L p (0, b) with 1 < p < ∞ and assume that f n → f in L p (0, b), f n → f in W −1,p (0, b). Show that f n → f in L p (0, b). Problem 4.60. Let X be a locally compact separable metric space, f : X → ℝ+ a lower τn (X)

semicontinuous function, and {μ n , μ}n≥1 ⊆ ca+ (B(X)) such that μ n → μ and μ n ≤ λ for all n ∈ ℕ and some λ ∈ ca+ (B(X)) such that ∫X fdλ < ∞. Show that ∫X fdμ n → ∫X fdμ.

5 Convex Functions – Nonsmooth Analysis Convex sets and convex functions are a basic tool in many parts of mathematical analysis as well as in many applied fields such as optimization, optimal control, game theory, mathematical economics, and others. They exhibit many interesting properties that lead to remarkable results. For convex sets, topological, algebraic, and geometric notions often coincide. Convex functions have properties that lead to many fruitful continuity properties and a coherent differentiability theory. Moreover, local minima turn out to be global ones. Their systematic study started in the early 1960s. This effort led to a rich theory of convex sets and functions known as “Convex Analysis.” One of the main features of this theory is duality, which provides significant insight into convex optimization in the context of applications. In addition, convex analysis permits the treatment of nonsmooth functions and provides a calculus for convex functions that goes well beyond the classical one. In the absence of smoothness and convexity, the situation becomes more complicated. A major step in this direction was made by considering locally Lipschitz functions. An effective calculus as well as powerful optimality conditions were produced for this class of functions, extending the corresponding theory for continuous and convex functions. The body of these results constitute what is known nowadays as “Nonsmooth Analysis.”

5.1 Convex Functions – Continuity Properties In this chapter we often deal with extended real valued functions. So, we set ℝ∗ = ℝ ∪ {±∞} and ℝ = ℝ ∪ {+∞}. The operations on ℝ∗ and ℝ are defined as usual. In addition, we set 0 ⋅ (±∞) = (±∞) ⋅ 0 = 0. However, we do not define (+∞) − ∞. Definition 5.1.1. Let X be a real vector space and f : X → ℝ∗ . (a) We say that f is convex if f(λx + (1 − λ)u) ≤ λf(x) + (1 − λ)f(u)

(5.1.1)

for all x, u ∈ X and for all λ ∈ [0, 1] provided the right-hand side is defined. (b) We say that f is strictly convex if f(λx + (1 − λ)u) < λf(x) + (1 − λ)f(u) for all x, u ∈ X with x ≠ u and for all λ ∈ [0, 1] provided the right-hand side is defined. (c) The set dom f = {x ∈ X : f(x) < +∞} is called the effective domain of f . Moreover, the epigraph of f is the set epi f = {(x, η) ∈ X × ℝ : f(x) ≤ η} . https://doi.org/10.1515/9783110532982-005

394 | 5 Convex Functions – Nonsmooth Analysis (d) We say that f is proper if dom f ≠ 0 and f(x) > −∞ for all x ∈ X. (e) We say that f is concave (resp. strictly concave) if −f is convex (resp. strictly convex). Remark 5.1.2. In Definitions 5.1.1 (a) and (b), the right-hand side is not defined only if f(x) = ±∞ and f(u) = ∓∞. Proposition 5.1.3. If X is a vector space and f : X → ℝ∗ , then the following hold: (a) f is convex if and only if epi f ⊆ X × ℝ is convex; (b) if f is convex, then dom f is convex. Proof. (a) ⇒: Let (x, η), (u, ϑ) ∈ epi f and let λ ∈ (0, 1). Since f is convex, we obtain f(λx + (1 − λ)u) ≤ λf(x) + (1 − λ)f(u) ≤ λη + (1 − λ)ϑ . This shows that (λx + (1 − λ)u, λη + (1 − λ)ϑ) ∈ epi f . But (λx + (1 − λ)u, λη + (1 − λ)ϑ) = λ(x, η) + (1 − λ)(u, ϑ). Therefore, epi f is convex. ⇐: Note that dom f = p X (epi f) where p X denotes the projection operator. So, dom f ⊆ X is convex. It suffices to check (5.1.1) on dom f . So, let x, u ∈ dom f and consider η, ϑ ∈ ℝ such that f(x) ≤ η and f(u) ≤ ϑ, that is, (x, η), (u, ϑ) ∈ epi f . By hypothesis, one has λ(x, η) + (1 − λ)(u, ϑ) ∈ epi f for every λ ∈ [0, 1]. Hence, f(λx + (1 − λ)u) ≤ λη + (1 − λ)ϑ. If both f(x), f(u) ∈ ℝ, then we can take η = f(x), ϑ = f(u). This gives f(λx + (1 − λ)u) ≤ λf(x) + (1 − λ)f(u). If f(x) = −∞ and f(u) = −∞, then we let η ↘ −∞ and ϑ ↘ −∞ and again inequality (5.1.1) is satisfied. (b) Since f is convex, the epigraph of f is convex; see part (a). Recall that dom f = p X (epi f). Remark 5.1.4. Note that if f : D ⊆ X → ℝ, then we introduce f ̂ : X → ℝ by setting {f(x) if x ∈ D , f ̂(x) = { +∞ if x ∈ X \ D , { that is, we impose an infinite penalty if we violate the constraint D. So by considering ℝ-valued functions, we can deal only with functions defined on all of X. In this respect, the following definition is useful. Definition 5.1.5. Let X be a vector space and let D ⊆ X. The indicator function i D : X → ℝ is defined by {0 if x ∈ D , i D (x) = { +∞ if x ∈ X \ D . { Evidently, D ⊆ X is convex if and only if i D is a convex function. Moreover, dom i D = D. Thus the study of convex sets is reduced to the study of convex functions. The following proposition is an easy consequence of Definition 5.1.1.

5.1 Convex Functions – Continuity Properties | 395

Proposition 5.1.6. If X is a vector space, f, h : X → ℝ are convex functions and η, ϑ ≥ 0, then ηf + ϑh : X → ℝ is convex as well. Proposition 5.1.7. If X is a vector space and f α : X → ℝ with α ∈ I is a family of convex functions, then f = supα∈I f α is convex as well. Proof. Note that epi f = ⋂α∈I epi f α . Therefore, epi f is convex and so the result follows from Proposition 5.1.3. The next proposition shows that it is quite pathological for a convex function to attain the value −∞. Proposition 5.1.8. If X is a topological vector space, f : X → ℝ∗ is convex, and there exists x0 ∈ int dom f , then f is proper. Proof. Arguing by contradiction, suppose that we can find u ∈ X such that f(u) = −∞. Since x0 ∈ int dom f , for small enough λ ∈ (0, 1), we get x̂ = x0 + λ(x0 − x) ∈ dom f . Note that x0 = 1/(1 + λ)x̂ + λ/(1 + λ)u. Then the convexity of f implies that f(x0 ) ≤

1 λ f(x)̂ + f(u) = −∞ , 1+λ 1+λ

a contradiction. Therefore, f(x) > −∞ for all x ∈ X, that is, f is proper. Proposition 5.1.9. If X is a vector space, f : X → ℝ is convex and h : ℝ → ℝ is convex and nondecreasing, then h ∘ f : X → ℝ is convex as well. Proof. Let x, u ∈ X and let λ ∈ [0, 1]. Since f is convex, that is, f(λx + (1 − λu)) ≤ λf(x) + (1 − λ)f(u), we obtain h(f(λx + (1 − λ)u)) ≤ h(λf(x) + (1 − λ)f(u)) ≤ λh(f(x)) + (1 − λ)h(f(u)) because h is nondecreasing and convex. Therefore h ∘ f is convex. Remark 5.1.10. From the proof above it is clear that if f is strictly convex and h is convex and strictly increasing, then h ∘ f is strictly convex. Corollary 5.1.11. If X is a normed space, then x → ‖x‖p is convex if and only if p ≥ 1. Moreover, if X is strictly convex, then x → ‖x‖p is strictly convex if and only if p > 1. Definition 5.1.12. Let X be a vector space and let f k : X → ℝ∗ for k = 1, . . . , n be proper functions. We define n

n

f(x) = inf [ ∑ f k (x k ) : x = ∑ x k ] . k=1

k=1

Then f is called the infimal convolution of the f k ’s and is denoted by n

f = ⨁ fk . k=1

396 | 5 Convex Functions – Nonsmooth Analysis We say that the infimal convolution is exact at x if there exists a sequence {x k }nk=1 ⊆ X such that n

n

x = ∑ xk

and

k=1

f(x) = ∑ f(x k ) . k=1

Remark 5.1.13. If n = 2, then (f1 ⊕ f2 )(x) = inf[f1 (x − u) + f2 (u) : u ∈ X], which reminds us of the formula for the usual convolution (f1 ∗ f2 )(x) = ∫ℝN f1 (x − u)f2 (u)du if X = ℝN . Clearly, if the f k ’s are convex, then so is f but may fail to be proper. In order to see this, let f1 = i C1 and f2 = i C2 with C1 , C2 ⊆ X be nonempty, disjoint, and convex. Then f1 ⊕ f2 ≡ +∞. Similarly, if f1 and f2 are linear with f1 ≠ f2 , then f1 ⊕ f2 = −∞. Moreover, from Definition 5.1.12 we get (f1 ⊕ f2 )(x) = inf [η ∈ ℝ : there exists x ∈ X such that (x, η) ∈ (epi f1 + epi f2 )] . In addition, it is easily seen that f1 ⊕ f2 = f2 ⊕ f1

and

f1 ⊕ (f2 ⊕ f3 ) = (f1 ⊕ f2 ) ⊕ f3 .

Example 5.1.14. (a) Let f1 = f and f2 = δ{x0 } . Then (f1 ⊕ f2 )(x) = f(x − x0 ). So, if x0 = 0, then (f1 ⊕ f2 )(x) = f(x). (b) Let X be a normed space, C ⊆ X be nonempty and convex, and f1 (x) = ‖x‖, f2 (x) = i C (x) for all x ∈ X. Then (f1 ⊕ f2 )(x) = inf[‖x − u‖ + δ C (u) : x ∈ X] = inf[‖x − u‖ : u ∈ C] = d(x, C), where d(x, C) stands for the distance of x from C. Since f1 is convex (see Corollary 5.1.11), and f2 is also convex, because C ⊆ X is convex, it follows that x → d(x, C) is convex; see Remark 5.1.13. Definition 5.1.15. Let X be a vector space and q : X → ℝ∗ . We say that q is sublinear if the following hold: (a) q is proper; (b) q is positively homogeneous, that is, q(λx) = λq(x) for all λ > 0 and for all x ∈ X; (c) q is subadditive, that is, q(x + u) ≤ q(x) + q(u) for all x, u ∈ X. Remark 5.1.16. From this definition it follows that a sublinear function q is convex. Moreover, q(0) = 0. If X is a normed space, then q(x) = ‖x‖ is sublinear. In fact, a sublinear function can be seen as a generalization of the norm in a vector space. Given a convex absorbing set A ⊆ X, let ρ A be the Minkowski function of A; see Definition 3.1.37. Then ρ A is sublinear. Another important sublinear function is given in the next definition. Definition 5.1.17. Let X be a normed space and A ⊆ X a nonempty set. The support function of A is the function σ(⋅; A) : X ∗ → ℝ defined by σ(x∗ ; A) = sup [⟨x∗ , u⟩ : u ∈ A] . Evidently, σ(⋅; A) is sublinear and σ(⋅; A) = σ(⋅; conv A). Now we turn our attention to the continuity properties of convex functions.

5.1 Convex Functions – Continuity Properties | 397

Proposition 5.1.18. If X is a locally convex space, f : X → ℝ is proper and convex, and for x0 ∈ dom f there exists U ∈ N(x0 ) such that f U is bounded above, then f is continuous at x0 . Proof. Replacing f with f ̂(y) = f(x0 + y) − f(x0 ) and y ∈ X if necessary, we assume, without any loss of generality, that x0 = 0 and f(0) = 0. By hypothesis we have f(x) ≤ M for all x ∈ U ∈ N(0). Let V = U ∩ (−U). Then V ∈ N(0), and it is symmetric. Let λ ∈ (0, 1) and x ∈ λV. From the convexity of f it follows that 1 f(x) ≤ λf ( x) ≤ εM , λ

(5.1.2)

since x = (1 − λ)0 + λ1/λx and f(0) = 0. Note, since V is symmetric, that 1 − x∈V⊆U λ

and 0 =

This gives 0 = f(0) ≤

1 λ 1 x+ (− x) . 1+λ 1+λ λ

1 λ 1 f(x) + f (− x) . 1+λ 1+λ λ

Hence, 1 f(x) ≥ −λf (− x) ≥ −λM . λ

(5.1.3)

From (5.1.2) and (5.1.3) it follows that |f(x)| ≤ λx for all x ∈ λV and this implies the continuity of f at x0 = 0. Proposition 5.1.19. If X is a normed space, and f : X → ℝ is convex and continuous at x0 ∈ int dom f , then f is Lipschitz on some neighborhood of x0 . Proof. Since f is continuous at x0 there exist M, δ > 0 such that |f(x)| ≤ M

for all x ∈ B2δ (x0 ) ⊆ int dom f .

(5.1.4)

Let x, u ∈ B δ (x0 ) with x ≠ u and ϑ = ‖u − x‖ > 0. We set v=u+

δ u−x δ δ (u − x) = u + δ = (1 + ) u − x . ϑ ‖u − x‖ ϑ ϑ

(5.1.5)

Then, (5.1.5) implies u − x ‖v − x0 ‖ = u − x0 + δ ≤ ‖u − x0 ‖ + δ < 2δ . ‖u − x‖ Hence v ∈ B2δ (x0 ). Moreover, from (5.1.5) it follows that u = ϑ/(ϑ + δ)v + δ/(ϑ + δ)x, which gives, due to the convexity of f , that f(u) ≤ ϑ/(ϑ + δ)f(v) + δ/(ϑ + δ)f(x). Since v ∈ B2δ (x0 ) and because of (5.1.4) we then derive f(u) − f(x) ≤

ϑ ϑ 2M 2M ≤ ‖u − x‖ . [f(v) − f(x)] ≤ ϑ+δ ϑ+δ δ

(5.1.6)

398 | 5 Convex Functions – Nonsmooth Analysis Interchanging the roles of x and u in the argument above, we also get f(x) − f(u) ≤

2M ‖x − u‖ . δ

(5.1.7)

From (5.1.6) and (5.1.7) we conclude that |f(x) − f(u)| ≤

2M ‖x − u‖ δ

for all x, u ∈ B δ (x0 ) .

Proposition 5.1.20. If X is a normed space and f : X → ℝ is proper and convex, then the following statements are equivalent: (a) f is Lipschitz on some neighborhood of x0 ; (b) f is continuous at x0 ; (c) f is bounded above on some neighborhood of x0 . Proof. Clearly, the implications (a) ⇒ (b) ⇒ (c) are easy to verify. Suppose that (c) holds. Then f(x) ≤ M for some M > 0 and for all x ∈ B2δ (x0 ). From the proof of Proposition 5.1.19, we obtain that f B δ (x0 ) is 2M/δ-Lipschitz. Proposition 5.1.21. If X is a Banach space and f : X → ℝ is lower semicontinuous, proper, and convex, then the following statements are equivalent: (a) f is continuous at x0 ; (b) x0 ∈ int dom f . Proof. (a) ⇒ (b): This implication is clear. (b) ⇒ (a): Replacing f with f ̂(x) = f(x0 + x) if necessary, we may assume that x0 = 0. Let C n = {x ∈ X : max{f(x), f(−x)} ≤ n} . The lower semicontinuity of f implies that each C n is closed and clearly X = ⋃n≥1 nC n . Then the Baire Category Theorem (see Theorem 1.5.68) implies that int n0 C n0 ≠ 0 for some n0 ∈ ℕ. Therefore, 0 ∈ int C n0 . Then, by Proposition 5.1.20, we see that f is continuous at x0 = 0. Definition 5.1.22. Let X be a normed space and let f : → ℝ. We say that f is locally Lipschitz if for every x ∈ X there exist U ∈ N(x) and k U > 0 such that |f(u) − f(v)| ≤ k U ‖u − v‖

for all u, v ∈ U .

As a consequence of Propositions 5.1.21 and 5.1.20 and of the definition above, we can state the following corollary. Corollary 5.1.23. If X is a Banach space and f : X → ℝ is lower semicontinuous and convex, then f int dom f is locally Lipschitz. In particular, a continuous and convex function f : X → ℝ is locally Lipschitz. Proposition 5.1.24. If X is a normed space and f : X → ℝ is proper and convex, then the following statements are equivalent:

5.1 Convex Functions – Continuity Properties | 399

(a) f is bounded above in a neighborhood of x0 ; (b) f is continuous at x0 ∈ X; (c) int epi f ≠ 0; (d) int dom f ≠ 0 and f int dom f is continuous. Moreover, if one of the statements above holds, then int epi f = {(x, η) ∈ X × ℝ : x ∈ int dom f, f(x) < η} . Proof. (a) ⇐⇒ (b): This equivalence is Proposition 5.1.20. (a) ⇒ (c): Let U ∈ N(x0 ) such that f U ≤ M. Then U ⊆ int dom φ and {(x, η) ∈ U × ℝ : M < η} ⊆ epi f . Hence, int epi f ≠ 0. (c) ⇒ (a): Let (x0 , η) ∈ int epi f . Then there exist U ∈ N(x0 ) and ε > 0 such that U × [η − ε, η + ε] ⊆ epi f . Hence, U × {η} ⊆ epi f and so f U ≤ η. (a) ⇒ (d): As before, without any loss of generality, we may assume that x0 = 0. Let U ∈ N(x0 ) be such that f U ≤ M for some M > 0. Then U ⊆ dom f and so int dom f ≠ 0. Note that the set dom f is convex. So, if x ∈ int dom f , there is λ > 1 such that v = λx ∈ dom f ; see Proposition 3.1.26. We set V = x + (1 − 1/λ)U ∈ N(x). If y ∈ V, then we obtain y = x + (1 − 1/λ)u with u ∈ U. From the convexity of f we derive 1 1 1 1 f(y) = f ( v + (1 − ) u) ≤ f(v) + (1 − ) f(u) λ λ λ λ 1 1 ≤ f(v) + (1 − ) M = M̂ . λ λ This shows that f V is bounded above and so f int dom f is continuous. (d) ⇒ (a): This is clear. Finally let D = {(x, η) ∈ X × ℝ : x ∈ int dom f, f(x) < η}. Clearly, int epi f ⊆ D. Let x ∈ int dom f with f(x) < η. We choose μ ∈ ℝ such that f(x) < μ < η. By hypothesis f int dom f is continuous, so there exists U ∈ N(x) with U ⊆ int dom f and f U < μ. Then U × (μ, +∞) ⊆ int epi f . Therefore, D ⊆ int epi f , which means that D = int epi f . In finite dimensional spaces the situation is simpler. Proposition 5.1.25. If X is a finite dimensional vector space and f : X → ℝ is convex, then f int dom f is locally Lipschitz. Proof. Let x ∈ int dom f . We can find δ > 0 and {e n }N+1 n=1 ⊆ X, where N = dim X, such that B δ (x) ⊆ conv {e n }N+1 n=1 ⊆ dom f ; see Theorem 3.1.30. So, if u ∈ B δ (x), there exists {λ n }N+1 n=1 ⊆ [0, 1] such that N+1

∑ λn = 1 n=1

N+1

and

u = ∑ λn en . n=1

The convexity of f implies that N+1

f(u) ≤ ∑ λ n f(e n ) ≤ max[f(e n ) : 1 ≤ n ≤ N + 1] = M , n=1

which shows that f int dom f is locally Lipschitz; see Proposition 5.1.20.

400 | 5 Convex Functions – Nonsmooth Analysis Remark 5.1.26. Comparing the proposition above with Corollary 5.1.23, we see that in the infinite dimensional case, we need the extra condition that f is lower semicontinuous. Definition 5.1.27. Let X be a normed space. (a) Let x∗ ∈ X ∗ and η ∈ ℝ. A function a : X → ℝ of the form a(x) = ⟨x∗ , x⟩ + η

for all x ∈ X

is said to be a continuous affine function. We denote the set of such functions by Aff(X). (b) We define the following sets Γ(X) = {f : X → ℝ∗ : f(x) = sup[a(x) : a ∈ Aff(X), a ≤ f]} Γ0 (X) = {f ∈ Γ(X) : f is proper} . Evidently Γ0 (X) ⊆ Γ(X) and both are cones, that is, they are closed under positive scalar multiplication. The next proposition characterizes these cones. Proposition 5.1.28. If X is a normed space and f : X → ℝ∗ , then f ∈ Γ(X) if and only if f is lower semicontinuous and convex. Moreover if f attains the value −∞, then f ≡ −∞. Proof. ⇒: The pointwise supremum of an empty set of functions is −∞. So, if the set of continuous affine minorants of f is nonempty, the function f does not take the value −∞ and being the supremum of continuous convex, in fact affine, functions, it is lower semicontinuous and convex; see Propositions 1.7.4(a) and 5.1.7. ⇐: Suppose that f is lower semicontinuous, convex, and f ≢ −∞. If f ≡ +∞, then clearly f is the pointwise supremum of all continuous, affine functions. So, we assume that f is proper. Let (x,̂ η)̂ ∈ ̸ epi f . Since epi f is closed and convex, by the Strong Separation Theorem, there exist x∗ ∈ X ∗ and α, β ∈ ℝ such that ⟨x∗ , x⟩̂ + α η̂ < β < ⟨x∗ , x⟩ + αη

for all (x, η) ∈ epi f .

(5.1.8)

First assume that x̂ ∈ dom f , that is, f(x)̂ < +∞. Then we choose x = x̂ and η = f(x)̂ in ̂ thus α > 0. Again from (5.1.8) one has (5.1.8) to obtain 0 < α[f(x)̂ − η], η̂

0 and

a(x) < 0

for all x ∈ dom f .

5.1 Convex Functions – Continuity Properties | 401

̂ Therefore there exist u∗ ∈ X ∗ and ϑ ∈ ℝ such that if a(x) = ϑ − ⟨u∗ , x⟩ for all x ∈ X, ̂ ̂ + ma(x) for all x ∈ X. then a(x) < f(x) for all x ∈ X. For every m > 0 we set â m (x) = a(x) Hence, η̂ ≤ â m (x) ≤ f(x) for large enough m > 0 and for all x ∈ X. Thus, f ∈ Γ(X). Remark 5.1.29. So, Γ0 (X) is the cone of lower semicontinuous, convex, proper functions and each f ∈ Γ0 (X) is the supremum of all its continuous affine minorants. Moreover, every f ∈ Γ0 (X) admits continuous affine minorants. Definition 5.1.30. Let X be a normed space and let f : X → ℝ be a proper function. The largest minorant f ∈ Γ0 (X) of f is called the Γ-regularization of f and is denoted by f c . From this definition and Proposition 5.1.28 we have the following. Proposition 5.1.31. If X is a normed space and f : X → ℝ is proper, then epi f c = conv epi f . In addition to the Γ-regularization of a proper function f : X → ℝ, we also introduce its lower semicontinuous regularization. Definition 5.1.32. Let X be a topological space and let f : X → ℝ be a proper function. The largest lower semicontinuous minorant of f is called the lower semicontinuous regularization of f and is denoted by f . The next proposition characterizes f . Proposition 5.1.33. If X is a topological space and f : X → ℝ is a proper function, then f (x) = supU∈N(x) inf u∈U f(u) = lim inf u→x f(u) for all x ∈ X and epi f = epi f . Proof. Evidently, the function f0 (x) = supU∈N(x) inf u∈U f(u) is lower semicontinuous and f0 (x) ≤ f(x) for all x ∈ X. Hence f0 ≤ f . On the other hand, if h is a lower semicontinuous minorant of f , then h(x) = sup inf h(u) ≤ sup inf f(u) = f0 (x) for all x ∈ X ; U∈N(x) u∈U

U∈N(x) u∈U

see Remark 1.7.3. Hence, f (x) ≤ f0 (x) for all x ∈ X and so f = f0 . Since f ≤ f , we have epi f ⊆ epi f , and hence, epi f ⊆ epi f . On the other hand epi f = epi h for some proper function h : X → ℝ. This function is lower semicontinuous (see Proposition 1.7.2) and satisfies h ≤ f . Therefore, h = f . Remark 5.1.34. In addition, it holds that {x ∈ X : f (x) ≤ η} = ⋂ {x ∈ X : f(x) ≤ ϑ}

for all η ∈ ℝ .

ϑ>η

Example 5.1.35. If A ⊆ X and f = i A , then f = i A . In first countable spaces, we can state another characterization of f in terms of sequences.

402 | 5 Convex Functions – Nonsmooth Analysis Proposition 5.1.36. If X is a first countable topological space and f : X → ℝ is proper, then f is characterized by the following two properties: (a) for every sequence x n → x, we have f (x) ≤ lim inf n→∞ f(x n ); (b) there exists a sequence x n → x such that lim supn→∞ f(x n ) ≤ f (x). Proof. Since f is lower semicontinuous and f ≤ f , we obtain f (x) ≤ lim inf f (x n ) ≤ lim inf f(x n ) n→∞

n→∞

for every sequence x n → x. This establishes (a). For property (b) we may assume that x ∈ dom f , that is, f (x) < +∞. Let {U n }n≥1 be a decreasing local basis at x, which exists since X is first countable. Let η n ↘ f (x) and η n ≠ f (x). Then by Proposition 5.1.33 it follows that inf u∈U n f(u) < η n and so there exists x n ∈ U n such that f(x n ) < η n . Evidently x n → x and lim supn→∞ f(x n ) ≤ f (x). This establishes property (b). A direct consequence of Definition 5.1.32 is the following result. Proposition 5.1.37. If X is a topological space and f, h : X → ℝ are proper functions, then f + h ≤ f + h. Moreover, if h is continuous, then f + h = f + h. Another consequence of Definitions 5.1.32 and 5.1.30 is the following one. Proposition 5.1.38. If X is a normed space and f : X → ℝ is a proper function, then fc ≤ f ≤ f.

5.2 Differentiability of Convex Functions Convex functions exhibit interesting differentiability properties. In this section we explore some of these properties. Let X be a normed space, U ⊆ X is an open set, x ∈ U, and f : U → ℝ. We introduce the following directional derivatives of f at x in the direction h ∈ X f(x + λh) − f(x) f(x + λh) − f(x) , f− (x; h) = lim− f+ (x; h) = lim+ λ λ λ→0 λ→0 f(x + λh) − f(x) f (x; h) = lim λ λ→0 From the definitions above it is clear that f+ (x; h) = −f− (x; −h). Moreover, f (x; h) exists if and only if f+ (x; ±h) both exist and f+ (x; h) = −f+ (x; −h). In addition, it holds that f+ (x; 0) = 0 and f+ (x; λh) = λf+ (x; h) for all λ > 0. Definition 5.2.1. (a) We say that f is Gateaux differentiable at x if f (x; ⋅) ∈ X ∗ . We say that f is Gateaux differentiable if it is Gateaux differentiable at every x ∈ U. (b) We say that f is Fréchet differentiable at x if there exists x∗ ∈ X ∗ such that f(x + h) − f(x) − ⟨x∗ , h⟩ lim =0. ‖h‖ h→0 We say that f is Fréchet differentiable if it is Fréchet differentiable at every x ∈ U.

5.2 Differentiability of Convex Functions | 403

In what follows we denote by f (x) ∈ X ∗ both the Gateaux and Fréchet derivatives of f at x. It will be clear from the context which one is used. Remark 5.2.2. Note that both notions remain unaffected by equivalent renorming of X. A function that is Fréchet differentiable at x, is continuous at x, but this is not true for functions that are Gateaux differentiable at x. For example, the function f : ℝ2 → ℝ defined by x6

{ 8 1 22 f(x1 , x2 ) = { x1 +(x2 −x1 ) {0

if (x1 , x2 ) ≠ 0 if (x1 , x2 ) = 0

is Gateaux differentiable at (0, 0) with f (0, 0) = 0 but f is not continuous at zero. Note that f(x, x2 ) = 1/x2 . Directly from Definition 5.2.1 we obtain the following simple facts. Proposition 5.2.3. (a) f is Gateaux differentiable at x ∈ U if and only if there exists x∗ ∈ X ∗ such that f(x + λh) = f(x) + λ⟨x∗ , h⟩ + o(λ) as λ → 0. Then x∗ = f (x). (b) f is Fréchet differentiable at x ∈ U if and only if there exists x∗ ∈ X ∗ such that f(x + h) = f(x) + ⟨x∗ , h⟩ + o(‖h‖) as h → 0. Then x∗ = f (x). (c) f is Fréchet differentiable at x if and only if f is Gateaux differentiable at x and lim sup [

λ→0

f(x + λh) − f(x) : ‖h‖ ≤ 1] = ⟨f (x), h⟩ . λ

For a vector space V and a subset A ⊆ V, the core (or algebraic interior of A) denoted by cor A is the set of points x of A such that for all h ∈ V \ {x} there exists λ ∈ (0, 1) for which [x, (1 − λ)x + λh) = {tx + (1 − t)[(1 − λ)x + λh] : 0 < t ≤ 1} ⊆ A. If Aff A, the affine hull of A, is not all of X, then cor A = 0. Therefore, for convex sets A in a topological vector space with int A ≠ 0, we have int A = cor A. This fact and Proposition 5.1.21 imply the following result. Proposition 5.2.4. If X is a Banach space and f : X → ℝ is lower semicontinuous, convex, and Gateaux differentiable at x ∈ X, then f is continuous at x. Proposition 5.2.5. If X = ℝN and f : X → ℝ is Lipschitz on some neighborhood of x ∈ X, then f is Fréchet differentiable at x if and only if it is Gateaux differentiable at x. Proof. ⇒: This implication is always true. ⇐: Let f (x) ∈ X be the Gateaux derivative of f at x. Arguing by contradiction, suppose that f is not Fréchet differentiable at x ∈ ℝN . Then there exists a sequence {h n }n≥1 ⊆ ℝN \ {0} such that ‖h n ‖ → 0 and f(x + h n ) − f(x) − (f (x), h n )ℝN ↛ 0 . d n = ‖h n ‖

(5.2.1)

404 | 5 Convex Functions – Nonsmooth Analysis We set v n = h n /‖h n ‖ and have h n = λ n v n with λ n = ‖h n ‖ for all n ∈ ℕ. Note that ‖v n ‖ = 1 for all n ∈ ℕ and so we may assume that v n → v in ℝN with ‖v‖ = 1. We obtain f(x + t v ) − f(x) n n d n = − (f (x), v n )ℝN tn |f(x + t v ) − f(x + t v)| f(x + t n v) − f(x) n n n ≤ − (f (x), v)ℝN + tn tn + (f (x), v n − v)ℝN f(x + t v) − f(x) n ≤ − (f (x), v)ℝN + [k + f (x)]‖v n − v‖ → 0 , tn where k > 0 is the Lipschitz constant on a neighborhood of x. This contradicts (5.2.1). Combining this with Proposition 5.1.25 gives the following result. Corollary 5.2.6. If f : ℝN → ℝ is convex and all partial derivatives ∂f/∂x k (x) exist, then f is Fréchet differentiable. Remark 5.2.7. From multivariable calculus we know that the existence of partial derivatives does not imply the existence of the Fréchet derivative in general. In fact also it does not imply Gateaux differentiability. Consider the function x (x2 −3x2 )

{ 1 21 2 2 f(x1 , x2 ) = { x1 +x2 {0

if (x1 , x2 ) ≠ 0 , if (x1 , x2 ) = 0 ,

Then f ((0, 0); (h, v)) = f(h, v), which is not linear. Proposition 5.2.8. If X is a normed space and f : X → ℝ is proper and convex, then the following hold: (a) for x ∈ dom f and h ∈ X, the function λ → (f(x + λh) − f(x))/λ is increasing on ℝ \ {0} and so f+ (x; h) exists and it holds that f(x) − f(x − h) ≤ f+ (x; h) = inf

λ>0

f(x + λh) − f(x) ≤ f(x + h) − f(x) ; λ

(b) for x ∈ dom f , the function h → f+ (x; h) is sublinear; (c) for x ∈ int dom f , it holds f+ (x; h) ∈ ℝ for all h ∈ X; (d) if f : X → ℝ is continuous and convex, then f+ (x, ⋅) exists and is continuous on X. Proof. (a) Since f is proper and convex, the same is true for φ(λ) = f(x + λh) with λ ∈ ℝ. If λ1 < λ2 < λ3 , we set μ mk = λ m − λ k for m, k = 1, 2, 3. Note that λ2 = μ32 /μ31 λ1 + μ21 /μ31 λ3 . So, from the convexity of φ, we obtain that φ(λ2 ) ≤

μ32 μ21 φ(λ1 ) + φ(λ3 ) , μ31 μ31

which results in φ(λ2 ) − φ(λ1 ) φ(λ3 ) − φ(λ1 ) φ(λ3 ) − φ(λ2 ) ≤ ≤ . λ2 − λ1 λ3 − λ1 λ3 − λ2

(5.2.2)

5.2 Differentiability of Convex Functions | 405

Consider 0 < λ < ϑ. Then from (5.2.2) it follows that φ(λ)/λ ≤ φ(ϑ)/ϑ, which implies that f(x + λh) − f(x) f(x + ϑh) − f(x) ≤ . λ ϑ So, we have proven that λ → (f(x + λh) − f(x))/λ is increasing on (0, +∞). For λ < 0, note that f(x + (−λ)(−h)) − f(x) f(x + λh) − f(x) =− , λ −λ which shows that λ → (f(x + λh) − f(x))/λ is increasing on (−∞, 0). It follows that f+ (x; h) exists in ℝ and we get f(x) − f(x − h) ≤ f (x; h) = inf

λ>0

f(x + λh) − f(x) λ

(5.2.3)

≤ f(x + h) − f(x) for all h ∈ X . (b) Since f is convex, for every h, v ∈ X, we obtain 1 1 f(x + λ(h + v)) = f ( (x + 2λh) + (x + 2λv)) 2 2 1 1 ≤ f(x + 2λh) + f(x + 2λv) . 2 2 Hence, f+ (x; h + v) ≤ f+ (x; h) + f+ (x; v) . Clearly, f+ (x, ⋅) is positively homogeneous. Therefore, f+ (x, ⋅) is sublinear. (c) If x ∈ int dom f , there exists δ > 0 such that x ± δh ∈ dom f . From (5.2.3) with h replaced by δh, we have δf+ (x; h) = f+ (x; δh) ∈ ℝ, which gives f+ (x; h) ∈ ℝ for all h ∈ X. (d) There exists δ > 0 such that f(x + h) − f(x) ≤ 1 for all h ∈ B δ (0). Then, due to (5.2.3), this yields f+ (x, ⋅), which is bounded above on B δ (0). Since f+ (x, ⋅) is sublinear (see part (b)), it follows that f+ (x, ⋅) is continuous; see Proposition 5.1.20. Remark 5.2.9. Evidently, if x ∈ dom f , then h → f− (x; h) is superlinear, that is, f− (x; ⋅) is positively homogeneous and superadditive, that is, if h, v ∈ X, then f− (x; h + v) ≥ f− (x; h) + f− (x; v). We can state characterizations of Fréchet and Gateaux differentiability without explicit mention of the derivative. Proposition 5.2.10. If X is a Banach space and f : X → ℝ is convex and continuous at x0 ∈ X, then f is Fréchet differentiable at x0 ∈ X if and only if lim

λ→0

f(x0 + λh) + f(x0 − λh) − 2f(x0 ) =0 λ

uniformly in h ∈ X with ‖h‖ = 1.

406 | 5 Convex Functions – Nonsmooth Analysis Proof. ⇒: The Fréchet differentiability of f at x0 implies that for a given ε > 0 there exists δ > 0 such that |f(x0 + h) + f(x0 − h) − ⟨f (x0 ), h⟩| ≤

ε ‖h‖ 2

for all h ∈ X with ‖h‖ ≤ δ .

(5.2.4)

We use (5.2.4) first with h and then with −h. Adding these inequalities leads to 0 ≤ f(x0 + h) + f(x0 − h) − 2f(x0 ) ≤ ε‖h‖ for all h ∈ X with ‖h‖ ≤ δ .

(5.2.5)

The first inequality in (5.2.5) is a consequence of the convexity of f . For ‖h‖ = 1 and λ ∈ (0, δ] one has ‖λh‖ ≤ δ and from (5.2.5) it follows that 0≤

f(x0 + λh) + f(x0 − λh) − 2f(x0 ) ≤ε λ

for all λ ∈ (0, δ] with ‖h‖ = 1 .

This gives lim

λ→0

f(x0 + λh) + f(x0 − λh) − 2f(x0 ) = 0 uniformly in h ∈ X with ‖h‖ = 1 . λ

Moreover, we easily see that f(x0 + λh) + f(x0 − λh) − 2f(x0 ) f(x0 + λh) − f(x0 ) f(x0 − λh) − f(x0 ) = + . λ λ λ Hence, f+ (x0 ; h) = f− (x0 ; h) = f (x0 ; h) and so f (x0 ; ⋅) ∈ X ∗ , which shows that f is Gateaux differentiable. In addition, limλ→0 (f(x0 + λh)− f(x0 ))/λ = f (x0 ; h) is uniform in ‖h‖ = 1. Therefore, f is Fréchet differentiable by Proposition 5.2.3. In a similar way we can state an analogous characterization of Gateaux differentiability for convex functions. Proposition 5.2.11. If X is a Banach space and f : X → ℝ is convex and continuous at x0 ∈ X, then f is Gateaux differentiable at x0 if and only if for every ε > 0 and every h ∈ X with ‖h‖ = 1 there exists δ = δ(ε, h) > 0 such that f(x0 + th) + f(x0 − th) − 2f(x0 ) ≤ εt

for all t ∈ [0, δ] .

Convex functions on the real line have many points of differentiability, as the next proposition shows. Proposition 5.2.12. If T ⊆ ℝ is an open interval and f : T → ℝ is convex, then f is differentiable at all but at most countably many points of T. Proof. From (5.2.2) we see that x → f+ (x) = limλ→0+ (f(x + λ) − f(x))/λ is nondecreasing on T. Note that the points where f fails to be differentiable are the discontinuity jump points of f+ . But by Proposition 4.3.3 this set is at most countable. Next we introduce a notion that will be the main object of interest in the next section.

5.2 Differentiability of Convex Functions | 407

Definition 5.2.13. Let X be a normed space, f : X → ℝ a function, and x0 ∈ dom f . The subdifferential of f at x0 is the set ∂f(x0 ) = {x∗ ∈ X ∗ : ⟨x∗ , x − x0 ⟩ ≤ f(x) − f(x0 ) for all x ∈ X} . When x0 ∈ ̸ dom f , we set ∂f(x0 ) = 0. The elements of the set ∂f(x0 ) are called subgradients of f at x0 . Remark 5.2.14. According to the definition above, x∗ ∈ ∂f(x0 ) if and only if the affine function x → f(x0 ) + ⟨x∗ , x − x0 ⟩ supports the epigraph of f at (x0 , f(x0 )). Thus the subdifferential generalizes the classical notion of derivative. The subdifferential of a lower semicontinuous convex function may be empty at some points in its effective domain. Consider the function f(x) = −√1 − x2 for all x ∈ [−1, 1]. Then ∂f(±1) = 0. The domain of ∂f is the set D(∂f) = {x ∈ X : ∂f(x) ≠ 0}. For a convex function f , dom f is always convex. However, D(∂f) need not be convex. Next, we give a detailed study of the subdifferential of a convex function. At this point we want to use the subdifferential to characterize the Gateaux differentiability of convex functions. First we relate f+ (x, ⋅) and ∂f(x). Proposition 5.2.15. If X is a Banach space and f : X → ℝ is a proper, convex function, then the following hold: (a) for x0 ∈ dom f we have ∂f(x0 ) = {x∗ ∈ X ∗ : ⟨x∗ , h⟩ ≤ f+ (x0 ; h) for all h ∈ X} ; (b) if x0 ∈ dom f and f is continuous at x0 , then ∂f(x0 ) ⊆ X ∗ is nonempty, w∗ -compact and convex; (c) if x0 ∈ int dom f and f is continuous at x0 , then σ(h; ∂f(x0 )) = f+ (x0 ; h) for all h ∈ X; see Definition 5.1.17. Proof. (a) Let x∗ ∈ ∂f(x0 ). Then for λ > 0 we obtain λ⟨x∗ , h⟩ ≤ f(x0 + λh) − f(x0 ), which gives ⟨x∗ , h⟩ ≤ f+ (x0 ; h) for all h ∈ X. Hence ∂f(x0 ) ⊆ {x∗ ∈ X ∗ : ⟨x∗ , h⟩ ≤ f+ (x0 ; h) for all h ∈ X} =: D∗ . On the other hand, if x∗ ∈ D∗ , then from Proposition 5.2.8(a) it follows that ⟨x∗ , h⟩ ≤ f(x0 + h) − f(x0 ) for all x ∈ X. Therefore, x∗ ∈ ∂f(x0 ) and so ∂f(x0 ) = D∗ . (b) From Proposition 5.1.24 we know that int epi f ≠ 0 and in addition, that x0 ∈ int dom f . Since (x0 , f(x0 )) is a boundary point of epi f , by the First Separation Theorem (see Theorem 3.1.59), there exists (x∗ , η) ∈ X ∗ × ℝ with (x∗ , η) ≠ (0, 0) such that η(f(x0 ) − λ) ≤ ⟨x∗ , u − x0 ⟩ for all (u, λ) ∈ epi f .

(5.2.6)

Note that λ can increase up to +∞. So, from (5.2.6) we see that η ≥ 0. If η = 0, then ⟨x∗ , x0 ⟩ ≤ ⟨x∗ , u⟩

for all u ∈ dom f .

(5.2.7)

408 | 5 Convex Functions – Nonsmooth Analysis Since x0 ∈ int dom f , from (5.2.7) we conclude that x∗ = 0, a contradiction. So η > 0 and we can take η = 1. From (5.2.6) with λ = f(u) one obtains f(x0 ) − f(u) ≤ ⟨x∗ , u − x0 ⟩ for all u ∈ dom f . Thus, −x∗ ∈ ∂f(x0 ) and so ∂f(x0 ) ≠ 0. From Proposition 5.1.20, we know that there exists δ > 0 such that f B δ (x0 ) is Lipschitz. Then, for x∗ ∈ ∂f(x0 ), it follows that ⟨x∗ , h⟩ ≤ f(x0 + h) − f(x0 ) ≤ k‖h‖

for all h ∈ B δ (0) .

Hence, ‖x∗ ‖ ≤ k. Therefore, ∂f(x0 ) is bounded and clearly w∗ -closed and convex. Thus, ∂f(x0 ) is nonempty, w∗ -compact, and convex. (c) Let ρ(h) = f+ (x0 ; h). By Proposition 5.2.8, ρ is sublinear, continuous, and in fact, Lipschitz. Let h ∈ X with ‖h‖ = 1 and H = ℝh. We introduce the linear functional l : H → ℝ defined by l(th) = tρ(h) for all t ∈ ℝ. Then by the Hahn–Banach Theorem, there exists l ̂ ∈ X ∗ such that l̂ H = l. Moreover, we have ⟨l,̂ v⟩ ≤ p(v) ≤ f(x0 + v) − f(x0 ) ̂ ̂ for all v ∈ X. Thus, l ∈ ∂f(x0 ) and l(th) = f+ (x; th) for all t ≥ 0. Hence, f+ (x0 ; h) = σ(h; ∂f(x0 ))

for all h ∈ X .

Using this proposition we can state the following characterization of the Gateaux differentiability of a convex function in terms of its subdifferential. Theorem 5.2.16. If X is a Banach space and f : X → ℝ is proper and convex with x0 ∈ dom f , then the following hold: (a) if f is Gateaux differentiable at x0 , then ∂f(x0 ) = {f (x0 )}; (b) if f is continuous at x0 and ∂f(x0 ) is a singleton, then f is Gateaux differentiable at x0 and ∂f(x0 ) = {f (x0 )}. Proof. (a) The convexity of f implies that ⟨f (x0 ), h⟩ ≤

1 [f(x0 + λh) − f(x0 )] ≤ f(x0 + h) − f(x0 ) λ

for all λ ∈ (0, 1) and for all h ∈ X. Then, f (x0 ) ∈ ∂f(x0 ); see Definition 5.2.13. Suppose that x∗ ∈ ∂f(x0 ). Then ⟨x∗ , h⟩ ≤

1 [f(x0 + h) − f(x0 )] λ

for all λ > 0 and for all h ∈ X .

This implies ⟨x∗ , h⟩ ≤ ⟨f (x0 ), h⟩ for all h ∈ X. Hence, x∗ = f (x0 ) and so ∂f(x0 ) = {f (x0 )}. (b) Suppose ∂f(x0 ) = {x∗ }. From Proposition 5.2.15 we know that f+ (x0 ; h) = ⟨x∗ , h⟩ for all h ∈ X. Hence, f+ (x0 ; ⋅) = x∗ and so, f is Gateaux differentiable at x0 . For separable Banach spaces we have generic Gateaux differentiability of continuous, convex functions. Theorem 5.2.17. If X is a separable Banach space and f : X → ℝ is continuous and convex, then f is Gateaux differentiable on a dense G δ -subset of X.

5.2 Differentiability of Convex Functions | 409

Proof. Let {x n }n≥1 be dense in the unit sphere ∂B1 = {x ∈ X : ‖x‖ = 1}. For n, m ∈ ℕ let A n,m = {x ∈ X : there exists x∗ , u∗ ∈ ∂f(u) such that ⟨x∗ − u∗ , x n ⟩ ≥ 1/m} . According to Theorem 5.2.16, f (x) does not exist if and only if ∂f(x) is not a singleton if and only if x ∈ ⋃n,m≥1 A n,m . We show that each set A n,m is closed. So, let {y k }k≥1 ⊆ A n,m and assume that y k → y in X. For each k ∈ ℕ there exist x∗k , u∗k ∈ ∂f(y k ) such that ⟨x∗k − u∗k , x n ⟩ ≥ 1/m. The separability of X implies that bounded sets in X ∗ endowed with the relative weak* topology are metrizable; see Theorem 3.4.12 and Remark 3.4.13. From the proof of Proposition 5.2.15(b), it is clear that ⋃k≥1 ∂f(y k ) ⊆ X ∗ is bounded. Therefore, we may assume that w∗

x∗k → x∗

and

w∗

u∗k → u∗

in X ∗ .

For any h ∈ X we see that ⟨x∗ , h − y⟩ = lim ⟨x∗k , h − y k ⟩ ≤ lim [f(h) − f(y k )] = f(h) − f(y) , k→∞

k→∞

⟨u∗ , h − y⟩ = lim ⟨u∗k , h − y k ⟩ ≤ lim [f(h) − f(y k )] = f(h) − f(y) . k→∞

k→∞

Hence, x∗ , u∗ ∈ ∂f(y) and ⟨x∗ − u∗ , x n ⟩ = limk→∞ ⟨x∗k − u∗k , x n ⟩ ≥ 1/m. Thus, y ∈ A n,m , and so A n,m ⊆ X is closed. The set X \ A n,m = U n,m is open for all n, m ∈ ℕ. We claim that U n,m is also dense. Let x0 ∈ X and consider the function ξ(λ) = f(x0 + λx n ) for all λ ∈ ℝ. According to Proposition 5.2.12 we can approximate x0 by points of the form x0 + λx n with ξ being differentiable at λ. If x∗ , u∗ ∈ ∂f(x0 + λx n ), then their restrictions on x0 + ℝx n give subgradients of ξ at λ. But ξ is differentiable at λ. So, x∗ and u∗ coincide on the line x0 + ℝx n . In particular, ⟨x∗ , x n ⟩ = ⟨u∗ , x n ⟩. Hence, x0 + λx n ∈ U n,m for all m ∈ ℕ. From the Baire Category Theorem, we see that ⋂n,m≥1 U n,m is dense and G δ in X. Definition 5.2.18. A Banach space X is said to be weak Asplund if every continuous and convex function f : X → ℝ is Gateaux differentiable at a dense G δ -subset of X. This definition in combination with Theorem 5.2.17 yields the following corollary. Corollary 5.2.19. Every separable Banach space is weak Asplund. Proposition 5.2.20. If X is a Banach space and f : X → ℝ is continuous and convex, then f is Fréchet differentiable at a possibly empty G δ -set. Proof. From Proposition 5.2.10 we know that f is Fréchet differentiable at x ∈ X if and only if for every ε > 0 there exists δ > 0 such that f(x + λh) + f(x − λh) − 2f(x) < λε

(5.2.8)

410 | 5 Convex Functions – Nonsmooth Analysis for all h ∈ X with ‖h‖ = 1 and for all λ ∈ (0, δ). For each n ∈ ℕ we define U n = {x ∈ X : there exists δ > 0 such that sup [ ‖h‖=1

1 f(x + δh) + f(x − δh) − 2f(x) ]< } . δ n

If D is the set of points of Fréchet differentiability of f , from (5.2.8), we obtain that D = ⋂ Un .

(5.2.9)

n≥1

So, we need to show that for every n ∈ ℕ, U n is open. Let x ∈ U n . From Corollary 5.1.23 we know that f is locally Lipschitz. So, there exist δ1 > 0 and k > 0 such that |f(u) − f(v)| ≤ k‖u − v‖

for all u, v ∈ B δ1 (x) .

(5.2.10)

Moreover, because x ∈ U n , there exist δ > 0 and η > 0 such that f(x + δh) + f(x − δh) − 2f(x0 ) 1 ≤η< δ n

(5.2.11)

for all h ∈ X with ‖h‖ = 1. We choose a small enough δ2 > 0 such that η+(4kδ2 )/δ < 1/n and let y ∈ B δ2 (x) . Taking (5.2.10) and (5.2.11) into account, one has, for any h ∈ X with ‖h‖ = 1, that f(y + δh) + f(y − δh) − 2f(y) δ f(x + δh) + f(x − δh) − 2f(x) 2|f(y) − f(x)| ≤ + δ δ |f(y + δh) − f(x + δh)| |f(y − δh) − f(x − δh)| + + δ δ 4kδ2 1 4k‖y − x‖ ≤η+ < . ≤η+ δ δ n Hence, y ∈ U n and so B δ2 (x) ⊆ U n . This proves that U n is open for every n ∈ ℕ and so from (5.2.9) we conclude that it is G δ , possibly empty. Definition 5.2.21. A Banach space X is said to be Asplund if every continuous and convex function f : X → ℝ is Fréchet differentiable on a dense set. Remark 5.2.22. On account of Proposition 5.2.20, f is in fact differentiable on a dense G δ -set. The following theorem characterizes Asplund spaces and its proof can be found in Phelps [236, p. 23]. Theorem 5.2.23. If X is a Banach space, then X is an Asplund space if and only if every separable closed subspace of X has a separable dual space. In particular, every Banach space with a separable dual space is an Asplund space.

5.2 Differentiability of Convex Functions |

411

Corollary 5.2.24. Every reflexive Banach space is an Asplund space. Proposition 5.2.25. If H is a Hilbert space and f(x) = 1/2‖x‖2 for all x ∈ H, then f is Fréchet differentiable and ⟨f (x), h⟩ = (x, h) for all x, h ∈ H, where (⋅, ⋅) is the inner product of H. Proof. For every x, h ∈ H we easily see that 1 1 ‖x + h‖2 − ‖x‖2 − (x, h) = ‖h‖2 . 2 2 Hence, ⟨f (x), h⟩ = (x, h). Applying the chain rule and the proposition above, we can state the following result. Corollary 5.2.26. The norm of a Hilbert space is Fréchet differentiable at every x ∈ H with x ≠ 0. Let H be a Hilbert space and let C ⊆ H be a nonempty, closed, and convex set. We define the metric projection map p C : H → C that assigns to each x ∈ H its unique best approximation from C; see Definition 3.5.19. Proposition 5.2.27. If H is a Hilbert space, C ⊆ H is nonempty, closed, convex, and f(x) =

1 [‖x‖2 − ‖x − p C (x)‖2 ] 2

for all x ∈ H ,

then f is convex, Fréchet differentiable and ⟨f (x), h⟩ = (p C (x), h) for all x, h ∈ H. Proof. We easily see that 2f(x) = ‖x‖2 − inf[‖x − u‖2 : u ∈ C] = sup[2(x, u) − ‖u‖2 : u ∈ C] . So, f is the supremum of affine continuous functions. Therefore, f is convex and from Proposition 3.5.20, we know that f is continuous. Moreover, Proposition 5.2.25 implies that f is Fréchet differentiable and ⟨f (x), h⟩ = (x, h) − (x − p C (x), h) = (p C (x), h) for all h ∈ H .

Remark 5.2.28. If H is a Hilbert space and f : H → ℝ is Gateaux differentiable at x0 ∈ H, then f (x0 ) ∈ H ∗ . From Theorem 3.5.21 we know that there exists a unique u0 ∈ H such that ⟨f (x0 ), h⟩ = (u0 , h) for all h ∈ H. This element is called the gradient of f at x0 and is denoted by ∇f(x0 ). So, we have ⟨f (x0 ), h⟩ = (∇f(x0 ), h) for all h ∈ H. Proposition 5.2.29. If T ⊆ ℝ is an interval, then the following hold: (a) a differentiable function f : T → ℝ is convex (resp. strictly convex) if and only if f is increasing (resp. strictly increasing) on T; (b) a twice differentiable function f : T → ℝ is convex if and only if f (t) ≥ 0 for all t ∈ T.

412 | 5 Convex Functions – Nonsmooth Analysis Proof. (a) ⇒: This is a consequence of (5.2.2). ⇐: Arguing by contradiction, suppose that f is not convex. Then there exist t < s < r in the interval T such that r−s s−t f(t) + f(r) < f(s) , r−t r−t which implies f(r) − f(s) f(s) − f(t) < . r−s s−t

(5.2.12)

From (5.2.12) and the Mean Value Theorem, we contradict the hypothesis that f is increasing. Similarly we show the assertion for strictly convex functions. (b) This follows immediately from (a). Recall that if f : ℝN → ℝ is twice Gateaux differentiable at x0 , then we can identify f (x0 ) with the Hessian matrix H(x0 ) = (f x k x i (x0 ))Nk,i=1

where

f x k x i (x0 ) =

∂f (x0 ) ∂x i ∂x k

by setting f (x0 )(u, v) = (H(x0 )u, v)ℝN

for all u, v ∈ ℝN .

Then the second derivative ∇2 f(x0 ) = H(x0 ) is a symmetric matrix. Proposition 5.2.30. If f : ℝN → ℝ is twice Gateaux differentiable, then f is convex if and only if ∇2 f(x) ≥ 0 for all x ∈ ℝN , that is, (∇f(x)h, h)ℝN ≥ 0 for all x, h ∈ ℝN . Proof. Let x, h ∈ ℝN with h ≠ 0 and define ξ(t) = f(x + th). Then ξ (t) = (∇f(x + th), h)ℝN

and

ξ (t) = (∇2 f(x + th), h)ℝN .

The result follows from Proposition 5.2.29. Remark 5.2.31. If ∇2 f(x) > 0 for all x ∈ ℝN , that is, (∇2 f(x)h, h)ℝN > 0 for all x, h ∈ ℝN with h ≠ 0, then f is strictly convex. However, a function can be strictly convex without ∇2 f(x) being positive definite at all points. For example, let f(x) = x4 . Then f is strictly convex, but f (0) = 0.

5.3 Conjugate Functions – Convex Subdifferential Conjugate functions play a major role in the duality theory, which is one of the main themes of “Convex Analysis.” Conjugate functions have many interesting properties and are also closely related to the notion of convex subdifferentials, which we investigate in the second half of this section.

5.3 Conjugate Functions – Convex Subdifferential |

413

Definition 5.3.1. Let X be a locally convex space, X ∗ is its dual space and f : X → ℝ∗ is a function. The conjugate of f is the function f ∗ : X ∗ → ℝ∗ defined by f ∗ (x∗ ) = sup[⟨x∗ , x⟩ − f(x) : x ∈ X] .

(5.3.1)

We can also define the conjugate of f ∗ which is called the second conjugate of f . This is a function defined on X ∗∗ . Remark 5.3.2. Clearly we can restrict ourselves to x ∈ dom f in (5.3.1). So, if dom f ≠ 0, then f ∗ (x∗ ) > −∞ for all x∗ ∈ X ∗ . From (5.3.1) we see that f ∗ is the pointwise supremum of the family of continuous affine functions x → ⟨x∗ , x⟩ − f(x). Therefore, f ∗ ∈ Γ(X ∗ ); see Definition 5.1.27(b). The conjugate f ∗ may not be proper, even if f is proper on X. The next proposition contains some properties of the conjugate function that are an immediate consequence of Definition 5.3.1. Proposition 5.3.3. The following hold: (a) f ∗ (0) = − inf[f(x) : x ∈ X]; (b) f ≤ h implies h∗ ≤ f ∗ ; (c) (inf i∈I f i )∗ = supi∈I f i∗ and (supi∈I f i )∗ ≤ inf i∈I f i∗ for every family of functions f i : X → ℝ∗ with i ∈ I; (d) (λf)∗ (x∗ ) = λf ∗ (x∗ /λ) for all λ > 0 and for all x∗ ∈ X ∗ ; (e) (f + η)∗ = f ∗ − η for all η ∈ ℝ; (f) if for u ∈ X, f u (x) = f(x − u), then (f u )∗ (x∗ ) = f ∗ (x∗ ) + ⟨x∗ , u⟩ for all x∗ ∈ X ∗ ; (g) if f is proper, then ⟨x∗ , x⟩ ≤ f(x) + f ∗ (x∗ ) for all x ∈ X and x∗ ∈ X ∗ . This inequality is known as the Young–Fenchel inequality. (h) if C ⊆ X is convex and f = i C , then f ∗ (⋅) = σ(⋅; C). Proposition 5.3.4. If X is a normed space and f(x) = ‖x‖ for all x ∈ X, then f ∗ (x∗ ) = ∗ i B∗ (x∗ ) with B1 = {x∗ ∈ X ∗ : ‖x∗ ‖∗ ≤ 1}. 1

Proof. From Definition 5.3.1, we have f ∗ (x∗ ) = sup[⟨x∗ , x⟩ − ‖x‖ : x ∈ X] .

(5.3.2)

First suppose that ‖x∗ ‖∗ ≤ 1. Then ⟨x∗ , x⟩ ≤ ‖x‖ for all x ∈ X and so ⟨x∗ , x⟩ − ‖x‖ ≤ 0. Hence, f ∗ (x∗ ) = 0 for all ‖x∗ ‖∗ ≤ 1; see (5.3.2). Next suppose that ‖x∗ ‖∗ > 1. Then ‖x∗ ‖∗ = sup [⟨x∗ ,

x ⟩ : x ∈ X, x ≠ 0] > 1 . ‖x‖

̂ x‖⟩ ̂ > 1, which implies that Thus, there exists x̂ ∈ X with x̂ ≠ 0 such that ⟨x∗ , x/‖ ̂ Then we obtain ⟨x∗ , x⟩̂ > ‖x‖. f ∗ (x∗ ) = sup [⟨x∗ , x⟩ − ‖x‖ : x ∈ X] ≥ sup [⟨x∗ , λ x⟩̂ − λ‖x‖̂ : λ > 0] ̂ = +∞ . = (sup λ) [⟨x∗ , x⟩̂ − ‖x‖] λ>0

414 | 5 Convex Functions – Nonsmooth Analysis So, we have proven that {0 if ‖x∗ ‖∗ ≤ 1 , f ∗ (x∗ ) = { +∞ if 1 < ‖x∗ ‖∗ . { Hence, f ∗ = i B∗ . 1

Proposition 5.3.5. If X is a normed space, ξ : ℝ → ℝ is proper, even, and f(x) = ξ(‖x‖) for all x ∈ X, then f ∗ (x∗ ) = ξ ∗ (‖x∗ ‖∗ ) for all x∗ ∈ X ∗ . Proof. From Definition 5.3.1, since ξ is even, we obtain f ∗ (x∗ ) = sup [⟨x∗ , x⟩ − ξ(‖x‖)] = sup sup [⟨x∗ , x⟩ − ξ(t)] t≥0 ‖x‖=t

= sup [sup ⟨x∗ , x⟩ − ξ(t)] = sup [t‖x∗ ‖∗ − ξ(t)] t≥0

t>0

‖x‖=t

= sup [t‖x ‖∗ − ξ(t)] = ξ (‖x ‖∗ ) . ∗

∗

∗

t∈ℝ

Proposition 5.3.6. If X is a normed space and f ∈ Γ0 (X), then f ∗ ∈ Γ0 (X ∗ ). Proof. Since f ∈ Γ0 (X), it admits a continuous affine minorant; see Remark 5.1.29. So, there exist x∗ ∈ X ∗ and η ∈ ℝ such that ⟨x∗ , x⟩ + η ≤ f(x) for all x ∈ X. Then f ∗ (x∗ ) = sup [⟨x∗ , x⟩ − f(x) : x ∈ X] ≤ −η , that is, x∗ ∈ dom f ∗ . Hence, f ∗ is proper and so f ∗ ∈ Γ0 (X ∗ ); see Remark 5.3.2. Proposition 5.3.7. If X is a normed space and f : X → ℝ, then the following hold: (a) f ∗∗ X = f ≤ f ; (b) if f is proper, then f ∗∗ X = f if and only if f ∈ Γ0 (X). Proof. (a) We know that f ≤ f . Let x∗ ∈ X ∗ and η ∈ ℝ. Then ⟨x∗ , x⟩ + η ≤ f(x) for all x ∈ X

if and only

f ∗ (x∗ ) ≤ −η .

(5.3.3)

From Definition 5.1.30 and (5.3.3), it follows that f (x) = sup [⟨x∗ , x⟩ + η : x∗ ∈ X ∗ , η ∈ ℝ, η ≤ −f ∗ (x∗ )] . If f ∗ (x∗ ) > −∞ for all x∗ ∈ X ∗ , then from (5.3.4) we have f (x) = sup [⟨x∗ , x⟩ − f ∗ (x∗ ) : x∗ ∈ X ∗ ] = f ∗∗ (x) for all x ∈ X . Hence, f ∗∗ X = f . If f ∗ (x∗ ) = −∞ for some x∗ ∈ X ∗ , then f ≡ +∞ = f ∗∗ X . (b) ⇒: From Proposition 5.3.6, it follows that f = f ∗∗ X ∈ Γ0 (X). ⇐: This follows from (a) and Proposition 5.1.38.

(5.3.4)

5.3 Conjugate Functions – Convex Subdifferential |

415

Remark 5.3.8. If X is a normed space and f(x) = ‖x‖ for all x ∈ X, then f is continuous, convex, and from Proposition 5.3.4 we see that f ∗ = i B∗ . Invoking Proposition 5.3.7 we 1 recover the familiar formula for ‖ ⋅ ‖, namely ‖x‖ = sup [⟨x∗ , x⟩ : ‖x∗ ‖∗ ≤ 1] = max [⟨x∗ , x⟩ : ‖x∗ ‖∗ ≤ 1] by Alaoglu’s Theorem; see also Proposition 3.1.52. Recall the operation of infimal convolution introduced in Definition 5.1.12. We have the following conjugation rule for this operation. Proposition 5.3.9. If X is a normed space and f, h : X → ℝ are proper functions, then (f ⊕ h)∗ = f ∗ + h∗ . Proof. For x∗ ∈ X ∗ we obtain (f ⊕ h)∗ (x∗ ) = sup [⟨x∗ , x⟩ − inf(f(x − u) + h(u) : u ∈ X) : x ∈ X] = sup [⟨x∗ , y⟩ − f(y) + ⟨x∗ , u⟩ − h(u) : y, u ∈ X] = f ∗ (x∗ ) + h∗ (x∗ ) .

Proposition 5.3.10. If X is a normed space and f, h : X → ℝ are proper, then the following hold: (a) (f + h)∗ ≤ f ∗ ⊕ h∗ ; (b) if f, h are convex and there exists a point in dom f ∩ dom h, where one of the two functions is continuous, then (f + h)∗ = f ∗ ⊕ h∗ . Proof. (a) From the Young–Fenchel inequality (see Proposition 5.3.3(g)), we derive ⟨x∗ + u∗ , x⟩ − f(x) − h(x) ≤ f ∗ (x∗ ) + h∗ (u∗ ) for all x∗ , u∗ ∈ X ∗ and for all x ∈ X. This implies (f + h)∗ (x∗ + u∗ ) ≤ f ∗ (x∗ ) + h∗ (u∗ ) .

(5.3.5)

In particular, (5.3.5) holds for all x∗ , u∗ ∈ X ∗ such that y∗ = x∗ + u∗ . Therefore, (f + h)∗ ≤ f ∗ ⊕ h∗ .

(5.3.6)

(b) Let x∗ ∈ X ∗ and λ = (f + h)∗ (x∗ ). If λ = +∞, then from (5.3.6) we see that ⊕ h∗ )(x∗ ) = +∞ and so we have equality. Let us assume that λ < +∞. Note that x ∈ dom(f + h) and so λ ∈ ℝ. We consider the set

(f ∗

D = {(x, η) ∈ X × ℝ : η ≤ ⟨x∗ , x⟩ − h(x) − λ} .

(5.3.7)

This set is convex. Assuming that f is continuous at a point of dom f ∩ dom h, we will show that D ∩ (int epi f) = 0; see Proposition 5.1.24. Indeed, suppose (η, x) ∈ D ∩ (int epi f). Then f(x) < η ≤ ⟨x∗ , x⟩ − h(x) − λ. Hence, λ < ⟨x∗ , x⟩ − (f(x) + h(x)) ≤ (f + h)∗ (x∗ ) = λ ,

416 | 5 Convex Functions – Nonsmooth Analysis a contradiction. So, D ∩ (int epi f) = 0 and we can apply the First Separation Theorem (see Theorem 3.1.59) and find (u∗ , ϑ) ∈ X ∗ × ℝ with (u∗ , ϑ) ≠ (0, 0) such that sup [ϑη + ⟨u∗ , x⟩ : (x, η) ∈ epi f ] ≤ inf [ϑη + ⟨u∗ , x⟩ : (x, η) ∈ D] .

(5.3.8)

Since ϑ can increase up to +∞, from (5.3.7) we see that ϑ ≤ 0. If ϑ = 0, then u∗ ≠ 0 separates dom f and dom h, but this is a contradiction since int dom f ∩ dom h ≠ 0. Therefore, ϑ < 0. We divide both parts of (5.3.8) by |ϑ| and set û ∗ = 1/|ϑ|u∗ . Taking (5.3.7) and (5.3.8) into account, we obtain f ∗ (û ∗ ) = sup [⟨û ∗ , x⟩ − f(x) : x ∈ X] = sup [⟨û ∗ , x⟩ − η : (x, η) ∈ epi f ] ≤ inf [⟨û ∗ , x⟩ − η : (x, η) ∈ D] = inf [⟨û ∗ − x∗ , x⟩ + h(x) : x ∈ dom h] + λ = −h∗ (x∗ − û ∗ ) + λ Therefore, (f ∗ ⊕ h∗ ) (x∗ ) = f ∗ (û ∗ ) + h∗ (x∗ − û ∗ ) ≤ λ = (f + h)∗ (x∗ ) . Then, due to (5.3.6), we get f ∗ ⊕ h∗ = (f + h)∗ . Let C ⊆ X be a nonempty set. Recall that d(x, C) = inf [‖x − c‖ : c ∈ C]

with x ∈ X ,

σ(x ; C) = sup [⟨x , c⟩ : c ∈ C]

with x∗ ∈ X ∗ .

∗

∗

In the next proposition, to simplify the notation, we write d A (⋅) = d(⋅, A) and σ A (⋅) = σ(⋅; A). Proposition 5.3.11. If X is a normed space and C ⊆ X is nonempty and convex, then d A = σ∗A X . Proof. Consider the proper convex functions f(x) = ‖x‖ and h(x) = i C (x). We have (f ⊕ h)(x) = inf[‖x − u‖ + i C (u) : u ∈ X] = d C (x)

for all x ∈ X .

The function x → d C (x) is continuous and convex, hence, thanks to Propositions 5.3.9, 5.3.3(h) and 5.3.4, d C (x) = (f ⊕ h)∗∗ (x) = (f ∗ + h∗ )∗ = sup [⟨x∗ , x⟩ − i B∗ (x∗ ) − σ C (x∗ )] 1

∗

= sup [⟨x∗ , x⟩ − σ C (x∗ ) : x∗ ∈ B1 ] .

(5.3.9)

∗

In fact, since B1 is w∗ -compact by Alaoglu’s Theorem and x∗ → ⟨x∗ , x⟩ − σ C (x∗ ) is weakly upper semicontinuous, the second supremum in (5.3.9) becomes maximum. Using the support function σ(⋅; C) we can characterize some important classes of subsets of X. For a proof we refer to Moreau [225] and Laurent [186].

5.3 Conjugate Functions – Convex Subdifferential |

417

Proposition 5.3.12. If X is a normed space and C ⊆ X is nonempty, closed, and convex, then the following hold: (a) C is bounded if and only if σ(⋅; C) is strongly continuous on X ∗ ; (b) C is compact if and only if σ(⋅; C)B∗ is w∗ -continuous; 1 (c) C is weakly compact if and only if σ(⋅; C) is m-continuous where m denotes the Mackey topology; see Theorem 3.8.9; (d) C is weakly locally compact and contains no line if and only if there exists x∗ ∈ X ∗ such that σ(⋅; C) is m-continuous at x∗ Moreover, σ(⋅; C) ∈ Γ0 (X) and C = {x ∈ X : ⟨x∗ , x⟩ ≤ σ(x∗ ; C) for all x∗ ∈ X ∗ } . Now we turn our attention to the subdifferential of a convex function. We start by recalling the definition of the subdifferential. Definition 5.3.13. Let X be a normed space and f : X → ℝ a proper function. The subdifferential of f at x is the set ∂f(x) defined by ∂f(x) = {x∗ ∈ X ∗ : ⟨x∗ , h⟩ ≤ f(x + h) − f(x) for all h ∈ X} .

(5.3.10)

The elements of ∂f(x) are called subgradients of f at x. Moreover, we say that f is subdifferentiable at x if ∂f(x) ≠ 0. This set is known as the domain of ∂f and is denoted by D(∂f), that is, D(∂f) = {x ∈ X : ∂f(x) ≠ 0}. Remark 5.3.14. From (5.3.10) it is clear that ∂f(x) ⊆ X ∗ is always closed and convex. In fact ∂f(x) is weakly* closed. Note that D(∂f) ⊆ dom f . In the definition above, f need not be convex. However, a coherent theory and a remarkable calculus can be developed for convex functions, the convex subdifferential for short; see Proposition 5.3.17. We present three characteristic and important examples of subdifferentials. Example 5.3.15. (a) Let X be a Banach space and let f(x) = ‖x‖ for all x ∈ X. We show that ∗

{B = {x∗ ∈ X ∗ : ‖x∗ ‖∗ ≤ 1} ∂f(x) = { 1 {x∗ ∈ X ∗ : ‖x∗ ‖∗ = 1, ⟨x∗ , x⟩ = ‖x‖} {

if x = 0 ,

(5.3.11)

if x ≠ 0 .

In order to prove (5.3.11), we first assume that x = 0. Then x∗ ∈ ∂f(0)

if and only if ⟨x∗ , x⟩ ≤ ‖h‖ for all h ∈ X .

(5.3.12) ∗

From (5.3.12) and the definition of the dual norm, one has ∂f(0) = B1 . Next suppose that x ≠ 0. Let C = {x∗ ∈ X ∗ : ‖x∗ ‖∗ = 1, ⟨x∗ , x⟩ = ‖x‖} and take x∗ ∈ C. Then ⟨x∗ , h⟩ ≤ ‖h‖ for all h ∈ X. Since x∗ ∈ C, we get ⟨x∗ , x + h⟩ − ⟨x∗ , x⟩ ≤ ‖x + h‖ − ‖x‖ .

418 | 5 Convex Functions – Nonsmooth Analysis Hence, x∗ ∈ ∂f(x) and so C ⊆ ∂f(x). On the other hand, if x∗ ∈ ∂f(x), then −⟨x∗ , x⟩ ≤ −‖x‖; see (5.3.9). This gives ⟨x∗ , x⟩ ≥ ‖x‖ .

(5.3.13)

But ‖x‖ = ‖2x‖ − ‖x‖ ≥ ⟨x∗ , x⟩. Hence, ⟨x∗ , x⟩ = ‖x‖; see (5.3.13). Moreover, for h ∈ X and λ > 0, we have ⟨x∗ , λh⟩ ≤ ‖x + λh‖ − ‖x‖. Hence, x x ⟨x∗ , h⟩ ≤ + h − λ λ

for all λ > 0 .

Sending λ → +∞, we obtain ⟨x∗ , h⟩ ≤ ‖h‖ for all h ∈ X. Therefore, ‖x∗ ‖∗ ≤ 1. Thus, we conclude that ∂f(x) = C. (b) Let X be a Banach space and f(x) = 1/2‖x‖2 for all x ∈ X. We claim that ∂f(x) = F(x) = {x∗ ∈ X ∗ : ‖x∗ ‖∗ = ‖x‖, ⟨x∗ , x⟩ = ‖x‖2 }

for all x ∈ X ,

is the duality map; see Remark 3.1.51. So, let x∗ ∈ F(x). Then we obtain ⟨x∗ , u − x⟩ = ⟨x∗ , u⟩ − ‖x‖2 ≤ ‖x‖‖u‖ − ‖x‖2 ≤

1 [‖u‖2 − ‖x‖2 ] 2

for all u ∈ X. Hence, x∗ ∈ ∂f(x) and so F(x) ⊆ ∂f(x) for all x ∈ X .

(5.3.14)

On the other hand, if x∗ ∈ ∂f(x), then ⟨x∗ , u − x⟩ ≤

1 [‖u‖2 − ‖x‖2 ] 2

for all u ∈ X .

(5.3.15)

Let u = x + λh with λ > 0 and h ∈ X. Then 1 1 2 [‖x + λh‖2 − ‖x‖2 ] ≤ [(‖x‖ + λ‖h‖) − ‖x‖2 ] 2λ 2λ 1 ≤ ‖x‖‖h‖ + λ2 ‖h‖ for all λ > 0 . 2

⟨x∗ , h⟩ ≤

This implies ⟨x∗ , h⟩ ≤ ‖x‖‖h‖ .

(5.3.16)

Moreover, if we choose u = (1 − λ)x in (5.3.15), divide by λ > 0, and let λ → 0+ , then ⟨x∗ , x⟩ ≥ ‖x‖2 .

(5.3.17)

From (5.3.16) and (5.3.17) it follows that ⟨x∗ , x⟩ = ‖x‖2 = ‖x∗ ‖2∗ and so ∂f(x) = F(x); see (5.3.14).

5.3 Conjugate Functions – Convex Subdifferential |

419

(c) Let X be a Banach space and let C ⊆ X be a closed and convex set. The normal cone N C (x) to C at x ∈ C is defined by N C (x) = {x∗ ∈ X ∗ : ⟨x∗ , u − x⟩ ≤ 0 for all u ∈ C} . This is a closed, convex cone with 0 ∈ N C (x) and N C (x) = ∂i C (x) for all x ∈ C . Note that D(∂i C ) = C and ∂i C (x) = {0} if x ∈ int C. If C is a vector subspace of X, then ∂i C (x) = C⊥ for all x ∈ C; see Definition 3.2.24. In general, the normal cone to C at x ∈ C is the set of normal vectors to half-spaces, which support C at x. Remark 5.3.16. If X = ℝ, then from Example 5.3.15(a), we get for f(x) = |x| with x ∈ ℝ that −1 if x < 0 , { { { ∂f(x) = {{v ∈ ℝ : |v| ≤ 1} if x = 0 , { { if x > 0 . {+1 Proposition 5.3.17. If X is a normed space and f : X → ℝ is subdifferentiable at every x ∈ X, then f ∈ Γ0 (X). Proof. First we show that f is convex. So, let x∗ ∈ ∂f(x) and u, v ∈ X. Then ⟨x∗ , u − x⟩ ≤ f(u) − f(x)

and ⟨x∗ , v − x⟩ ≤ f(v) − f(x)

and ⟨x∗ , λ(u − x)⟩ ≤ λ[f(u) − f(x)] ⟨x∗ , (1 − λ)(v − x)⟩ ≤ (1 − λ)[f(v) − f(x)]

(5.3.18)

with λ ∈ (0, 1). Adding these two inequalities in (5.3.18), we obtain ⟨x∗ , λu + (1 − λ)v − x⟩ ≤ λf(u) + (1 − λ)f(v) − f(x) .

(5.3.19)

So, if x = λu + (1 − λ)v, then inequality (5.3.19) gives f(λv + (1 − λ)v) ≤ λf(u) + (1 − λ)f(v) , that is, f is convex. Next we show that f is lower semicontinuous. So, let x n → x in X and let x∗ ∈ ∂f(x). Then ⟨x∗ , x n − x⟩ ≤ f(x n ) − f(x) for all n ∈ ℕ . Therefore, f(x) ≤ lim inf n→∞ f(x n ), that is, f is lower semicontinuous. We conclude that f ∈ Γ0 (X). One of the main uses of the subdifferential is to detect minimizers. In fact, directly from Definition 5.3.13, one has the following extension of the classical Fermat rule.

420 | 5 Convex Functions – Nonsmooth Analysis Proposition 5.3.18. If X is a normed space and f : X → ℝ is proper and convex, then x0 ∈ X is a (global) minimizer of f if and only if 0 ∈ ∂f(x0 ). Proposition 5.3.19. If X is a Banach space and f : X → ℝ is proper and convex, then x∗ ∈ ∂f(x) if and only if f(x) + f(x∗ ) = ⟨x∗ , x⟩. Moreover, if f ∈ Γ0 (X), then x∗ ∈ ∂f(x) if and only if x ∈ ∂f ∗ (x∗ ). Proof. Let x∗ ∈ ∂f(x). Then ⟨x∗ , u − x⟩ ≤ f(u) − f(x) for all u ∈ X. Hence, ⟨x∗ , u⟩ − f(u) + f(x) ≤ ⟨x∗ , x⟩

for all u ∈ X .

This implies f ∗ (x∗ ) + f(x) ≤ ⟨x∗ , x⟩ and so, f(x) + f ∗ (x∗ ) = ⟨x∗ , x⟩ according to the Young–Fenchel inequality; see Proposition 5.3.3(g). Now suppose that f(x) + f ∗ (x∗ ) = ⟨x∗ , x⟩. The continuous affine function u → a(u) = ⟨x∗ , u⟩ − f ∗ (x∗ ) = ⟨x∗ , u⟩ + f(x) − ⟨x∗ , x⟩ is a minorant of f and a(x) = f(x). Therefore, x∗ ∈ ∂f(x). Now we assume that f ∈ Γ0 (X). Then from the first part we have x ∈ ∂f ∗ (x∗ ) if and only if f ∗ (x∗ ) + f ∗∗ (x) = f ∗ (x∗ ) + f(x) = ⟨x∗ , x⟩ (see Propositions 5.3.6 and 5.3.7) if and only if x∗ ∈ ∂f(x), from the first part of the proof. From Propositions 5.3.18 and 5.3.19 we deduce the following corollaries. Corollary 5.3.20. If X is a Banach space and f ∈ Γ0 (X), then f attains its infimum on X if and only if ∂f ∗ (0) ∩ X ≠ 0. Corollary 5.3.21. If X is a reflexive Banach space and f ∈ Γ0 (X), then ∂f ∗ = (∂f)−1 . Recall that D(∂f) ⊆ dom f . Combining Proposition 5.1.24 and Proposition 5.2.15(b) we obtain the following. Proposition 5.3.22. If X is a Banach space and f ∈ Γ0 (X), then int dom f ⊆ D(∂f). Directly from Definition 5.3.13 we can state the following proposition. Proposition 5.3.23. If X is a normed space and f, h : X → ℝ, then the following hold: (a) ∂(λf) = λ∂f for all λ > 0; (b) ∂f + ∂h ⊆ ∂(f + h). Next we are going to improve part (b) of the proposition above. Theorem 5.3.24. If X is a normed space, f, h ∈ Γ0 (X), and dom f ∩ dom h ≠ 0 with one of the two functions being continuous at x0 ∈ dom f ∩dom h, then ∂(f +h)(x) = ∂f(x)+∂h(x) for all x ∈ X. Proof. From Proposition 5.3.23(b) we already know that ∂f(x) + ∂h(x) ⊆ ∂(f + h)(x)

for all x ∈ X .

(5.3.20)

5.3 Conjugate Functions – Convex Subdifferential |

421

So, let x∗ ∈ ∂(f + h)(x). Then ⟨x∗ , u − x⟩ + f(x) + h(x) ≤ f(u) + h(u) for all u ∈ X .

(5.3.21)

We consider the following two convex subsets of X × ℝ C1 = {(u, λ) ∈ X × ℝ : f(u) − ⟨x∗ , u − x⟩ − h(x) ≤ λ} C2 = {(u, λ) ∈ X × ℝ : λ ≤ h(x) − h(u)} . We assume that f is continuous at x0 ∈ dom f ∩ dom h. Note that if g(u) = f(u) − ⟨x∗ , u − x⟩ − h(x)

for u ∈ X ,

then C1 = epi g and since g is continuous at x0 , we have that int C1 ≠ 0; see Proposition 5.1.24. Moreover, (5.3.21) implies that int C1 ∩ C2 = 0. We apply now the First Separation Theorem (see Theorem 3.1.59), and find (u∗ , η) ∈ X ∗ × ℝ with (u∗ , η) ≠ 0 such that h(x) − h(u) ≤ ⟨u∗ , u⟩ + η ≤ f(u) − ⟨x∗ , u − x⟩ − f(x)

for all u ∈ X .

(5.3.22)

If u = x, then η = −⟨u∗ , x⟩ (see (5.3.22)), and so ⟨u − x, −u∗ ⟩ ≤ h(u) − h(x)

for all u ∈ X ,

(5.3.23)

⟨u − x, u + x ⟩ ≤ f(u) − f(x) for all u ∈ X .

(5.3.24)

∗

∗

From (5.3.23) we see that −u∗ ∈ ∂h(x) and from (5.3.24) we have u∗ + x∗ ∈ ∂f(x). Therefore, x∗ ∈ ∂(f + h)(x) has been decomposed as x∗ = (u∗ + x∗ ) + (−u∗ ) with u∗ + x∗ ∈ ∂f(x) and −u∗ ∈ ∂h(x). This means that ∂(f + h)(x) ⊆ ∂f(x) + ∂h(x). This and (5.3.20) imply that ∂(f + h)(x) = ∂f(x) + ∂h(x)

for all x ∈ X .

Remark 5.3.25. By induction we can extend this to any finite set {f k }Nk=1 ⊆ Γ0 (X) and obtain N

N

∂ ( ∑ f k ) (x) = ∑ ∂f k (x) k=1

for all x ∈ X

k=1

provided that ⋂Nk=1 dom f k ≠ 0 and all but one of the functions are continuous at a point x0 ∈ ⋂Nk=1 dom f k . If f is lower semicontinuous and f ∗ is Fréchet differentiable, then f ∈ Γ0 (X). Another subdifferential rule concerns composite functions. Theorem 5.3.26. If X, Y are normed spaces, A ∈ L(X, Y), f ∈ Γ0 (Y), and there is a point A(x0 ) ∈ Y where f is continuous and finite, then ∂(f ∘ A)(x) = A∗ ∂f(A(x)) for all x ∈ X.

422 | 5 Convex Functions – Nonsmooth Analysis Proof. Evidently, f ∘ A ∈ Γ0 (X). Let y∗ ∈ ∂f(A(x)). We obtain ⟨y∗ , y − A(x)⟩Y ≤ f(y) − f(A(x)) for all y ∈ Y .

(5.3.25)

Choosing y = A(u) with u ∈ X in (5.3.25) yields ⟨y∗ , A(u) − A(x)⟩Y ≤ (f ∘ A)(u) − (f ∘ A)(x) for all x ∈ X , which implies ⟨A∗ (y∗ ), u − x⟩X ≤ (f ∘ A)(u) − (f ∘ A)(x) for all x ∈ X . Hence A∗ (y∗ ) ∈ ∂(f ∘ A)(x). So, we have proved that A∗ (∂f(A(x))) ⊆ ∂(f ∘ A)(x) for all x ∈ X .

(5.3.26)

Now, let x∗ ∈ ∂(f ∘ A)(x). Then ⟨x∗ , u − x⟩X ≤ (f ∘ A)(u) − (f ∘ A)(x) for all u ∈ X .

(5.3.27)

In Y × ℝ we consider the affine space L = {(A(u), ⟨x∗ , u − x⟩ + (f ∘ A)(x)) : u ∈ X} . From (5.3.27) and the hypotheses of the theorem, we see that L ∩ (int epi f) = 0. So, as before by the First Separation Theorem, there exists (y∗ , η) ∈ Y ∗ × ℝ such that the closed hyperplane H, which is the graph of the affine function y → ⟨y∗ , y⟩Y + η separates L and epi f . Since L ⊆ H, we get ⟨y∗ , A(u)⟩Y + η = ⟨x∗ , u − x⟩X + (f ∘ A)(x) for all u ∈ X . It follows that ⟨y∗ , A(u)⟩Y = ⟨x∗ , u⟩X

for all u ∈ X ,

(5.3.28)

η = (f ∘ A)(x) − ⟨x , x⟩X

for all u ∈ X .

(5.3.29)

∗

Equality (5.3.28) implies A∗ (y∗ ) = x∗ . Moreover, since H ∩ int epi f = 0, we obtain ⟨y∗ , y − A(x)⟩ ≤ f(y) − (f ∘ A)(x) for all y ∈ Y , see (5.3.29). Hence, y∗ ∈ ∂f(A(x)) which implies x∗ = A∗ (y∗ ) ∈ A∗ (∂f(A(x))). Therefore, ∂(f ∘ A)(x) ⊆ A∗ (∂f(A(x))) for all x ∈ X .

(5.3.30)

From (5.3.26) and (5.3.30) we conclude that ∂(f ∘ A)(x) = A∗ (∂f(A(x))) for all x ∈ X. In the next proposition we give some more subdifferential calculus rules. They are easy consequences of the definition of the subdifferential.

5.3 Conjugate Functions – Convex Subdifferential |

423

Proposition 5.3.27. If X is a normed space and f : X → ℝ is proper and convex, then the following hold: (a) for h(x) = f(x + x0 ) with x ∈ X, we have ∂h(x) = ∂f(x + x0 ); (b) for h(x) = λf(x) with λ > 0, we have ∂h(x) = λ∂f(x); (c) for h(x) = f(λx) with λ > 0, we have ∂h(x) = λ∂f(λx). We will return to the properties of the convex subdifferential in Section 6.1, where we discuss maximal monotone maps. The convex subdifferential is a prime example of such a map. In addition to the subdifferential of a proper, convex function, we can also define the approximate subdifferential also called ε-subdifferential, which is also a useful tool in some occasions. Definition 5.3.28. Let X be a normed space and f : X → ℝ is a proper, convex function. For each ε ≥ 0, the ε-subdifferential of f at x ∈ dom f is defined to be the w∗ -closed set ∂ ε f(x) = {x∗ ∈ X ∗ : ⟨x∗ , u − x⟩ − ε ≤ f(u) − f(x) for all u ∈ X} . Remark 5.3.29. When ε = 0, we recover the notion of the convex subdifferential; see Definition 5.3.13. However, there is a basic difference between the subdifferential, that is, ε = 0, and the ε-subdifferential, that is, ε > 0. The usual subdifferential ∂f is a local notion while ∂ ε f is a global one, that is, the behavior of f on all of X may be relevant to the construction of ∂ ε f . This explains why ∂f and ∂ ε f have in general different properties. The next proposition presents an important such difference. Proposition 5.3.30. If X is a normed space and f ∈ Γ0 (X), then for every ε > 0 and every x ∈ dom f , we have ∂ ε f(x) ≠ 0. Proof. Note that (x, f(x) − ε) ∈ ̸ epi f . Then by the Strong Separation Theorem (see Theorem 3.1.60), there exists (u∗ , η) ∈ X ∗ × ℝ with (u∗ , η) ≠ 0 such that ⟨u∗ , x⟩ + η(f(x) − ε) < ⟨u∗ , u⟩ + ηλ

for all (u, λ) ∈ epi f .

(5.3.31)

We choose u = x and λ = f(x) in (5.3.31) to get η(−ε) < 0, which implies η > 0. Without any loss of generality, we can assume that η = 1 by replacing u∗ with 1/ηu∗ . Setting x∗ = −u∗ and λ = f(u) gives ⟨x∗ , u − x⟩ − ε ≤ f(u) − f(x)

for all u ∈ X .

This shows that x∗ ∈ ∂f ε f(x). The next proposition generalizes the formula in Proposition 5.2.15(c). Proposition 5.3.31. If X is a normed space and f ∈ Γ0 (X), then for any x ∈ dom f we have f+ (x; h) = lim σ(h; ∂ ε f(x)) for all h ∈ X . ε↘0

(5.3.32)

424 | 5 Convex Functions – Nonsmooth Analysis Proof. Clearly, {∂ ε f(x)}ε>0 is decreasing with ε > 0. So, the limit on the right-hand side of (5.3.32) exists in ℝ∗ . For ε > 0, x∗ ∈ ∂f ε (x), λ > 0, and h ∈ X, we obtain ⟨x∗ , h⟩ ≤ Let λ = √ε. Then

1 [f(x + λh) − f(x) + ε] . λ

1 [f (x + √εh) − f(x)] + √ε . √ε

⟨x∗ , h⟩ ≤ This implies

σ(h; ∂ ε f(x)) ≤ f+ (x; h) for all h ∈ X .

(5.3.33)

Evidently, we may assume that f+ (x; h) > −∞. Let ϑ ∈ ℝ such that ϑ < f+ (x; h) and let ε > 0. For λ ∈ [0, 1] we obtain f(x) + λϑ ≤ f(x + λh). In X × ℝ we consider the sets K = {(x, f(x) − ε) + λ(h, ϑ) : 0 ≤ λ ≤ 1}

and

epi f .

Both are convex, K is compact, epi f is closed, and K ∩ epi f = 0. So, by the Strong Separation Theorem there exists (x∗ , η) ∈ X ∗ × ℝ with (x∗ , η) ≠ 0 such that ⟨x∗ , x + λh⟩ + η(f(x) − ε + λϑ) < ⟨x∗ , u⟩ + ηf(u)

(5.3.34)

for all u ∈ dom f and 0 ≤ λ ≤ 1. In (5.3.34) we choose u = x and λ = 0. Then η(−ε) < 0 and so η > 0. Now let u = x and λ = 1 and divide with η > 0. We obtain 1 ϑ − ε ≤ ⟨− x∗ , h⟩ . λ

(5.3.35)

For given v ∈ X with x + v ∈ dom f , let u = x + v and λ = 0. Dividing with λ > 0, we get 1 ⟨− x∗ , v⟩ ≤ f(x + v) − f(x) + ε . λ Hence, −1/λx∗ ∈ ∂ ε f(x) and so, due to (5.3.35), ϑ − ε ≤ σ(h; ∂ ε f(x)). This gives f+ (x; h) − ε ≤ σ(h; ∂ ε f(x))

for all h ∈ X and ε > 0 .

(5.3.36)

From (5.3.33) and (5.3.36) we conclude that (5.3.32) holds. Next we introduce a notion that is useful in optimization theory. Definition 5.3.32. Let X be a normed space and f ∈ Γ0 (X). The recession function f ∞ of f is defined by f ∞ (h) = lim

λ→+∞

f(x + λh) λ

for all x ∈ dom f and for all h ∈ X .

(5.3.37)

Remark 5.3.33. The function ξ(λ) = f(x + λh) is convex on ℝ. Hence, λ → (ξ(λ)− ξ(0))/λ is nondecreasing and therefore the limit in (5.3.37) exists.

5.3 Conjugate Functions – Convex Subdifferential |

425

Proposition 5.3.34. If X is a normed space and f ∈ Γ0 (X), then f ∞ (h) = sup [

f(x + λh) − f(x) : λ > 0] = sup [f(x + h) − f(x) : x ∈ dom f ] . λ

Proof. First we prove that f ∞ (h) = lim

λ→+∞

f(x + λh) f(x + λh) − f(x) = sup [ : λ > 0] λ λ

(5.3.38)

for x ∈ dom f and h ∈ X. Note that lim

λ→+∞

f(x + λh) f(x + λh) − f(x) = lim λ λ λ→+∞ f(x + λh) − f(x) : λ > 0] . ≤ sup [ λ

(5.3.39)

We need to show that the opposite inequality also holds. We fix λ > 0 and t > λ. The convexity of f gives f(x + λh) = f ((1 − Hence,

λ λ λ λ ) x + (x + th)) ≤ (1 − ) f(x) + f(x + th) . t t t t

f(x + λh) − f(x) f(x + th) − f(x)) ≤ . λ t

This yields sup [

f(x + th) − f(x) f(x + λh) − f(x) : λ > 0] ≤ lim t→+∞ λ t f(x + th) = lim = f ∞ (h) . t→+∞ t

(5.3.40)

From (5.3.39) and (5.3.40) we infer that (5.3.38) holds. In order to finish the proof of the proposition we need to show that, for every u ∈ dom f and h ∈ X, there holds sup [f(x + h) − f(x) : x ∈ dom f ] = sup [

f(u + λh) − f(u) : λ > 0] . λ

(5.3.41)

Let x, u ∈ dom f and h ∈ X. Since f ∈ Γ0 (X) we obtain 1 1 ) x + (u + λh)) λ λ λ→+∞ 1 1 ≤ lim inf [(1 − ) f(x) + f(u + λh)] λ λ λ→+∞ f(u + λh) − f(u) = f(x) + lim . λ λ→+∞

f(x + h) ≤ lim inf f ((1 −

This implies sup [f(x + h) − f(x) : x ∈ dom f ] ≤ sup [

f(u + λh) − f(u) : λ > 0] . λ

(5.3.42)

426 | 5 Convex Functions – Nonsmooth Analysis Let ϑ = sup [f(x + h) − f(x) : x ∈ dom f ] and assume that ϑ < +∞. Then x + h ∈ dom f for every x ∈ dom f and so from f(x + h) ≤ ϑ + f(x) we deduce that m

f(x + mh) = f(x) + ∑ [f(x + kh) − f(x + (k − 1)h)] ≤ f(x) + mϑ

(5.3.43)

k=1

for all m ∈ ℕ0 . Let n, m ∈ ℕ0 with n > m. Then, the convexity of f and (5.3.43) imply f (x +

x + mh 1 1 m 1 h) = f ((1 − ) x + ) ≤ (1 − ) f(x) + f(x + mh) n n n n n 1 1 m ≤ (1 − ) f(x) + (f(x) + mϑ) = f(x) + ϑ . n n n

Exploiting the lower semicontinuity of f , we obtain

Hence,

f(x + λh) ≤ f(x) + λϑ

for all λ ≥ 0 .

f(x + λh) − f(x) ≤ϑ λ

for all λ > 0 ,

which gives sup [

f(x + λh) − f(x) : λ > 0] ≤ ϑ . λ

(5.3.44)

From (5.3.42) and (5.3.44) it follows that (5.3.41) holds. So, we have proven the proposition. Corollary 5.3.35. If X is a normed space and f ∈ Γ0 (X), then f ∞ is independent of x ∈ dom f ; see Definition 5.3.32. Proposition 5.3.36. If X is a normed space and f ∈ Γ0 (X), then f ∞ ∈ Γ0 (X) and it is positively homogeneous. Proof. For every x ∈ dom f , the function g(h) = f(x + h) − f(x) belongs to Γ0 (X). Then invoking Proposition 5.3.34, we conclude that f ∞ ∈ Γ0 (X). For the positive homogeneity, note that for given h ∈ X and t > 0, we have f ∞ (th) = lim

λ→+∞

f(x + λth) − f(x) . λ

We set s = λt. Then f ∞ (th) = t lim

s→+∞

f(x + sh) − f(x) = tf ∞ (h) . s

Proposition 5.3.37. If X is a normed space and f ∈ Γ0 (X), then f ∞ (h) + f ∞ (−h) ≥ 0 for all h ∈ X.

5.4 Proximinal and Chebyshev Sets |

427

Proof. We assume that f ∞ (h), f ∞ (−h) ∈ ℝ, otherwise the inequality is clearly true. Taking Proposition 5.3.34 into account gives x + h ∈ dom f and x − h ∈ dom f for all x ∈ dom f . Therefore, using once more Proposition 5.3.34 leads to f ∞ (h) + f ∞ (−h) ≥ sup [f(x) − f(x − h) : x ∈ dom f ] + sup [f(x − h) − f(x) : x ∈ dom f ] ≥ 0 . Proposition 5.3.38. If X is a normed space, f ∈ Γ0 (X), and inf X f > −∞, then f ∞ (h) ≥ 0 for all h ∈ X. Proof. Let m = inf X f > −∞. Let x ∈ dom f and h ∈ X. Then f ∞ (h) = lim

λ→+∞

f(x + λh) m ≥ lim =0. λ λ→+∞ λ

Another easy consequence of Definition 5.3.32 is the following formula. Proposition 5.3.39. If X is a normed space and {f k }nk=1 ⊆ Γ0 (X) with ⋂nk=1 dom f k ≠ 0, then ∞

n

( ∑ fk ) k=1

n

= ∑ f k∞ . k=1

Remark 5.3.40. If f ∈ Γ0 (X) is positively homogeneous of degree p > 1, that is, f(λx) = λ p f(x) for all λ > 0 and for all x ∈ X, then we have {0 if f(h) = 0 , f ∞ (h) = { +∞ otherwise . { On the other hand, if f is positively homogeneous of degree 1, then f ∞ = f .

5.4 Proximinal and Chebyshev Sets In Theorem 3.5.18 we proven that every nonempty closed convex set in a Hilbert space has the unique best approximation property. In this section we examine such approximation properties for sets in general normed spaces, not necessarily Hilbert spaces. Definition 5.4.1. Let X be a normed space and let C ⊆ X be a nonempty set. The best approximation map p C : X → 2C is defined by p C (x) = {c ∈ C : ‖x − c‖ = d(x, C)}

for all x ∈ X .

If p C (x) ≠ 0 for every x ∈ X, then C is said to be proximinal. If p C (x) is a singleton for every x ∈ X, then C is said to be a Chebyshev set. In that case p C : X → C is called metric projection.

428 | 5 Convex Functions – Nonsmooth Analysis Proposition 5.4.2. If X is a reflexive Banach space and C ⊆ X is nonempty, closed, and convex, then p C (x) ≠ 0 for every x ∈ X. Proof. Let {c n }n≥1 ⊆ C be such that that ‖x − c n ‖ ↘ d(x, C). Evidently, {c n }n≥1 ⊆ C is w bounded. So, by passing to a subsequence if necessary, we may assume that c n → c ∈ C. We obtain ‖x − c‖ ≤ lim inf n→∞ ‖x − c n ‖ = d(x, C), which shows that ‖x − c‖ = d(x, C) and so c ∈ p C (x). Proposition 5.4.3. If X is a strictly convex Banach space and C ⊆ X is nonempty, closed, and convex, then p C (x) = 0 for all x ∈ X or it is a singleton. ̂ < Proof. Let c, ĉ ∈ p C (x). The strict convexity of X implies that, if c ≠ c,̂ then ‖2x−(c+ c)‖ 2d(x, C); see Definition 3.4.21(a). Hence, 1 x − (c + c)̂ < d(x, C) , 2 a contradiction since 1/2(c + c)̂ ∈ C. So, p C (x) is either empty or a singleton. Combining Proposition 5.4.2 and 5.4.3 we can state the following result. Corollary 5.4.4. If X is a reflexive and strictly convex Banach space and C ⊆ X is nonempty, closed, and convex, then C is Chebyshev. Using the notion of proximinality we can characterize reflexive Banach spaces. Theorem 5.4.5. If X is a Banach space, then X is reflexive if and only if every nonempty, closed, convex set is proximinal. Proof. ⇒: This implication is stated in Proposition 5.4.2. ⇐: If X is not reflexive, then on account of James’ Theorem (see Theorem 3.3.41), there exists x∗ ∈ X ∗ with ‖x∗ ‖∗ = 1 such that ⟨x∗ , x⟩ < 1 for all x ∈ X with ‖x‖ ≤ 1. Let C = (x∗ )−1 (1). Then d(0, C) = 1 < ‖x‖ for all x ∈ C. But C = (x∗ )−1 (1) is nonempty, closed, and convex, and thus by hypothesis proximinal, a contradiction. Proposition 5.4.6. If X is a Banach space and C ⊆ X is a Chebyshev set, then the metric projection map p C : X → C has a closed graph and is locally bounded. Proof. Let x n → x in X and p C (x n ) → u ∈ C in C. We have ‖x n − p C (x n )‖ − ‖x − p C (x)‖ = |d(x n , C) − d(x, C)| ≤ ‖x n − x‖ . Therefore, ‖x n − p C (x n )‖ ≤ ‖x n − x‖ + ‖x − p C (x)‖ .

(5.4.1)

This yields ‖x − u‖ ≤ ‖x − p C (x)‖, hence u = p C (x) and so Gr p C is closed. From (5.4.1) it is clear that x → p C (x) is locally bounded.

5.4 Proximinal and Chebyshev Sets |

429

Corollary 5.4.7. If X is a Banach space and C ⊆ X is a locally compact Chebyshev set, then p C : X → C is continuous. Proof. Arguing by contradiction, suppose that p C is not continuous at x0 . Then there exist a sequence {x n }n≥1 ⊆ X and ε > 0 such that x n → x0 in X

and ‖p C (x n ) − p C (x0 )‖ ≥ ε

for all n ∈ ℕ .

(5.4.2)

From Proposition 5.4.6, p C is locally bounded. So, it follows that {p C (x n )}n≥1 ⊆ C is bounded. Since by hypothesis C is locally compact, we may assume that p C (x n ) → u ∈ C and u ≠ p C (x0 ); see (5.4.2). Then we obtain ‖x0 − u‖ ≤ ‖x0 − p C (x0 )‖. Hence, u ∈ p C (x0 ) and so u = p C (x0 ) since C is Chebyshev, a contradiction. Corollary 5.4.8. If X is a finite dimensional Banach space and C ⊆ X is a Chebyshev set, then the metric projection map p C : X → C is continuous. Proof. Note that C is a closed subset of a locally compact space. Hence, C is locally compact and we can apply Corollary 5.4.7. Proposition 5.4.9. If X is a reflexive Banach space and C ⊆ X is a weakly closed Chebyshev set, then the metric projection map p C : X → C is norm-to-weak continuous. Proof. Suppose that x n → x in X. Then ‖x n −p C (x n )‖ = d(x n , C) → d(x, C) = ‖x−p C (x)‖. w Since {p C (x n )}n≥1 ⊆ C is bounded, we may assume that p C (x n ) → u ∈ C since C is weakly closed. Then we obtain ‖x − u‖ ≤ lim inf ‖x n − p C (x n )‖ = ‖x − p C (x)‖ , n→∞

w

which shows that u = p C (x). Therefore, we have p C (x n ) → p C (x) for the original sequence and so we conclude that p C : X → C is norm-to-weak continuous. Corollary 5.4.10. If X is a reflexive Banach space with the Kadec–Klee property (see Definition 3.4.30), and C ⊆ X is a weakly closed Chebyshev set, then the metric projection map p C : X → C is continuous. Proof. From the proof of Proposition 5.4.9 we know that x n → x in X implies x n − w p C (x n ) → x − p C (x) and ‖x n − p C (x n )‖ → ‖x − p C (x)‖. So, by the Kadec–Klee property it follows that x n − p C (x n ) → x − p C (x) in X, which implies p C (x n ) → p C (x) in X. Hence, p C is continuous. Next we focus on Hilbert spaces and try to characterize the Chebyshev sets. We start with a technical duality formula that will be used in the sequel. Lemma 5.4.11. If H is a Hilbert space, C ⊆ H is a nonempty, closed subset, and f = 1/2‖ ⋅ ‖2 + i C , then d2C = ‖ ⋅ ‖2 − 2f ∗ , where d C (x) = inf[‖x − c‖ : c ∈ C].

430 | 5 Convex Functions – Nonsmooth Analysis Proof. We identify H and H ∗ ; see Theorem 3.5.21. According to Definition 5.3.1 we see that 1 2 1 ‖x‖ : x ∈ C] = sup [(h, x) − (x, x) : x ∈ C] 2 2 1 1 1 = sup [ (h, h) + (h, x) − (h, h) − (x, x) : x ∈ C] 2 2 2 1 1 1 = (h, h) + sup [− (x, x) + (h, x) − (h, h) : x ∈ C] 2 2 2 1 1 1 1 = (h, h) − inf [(x, x) − 2(h, x) + (h, h) : x ∈ C] = (h, h) − d2C (h) 2 2 2 2

f ∗ (h) = sup [(h, x) −

for all h ∈ H. Hence, d2C = ‖ ⋅ ‖2 − 2f ∗ . Using this lemma we can characterize closed and convex sets in a Hilbert space by employing the distance function. Theorem 5.4.12. If H is a Hilbert space and C ⊆ H is nonempty and closed, then the following statements are equivalent: (a) C is convex; (b) d2C is Fréchet differentiable; (c) d2C is Gateaux differentiable. Proof. (a) ⇒ (b): Let φ(u) = 1/2d2C (u). We have 1 1 φ(u) = inf [ ‖u − c‖2 : c ∈ C] = inf [ ‖u − c‖2 + i C (c) : c ∈ H] 2 2 1 = ( ‖ ⋅ ‖2 ⊕ i C ) (u) for all u ∈ H . 2 Then, by Proposition 5.3.9, we obtain φ∗ (u∗ ) =

1 ∗ 2 ‖u ‖ + σ(u∗ ; C) 2

for all u∗ ∈ H .

From the Young–Fenchel inequality (see Proposition 5.3.3(g)), it follows that (u∗ , u) −

1 ∗ 2 ‖u ‖ − σ(u∗ ; C) ≤ φ(u) for all u, u∗ ∈ H . 2

(5.4.3)

Fix v ∈ H and let u∗ = v − p C (v). From the properties of the metric projection (see Proposition 3.5.20), one gets 0 ≤ (v − p C (v), p C (v) − c) for all c ∈ C. This implies σ(u∗ ; C) = sup [(u∗ , u) : c ∈ C] = sup [(v − p C (v), c) : c ∈ C] = (v − p C (v), p C (v)) . We return to (5.4.3) and use (5.4.4). We obtain (v − p C (v), u − p C (v)) −

1 ‖v − p C (v)‖2 ≤ φ(u), 2

(5.4.4)

5.5 Smoothness of the Norm |

431

which implies (v − p C (v), u − p C (v)) − φ(v) ≤ φ(u) . This finally gives φ(u) − φ(v) ≥ (v − p C (v), u − p C (v)) − 2φ(v) = (v − p C (v), u − p C (v)) − ‖v − p C (v)‖2 = (v − p C (v), u − v) . Therefore, 0 ≤ φ(u) − φ(v) − (v − p C (v), u − v) .

(5.4.5)

Reversing the roles of u and v, we also have 0 ≤ φ(v) − φ(u) − (u − p C (u), v − u) .

(5.4.6)

From (5.4.5) and (5.4.6) it follows that 0 ≤ φ(u) − φ(v) − (v − p C (v), u − v) ≤ 2‖u − v‖2 and so 0≤

φ(u) − φ(v) − (v − p C (v), u − v) ≤ 2‖u − v‖ . ‖u − v‖

This shows that φ (v) = v − p C (v) and so φ ∈ C1 (H). (b) ⇒ (c): This is immediate. (c) ⇒ (a): Let f = 1/2‖ ⋅ ‖2 + i C . Taking Lemma 5.4.11 into account yields f ∗ = 1/2 [‖ ⋅ ‖2 − d2C ]. Hence, f ∗ is Gateaux differentiable. So, by the Kadec–Klee property of Hilbert spaces it follows that the Gateaux derivative of d2C is continuous, hence d2C is Fréchet differentiable. Then so is f ∗ and this implies the convexity of C = dom f ; see Remark 5.3.25. Finally we mention a result of Vlasov [297] on Chebyshev sets. For a proof of this theorem we refer to Giles [126, p. 245]. Theorem 5.4.13. If X is a Banach space with strictly convex dual space, then every Chebyshev set with continuous metric projection is convex. Remark 5.4.14. In a Hilbert space a Chebyshev set is convex if and only if the metric projection map is nonexpansive. Moreover, in a Hilbert space a weakly closed Chebyshev set is convex.

5.5 Smoothness of the Norm In this section we present some basic results on the differentiability properties of the norms of Banach spaces. These properties provide important information about the geometry of the Banach space. We start with a basic duality theorem from the theory of convex sets. First we state a definition.

432 | 5 Convex Functions – Nonsmooth Analysis Definition 5.5.1. Let X be a topological vector space and A ⊆ X. The polar of A is the set A∘ ⊆ X ∗ defined by A∘ = {x∗ ∈ X ∗ : ⟨x∗ , a⟩ ≤ 1 for all a ∈ A} . Given a subset C ⊆ X ∗ , the prepolar of C is the set ∘ C ⊆ X defined by ∘

C = {x ∈ X : ⟨c, x⟩ ≤ 1 for all c ∈ C} .

The next theorem is known as the “Bipolar Theorem.” It is the basic result concerning polars of sets in Banach spaces. Theorem 5.5.2 (Bipolar Theorem). If X is a Banach space and A ⊆ X as well as C ⊆ X ∗ , then the following hold: (a) ∘ (A∘ ) is the closed balanced convex hull of A, that is ∘

(A∘ ) = conv [{0} ∪ A] ;

(b) (∘ C)∘ is the w∗ -closed balanced convex hull of C, that is, (∘ C)∘ = conv w [{0} ∪ C] . ∗

Proof. (a) Clearly, it holds that {0} ∪ A ⊆ ∘ (A∘ ). But the set ∘ (A∘ ) is closed and convex; see Definition 5.5.1. Therefore, conv [{0} ∪ A] ⊆ ∘ (A∘ ) .

(5.5.1)

On the other hand, every closed half-space that contains {0} ∪ A also contains ∘ (A∘ ). Hence, conv [{0} ∪ A] ⊇ ∘ (A∘ ) .

(5.5.2)

Recall that conv D is the intersection of all closed half-spaces in X that contain D. From (5.5.1) and (5.5.2), it follows that ∘

(A∘ ) = conv [{0} ∪ A] .

(b) This follows from part (a) above. Corollary 5.5.3. If X is a Banach space and A ⊆ X, then span A is dense in X if and only if x∗ = 0 is the only functional that vanishes on A. Corollary 5.5.4. If X is a Banach space and C ⊆ X ∗ , then C separates points in X, that is, C is total, if and only if span C is w∗ -dense in X ∗ . The Bipolar Theorem (see Theorem 5.5.2) is used to recognize dual norms. Proposition 5.5.5. If (X, ‖ ⋅ ‖) is a Banach space and | ⋅ |∗ is a norm on X ∗ equivalent to the dual norm ‖ ⋅ ‖∗ of ‖ ⋅ ‖, then | ⋅ |∗ is a dual norm to some norm | ⋅ | on X equivalent to ‖ ⋅ ‖ if and only if | ⋅ |∗ is w∗ -lower semicontinuous on X ∗ .

5.5 Smoothness of the Norm |

433

Proof. ⇒: By hypothesis, |x∗ |∗ = sup[⟨x∗ , x⟩ : |x| ≤ 1]. So, | ⋅ |∗ is the supremum of w∗ -continuous linear functionals. Hence, | ⋅ |∗ is w∗ -lower semicontinuous. ⇐: Let B1 = {x∗ ∈ X ∗ : |x∗ |∗ ≤ 1}. Then B1 is w∗ -closed and by the Bipolar Theorem (see Theorem 5.5.2), we obtain B1 = (∘ B1 )∘ . Then | ⋅ |∗ is the dual norm to the equivalent norm given by the Minkowski functional of ∘ B1 . Proposition 5.5.6. If (X, ‖ ⋅ ‖) is a Banach space and if x ∈ X with ‖x‖ = 1, then the following hold: (a) ‖ ⋅ ‖ is Fréchet differentiable at x if and only if ‖x∗n − u∗n ‖∗ → 0 whenever x∗n , u∗n ∈ X ∗ with ‖x∗n ‖∗ = ‖u∗n ‖ = 1 satisfy lim ⟨x∗n , x⟩ = lim ⟨u∗n , x⟩ = 1 ;

n→∞

(5.5.3)

n→∞

w∗

(b) ‖ ⋅ ‖ is Gateaux differentiable at x if and only if x∗n − u∗n → 0 whenever x∗n , u∗n ∈ X ∗ with ‖x∗n ‖∗ = ‖u∗n ‖ = 1 fulfill (5.5.3). Proof. (a) ⇒: Since by hypothesis ‖ ⋅ ‖ is Fréchet differentiable at x, for a given ε > 0 there exists δ > 0 such that ‖x + h‖ + ‖x − h‖ ≤ 2 + ε‖h‖ for all ‖h‖ ≤ δ ;

(5.5.4)

see Proposition 5.2.10. Let x∗n , u∗n ∈ X ∗ with ‖x∗n ‖∗ = ‖u∗n ‖∗ = 1 such that (5.5.3) is satisfied. We can find a number n0 ∈ ℕ such that max {|⟨x∗n , x⟩ − 1|, |⟨u∗n , x⟩ − 1|} ≤ εδ

for all n ≥ n0 .

(5.5.5)

Then, due to (5.5.4) and (5.5.5), we obtain, for n ≥ n0 and ‖h‖ ≤ δ, that ⟨x∗n − u∗n , h⟩ = ⟨x∗n , x + h⟩ + ⟨u∗n , x − h⟩ − ⟨x∗n + u∗n , x⟩ ≤ ‖x + h‖ + ‖x − h‖ − ⟨x∗n + u∗n , x⟩ ≤ 2 + ε‖h‖ −

⟨x∗n

+

u∗n ,

(5.5.6)

x⟩ ≤ 3εδ .

Taking (5.5.6) into account, we get for n ≥ n0 that ‖x∗n − u∗n ‖∗ = sup [⟨x∗n − u∗n , y⟩ : ‖y‖ = 1] = sup [

⟨x∗n − u∗n , δy⟩ : ‖y‖ = 1] ≤ 3ε . δ

Hence, limn→∞ ‖x∗n − u∗n ‖∗ = 0. ⇐: Arguing by contradiction, suppose that ‖ ⋅ ‖ is not Fréchet differentiable at x. Applying Proposition 5.2.10, there exist ε > 0 and a sequence {h n }n≥1 ⊆ X with h n → 0 in X such that ‖x + h n ‖ + ‖x − h n ‖ ≥ 2 + ε‖h n ‖ for all n ∈ ℕ . Let us choose x∗n , u∗n ∈ X ∗ with ‖x∗n ‖∗ = ‖u∗n ‖∗ = 1 such that ⟨x∗n , x + h n ⟩ = ‖x + h n ‖

and ⟨u∗n , x − h n ⟩ = ‖x − h n ‖

for all n ∈ ℕ .

(5.5.7)

434 | 5 Convex Functions – Nonsmooth Analysis We obtain ∗ ⟨x n , h n ⟩ ≤ ‖h n ‖

and ‖x + h n ‖ − ‖x‖ ≤ ‖h n ‖

for all n ∈ ℕ .

Hence, ⟨x∗n , h n ⟩ → 0 and ‖x + h n ‖ → 1 as n → ∞. Because of (5.5.7) we then derive lim ⟨x∗n , x⟩ = lim [⟨x∗n , x + h n ⟩ − ⟨x∗n , h n ⟩] = lim [‖x + h n ‖ − ⟨x∗n , h n ⟩] = 1 .

n→∞

n→∞

n→∞

Similarly, we show that limn→∞ ⟨u∗n , x⟩ = 1. (b) This equivalence is proven similarly as (a) using Proposition 5.2.11 this time. Remark 5.5.7. Note that the second statement in (a) is equivalent to the following one: {x∗n }n≥1 ⊆ X ∗ with ‖x∗n ‖∗ = 1 is convergent whenever ⟨x∗n , x⟩ → 1 . Similarly, the second statement in (b) can be written equivalently as follows: there exists a unique x∗ ∈ X ∗ with ‖x∗ ‖∗ = 1 such that ⟨x∗ , x⟩ = 1 . Definition 5.5.8. Let (X, ‖ ⋅ ‖) be a Banach space. We say that ‖ ⋅ ‖ is Fréchet (resp. Gateaux) differentiable if ‖ ⋅ ‖ is Fréchet (resp. Gateaux) differentiable at every x ∈ X \ {0}. Remark 5.5.9. The norm ‖ ⋅ ‖ is never differentiable at x = 0. Differentiability conditions for a norm are homogeneous, that is, ‖ ⋅ ‖ is differentiable at x if it is differentiable at λx with λ ∈ ℝ \ {0}. So, it is enough to check differentiability at points x ∈ X with ‖x‖ = 1. Proposition 5.5.10. If X is a Banach space with a Fréchet differentiable norm ‖ ⋅ ‖, then ‖ ⋅ ‖ ∈ C1 (X \ {0}). Proof. Let {x n , x}n≥1 ⊆ X \ {0} and assume that x n → x. Let φ(u) = ‖u‖ for all u ∈ X \ {0} and let x∗n = φ (x n ) as well as x∗ = φ (u). We have ‖x∗n ‖∗ = ‖x∗ ‖∗ = 1 ,

⟨x∗n , x n ⟩ = ‖x n ‖

and

⟨x∗ , x⟩ = ‖x‖ ;

see Example 5.3.15(a). Let v n = x n /‖x n ‖ and v = x/‖x‖. Then ⟨x∗n , v n ⟩ = 1

and ⟨x∗ , v⟩ = 1 .

(5.5.8)

Note that ⟨x∗n , v⟩ → 1

as

n→∞.

(5.5.9)

Then, (5.5.8), (5.5.9) and Proposition 5.5.6 (see also Remark 5.5.7) imply that x∗n → x∗ in X ∗ . This proves that φ(⋅) = ‖ ⋅ ‖ ∈ C1 (X \ {0}). Theorem 5.5.11. If X is a Banach space and X ∗ has Fréchet differentiable norm, then X is reflexive.

5.5 Smoothness of the Norm |

435

Proof. According to James’ Theorem (see Theorem 3.3.41), it suffices to show that every X x∗ ∈ X ∗ attains its norm on B1 = {x ∈ X : ‖x‖ ≤ 1}. So, suppose that x∗ ∈ X ∗ with ‖x∗ ‖∗ = 1 and choose {x n }n≥1 ⊆ ∂B1X such that