INTEGRAL AND FUNCTIONAL ANALYSIS (UPDATED EDITION) 9781536196177, 1536196177

291 108 11MB

English Pages [414] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

INTEGRAL AND FUNCTIONAL ANALYSIS (UPDATED EDITION)
 9781536196177, 1536196177

Table of contents :
INTEGRAL AND FUNCTIONALANALYSIS(UPDATED EDITION)
INTEGRAL AND FUNCTIONALANALYSIS(UPDATED EDITION)
Contents
Preface
Acknowledgments
Chapter 1Preliminaries
1.1 Sets, Relations, Functions, Cardinals and Ordinals
1.2 Reals, Some Basic Theorems and Sequence Limits
Problems
Chapter 2Riemann Integrals
2.1 Definitions, Examples and Basic Properties
2.2 Algebraic Operations and the Darboux Criterion
2.3 Fundamental Theorem of Calculus
2.4 Improper Integrals
Problems
Chapter 3Riemann-Stieltjes Integrals
3.1 Functions of Bounded Variation
3.2 Definition and Basic Properties
3.3 Nonexistence and Existence for Integrals
3.4 Evaluations of Integrals
3.5 Improper Situations
Problems
Chapter 4Lebesgue-Radon-StieltjesIntegrals
4.1 Foundational Material
4.2 Essential Properties
4.3 Convergence Theorems
4.4 Extension via Measurability
4.5 Double, Iterated and Generic Integrals
Problems
Chapter 5Absolute Continuitiesin Lebesgue Integrals
5.1 Lebesgue’s Outer Measure and Vitali’s Covering
5.2 Derivatives of Increasing Functions
5.3 Absolutely Continuous Functions
5.4 Cantor’s Ternary Set and Singular Function
5.5 Lebesgue’s Points
Problems
Chapter 6Metric Spaces
6.1 Metrizable Topology and Connectedness
6.2 Completeness
6.3 Compactness, Density and Separability
Problems
Chapter 7Continuous Mappings
7.1 Criteria for Continuity
7.2 Continuous Mappings over Compactor ConnectedMetric Spaces
7.3 Sequences of Mappings
7.4 Contractions
7.5 Structures of Metric Spaces
Problems
Chapter 8Normed Linear Spaces
8.1 Linear Spaces, Norms and Quotient Spaces
8.2 Finite Dimensional Spaces
8.3 Bounded Linear Operators
8.4 Linear Functionals via Hahn-Banach Extension
Problems
Chapter 9Banach Spaces via Operatorsand Functionals
9.1 Definition and Beginning Examples
9.2 Uniform Boundedness - Open Map - Closed Graph
9.3 Dual Banach Spaces by Examples
9.4 Weak andWeak* Topologies
9.5 Compact and Dual Operators
Problems
Chapter 10Hilbert Spaces and TheirOperators
10.1 Definition, Examples and Basic Properties
10.2 Orthogonality, Orthogonal Complementand Duality
10.3 Orthonormal Sets and Bases
10.4 Five Special Bounded Operators
10.5 Compact Operators via Spectrum
Problems
Hints or Solutions
1 Preliminaries
3 Riemann-Stieltjes Integrals
4 Lebesgue-Radon-Stieltjes Integrals
5 Absolute Continuities in Lebesgue Integrals
6 Metric Spaces
7 Continuous Mappings
8 Normed Linear Spaces
9 Banach Spaces via Operators and Functionals
10 Hilbert Spaces and Their Operators
2 Riemann Integrals
References
About the Author
Index
Blank Page
Blank Page

Citation preview

MATHEMATICS RESEARCH DEVELOPMENTS

INTEGRAL AND FUNCTIONAL ANALYSIS (UPDATED EDITION)

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

MATHEMATICS RESEARCH DEVELOPMENTS Additional books and e-books in this series can be found on Nova’s website under the Series tab.

MATHEMATICS RESEARCH DEVELOPMENTS

INTEGRAL AND FUNCTIONAL ANALYSIS (UPDATED EDITION)

JIE XIAO

Copyright © 2021 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected].

NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the Publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data ISBN:  H%RRN

Published by Nova Science Publishers, Inc. † New York

Contents Preface

ix

Acknowledgments

xi

1 Preliminaries 1 1.1 Sets, Relations, Functions, Cardinals and Ordinals . . . . . . . . 1 1.2 Reals, Some Basic Theorems and Sequence Limits . . . . . . . 10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 Riemann Integrals 2.1 Definitions, Examples and Basic Properties . . 2.2 Algebraic Operations and the Darboux Criterion 2.3 Fundamental Theorem of Calculus . . . . . . . 2.4 Improper Integrals . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

21 21 28 35 43 52

3 Riemann-Stieltjes Integrals 3.1 Functions of Bounded Variation . . . . . 3.2 Definition and Basic Properties . . . . . . 3.3 Nonexistence and Existence for Integrals . 3.4 Evaluations of Integrals . . . . . . . . . . 3.5 Improper Situations . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

57 57 61 65 74 78 83

. . . . . .

. . . . . .

. . . . . .

4 Lebesgue-Radon-Stieltjes Integrals 87 4.1 Foundational Material . . . . . . . . . . . . . . . . . . . . . . . 88 4.2 Essential Properties . . . . . . . . . . . . . . . . . . . . . . . . 95

vi

Contents 4.3 Convergence Theorems . . . . . . . . 4.4 Extension via Measurability . . . . . 4.5 Double, Iterated and Generic Integrals Problems . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

100 107 113 128

5 Absolute Continuities in Lebesgue Integrals 5.1 Lebesgue’s Outer Measure and Vitali’s Covering 5.2 Derivatives of Increasing Functions . . . . . . . 5.3 Absolutely Continuous Functions . . . . . . . . . 5.4 Cantor’s Ternary Set and Singular Function . . . 5.5 Lebesgue’s Points . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

135 135 143 148 158 163 169

. . . .

173 173 181 187 195

6 Metric Spaces 6.1 Metrizable Topology and Connectedness 6.2 Completeness . . . . . . . . . . . . . . 6.3 Compactness, Density and Separability Problems . . . . . . . . . . . . . . . . . . .

. . . .

7 Continuous Mappings 7.1 Criteria for Continuity . . . . . . . 7.2 Continuous Mappings over Compact or Connected Metric Spaces . . . . 7.3 Sequences of Mappings . . . . . . . 7.4 Contractions . . . . . . . . . . . . . 7.5 Structures of Metric Spaces . . . . . Problems . . . . . . . . . . . . . . . . .

. . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

199 . . . . . . . . . . . . . . . 199 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

204 207 212 218 221

8 Normed Linear Spaces 8.1 Linear Spaces, Norms and Quotient Spaces . . 8.2 Finite Dimensional Spaces . . . . . . . . . . . 8.3 Bounded Linear Operators . . . . . . . . . . . 8.4 Linear Functionals via Hahn-Banach Extension Problems . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

225 225 233 239 243 249

vii

Contents 9 Banach Spaces via Operators and Functionals 9.1 Definition and Beginning Examples . . . . . . . . 9.2 Uniform Boundedness - Open Map - Closed Graph 9.3 Dual Banach Spaces by Examples . . . . . . . . . 9.4 Weak and Weak* Topologies . . . . . . . . . . . . 9.5 Compact and Dual Operators . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

10 Hilbert Spaces and Their Operators 10.1 Definition, Examples and Basic Properties 10.2 Orthogonality, Orthogonal Complement and Duality . . . . . . . . . . . . . . . . 10.3 Orthonormal Sets and Bases . . . . . . . 10.4 Five Special Bounded Operators . . . . . 10.5 Compact Operators via Spectrum . . . . . Problems . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

255 255 261 267 281 288 292

297 . . . . . . . . . . . . 297 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

301 304 311 321 332

Hints or Solutions

337

References

389

About the Author

393

Index

395

Preface Since the publication of the textbook, Integral and Functional Analysis (IaFA), in 2008, many helpful comments from colleagues and students have been received. Not only were typos corrected, but also some interesting topics were added for enhancements. Below is a revision of the original preface. This book is based on three closely-related courses: (a) Lebesgue Integration; (b) Integration and Metric Spaces; (c) Functional Analysis, which I have offered at Memorial University since 2002. Though the part on Functional Analysis has been used for both an undergraduate course and a graduate course, this textbook is designed primarily for senior undergraduate students. The prerequisites of this textbook are deliberately modest, and it is assumed that the students have some familiarity with Introductory Calculus and Linear Algebra plus the basic (direct, indirect) proof methods. I have striven to give an exposition which is at the same time introductory and modern in spirit, yet which addresses itself to the classical concerns of these three Analysis courses in mathematics. This approach leads to my aim in writing this textbook: I must present the basic ideas and results of this area in a natural sequence. On the other hand, enough material is covered to provide a firm base on which to build for later studies in Advanced Analysis and Partial Differential Equations. The textbook comprises two parts: (i) Integral Analysis; (ii) Functional Analysis, with results from the first part being used to partially motivate problems discussed in the second part, which can be covered from beginning to end in two semesters. The first part (Chapters 1, 2, 3, 4, 5) – Integral Analysis – a mathematical theory that defines the reasonable integrals for functions of different kinds. The integral of an individual function is a generalization of area, mass, volume, sum, and total. There are several possible definitions of integration with differ-

x

Jie Xiao

ent technical underpinnings. In this part, we start with the basic arithmetic and topological properties of the real number system. Then we discuss Riemann integration including some essential properties and the fundamental theorem of calculus, and use it as a model to handle both Riemann-Stieltjes integration via bounded variation and Lebesgue-Radon-Stieltjes integration covering three convergence theorems and Fubini-Tonelli’s theorem as well as absolutely continuous functions within Lebesgue integration. The second part (Chapters 6, 7, 8, 9, 10) – Functional Analysis – a mathematical theory that is concerned with function spaces and their functionals and operators. It has its historical roots in the investigation of transformations such as the Fourier transform and in the study of differential and integral equations. The functional goes back to the calculus of variations, implying a function whose argument is a function. In this part, motivated by the first part, we first work with more abstract metric spaces and their continuous mappings, and next introduce elements of the functional theory including Banach spaces, and Hilbert spaces whose norms arise from inner products. Of course, we also discuss major and foundational results of this theory such as the Hahn-Banach theorem extending a functional from a subspace to the full space in a normpreserving fashion, the uniform boundedness principle, the open mapping theorem and closed graph theorem, and the approximation for compact operators on the complex Hilbert spaces. In the process of learning IaFA, the students are strongly suggested to follow such an important principle that the best way to learn mathematics is to do mathematics. Moreover, the students are urged to acquire the habit of studying with paper and pen/pencil in hand (plus a desktop/laptop computer); in this way mathematics will become increasingly meaningful to them. Each chapter is followed by a series of exercises dealing with the material presented in that chapter. Although brief hints or complete solutions for these exercises are included, the students are recommended to at least do most of the exercises before checking their solutions or hints. A list of references is provided – those textbooks also supplement this textbook since they were used while preparing the courses. Last but not least, an index is attached. Jie Xiao September 2020 to January 2021

Acknowledgments I wish to thank all the colleagues whoever supported the preparation of the original textbook and any subsequent revisions. Next, I am grateful to the students of the Department of Mathematics and Statistics at Memorial University, who actively participated in the three courses and gave some helpful comments and suggestions on the textbook. Last but not least, I would like to thank Nova Science Publishers for continuing to publish this updated textbook.

Chapter 1

Preliminaries The objective of this chapter of two sections is to set up an appropriate groundwork for developing the theory of integrals and functionals. The first section establishes some conventions concerning sets, relations, functions, and cardinal and ordinal numbers. The second section is devoted to collecting the basic properties (as axioms) of the real numbers including their arithmetic, order and least upper bound, as well as limit rules.

1.1 Sets, Relations, Functions, Cardinals and Ordinals A set is a collection or group of objects, considered as an entity unto itself. Sets are usually symbolized by uppercase, italicized, boldface letters such as A, B, X, Y, or Z. Each object in a set is called a member or an element of the set. If x is an element of the set X, then we write x ∈ X, and hence we denote by x∈ / X when x is not an element of X. Sets may be expressed either by explicitly listing their elements inside of braces – for example, {a, b, c} is the set having a, b, and c as members, or by giving the property that every element of the set possesses and which is possessed by no member not in the set – for instance, {x : x is a car} is the set of all cars, or the set of all x such that x is a car. Below are the standard sets that will be used throughout this textbook:

2

Jie Xiao

N = Z = Q = R = C =

the set of natural numbers = {1, 2, 3, ..., n, ...}; the set of integers = {0, 1, −1, 2, −2, 3, −3, ..., n, −n, ...};

the set of rational numbers = {mn−1 : m and n are integers with n 6= 0}; the set of real numbers; the set of complex numbers.

Remark 1.1.1. (i) In Q we identify mn−1 and kl −1 if lm = kn.

√ (ii) C = R + iR = {z = x + iy : x, y ∈ R}, where i = −1 is the imaginary unit; x = ℜz and y = ℑz are called the real and imaginary parts of z = x + iy respectively. Definition 1.1.2. Given two sets X and Y . We say that: (i) X is a subset of Y , written X ⊆ Y , provided that x ∈ X implies x ∈ Y – clearly X 6⊆ Y denotes that X is not a subset of Y ;

(ii) X and Y are equal, denoted X = Y , provided that X ⊆ Y and Y ⊆ X – clearly, X 6= Y means that X is not equal to Y ; (iii) X is a proper subset of Y , written X ⊂ Y , provided that X ⊆ Y but X 6= Y ;

(iv) X ∪ Y , X ∩ Y , and X \ Y are the union of X and Y , i.e., the set {x : x ∈ X or x ∈ Y }, the intersection of X and Y , i.e., the set {x : x ∈ X and x ∈ Y }, and the complement of Y in X, i.e., the set {x : x ∈ X and x 6∈ Y }.

(v) 0/ is the empty set, i.e., the set which contains no elements – moreover, X and / Y are disjoint whenever X ∩Y = 0.

As the two most basic set operations, union and intersection can be used for any family of sets. More precisely, if {X j } j∈I is any family of sets indexed by some set I which may be considered simply as a set of labels for distinguishing the various members of the family of sets, then ∪ j∈I X j = {x : x ∈ X j for some j ∈ I} and ∩ j∈I X j = {x : x ∈ X j for all j ∈ I} stand for the union and intersection of {X j } j∈I respectively. Acting with the complement operation, the union and intersection satisfy the DeMorgan law for an arbitrary family of sets as follows.

Preliminaries

3

Proposition 1.1.3. Given a set X, let {X j } j∈I be a family of sets indexed by I. Then ∪ j∈I (X \ X j ) = X \ ∩ j∈I X j

and

∩ j∈I (X \ X j ) = X \ ∪ j∈I X j .

Proof. It suffices to verify the left-hand equality since the right-hand equality can be proved similarly. First, suppose that x ∈ ∪ j∈I (X \ X j ). Then x ∈ X \ X j for some j ∈ I and hence x 6∈ X j , implying x 6∈ ∩ j∈I X j . Consequently, x ∈ X \∩ j∈I X j , namely, ∪ j∈I (X \ X j ) ⊆ X \ ∩ j∈I X j . On the other hand, if x ∈ X \ ∩ j∈I X j then x ∈ X but x 6∈ ∩ j∈I X j . Accordingly, there exists some j ∈ I such that x ∈ X \ X j . Then x ∈ ∪ j∈I X \ X j . Consequently, X \ ∩ j∈I X j ⊆ ∪ j∈I (X \ X j ). According to Definition 1.1.2 (ii), we derive the desired equality. Next, we introduce the Cartesian product and relation, and equivalence relation. Definition 1.1.4. Given two sets X and Y . (i) The Cartesian product X ×Y of X and Y is defined to be the set of all ordered pairs (x, y) such that x ∈ X and y ∈ Y – here the ordered pair (x, y) is the set  whose elements are {x}, {x, y}, in symbols (x, y) = {x}, {x, y} .

(ii) A subset R of X × Y is called a relation between X and Y – in particular a subset of X × X is said to be a relation on X – in this case, (x, y) ∈ R may also be denoted by xRy or x being R-related to y. (iii) A relation R on X is an equivalence relation provided that it has the following properties for all x, y, z ∈ X: (a) xRx (reflexivity); (b) xRy implies yRx (symmetry); (c) xRy and yRz yield xRz (transitivity). Here, for simplicity we have chosen (x, y) as the notation of an ordered pair, but hope that this symbolic choice will not make any trouble whenever the same notation is used later on to stand for an open interval of reals. Of course, the equality “=” on any set X is an equivalence relation. To better understand this definition, we prove the following proposition.

4

Jie Xiao

Proposition 1.1.5. Let (x, y), (x0, y0 ) ∈ X ×Y . Then (x, y) = (x0 , y0 ) if and only if x = x0 and y = y0 . Proof. Clearly, x = x0 and y = y0 imply   (x, y) = {x}, {x, y} = {x0 }, {x0 , y0 } = (x0 , y0 ).

Conversely, suppose that (x, y) = (x0 , y0 ). To prove x = y and x0 = y0 , we consider two cases.  Case 1: x = y. Then {x} = {x, y} and hence (x0 , y0 ) = (x, y) = {x} . This means that {x0 }, {x0 , y0 } has only one element {x} and so that {x0 , y0 } = {x0 } = {x} = {y}. Accordingly, x0 = y0 . Case 2: x 6= y. Then theargument for Case 1 yields x0 6= y0 . Since (x, y) = (x0 , y0 ), we conclude {x} ∈ {x0 }, {x0 , y0 } and so either {x} = {x0 } or {x} = 0 0 0 0 {x  , 0y }. 0This in turns implies x ∈ {x}0 and then x = x0 . 0 Similarly, {x, y} ∈ 0 {x }, {x , y } yields either {x, y} = {x } or {x, y} = {x , y }. But since x 6= y, we must have {x, y} = {x0 , y0 }. Now x = x0 , x 6= y and y ∈ {x0 , y0 }, it follows that y = y0 , as desired. As a special relation, we have the notion of a function. Definition 1.1.6. Given two sets X and Y . (i) A function f between X and Y is a nonempty relation f ⊆ X ×Y such that if (x, y), (x, y0) ∈ f then y = y0 . The following sets D( f ) = {x ∈ X : there is a y ∈ Y such that (x, y) ∈ f } and R( f ) = {y ∈ Y : there is an x ∈ X such that (x, y) ∈ f } are called the domain and the range of f respectively. In the case of D( f ) = X, f is called a function from X into Y , denoted f : X → Y , with f (A) = { f (x) : x ∈ A}} and

f −1 (B) = {x ∈ X : y = f (x) ∈ B}

as the image of A ⊆ X and the pre-image of B ⊆ Y respectively.

(ii) A function f : X → Y is called onto or surjective provided f (X) = Y .

(iii) A function f : X → Y is called one-to-one or injective provided that f (x) = f (x0 ) for any x, x0 ∈ X implies x = x0 .

Preliminaries

5

(iv) A surjection is an onto function, while an injection is a one-to-one function. Furthermore, a function which is both onto and one-to-one is said to be bijective or a bijection. Proposition 1.1.7. Given two sets X and Y , let f : X → Y be a function.

(i) Suppose that A, B ⊆ X. Then f (A ∪ B) = f (A) ∪ f (B). Furthermore f (A ∩ B) = f (A) ∩ f (B) whenever f is injective.  (ii) Suppose that A ⊆ X and B ⊆ Y . Then f −1 f (A) = A respectively  f f −1 (B) = B provided f is injective respectively surjective. Proof. (i) If y ∈ f (A ∪ B), then by definition there is an x ∈ A ∪ B such that y = f (x). Now x ∈ A or x ∈ B, consequently f (x) ∈ f (A) or f (x) ∈ f (B). In either case, y ∈ f (A) ∪ f (B). This proves f (A ∪ B) ⊆ f (A) ∪ f (B). On the other hand, if y ∈ f (A) ∪ f (B), then y ∈ f (A) or y ∈ f (B) and hence there exists x ∈ A ∪ B such that y = f (x), namely, y ∈ f (A ∪ B). Thus, f (A) ∪ f (B) ⊆ f (A ∪ B). Accordingly, f (A ∪ B) = f (A) ∪ f (B). Clearly, f (A ∩ B) ⊆ f (A) ∩ f (B). Now if f is injective, then for any y ∈ f (A) ∩ f (B) there exist x ∈ A and x0 ∈ B such that y = f (x) = f (x0 ), and hence x = x0 ∈ A ∩ B. Accordingly, y ∈ f (A ∩ B), i.e., f (A) ∩ f (B) ⊆ f (A ∩ B), whence giving f (A ∩ B) = f (A) ∩ f (B). (ii) This follows immediately from Definition 1.1.6 (i).

Remark 1.1.8. We make two comments on Proposition 1.1.7 (ii). (i) The symbol f −1 (B) is not to be thought of as an inverse function applied to points in B. But, if f −1 (y) = {x ∈ X : f (x) = y} (for each y ∈ Y ) contains precisely one member in X, then f −1 defines a function from Y into X. In fact, this amounts to f : X → Y being bijective. To see this, if f is a bijection, then for any y ∈ Y there exists one (surjective) and only one (injective) element x ∈ X such that y = f (x), and hence f −1 (y) = x is uniquely defined for each y ∈ Y . Consequently, f −1 is a function from Y into X. Conversely, if f −1 : Y → X is a function, then for each y ∈ Y , f −1 (y) = x is a unique member of X, that is to say, f is injective. Moreover, f −1 (y) = x produces f (x) = y and so that f is surjective. Therefore, f is bijective.   (ii) The symbols f −1 f (A) and f f −1 (B) induce the notion of composition of functions. For functions f : X → Y and g : Y → Z, we say that g ◦ f , the composition of g with f , is a function from X into Z determined by (g ◦ f )(x) =  g f (x) for each x ∈ X. In order to make this meaningful, we naturally require

6

Jie Xiao

f (X) ⊆ Y . It is evident to see that if f : X → Y is bijective then f −1 : Y → X is bijective and f −1 ◦ f = iX and f ◦ f −1 = iY where iX : X → X and iY : Y → Y stand for the identity functions: iX (x) = x for x ∈ X and iY (y) = y

for y ∈ Y.

We now introduce the cardinality of a set, which is a property describing the size of the set by describing it using a cardinal number. This is actually equivalent to comparing two sets, and so leads to the following concept. Definition 1.1.9. Two sets X and Y are called equivalent, or said to have the same cardinality, denoted X ∼R Y , provided that there exists a bijection between X and Y . In particular, any set that has the same cardinality ℵ0 as the set of all natural numbers N is said to be an infinite countable set, and if a set is 0/ or there is a bijection between this set and {1, 2, ..., n} for some n ∈ N of then it is said to be finite; otherwise, the set is said to be uncountable. Example 1.1.10. (i) If X = {a, b, c} and Y = {tables, chairs, chalkboards}, then they both have three elements. (ii) The set 2N = {2, 4, 6, ..., 2n, ...} has the same cardinality as N since the function f (n) = 2n, is a bijection between N and 2N. (iii) Z has the cardinality ℵ0 . In fact, if Z− = {−1, −2, −3, ..., −n, ...} and

 , x∈N  2x f (x) = 1 , x=0  (−1)(2x) + 1 , x ∈ Z−

,

then Z = N ∪ {0} ∪ Z− and f is a bijection between N and Z. Proposition 1.1.11. (i) Any subset of a finite set is finite. (ii) Any subset of any countable set is countable. (iii) The union of any countable collection of countable sets is countable. (iv) The Cartesian product of any two countable sets is countable.

Preliminaries

7

Proof. (i) Suppose X is finite and A ⊆ X. Then either X is 0/ or there exists a / then it is finite. If bijection f : X → {1, 2, 3, ...,n} for some n ∈ N. If A = 0, / then the assumption A ⊆ X yields A 6= 0, f (A) ⊆ {1, 2, 3, ..., n}, and hence we may assume f (A) = { j1 , j2 , j3 , ..., jm}. Consequently, upon defining g : A → {1, 2, 3, ...,m} by g(x) = k where f (x) = jk for x ∈ A, we see that A is finite. / (ii) Assume that X is countable and S ⊆ X. Then X = {x j }∞j=1 . If S = 0, / then it contains at least one element, say s. For then it is countable. If S 6= 0, each j ∈ N, set y j = x j or y j = s when x j ∈ S or x j 6∈ S. Then this establishes a bijection between S and {y j }∞j=1 . Thus S is countable. (iii) Suppose that {A j }∞j=1 is a countable collection of countable sets A j = j k ∞ {a jk }∞ k=1. Define f (a jk ) = 2 3 . Then f is a bijection between ∪ j=1 A j and a subset of N. Thus ∪∞j=1 A j is countable due to (ii). (iv) Assume that X = {x j }∞j=1 and Y = {y j }∞j=1 . Then each member of X ×Y has the form (x j , yk ) for j, k ∈ N. Define f : X ×Y → N by f (x j , yk ) = 2 j 3k . Then f is one-to-one. Thus X ×Y is countable. Example 1.1.12. (i) Q is countable. In fact, via associating mn−1 , n 6= 0, m and n relatively prime, we see that Q has the same cardinality as a subset of Z × Z which is a countable set by (iv). (ii) R is uncountable. To verify this, let X be the set of unending decimals between 0 and 1 and that contain only 0 and 1 as digits. If f is an arbitrary function from N into X, then f cannot be onto, and hence X is uncountable. In fact, we form a member d = 0.x1 x2 x3 x4 · · · of X as follows: If the first digit of f (1) is 0 or 1, then x1 is taken to be 1 or 0, and consequently, d differs from f (1) in the first digit. We choose x2 to be either 0 or 1, but different from the second digit of f (2). Via continuing this process, we find that d is not f (n) for any n since d differs from f (n) for any n at least in the nth digit. Consequently,

8

Jie Xiao

d 6∈ f (N), and so f is not surjective. Note that X is a subset of R. So, if R were countable, then X would also be countable by Proposition 1.1.11 (ii). This contradiction proves that R is uncountable. The preceding discussion on the cardinals of sets reveals X = {1, 2, 3, ...} ∼R Y = {..., 3, 2,1}. However, if X is ordered by the usual relation ≤ and Y ordered by ≥ then X has a first but not a last element, and Y has a last but not a first, and hence this gives a way to distinguish both sets. So, we are finally led to a brief consideration of the ordinals of sets. Definition 1.1.13. Let  be a relation on the given set X.

(i)  is said to be a partial order on X provided the following three properties hold for x, y, z ∈ X: (a) x  x (reflexivity); (b) x  y and y  x imply x = y (antisymmetry); (c) x  y and y  z yield x  z (transitivity). In this case, (X, ) stands for a partially ordered set in which not all pairs of elements are mutually comparable. (ii)  is said to be a total/linear order provided the following three properties hold for x, y, z ∈ X: (a) x  y or y  x (totalness); (b) x  y and y  x imply x = y (antisymmetry); (c) x  y and y  z yield x  z (transitivity). In this case, (X, ) stands for a totally/linearly ordered set, or a chain. The totalness property can be stated thus that any pair of elements in the chain are mutually comparable. Notice that the totalness condition implies reflexivity. Thus a total order is also a partial order, that is, a binary relation which is reflexive, antisymmetric and transitive. It follows that a total order can also be defined as a partial order that is total. Example 1.1.14. (i) Given a set X, let P (X) be the collection of all subsets of X. Then P (X), ⊆ is a partially ordered set, but usually it is not totally ordered.



Preliminaries

9

(ii) (Z, ≤) is totally ordered. Definition 1.1.15. Let (X, ) be a partially ordered set.

(i) An element u of X is an upper bound of a subset A of X, provided that a  u for all elements a ∈ A – in this case, A is called bounded from above. Using u  a instead of a  u leads to the definition of a lower bound of A, i.e., A being bounded from below. (ii) An element m of a subset A of X is a maximal element of A if m  a for some a ∈ A implies m = a. The definition for a minimal element is obtained by using a  m instead of m  a. Clearly, a subset of a partially ordered set may fail to have any upper bounds. Consider for example the subsets of N which are greater than a given natural number. On the other hand, a set may have many several upper and lower bounds, and hence we are usually interested in picking out specific elements from the sets of upper or lower bounds. This leads to the consideration of least upper bounds (or suprema) and greatest lower bounds (or infima). What is important to note about maximal elements is that they are in general not the greatest elements of a subset S. Indeed, consider P (N) – the set of all subsets of the natural numbers ordered by subset inclusion. The subset S of all one-element sets of N consists only of maximal elements, but has no greatest element. This example also shows that maximal elements are usually not unique and that it is well possible for an element to be both maximal and minimal at the same time. If a subset has a greatest element, then this is the unique maximal element. But, on the other hand, even if a set has only one maximal element, it is not necessarily the greatest one. Take the set of natural numbers in their usual order, which obviously has no maximal elements, and add a single new element a which can only be compared to itself, in other words, it is neither smaller nor greater than any natural number. Then the whole set has a as a single maximal element that is not the greatest element. Similar conclusions are valid for minimal elements. Nevertheless, in a totally ordered set, the terms maximal (minimal) element and greatest (least) element are the same, which is why both terms are used interchangeably in fields like analysis where only total orders are considered. With the previous concepts, we may state the celebrated Zorn’s lemma, also known as the Kazimierz Kuratowski - Max Zorn lemma, as follows.

10

Jie Xiao

Zorn’s Lemma. Every partially ordered set in which every totally ordered subset has an upper bound contains at least one maximal element. We offer no proof for this lemma because it is equivalent to the following axiom of choice in the sense that either one is sufficient to prove the other: The Axiom of Choice. Given an arbitrary collection {A j } j∈I = {A j : j ∈ I} of nonempty sets indexed by the nonempty set I, there is a function f : I → ∪ j∈I A j , called a choice or selection function, such that f ( j) ∈ A j when j ∈ I. Here, it is worth pointing out that Zorn’s lemma, as the most useful of all equivalents of the axiom of choice, occurs in the proofs of several theorems of crucial importance – for example, the Hans Hahn - Stefan Banach theorem in Functional Analysis, the theorem that every vector space has a basis, the Andrey Nikolayevich Tychonoff theorem in topology saying that every product of compact spaces is compact, and those theorems in abstract algebra that each ring has a maximal ideal and that each field has an algebraic closure.

1.2 Reals, Some Basic Theorems and Sequence Limits In this section we will list the basic properties of the reals rather than to construct the reals, and then use them to derive some foundational theorems. Furthermore, we consider sequences of reals and their most important limit theorems including the Augustin Louis Cauchy criterion for the convergence of a sequence. The properties used to define R (along with an ordered pair of functions from R × R into R) are usually regarded as axioms and classified into three different groupings. Field Axioms. There are two functions + and · on R such that R becomes a field: F1. x + y = y + x and x · y = y · x for every x, y ∈ R (commutativity); F2. (x + y) + z = x + (y + z) and (x · y) · z = x · (y · z) for every x, y, z ∈ R (associativity); F3. x · (y + z) = x · y + x · z for every x, y, z ∈ R (distributivity); F4. There are distinct elements 0 and 1 in R such that x + 0 = x and x · 1 = x for every x ∈ R (existence of neutrals); F5. For every x ∈ R there is an element of R, denoted −x, such that x+(−x) = 0, and for every nonzero x ∈ R there is an element of R, denoted x−1 = 1/x, such that x · x−1 = 1 (existence of inverses).

Preliminaries

11

Here and elsewhere we will employ the common notational conventions when no confusion is possible. For example, we often write xy and x − y for x · y and x + (−y) respectively. Most of the rules of elementary algebra can be justified by the above five field axioms. For instance, for any a, b ∈ R the equation x + a = b has one and only one solution x = b − a. The next is the group of order axioms. Order Axioms. There is an order relation < defined on R satisfying: O1. If x, y ∈ R then one and only one of: x < y, x = y, y < x holds (trichotomy); O2. If x < y and y < z for x, y, z ∈ R then x < z (transitivity); O3. If x < y and x, y, z ∈ R then x + z < y + z; O4. If x < y, x, y, z ∈ R and 0 < z then xz < yz.

We also write y > x for x < y and define x ≤ y or y ≥ x for either x < y or x = y. Consequently, if x > 0 or x < 0 then we say that x is positive or negative, and according to Definition 1.1.13 (ii), (R, ≤) is a totally ordered set. Naturally, the inequality 0 < 1 follows from the above four order axioms. Furthermore, let   x , x>0 |x| = 0 , x=0 .  −x , x < 0

Then the field and order axioms can be used to derive that the triangle inequality and its reverse counterpart |x + y| ≤ |x| + |y| and |x| − |y| ≤ |x − y| hold for every x, y ∈ R. The final axiom for R is the following axiom which gives a further condition on the ordering of R.

Axiom of Least Upper Bound (LUB). If a subset S of R is nonempty and bounded from above, then it has a least upper bound, denoted supS. Obviously, supS is unique when it exists – thus it can be called the least upper bound. Using this axiom, we can prove that any nonempty subset S of R having a lower bound in R must have a greatest lower bound, denoted infS. In fact, a set S ⊆ R is bounded from below when and only when the set T = {x ∈ R : −x ∈ S} is bounded from above, and if S 6= 0/ is bounded from below then − sup T is the greatest lower bound of S. To better understand the previously-introduced axioms, we proceed to show that the foregoing LUB-axiom is not valid for the rationals, but the reals are not

12

Jie Xiao

very far from the rationals in the sense that any real number may be approximated as closely as we wish. More precisely, we have the following result. Theorem 1.2.1. (i) If S = {r ∈ Q : r > 0 and r2 < 2}, then S has an upper bound in Q but does not have a least upper bound in Q. (ii) Q is dense in R in the sense that for x, ε ∈ R with ε > 0 there exists an r ∈ Q such that |x − r| < ε. Proof. (i) Clearly, 2 is an upper bound of S. Suppose that S has a rational supremum s. Then by definition, for any rational ε: 0 < ε < s, there is a rational r ∈ S such that 0 < s − ε < r. Consequently, (s − r)2 < r2 < 2. Note that s + ε is a rational greater than s. So s + ε 6∈ S. This yields (s + ε)2 ≥ 2 and then (s − ε)2 < 2 ≤ (s + ε)2 . Now (s − ε)2 < s2 ≤ (s + ε)2 . Thus |s2 − 2| < (s + ε)2 − (s − ε)2 = 4sε. This is valid for 0 < ε < s, and therefore certainly for those rationals ε ≥ s. Accordingly, s2 = 2. Since s is rational, it can be written as m/n where n 6= 0, and m, n ∈ Z have no common factor. From this it follows that m2 = 2n2 and so that m2 is even 2k. Consequently, n2 = 2k2 which implies that n is even. We have seen that m and n have 2 as a common factor, contradicting that m and n have no common factor. This argument actually shows that S has no rational supremum, as desired. (ii) Given x, ε ∈ R and ε > 0, in order to get a rational r obeying |x − r| < ε, we first verify three intermediate results as follows. Step 1. There is an n ∈ Z such that n > x. To see this, assume n ≤ x for all n ∈ Z. Then Z has an upper bound x and hence it has a least upper bound in R, say z, by the axiom of LUB. But n ∈ Z implies n + 1 ∈ Z. So n + 1 ≤ z, i.e., n ≤ z − 1, thus showing that z − 1 is also an upper bound of Z, contradicting that z is the least. Step 2. There is an n ∈ Z such that n ≤ x < n + 1. By Step 1, we choose a natural number n0 > |x|, so that −n0 < x < n0 . Now we take n to be the greatest element of the finite set {−n0 , −n0 + 1, ..., −1, 0, 1,..., n0 − 1, n0 } which is less than or equal to x.

Preliminaries

13

Step 3. There is an n ∈ N such that n−1 < ε. By Step 1, we take an integer n > ε−1 > 0, hence proving n−1 < ε thanks to O4 above. Now, for the above-given x, ε, we use Step 3 to find a natural number n0 such that n−1 0 < ε, then apply Step 2 to n0 x to get an n ∈ Z such that n ≤ n0 x < n + 1, −1 and hence derive 0 ≤ x − nn−1 0 < n0 < ε, as desired. As important consequences of the LUB-axiom of R, we will prove three theorems – the nested interval theorem; the Bernard Bolzano - Karl Theodor ´ ´ Wilhelm Weierstrass theorem; the Eduard Heine - F´elix Edouard Justin Emile Borel theorem – which will serve as useful tools in the subsequent chapters. First of all, we make the following convention - for a, b ∈ R with a < b let (a, b) = {x ∈ R : a < x < b} and [a, b] = {x ∈ R : a ≤ x ≤ b}, which are called open and closed intervals with endpoints a and b respectively, and let (a, b] = {x ∈ R : a < x ≤ b} and [a, b) = {x ∈ R : a ≤ x < b} which are called open-closed or closed-open intervals with endpoints a and b respectively. In all cases, the length of an interval is equal to b − a. We next introduce three concepts required respectively by the abovementioned theorems. Definition 1.2.2. (i) {In }∞ n=1 is called a sequence of intervals provided that it is a function that assigns an interval In to each n ∈ N. If In+1 ⊆ In (respectively In ⊆ In+1 ) for all n ∈ N, then {In }∞ n=1 is called a nested downward (respectively upward) sequence of intervals. (ii) Given S ⊆ R and x ∈ R, x is called a cluster point of S provided that  (x − ε, x + ε) ∩ S \ {x} 6= 0/ for all ε > 0.

(iii) A set S ⊆ R is said to be a closed set provided every point in S is a cluster point of S. (iv) A set O ⊆ R is said to be an open set provided that its complement Oc = R \ O is closed - equivalently - any point in O is an interior point - namely - for any x ∈ O there exists an r > 0 such that (x − r, x + r) ⊆ O.

14

Jie Xiao

The structure of open subsets of R is described in the following Theorem 1.2.3(v) - Georg Cantor’s characterization. Theorem 1.2.3. (i) The union of any two closed subsets of R is closed. (ii) The intersection of any family of closed subsets of R is closed. (iii) The intersection of any two open subsets of R is open. (iv) The union of any family of open subsets of R is open. (v) Every open subset of R is the union of a countable family of pairwise disjoint open intervals. Proof. It is enough to verify (v). Given an open set O ⊆ R. If x ∈ O, then there exist a point y < x enjoying (y, x) ⊆ O, and hence a = inf{y : (y, x) ⊆ O} is well-defined. Correspondingly, so is b = sup{z : (x, z) ⊆ O}. Clearly, −∞ ≤ a < b ≤ ∞. Upon writing I(x) = (a, b) ⊆ O, we see that I(x) is an open interval containing x and a, b ∈ / O. Moreover, if x1 6= x2 in O, then I(x1 ) = I(x2 ) or / Note that each x ∈ O belongs to I(x) which contains a rational I(x1 ) ∩ I(x2 ) = 0. number. So a consideration of the family of open intervals {I(x)}x∈O yields that O must be the union of a countable family of pairwise disjoint open intervals {I(r j )}n≤∞ j=1 . Theorem 1.2.4. (i) The nested interval property: if {In }∞ n=1 is a nested downward sequence of closed intervals in R, then ∩∞ I is a nonempty set. n=1 n

(ii) The Bolzano-Weierstrass property: if a subset S of R is infinite and bounded from below and above, then S has at least one cluster point in R.

(iii)The Heine-Borel property: if F is a family of open intervals covering a closed interval I in R, then F contains a finite subcovering; that is, there are finitely many open intervals from F such that their union covers I. Proof. (i) Let In = [an , bn ] be nested. Then In ⊆ I1 and hence an ≤ b1 for all n ∈ N. This just says that {an }∞ n=1 is bounded from above. By the axiom of ∞ LUB, the supremum x = sup{an }∞ n=1 of {an }n=1 exists in R. Of course, an ≤ x ∞ for all n ∈ N. Note that {I j } j=1 is decreasing. So it follows that a1 ≤ a2 ≤ a3 ≤ · · · ≤ an ≤ · · ·

and b1 ≥ b2 ≥ b3 ≥ · · · ≥ bn ≥ · · ·

Preliminaries

15

and consequently am ≤ an < bn when m ≤ n as well as am < bm ≤ bn when m ≥ n. In both case we always have am ≤ bn , thus finding that bn is an upper ∞ bound of {an }∞ n=1 and so x ≤ bn . Therefore, x ∈ ∩n=1 In . (ii) By hypothesis, S is a subset of some closed interval I1 = [a1 , b1 ]. Upon subdividing I1 into two equal intervals: [a1 , 2−1 (a1 + b1 )]; [2−1 (a1 + b1 ), b1 ], and noticing that S is infinite, we find that one of the last two intervals, denoted I2 , must contain an infinite number of points in S. Continuing this argument produces a nested sequence of closed intervals In containing an infinite number of points of S. According to (i), there is a point x ∈ ∩∞ n=1 In . This point is a limit point of S. In fact, for any ε > 0, there is an n0 ∈ N such that bn −an = 21−n (b1 − a1 ) < ε as n > n0 . Since In contains x, it must be the case that In ⊆ (x − ε, x + ε) when n > n0 . In particular, / In0 +1 ⊆ (x − ε, x + ε) and so (S \ {x}) ∩ (x − ε, x + ε) 6= 0. In other words, x is a cluster point of S. (iii) Suppose the conclusion were false, that is, F has no finite subcovering for I. As in (ii), let I1 = I = [a1 , b1 ]. By bisecting I1 we obtain two intervals [a1 , 2−1 (a1 + b1 )] and [2−1 (a1 + b1 ), b1 ]. Since I1 can not be covered by finitely many intervals in F , at least one of the last two subintervals, labelled I2 = [a2 , b2 ], can not be covered by finitely many intervals in F . Repeating this process we find a nested sequence of closed intervals In = [an , bn ] which fail to be covered by any finite number of intervals in F . An application of (i) yields a point x ∈ ∩∞n=1 In. Furthermore, x ∈ I1 produces an open interval (a, b) in F such that a < x < b. Now, if δ is the smaller of x − a and b − x, then there is an n0 ∈ N such that n > n0 implies bn − an < δ. Since x ∈ In , it follows that In ⊆ (a, b) as n > n0 . But, this contradicts the choice of In . Therefore, F contains a finite subcover of I. Remark 1.2.5. The following simple comments can help us get a better understanding of Theorem 1.2.4: (i) The requirement “closed” in the nested interval property can not be replaced by “open” or “open-closed” or “closed-open”. For example, if In = (0, n−1 ) or / (0, n−1] or [−n−1 , 0) then ∩∞ n=1 In = 0.

16

Jie Xiao

(ii) Any finite set has no cluster point. Of course, an infinite set may fail to have cluster points – for example, N. Nevertheless, according to the BolzanoWeierstrass theorem, any bounded (from above and below) infinite subset of R must have at least one cluster point. −1 −1 (iii) Although (0, 1) ⊆ ∪∞ n=3(n , 1 − n ), the open interval (0, 1) cannot be covered by any finitely many intervals from {(n−1, 1 − n−1)}∞ n=3 . However, this does not vilolate the Heine-Borel property which is valid for all finite closed intervals.

We close this section by discussing sequences of reals and their limits. Definition 1.2.6. ∞ (i) A sequence {sn }∞ n=1 in R is a function from N into R. We use {sn }n=1 = {sn : n ∈ N} as the range of the sequence.

(ii) A sequence {sn }∞ n=1 is called bounded provided that its range is bounded, i.e., there exists a b > 0 such that sup{|sn|}∞ n=1 ≤ b. ∞ (iii) For s ∈ R and a sequence {sn }∞ n=1, we say that {sn }n=1 converges (or is convergent) to s, denoted either limn→∞ sn = s or sn → s, if and only if for any ε > 0 there exists an n0 ∈ N such that n > n0 implies |sn − s| < ε. A sequence that is not convergent is called divergent.

It is easy to see that a convergent sequence has only one limit, but also is bounded. Moreover, if limn→∞ sn = s 6= 0,then for ε = 2−1 |s| there is an n0 ∈ N such that |sn − s| < ε as n > n0 , and hence |sn | ≥ |s| − ε = 2−1 |s|. Using these facts and the triangle inequality, we can establish the following limit theorem. ∞ Theorem 1.2.7. For s,t ∈ R and sequences {sn }∞ n=1 and {tn}n=1 in R, let limn→∞ sn = s and limn→∞ tn = t. Then:

(i) limn→∞(sn + tn ) = s + t; (ii) limn→∞(sntn ) = st; (iii) limn→∞ (csn ) = cs for any c ∈ R;

(iv) limn→∞ (sntn−1 ) = st −1 provided that t 6= 0 and tn 6= 0 for each n ∈ N; (v) s ≤ t provided that sn ≤ tn for each n ∈ N.

Proof. We leave the demonstration as an exercise. Definition 1.2.8.

Preliminaries

17

(i) A function f from N into N is called increasing provided f (m) ≤ f (n) for m, n ∈ N with m < n. Moreover, a sequence {xn }∞ n=1 in R is called a subsequence ∞ of a given sequence {sn }n=1 in R when there is an increasing function f : N → N such that xn = s f (n) for each n ∈ N. (ii) A sequence {sn }∞ n=1 in R is called increasing (respectively, strictly increasing) if sn ≤ sn+1 (respectively, sn < sn+1 ) for any n ∈ N. Meanwhile, a sequence {sn }∞ n=1 in R is called decreasing (respectively, strictly decreasing) if sn ≥ sn+1 (respectively, sn > sn+1 ) for each n ∈ N. Furthermore, a sequence is called monotonic provided it is either increasing or decreasing. The following result, in which (i) characterizes the convergence of a sequence by its subsequences and (ii) is the monotone convergence theorem, is natural, important and useful. Theorem 1.2.9. Let {sn }∞ n=1 be a sequence in R. Then:

(i) {sn }∞ n=1 is convergent to s ∈ R when and only when each subsequence of ∞ {sn }n=1 is convergent to s ∈ R; (ii) {sn }∞ n=1 is convergent whenever it is monotonic and bounded.

∞ Proof. (i) Suppose that sn → s and {snk }∞ k=1 is a subsequence of {sn }n=1 . Then for ε > 0 there is an n0 ∈ N such that n > n0 implies |sn − s| < ε, and consequently, if k > n0 then nk ≥ k > n0 and hence |snk − s| < ε, that is, snk → s. Conversely, suppose that every subsequence of {sn }∞ n=1 is convergent to s. If {sn }∞ is not convergent to s ∈ R, then by negation of the limit definition, n=1 there is ε0 > 0 such that for any k ∈ N one can always find snk from {sn }∞ n=1 such that |snk − s| ≥ ε0 . This actually means that {snk }∞ k=1 does not converge to s, a contradiction. (ii) It suffices to consider the case that {sn }∞ n=1 is increasing and bounded since the other case can be handled similarly. To this end, we see from the axiom of LUB that a = sup{sn }∞ n=1 is a finite element of R. Also, by definition, for every ε > 0 there is an n0 ∈ N such that a − ε < sn0 . Hence a − ε < sn0 ≤ sn ≤ a as n > n0 . Consequently, sn → a.

Without using subsequences we can also characterize the convergence of a sequence. To see this, we introduce the notion of Cauchy’s sequence. Definition 1.2.10. A sequence {sn }∞ n=1 in R is called a Cauchy sequence provided that for every ε > 0 there exists an n0 ∈ N such that |sm − sn | < ε for m, n > n0 .

18

Jie Xiao The next property is the so-called completeness of R.

Theorem 1.2.11. A sequence in R is convergent if and only if it is a Cauchy sequence. Proof. Suppose that the sequence {sn }∞ n=1 in R is convergent to s. Then for any ε > 0 there is n0 ∈ N such that n > n0 implies |sn − s| < 2−1 ε and consequently, if m, n > n0 then by the triangle inequality, |sm − sn | ≤ |sm − s| + |sn − s| < ε, namely, {sn }∞ n=1 is a Cauchy sequence. Conversely, if {sn }∞ n=1 is a Cauchy sequence, then there is an n0 ∈ N such that |sn+1 − sn0 +1 | < 1 and thus |sn+1| < 1 + |sn0+1 | whenever n > n0 . Accordingly, {sn }∞ n=1 is bounded. Let us consider two cases as follows.

Case 1: {sn }∞ n=1 is finite. This case yields that some member, say s in this sequence, must be repeated infinitely. Thus, there exists a subsequence {snk }∞ k=1 such that |snk − s| = 0 for all k ∈ N. Case 2: {sn }∞ n=1 is infinite. Then using the Bolzano-Weierstrass property, we can also get a subsequence {snk }∞ k=1 that is convergent to some s ∈ R. Consequently, for any ε > 0 there are k1 , k2 ∈ N such that |snk − s| < 2−1 ε as k > k1 and |sm − sn | < 2−1 ε for m, n > k2 . Finally, if n > k3 = max{k1 , k2 } then by taking k ≥ k3 we derive |sn − s| ≤ |sn − snk | + |snk − s| < 2−1 ε + 2−1 ε = ε, thereby proving sn → s.

Remark 1.2.12. (i) The argument for Theorem 1.2.11 yields the Bolzano-Weierstrass property for sequences - any bounded sequence in R has a convergent subsequence. (ii) Also, Theorem 1.2.11 produces another way to look at the difference between R and√Q. To be more specific, let {sn }∞ n=1 be a sequence of rationals with sn → 2 (by the √ density of Q in R). Then this sequence is a Cauchy sequence in Q but its limit 2 is in R \ Q. In other words, Q is not complete, but R is.

Preliminaries

19

Problems 1.1. The symmetric difference of two given sets X and Y is defined as (X ∪Y ) \ (X ∩Y ). Prove (X ∪Y ) \ (X ∩Y ) = (X \Y ) ∪ (Y \ X). 1.2. For j ∈ N let X j = [0, j −1]. Find ∪∞j=1 X j and ∩∞j=1 X j . 1.3. Given a set X, let {X j } j∈I be a family of sets indexed by I. Prove X ∩ (∪ j∈I X j ) = ∪ j∈I (X ∩ X j ) and X ∪ (∩ j∈I X j ) = ∩ j∈I (X ∪ X j ). 1.4. Prove that if X, Y and Z are sets then X × (Y ∪ Z) = (X ×Y ) ∪ (X × Z). 1.5. Let (a, b) be an open interval of R. Find a bijection from (0, 1) onto (a, b) and then onto (0, ∞). 1.6. Prove that if f : X → Y and g : Y→ Z are two functions and C is a subset of Z then (g ◦ f )−1 (C) = f −1 g−1 (C) . 1.7. Prove that if X is an infinite set and x ∈ X then X ∼R X \ {x}.

1.8. Suppose that f is a one-to-one function from X into Y with Y countable. Show that X is countable. 1.9. Prove that if X is the set of all functions f : [0, 1] → R then there is no function from [0, 1] onto X. 1.10. Use the field axioms to show the following properties. (i) If a, b ∈ R with a 6= 0 then the equation ax = b has one and only one solution. (ii) If a ∈ R is nonzero then (a−1)−1 = a.

(iii) If a, b ∈ R are nonzero then (ab)−1 = a−1 b−1 . 1.11. Use the mathematical induction to prove that |a1 + a2 + · · · + an | ≤ |a1 | + |a2 | + · · · + |an | is valid for a1 , a2 , ..., an ∈ R. 1.12. Let a, b ≥ 0. Prove that a ≤ b if and only if a3 ≤ b3 .

20

Jie Xiao

1.13. For two nonempty subsets X and Y of R let X +Y = {x + y : x ∈ X and y ∈ Y }. Prove sup(X +Y ) = supX + supY

and

inf(X +Y ) = infX + infY.

1.14. In the notation of Theorem 1.2.4 and its proof, verify the following three facts. (i) a = limn→∞ an and b = limn→∞ bn are in ∩∞ n=1 In . (ii) [a, b] = ∩∞ n=1 In .

(iii) ∩∞ n=1 In is a single point provided that limn→∞ (bn − an ) = 0. 1.15. Prove Theorem 1.2.7. 1.16. A sequence {sn }∞ n=1 is said to diverge to ∞ respectively −∞, denoted limn→∞ sn = ∞ respectively limn→∞ sn = −∞, provided that for any a ∈ R there is an n0 ∈ N such that n > n0 implies sn > a respectively sn < a. Prove that if {sn }∞ n=1 is a sequence of positive numbers in R then limn→∞ sn = ∞ if and only if limn→∞ s−1 n = 0. 1.17. For n ∈ N let sn = 1 + 2−1 + · · · + n−1 . Prove that: (i) {sn }∞ n=1 is increasing but divergent;

(ii) {sn }∞ n=1 is not a Cauchy sequence although limn→∞ |xn+1 − xn | = 0. ∞ 1.18. Let {sn }∞ n=1 be a bounded sequence in R. A subsequential limit of {sn }n=1 is a real number that is the limit of some subsequence of {sn }∞ n=1 . If S is the set of all subsequential limits of {sn }∞ , then we define the limit superior/upper n=1 ∞ limit and the limit inferior/lower limit of {sn }n=1 to be

lim supsn = supS and liminf sn = infS. Prove the following two facts. (i) lim supsn = lim sup{sn : n > m} and lim inf sn = lim inf{sn : n > m}. m→∞

m→∞

(ii) {sn }∞ n=1 is convergent if and only if limsup sn = lim inf sn .

Chapter 2

Riemann Integrals The Riemann integral – the first rigorous definition of the integral of a function on an interval, created by Bernhard Riemann, is the integral normally encountered in Real Analysis and used by physicists and engineers. In this chapter, we discuss the definition and basic properties of the Riemann integral for realvalued functions of one real variable. Here we are concerned with the simplest results, up to Riemann sums, the integrability of a continuous function and the fundamental theorem of calculus. Moreover, we consider the improper integral as the limit of Riemann’s integral on an interval, when an endpoint, or both endpoints, of the interval approaches either a specified real number or infinity.

2.1 Definitions, Examples and Basic Properties From now on, we denote the implication “if A then B” by A ⇒ B whenever appropriate. Without a special remark, [a, b] always stands for a finite closed interval in R. Let f be a real-valued function on the interval [a, b]. A partition P of [a, b] is a set {xk }nk=0 satisfying a = x0 < x1 < · · · < xn = b, n ∈ N. The length of the subinterval [xk−1, xk ] is written as ∆xk = xk − xk−1 , and the norm kPk of the partition P is defined by kPk = max{∆xk : k = 1, 2, ..., n}. Thus P determines n subintervals of [a, b] of which the largest has length kPk, and the subinterval [xk−1 , xk ] is called the k-th subinterval associated with P. In each of the n subintervals, choose a number ξk ∈ [xk−1, xk ], and form the

22

Jie Xiao

following sum S( f , P, ξ) = S( f , P, {ξk }nk=1) =

n

∑ f (ξk )∆xk .

k=1

This is called a Riemann sum for the function f on [a, b]. With some familiarity with integral calculus, we can immediately recognize that if f is nonnegative then S( f , P, ξ) gives an approximation to the area of the region between the graph of f and the horizontal axis. Needless to say, if f is not continuous on [a, b], then it may be hard to think of the graph of f as serving as a boundary for any such region. Another interpretation which has the advantage of being free of geometrical dependence, is that S( f , P, ξ) represents a certain average value of f on [a, b]. This can be seen as follows: (b − a)−1 S( f , P, ξ) =

n

∑ f (ξk )(b − a)−1∆xk .

k=1

In the sum, f (ξk ) is multiplied by (b − a)−1 ∆xk that gives the fractional part of [a, b] from which ξk is chosen. Thus the sum is a weighted average of the n values { f (ξk )}nk=1. Multiplying this weighted average by the length b − a of [a, b], we get S( f , P, ξ). Therefore it is natural to expect that (b − a)−1S( f , P, ξ) gives an average of the values of f on [a, b]. This average is a better indicator of the behavior of f if P determines only small subintervals, so we examine the behavior of S( f , P, ξ) as kPk tends to 0. Definition 2.1.1. f is called Riemann integrable on [a,Rb] provided limkPk→0 S( f , P, ξ) exists. In this case the limit value, denoted ab f (x) dx, is said to be the Riemann integral of f on [a, b]. The set of all Riemann integrable functions on [a, b] is written as R[a, b]. Remark 2.1.2. We must acknowledge that the limit limkPk→0 S( f , P, ξ) in Definition 2.1.1 is neither a sequential limit nor a function limit (because S( f , P, ξ) is not a function of kPk). In order to fully appreciate the latter assertion, we consider the fact that for a given value of kPk, there could be many partitions of [a, b] whose largest subinterval has length kPk. For example, [0, 1] may be partitioned by P1 = {0,3−1 ,2 · 3−1 ,1}; P2 = {0,3−1,2−1 ,3 · 2−2 ,1}; P3 = {0,5−1 ,2 · 5−1 ,2−1 ,2 · 3−1 ,1},

and then kP1 k = kP3 k = kP3 k = 3−1 .

23

Riemann Integrals

Moreover, for each partition of [a, b] having the given norm kPk, there are many ways of choosing {ξk }nk=1 from the n subintervals. Thus the value of kPk does not determine S( f , P, ξ). It is therefore necessary to give a complete description of the limit concept used in Definition 2.1.1. Definition 2.1.3. That limkPk→0 S( f , P, ξ) = s ∈ R exists means that if ε > 0 then there exists a δ > 0 such that for every partition P of [a, b] with kPk < δ and any choice of the points {ξk ∈ [xk−1, xk ]}nk=1, the inequality |S( f , P, ξ) − s| < ε holds. Usually, a direct argument for the existence of limkPk→0 S( f , P, ξ) is too R complicated to handle, but Definition 2.1.3 for ab f (x) dx has certain advantages for developing the theory. In the forthcoming section we prove an equivalent formulation of the limit which is easier to use in proving the integrability of particular functions. Meanwhile, however, we deal first with several examples. Example 2.1.4. If f is a constant κ on [a, b], then f ∈ R[a, b] and κ(b − a).

Rb a

f (x) dx =

In fact, for any partition P of [a, b], we have n

S( f , P, ξ) = κ ∑ ∆xk = κ(xn − x0 ) = κ(b − a). k=1

Thus, for a given ε > 0, the inequality |S( f , P, ξ) − κ(b − a)| = 0 < ε is satisfied trivially. Hence Z b a

f (x) dx = lim S( f , P, ξ) = κ(b − a). kPk→0

Example 2.1.5. Let c be a fixed point of [a, b] and  1, x=c f (x) = , 0 , x 6= c then f ∈ R[a, b] and

Rb a

f (x) dx = 0.

As a matter of fact, for any S( f , P, ξ) corresponding to a partition P of [a, b] we have |S( f , P, ξ)| ≤ 2kPk, where 2 appears since c may be one of the partition points xk and we may in this case have both ξk and ξk+1 equal to c. So Rb f (x) dx = 0. a

24

Jie Xiao

Example 2.1.6. Let [α, β] ⊂ [a, b] and  1 , x ∈ (α, β) f (x) = , 0 , x ∈ [a, b] \ (α, β) then f ∈ R[a, b] and

Rb a

f (x) dx = β − α.

To verify this, suppose that P = {x0 , x1 , ..., xn} is a partition of [a, b] obeying kPk < δ, and S( f , P, ξ) corresponds to this P. Since f (ξk ) is 1 or 0 according as ξk ∈ (α, β) or not, we have S( f , P, ξ) =



∆xk .

ξk ∈(α,β)

Now choose l, j from {1, 2, ..., n} such that xl−1 ≤ α < xl and x j−1 ≤ β < x j . Then ξk ∈ (α, β) whenever l + 1 ≤ k ≤ j − 1, as well as, ξk ∈ / (α, β) whenever k < l or k > j. Therefore

∑ l+1≤k≤ j−1

∆xk ≤ S( f , P, ξ) ≤



∆xk .

l≤k≤ j

By the choice of l and j, we see β − α ≤ x j − xl−1 < ( j − l + 1)kPk, so that if δ is sufficiently small then l + 1 ≤ j − 1 must be true, and hence x j−1 − xl ≤ S( f , P, ξ) ≤ x j − xl−1 . Hence (x j−1 − β) − (xl − α) ≤ S( f , P, ξ) − (β − α) ≤ (x j − β) − (xl−1 − α). Note that x j−1 − β, xl − α, x j − β and xl−1 − α are of absolute values less than kPk. So |S( f , P, ξ) − (β − α)| < 2δ. This implies that

Rb a

f (x) dx = β − α.

Riemann Integrals Example 2.1.7. If f (x) =



25

1, x∈Q , 0, x∈ 6 Q

then f 6∈ R[a, b] for any interval [a, b]. To prove the result, we take ε = 2−1 (b − a) and use Theorem 1.2.1 (ii) – the density of the rational numbers in R, equivalently, the density of the irrational numbers in R. For any partition P of [a, b], no matter how small kPk may be, each subinterval associated with P must contain both rational numbers and irrational numbers, say, ξˆ k ∈ [xk−1, xk ] ∩ Q and ξ˘ k ∈ [xk−1, xk ] ∩ (R \ Q). Then

(

ˆ = ∑n f (ξˆ k )∆xk = b − a; S( f , P, ξ) k=1 ˘ = ∑n f (ξ˘ k )∆xk = 0. S( f , P, ξ) k=1

Since these two numbers are b − a units apart, there is no such number I that can be within ε of both sums. Hence, limkPk→0 S( f , P, ξ) does not exist, so f is not in R[a, b]. Because of working on an entirely new limit concept, we must begin by proving such basic results as the uniqueness of the limit. Theorem 2.1.8. If f ∈ R[a, b], then

Rb a

f (x) dx is unique and f is bounded.

Proof. Suppose that limkPk→0 S( f , P, ξ) = s, let t 6= s and take ε = 2−1 |s − t|. Since ε is half of the distance between s and t, it follows that (s − ε, s + ε) and (t − ε,t + ε) do not intersect. When P is a partition of [a, b] with kPk being sufficiently small, S( f , P, ξ) is in (s − ε, s + ε) for any choice of {ξk }nk=1 . Then S( f , P, ξ) is not in (t − ε,t + ε), so |S( f , P, ξ) − t| ≥ ε. Hence limkPk→0 S( f , P, ξ) cannot equal t. Let us next prove the boundedness of f on [a, b]. If f is not bounded on [a, b], then for any partition P of [a, b], f must be unbounded on at least one of the subintervals associated with P, say, [xm−1 , xm ]. For k 6= m, choose {ξk }nk=1 satisfying xk−1 ≤ ξk ≤ xk . Thus for any s ∈ R we have determined the number |s| + ∑k6=m | f (ξk )|∆xk . Now we use the unboundedness of f on [xm−1 , xm] to choose ξm so that | f (ξm )(xm − xm−1 )| > 1 + |s| +

∑ | f (ξk )|∆xk .

k6=m

26

Jie Xiao

Thus the m-th term dominates S( f , P, ξ), and we have |S( f , P, ξ)| > 1 + |s|. Hence limkPk→0 S( f , P, ξ) cannot exist, so f does not belong to R[a, b], a contradiction. The next theorem shows that an integral can be evaluated on two adjacent intervals and the resulting values are added. Theorem 2.1.9. If f ∈ R[a, b] and f ∈ R[b, c], then f ∈ R[a, c] and Rb Rc a f (x) dx + b f (x) dx.

Rc a

f (x) dx =

Proof. Any partition P of [a, c] determines partitions P1 and P2 on [a, b] and [b, c], respectively, since if b is not one of the partition points of P then it can be inserted between x j−1 and x j where j is the least integer such that b ≤ x j . The number ξ j in the sum S( f , P, ξ) is selected from the subinterval [x j−1 , x j ], and we may have either ξ j ∈ [x j−1 , b] or ξ j ∈ [b, x j ]. In the former case, we can write j−1

S( f , P, ξ) =

∑ f (ξk )∆xk k=1

+ f (ξ j )(b − x j−1 ) − f (b)(x j − b) + f (ξ j )(x j − b) + f (b)(x j − b) n

+



f (ξk )∆xk

k= j+1

=

 ˆ + f (ξ j ) − f (b) (x j − b) + S( f , P2, ξ), ˘ S( f , P1, ξ)

j where ξˆ = {ξk }k=1 and ξ˘ = {b} ∪ {ξk }nk= j+1. Similarly, if ξ j ∈ [b, x j ], then

 ˆ + f (ξ j ) − f (b) (b − x j−1 ) + S( f , P2 , ξ), ˘ S( f , P, ξ) = S( f , P1 , ξ)

j−1 where ξˆ = {ξk }k=1 ∪ {b} and ξ˘ = {ξk }nk= j . By the boundedness of f on [a, c], we have a constant κ > 0 such that | f (x)| < κ on [a, c] and then

ˆ − S( f , P2 , ξ)| ˘ ≤ 2κkPk, |S( f , P, ξ) − S( f , P1, ξ) due to b − x j−1 ≤ x j − x j−1 ≤ kPk and x j − b ≤ x j − x j−1 ≤ kPk. If ε > 0, then f ∈ R[a, b] ∩ R[b, c] allows us to choose a δ > 0 so that max{kP1 k, kP2 k} < δ implies Z b Z c −1 ˆ − ˘ S( f , P1 , ξ) f (x) dx < 3 ε and S( f , P2 , ξ) − f (x) dx < 3−1 ε. a

b

27

Riemann Integrals

We can also choose δ even smaller, if necessary, so that δ < ε/(6κ). Now since max{kP1 k, kP2 k} ≤ kPk, we conclude that kPk < δ implies Z c Z b  S( f , P, ξ) − f (x) dx + f (x) dx a b ˆ − S( f , P2 , ξ) ˘ ≤ S( f , P, ξ) − S( f , P1 , ξ) Z b Z c ˆ ˘ + S( f , P1 , ξ) − f (x) dx + S( f , P2 , ξ) − f (x) dx a −1

−1

b

< 2κkPk + 3 ε + 3 ε < ε.

Hence

Z c a

f (x) dx =

Z b

f (x) dx +

a

Z c

f (x) dx.

b

From Theorem 2.1.9 it follows that Z a a

f (x) dx = 0 and

Z b a

f (x) dx = −

Z a

f (x) dx

b

should be set. Moreover, motivated by the argument of Theorem 2.1.9, we find that in averaging a large set of numbers, it is possible to change a few values without altering the average very much. This is true for S( f , P, ξ) and the resulting integral value. Theorem 2.1.10. If f ∈ R[a, b] and g =Rf for all but a finite number of points in R [a, b], then g ∈ R[a, b] and ab g(x) dx = ab f (x) dx. Proof. It suffices to prove that the assertion in which g differs from f at exactly one point in [a, b], because we can produce finite changes in f by changing the value at one point and repeating the procedure finitely many times. So we may assume that g(x) = f (x) for all x ∈ [a, b] \ {c}. For any partition P, the point c can be in at most two of the subintervals in that c may be a partition point, say,

28

Jie Xiao

[xm−1 , xm ] and [xm , xm+1 ]. Then Z b S(g, P, ξ) − f (x) dx a n Z  ≤ ∑ g(ξk ) − f (ξk ) ∆xk + S( f , P, ξ) −

a

k=1

b

f (x) dx

≤ |g(c) − f (c)|(xm − xm−1 ) + |g(c) − f (c)|(xm+1 − xm ) Z b + S( f , P, ξ) − f (x) dx a Z b ≤ |g(c) − f (c)|(2kPk) + S( f , P, ξ) − f (x) dx . a

Given ε > 0, let δ ≤ ε/(4|g(c) − f (c)|). So, for any partition P of [a, b] we have Z b kPk < δ ⇒ S( f , P, ξ) − f (x) dx < 2−1 ε. a

Upon substituting this into

we see

Z |g(c) − f (c)|(2kPk) + S( f , P, ξ) −

b a

f (x) dx ,

Z b kPk < δ ⇒ S(g, P, ξ) − f (x) dx < |g(c) − f (c)|(2δ) + 2−1ε < ε. a Here it is worthwhile pointing out that the finiteness in Theorem 2.1.10 is important; see Example 2.1.7.

2.2 Algebraic Operations and the Darboux Criterion Theorem 2.2.1. If f 1 , f 2 ∈ R[a, b] and c1 , c2 ∈ R, then c1 f 1 + c2 f 2 ∈ R[a, b]. Proof. It suffices to prove f 1 + f 2 ∈ R[a, b]. Assume ε > 0 and choose such δ1 > 0 and δ2 > 0 that Z b kPk < δ1 ⇒ S( f 1 , P, ξ) − f 1 (x) dx < 2−1 ε a

29

Riemann Integrals and

Z kPk < δ2 ⇒ S( f 2 , P, ξ) −

b a

f 2 (x) dx < 2−1 ε.

Now, defining δ = min{δ1 , δ2 } guarantees that: if kPk < δ, then S( f 1 + f 2 , P, ξ) − ≤

and hence

 f 1 (x) dx + f 2 (x) dx a a Zb Z S( f 1 , P, ξ) − + S( f 2 , P, ξ) − f (x) dx 1 Z

b

Z b

a

Z b

f 1 + f 2 ∈ R[a, b] and

a

b

a



f 1 (x) + f 2 (x) dx =

Z b a

f 1 (x) dx +

f 2 (x) dx < ε. Z b a

f 2 (x) dx,

When evaluating a limit of some expression that is bounded above or below, we may use the same bound to control the limit value. Corollary 2.2.2. If f 1 , f 2 ∈ R[a, b] and f 1 (x) ≤ f 2 (x) for all x ∈ [a, b], then Rb Rb f (x) dx ≤ f (x) dx. 1 2 a a Proof. Consider the case in which f 1 is identically 0. Then f 2 ≥ 0 on [a, b], so R S( f 2 , P, ξ) ≥ 0. If s2 = ab f 2 (x) dx < 0 and ε = −s2 , then it is impossible to have |S( f 2 , P, ξ) − s2| < ε because the distance between S( f 2 , P, ξ) and s2 is at least equal to |s2|. Now for the general case: f 1 ≤ f 2 on [a, b]. Define f = f 2 − f 1 . Clearly, f ≥ 0, and, by Theorem 2.2.1, f ∈ R[a, b]. So the first case of this argument ensures Z Z Z b

0≤

b

f (x) dx =

a

a

b

f 2 (x) dx −

a

f 1 (x) dx.

In order to prove that multiplication is closed in R[a, b], we need to develop a characterization of R[a, b] (the so-called Jean Gaston Darboux’s criterion for integrability) that does not rely on predicting the value of the limit limkPk→0 S( f , P, ξ). In this sense, it is analogous to Cauchy’s criterion for sequential convergence.

30

Jie Xiao

Let f be a bounded function on [a, b] and P be a partition of [a, b]. Due to the boundedness of f on each [xk−1 , xk ], we may define mk = inf{ f (x) : x ∈ [xk−1 , xk ]} and Mk = sup{ f (x) : x ∈ [xk−1, xk ]}, and form the lower sum and upper sum for f with respect to P: n

L( f , P) =

n

∑ mk ∆xk and U( f , P) =

∑ Mk ∆xk .

k=1

k=1

Clearly, we have L( f , P) ≤ S( f , P, ξ) ≤ U( f , P) although the lower and upper sums need not be Riemann sums since mk and Mk may not be in the range of f . Let P and P? be partitions of [a, b]. If P ⊆ P? then P? is called a refinement of P. Thus, we can produce a refinement of P by inserting additional partition points between those in P. It is obvious that if P? is a refinement of P then kP? k ≤ kPk. Also, if P and P? are partitions of [a, b], then they have a common refinement; for example, P ∪ P? is a refinement of both P and P? because it contains both sets of partition points. Lemma 2.2.3. Let P and P? be partitions of [a, b]. Then L( f , P) ≤ U( f , P?). Proof. Note that if partition points are added to a partition P then the lower sums increase and the upper sums decrease. In fact, if L( f , P) = ∑nk=1 mk ∆xk and we add a point z ∈ (x j−1 , x j ) to form P? = P ∪ {z}, then n

L( f , P? ) =

∑ mk ∆xk + m† (z − x j−1 ) + m‡ (x j − z),

k6= j

where m† and m‡ are the greatest lower bounds of f on [x j−1 , z] and [z, x j ] respectively. Thus m† ≥ m j and m‡ ≥ m j , so m† (z − x j−1 ) + m‡ (x j − z) ≥ m j (z − x j−1 ) + m j (x j − z) = m j (x j − x j−1 ). Hence L( f , P? ) ≥ L( f , P). Similarly, U( f , P? ) ≤ U( f , P). Suppose now that P and P? are any two partitions of [a, b] and P? = P ∪ P? . Since P? is a refinement of both P and P? , the preceding observation gives L( f , P) ≤ L( f , P? ) ≤ U( f , P?) ≤ U( f , P?), as desired.

31

Riemann Integrals In the light of the foregoing lemma, we may define λ( f ) = sup L( f , P) and Λ( f ) = inf U( f , P), P

P

where both the supremum and the infimum are taken over all partitions P of [a, b]. From Lemma 2.2.3 it follows that λ( f ) ≤ Λ( f ). Moreover, the forthcoming lemma gives the necessary connection between the limit concept as kPk tends to 0 and the supremum and infimum concepts. Lemma 2.2.4. If f is a bounded function on [a, b], then lim L( f , P) = λ( f ) and

kPk→0

lim U( f , P) = Λ( f ).

kPk→0

Proof. It suffices to prove that limkPk→0 U( f , P) = Λ( f ). Suppose that | f (x)| < κ is valid for all x ∈ [a, b] where κ > 0 is a constant. By the definition of Λ( f ), j for any ε > 0 there exists a partition P? = {zl }l=0 of [a, b] such that U( f , P?) < Λ( f ) + 2−1 ε. Defining δ = ε/(4 jκ), we show that if kPk < δ for any partition P of [a, b] then U( f , P) < Λ( f ) + ε. To do so, let P? = P ∪ P? = {xk }nk=0. Then some of the partition points of P are zl ’s from P? , and some are not. Consequently, U( f , P0) =



[xk−1 ,xk ]∩P? =0/

Mk ∆xk +



Mk ∆xk .

[xk−1 ,xk ]∩P? 6=0/

Since the terms of the first sum on the right-hand side of the last representation are also terms of U( f , P), it follows that U( f , P) − U( f , P?) consists of at most 2 j terms (one for each of z0 and z j , and two for each of z1 , ..., z j−1), each of which does not exceed κkP? k ≤ κkPk < κδ = (κε)(4 jκ)−1 = ε(4 j)−1. Therefore U( f , P) −U( f , P?) < 2 jε(4 j)−1 = 2−1 ε. Accordingly, we have Λ( f ) ≤ U( f , P) < U( f , P?) + 2−1 ε ≤ U( f , P?) + 2−1 ε < Λ( f ) + ε, whence completing the proof.

32

Jie Xiao

Because the Riemann sums lie between the upper and lower sums, it is natural to ask what happens if λ( f ) = Λ( f ). One would expect that this forces the Riemann sums to converge, with their limit being equal to the common value of λ( f ) and Λ( f ). That is precisely what does happen, and it provides the following Darboux characterization of integrability. Theorem 2.2.5. Let f : [a, b] → R be bounded. The following statements are equivalent. (i) f ∈ R[a, b].

(ii) λ( f ) = Λ( f ). (iii) For any ε > 0 there is a δ > 0 such that U( f , P) − L( f , P) < ε as kPk < δ. Proof. (ii)⇒(iii)⇒(i). Suppose that λ( f ) = Λ( f ) = s and ε > 0. Using Lemma 2.2.4, we can choose δ > 0 so that kPk < δ ⇒ L( f , P) > s − 2−1 ε and U( f , P) < s + 2−1 ε. Then for any choice of {ξk }nk=0 in [a, b], we have s − 2−1 ε < L( f , P) ≤ S( f , P, ξ) ≤ U( f , P) < s + 2−1 ε. Thus, kPk < δ ⇒ U( f , P) − L( f , P) < ε ⇒ |S( f , P, ξ) − s| < ε ⇒

f ∈ R[a, b] and

(i)⇒(iii)⇒(ii). If f ∈ R[a, b] and find δ > 0 such that

Rb a

Z b

f (x) dx = s.

a

f (x) dx = s, then for any ε > 0 we can

kPk < δ ⇒ |S( f , P, ξ) − s| < 2−2 ε. By the definitions of Mk and mk , there exist points ξ0k , ξ00k ∈ [xk−1 , xk ] satisfying f (ξ0k ) > Mk − 2−2 ε(b − a)−1 and f (ξ00k ) < mk + 2−2 ε(b − a)−1 .

33

Riemann Integrals Then n

U( f , P) =
s − 2−1 ε, thereby getting U( f , P) − L( f , P) < ε and 0 ≤ Λ( f ) − λ( f ) ≤ U( f , P) − L( f , P) < ε. This, along with Lemma 2.2.4, implies λ( f ) = Λ( f ). Remark 2.2.6. It is worth remarking that f ∈ R[a, b].

Rb a

f (x) dx = λ( f ) = Λ( f ) holds for

We have known that R[a, b] is closed under addition and constant multiplication. To completely establish the algebraic closure of R[a, b], we now use Theorem 2.2.5 to prove the result for the product of two general functions in R[a, b]. Theorem 2.2.7. If f 1 , f 2 ∈ R[a, b], then f 1 f 2 ∈ R[a, b]. Proof. For convenience, let f = f 1 and g = f 2 . We first prove the assertion for the special case: f , g ≥ 0. Suppose that P is any partition of [a, b], and let M f ,k , Mg,k and M f g,k denote the least upper bounds of f , g, and f g, respectively, on the k-th subinterval [xk−1 , xk ]. It is not hard to see that M f g,k ≤ M f ,k Mg,k . Similarly, if m f ,k , mg,k , and m f g,k are the corresponding greatest lower bounds on [xk−1, xk ], then m f g,k ≥ m f ,k mg,k . Thus M f g,k − m f g,k ≤ M f ,k Mg,k − m f ,k mg,k . Now, let M f = sup f (x) and Mg = sup g(x), x∈[a,b]

x∈[a,b]

34

Jie Xiao

and rewrite the last inequality as M f g,k − m f g,k ≤ M f ,k Mg,k − m f ,k Mg,k + m f ,k Mg,k − m f ,k mg,k ≤ (M f ,k − m f ,k )Mg + (Mg,k − mg,k )M f .

Accordingly, we have n

U( f g, P) − L( f g, P) =

∑ (M f g,k − m f g,k )∆xk

k=1

n

≤ Mg ∑ (M f ,k − m f ,k )∆xk + M f k=1

n

∑ (Mg,k − mg,k )∆xk

k=1

  = Mg U( f , P) − L( f , P) + M f U(g, P) − L(g, P) .

By Theorem 2.2.5, we have that if ε > 0 then f , g ∈ R[a, b] allows us to choose P so that U( f , P)−L( f , P) < 2−1 ε(1 + Mg )−1 and U(g, P)−L(g, P) < 2−1 ε(1 + M f )−1 . This gives U( f g, P) − L( f g, P) < ε and so f g ∈ R[a, b]. To prove the general case in which f and g need not be nonnegative, we use Theorem 2.1.8 which assures us that f and g are bounded from below, say, by constants κ f and κg respectively. Then f − κ f , g − κg ≥ 0 on [a, b] and f − κ f , g − κg ∈ R[a, b], so the just-proved case can be applied to conclude that ( f − κ f )(g − κg ) ∈ R[a, b]. Note that f g = ( f − κ f )(g − κg ) + κ f g + κg f + κ f κg for which each term of the right-hand member is in R[a, b]. Thus it follows from the above special case, as well as Theorem 2.2.1, that f g ∈ R[a, b]. It may be observed that no explicit formula is valid for ab f 1 (x) f 2 (x) dx. R R But, there is an inequality to connect it with ab f 12 (x) dx and ab f 22 (x) dx. This is just the Cauchy-Schwarz inequality or the Cauchy-Bunyakovski-Schwarz inequality, named after Augustin Louis Cauchy, Hermann Amandus Schwarz and Viktor Yakovlevich Bunyakovsky. R

35

Riemann Integrals Theorem 2.2.8. If f 1 , f 2 ∈ R[a, b], then Z b Z b 2−1  Z b 2−1 2 2 f 1 (x) dx f 2 (x) dx . a f 1 (x) f 2 (x) dx ≤ a a

Proof. It is clear that

f 1 f 2 , f 12 , f 22 ∈ R[a, b] and 0 ≤ If

Z b a

f 12 (x) dx,

Z b a

f 22 (x) dx.

 Rb 2  α = aR f 1 (x) dx; β = 2 ab f 1 (x) f 2 (x) dx;  R  γ = ab f 22 (x) dx,

then 2

τ(t) = αt + βt + γ =

Z b a

2 t f 1 (x) + f 2 (x) dx ≥ 0.

From the elementary algebra we recall that τ(t) = 0 has two roots in R if and only if β2 − 4αγ > 0. Since Q(t) is never negative, the quadratic equation τ(t) = 0 can not have two real roots, which means that β2 − 4αγ ≤ 0. But this says that Z

b a

2  Z f 1 (x) f 2 (x) dx −

a

b

 Z

f 12 (x) dx

b a



f 22 (x) dx

≤ 0.

2.3 Fundamental Theorem of Calculus This section is devoted to proving the fundamental theorem of calculus via an investigation of the integrability of continuous functions. The ordinary definition of continuity is actually pointwise continuity and allows us to choose δ (in terms of the ε − δ language) in a point-by-point manner, using different choices of δ for different points in the domain of f . But sometimes, say, f (x) = x2 on [−5, 5], or f (x) = x−1 on [1, ∞), we are able to choose one δ that works uniformly well at any point in the domain of f . This provides both the motivation and the name of the next concept. Definition 2.3.1. Given a set D ⊆ R and a function f : D → R, we say that:

36

Jie Xiao

(i) f has a limit L ∈ R at c ∈ D, denoted limx→c f (x) = L, provided that for any ε > 0 there is a δ > 0 such that x ∈ D and 0 < |x − c| < δ ⇒ | f (x) − L| < ε; (ii) f is continuous on D provided that it is continuous at any point c ∈ D -i.e.for any ε > 0 there is a δ > 0 such that x ∈ D and |x − c| < δ ⇒ | f (x) − f (c)| < ε; (iii) f is uniformly continuous on D provided that for any ε > 0 there exists a δ > 0 (depending only on ε and D) such that x1 , x2 ∈ D and |x1 − x2 | < δ ⇒ | f (x1 ) − f (x2 )| < ε. It is plain that the linear function x 7→ f (x) = ax + b with a 6= 0 is uniformly continuous on R: for ε > 0, take δ = ε|a|−1. However, the sign function f (x) = |x|x−1 is not uniformly continuous on R \ {0} although it is so on (−∞, 0) and (0, ∞). Also, it is easy to see that if f is uniformly continuous on D, then f is continuous on D. Though its converse is not true in general, there are situations in which continuity at every point of D does imply uniform continuity on D. Unless a special remark is made, the symbol C[a, b] is employed to represent the set of all real-valued continuous functions on [a, b]. Theorem 2.3.2. Below are the basic properties of f ∈ C[a, b]. (i) f is bounded; that is, sup{| f (x)| : x ∈ [a, b]} < ∞.

(ii) f has a minimum and a maximum; that is, there exist x1 , x2 ∈ [a, b] such that f (x1 ) = sup{ f (x) : x ∈ [a, b]} and f (x2 ) = inf{ f (x) : x ∈ [a, b]}.

(iii) Intermediate value theorem – for any y between f (a) and f (b) there is x ∈ [a, b] such that f (x) = y. (iv) f is uniformly continuous on [a, b].

Proof. (i) If not, then for any n ∈ N there is an xn ∈ [a, b] such that | f (xn )| ≥ n. By the Bolzano-Weierstrass property – see also Theorem 1.2.4 (ii), there exist x ∈ [a, b] and a subsequence {xnk }k∈N of {xn }n∈N such that xnk → x. Hence, f ∈ C[a, b] yields limk→∞ f (xnk ) = f (x) and so { f (xnk )}k∈N is bounded. This is a contradiction since | f (xnk )| ≥ nk ≥ k for any k ∈ N. Therefore, f must be bounded on [a, b].

Riemann Integrals

37

(ii) By (i), M = sup{ f (x) : x ∈ [a, b]} exists in R. This produces a sequence xn ∈ [a, b] with M − n−1 < f (xn ) ≤ M for any n ∈ N. Again by the Bolzano∞ Weierstrass property, there is a subsequence {xnk }∞ k=1 of {xn }n=1 such that xnk → x ∈ [a, b]. Accordingly, from f ∈ C[a, b] and Theorem 1.2.7 it follows that    f (xnk ) → f (x); f (xnk ) → M;   M = f (x).

The argument for a minimum is similar. (iii) It suffices to consider the case f (a) 6= f (b). Of course, we may assume f (a) < y < f (b) and S = {x0 ∈ [a, b] : f (x0 ) < y}. Since a ∈ S, it follows that S 6= 0/ and it has an upper bound b. The axiom of LUB for R implies that x = sup S exists. The definition of S gives a < x < b. Next, we prove f (x) = y through handling two cases below. Case 1: f (x) < y. Since f ∈ C[a, b], there is a δ > 0 such that |x1 − x| < δ and x1 ∈ [a, b] ⇒ f (x1 ) < y. Consequently, there is an x2 ∈ [a, b] with x2 > x and f (x2 ) < y, contradicting the definition of x. Case 2: f (x) > y. Like Case 1, we can find a δ > 0 such that x − δ < x3 < x and x3 ∈ [a, b] ⇒ f (x3 ) > y, which is against the choice of x. (iv) Let c ∈ [a, b]. Since f ∈ C[a, b], for any ε > 0 there is a δc > 0 such that |x − c| < δc ⇒ | f (x) − f (c)| < 2−1 ε. Now consider the collection of open intervals:  O = (c − 2−1 δc , c + 2−1 δc ) : c ∈ [a, b] .

For each c ∈ [a, b], there is an interval in O centered at c, so O is obviously an open cover of [a, b]. By the Heine-Borel property – see also Theorem 1.2.4 (iii), there exists a finite subcollection of O that covers [a, b], say, [a, b] ⊂ ∪nj=1 (c j − 2−1 δc j , c j + 2−1 δc j ).

38

Jie Xiao

Define δ = 2−1 min1≤ j≤n δc j . Now if |x1 − x2 | < δ for x1 , x2 ∈ [a, b], then x1 is in one of the n intervals in the above finite covering, so |x1 − c j | < δ holds for some c j . Consequently, |x2 − c j | ≤ |x2 − x1 | + |x1 − c j | < δ + 2−1 δc j ≤ δc j . Thus, the choice of δc j yields | f (x1 ) − f (c j )| < 2−1 ε and | f (x2 ) − f (c j )| < 2−1 ε, and consequently, | f (x1 ) − f (x2 )| ≤ | f (x1 ) − f (c j )| + | f (x2 ) − f (c j )| < ε. Example 2.3.3. To handle the integrability of continuous functions, we consider the integrability of the following function  −1 q , x = mn−1 ∈ Q ∩ (0, 1] and mn−1 in lowest terms f (x) = . 0 , x ∈ {0} ∪ (0, 1] ∩ (R \ Q)

Recall that this function f is continuous at each irrational number and discontinuous at each rational number in (0, 1): in fact, if c ∈ (0, 1) is rational, then we take irrational points sk ∈ (0, 1) converging to c and hence limk→∞ f (sk ) = 0 6= f (c); in order to see that f is continuous at any irrational d ∈ (0, 1), let ε > 0 and note that there are only a finite number of irreducible fractions mn−1 in (0, 1) such that n−1 ≥ ε. Therefore there is a δ > 0 such that those fractions do not belong to (d − δ, d + δ). This means that for all x satisfying |x − d| < δ, we have 0 ≤ f (x) < ε. It is worth mentioning that f is continuous from the right at 0 and discontinuous from the left at 1. R1 We now assert that f ∈ R[0, 1] with 0 f (x) dx = 0. As a matter of fact, the density of R \ Q in R ensures that L( f , P) = 0 for every partition P of [0, 1]. Thus, for any ε > 0, we seek a partition P such that U( f , P) < ε. In [0, 1] there are just finitely many points p/q such that n−1 > 2−1 ε. Let l be the number of such points in [0, 1] and let P be a partition such that kPk < ε(4l)−1 . In U( f , P) = ∑nk=1 Mk ∆xk there are at most 2l terms where Mk > 2−1 ε; for each of these terms we have Mk ∆xk ≤ kPk < ε(4l)−1. Therefore the total of these terms is less than (2l)(ε(4l)−1) = 2−1 ε. In each of the remaining terms, Mk ≤ 2−1 ε, so their total is less than (2−1 ε) ∑nk=1 ∆xk = 2−1 ε. Hence U(R f , P) < ε, so the Darboux’s integrability criterion implies that f ∈ R[0, 1] and 01 f (x) dx = 0.

Riemann Integrals

39

Up to this point, we have seen very few examples that we could verify as being integrable functions. But, the next two theorems provide us with a very large set of such examples. Theorem 2.3.4. If f ∈ C[a, b], then f ∈ R[a, b]. Proof. By Theorem 2.3.2 (iv), f is uniformly continuous on [a, b], and so for any ε > 0 there is a δ > 0 such that x† , x‡ ∈ [a, b] and |x† − x‡ | < δ ⇒ | f (x† ) − f (x‡ )| < ε(b − a)−1 . Now choose a partition P such that kPk < δ, and consider the k-th subinterval [xk−1 , xk ]. There are ξk,† , ξk,‡ ∈ [xk−1 , xk ] such that f (ξk,† ) = Mk , f (ξk,‡ ) = mk thanks to Theorem 2.3.2 (ii). Note that |ξk,† − ξk,‡ | ≤ ∆xk ≤ kPk < δ. So it follows that Mk − mk < ε(b − a)−1 . Thus n

U( f , P) − L( f , P) =

∑ (Mk − mk )∆xk < ε;

k=1

and hence f ∈ R[a, b] by Theorem 2.2.5. As a consequence of Theorem 2.3.4, we can establish the mean value theorem for integrals. Theorem 2.3.5. If f ∈ C[a, b], then there is a number ξ ∈ [a, b] such that R f (ξ)(b − a) = ab f (x) dx. Proof. By Theorem 2.3.4, we have f ∈ R[a, b]. Let

m = min f (x) and M = max f (x). x∈[a,b]

Then m(b − a) ≤

x∈[a,b]

Z b a

f (x) dx ≤ M(b − a);

that is, (b − a)−1 ab f (x) dx lies between m and M. Since f ∈ C[a, b], f assumes any value between m and M and so there is a ξ ∈ [a, b] such that R

f (ξ) = (b − a)

−1

Z b a

f (x) dx.

40

Jie Xiao

The idea involved in the argument of Theorem 2.3.5 also leads to the following result. Theorem 2.3.6. If f is monotonic on [a, b]; that is, f is increasing x1 ≤ x2 and x1 , x2 ∈ [a, b] ⇒ f (x1 ) ≤ f (x2 ) or decreasing x1 ≤ x2 and x1 , x2 ∈ [a, b] ⇒ f (x1 ) ≥ f (x2 ), then f ∈ R[a, b]. Proof. Without loss of generality, we may assume that f is increasing and nonconstant (otherwise, there will be nothing to prove). Then for any subinterval [xk−1 , xk ] of [a, b], Mk occurs at the right endpoint: Mk = f (xk ). Similarly, mk = f (xk−1 ). Therefore, if P is any partition of [a, b], then n

U( f , P) − L( f , P) =

∑ (Mk − mk )∆xk

k=1

n

≤ kPk ∑ f (xk ) − f (xk−1 ) k=1

= kPk| f (b) − f (a)|.



Now if ε > 0, then we simply select a partition P such that kPk < ε f (b) − f (a) + 1

−1

and hence U( f , P) − L( f , P) < ε.

This infers that f ∈ R[a, b] by Theorem 2.2.5. To get the fundamental theorem of calculus linking an integral to a derivative, we recall the notion of differentiation and the corresponding mean value theorem. Definition 2.3.7. For an interval I ⊆ R containing an element c, we say that the function f : I → R is differentiable at c or has a derivative at c provided that the limit f (x) − f (c) f 0 (c) = lim x→c x−c

41

Riemann Integrals exists; that is, for any ε > 0 there exists a δ > 0 such that 0 f (x) − f (c) x, c ∈ I and 0 < |x − c| < δ ⇒ f (c) − < ε. x−c

If f is differentiable at every element of the set D ⊆ I, then f is called differentiable on D, and the function f 0 : D → R is called the derivative of f on D. Obviously, we can read off that if f is differentiable at c then it is continuous but not conversely. At the same time, we can take the derivative of the sum, product, quotient and composition of two differentiable functions. Moreover we can establish the generalized mean value theorem for derivatives as follows. Theorem 2.3.8. Let f , g ∈ C[a, b] be differentiable on (a, b). Then there exists  0 a point c ∈ (a, b) such that f (b) − f (a) g (c) = g(b) − g(a) f 0 (c). Proof. Define the function h : [a, b] → R by   h(x) = f (b) − f (a) g(x) − g(b) − g(a) f (x).

Then h ∈ C[a, b] and is differentiable on (a, b) with h(a) = h(b). An application of Theorem 2.3.2 (ii) produces two points x1 , x2 ∈ [a, b] such that inf{h(x) : x ∈ [a, b]} = h(x1 ) and sup{h(x) : x ∈ [a, b]} = h(x2 ). If x1 and x2 are both endpoints of [a, b], then h(x) = h(a) = h(b) for any x ∈ [a, b], and hence f is a constant, giving h0 (x) = 0 for all x ∈ (a, b). Otherwise, we may assume x2 ∈ (a, b). Now let sn = x2 − n−1 be a sequence converging to x2 such that a < sn < x2 for sufficiently large n ∈ N. Since h is differentiable at x2 , we have h(sn ) − h(x2 ) 0≤ → h0 (x2 ) as n → ∞, sn − x2

whence h0 (x2 ) ≥ 0. Similarly, if x2 < tn = x2 + n−1 < b, then 0≥

h(tn ) − h(x2 ) → h0 (x2 ) as tn − x2

n → ∞,

and hence h0 (x2 ) ≤ 0. Therefore, we have h0 (x2 ) = 0, whence   f (b) − f (a) g0(x2 ) = g(b) − g(a) f 0 (x2 ).

42

Jie Xiao

The coming-up-next is the fundamental theorem of calculus whose importance can hardly be overstated. Theorem 2.3.9. If f ∈ R[a, b] and F 0 (x) = f (x) for Rx ∈ [a, b], then ab f (x) dx = F(b) − F(a). Conversely, if f ∈ C[a, b] and F(x) = ax f (t) dt for x ∈ [a, b], then F 0 (x) = f (x) for x ∈ [a, b]. R

Proof. Let P be a partition of [a, b], and consider the collapsing sum n

F(b) − F(a) =

∑ k=1

 F(xk ) − F(xk−1) .

Since F is differentiable on [xk−1 , xk ], we can apply the special case (where g(x) = x) of Theorem 2.3.8 to get a number ξk ∈ [xk−1 , xk ] such that F(xk ) − F (xk−1) = F 0 (ξk )∆xk = f (ξk )∆xk . Via substituting the right-hand member of this equation into the collapsing sum, we get F(b) − F(a) = S( f , P, ξ), a Riemann sum for f with respect to P. Thus we have shown that for any partition P, the points {ξk }nk=1 can be chosen so that S( f , P, ξ) is F(b) − F(a). Therefore F(b) − F(a) is the only possible value of limkPk→0 S( f , P, ξ). Since f ∈ R[a, b], this limit must exist and its value is R R denoted by ab f (x) dx. Thus ab f (x) dx = F(b) − F(a). To prove the converse, let c ∈ [a, b] and consider the difference quotient Z  Qc (h) = h−1 F(c + h) − F(c) = h−1

c+h

f (t) dt. c

By Theorem 2.3.5, there is a ξ between c and c + h such that Z c+h

f (t) dt = f (ξ)h.

c

Via substituting this into the expression for Qc (h), we obtain Qc (h) = f (ξ). As h → 0, it follows that ξ → c; therefore F 0 (c) = lim Qc(h) = lim f (ξ) = f (c), h→0

thanks to the continuity of f at c.

ξ→c

43

Riemann Integrals

Theorem 2.3.9 is a powerful tool for the computation of integrals. For example, if n ∈ N, then (n + 1)−1 xn+1 has the continuous derivative xn , and hence Z b a

xn dx = (n + 1)−1 (bn+1 − an+1 ).

Additionally, we should notice that the condition F 0 = f in the first part of Theorem 2.3.9 cannot guarantee f ∈ R[a, b]; for instance, if  2 x sinx−2 , x 6= 0 F(x) = , 0 , x=0 then 0

F (x) =



2x sin x−2 − 2x−1 cos x−2 , x 6= 0 , 0 , x=0

and hence f = F 0 6∈ R[−1, 1] due to the fact that f is unbounded on [−1, 1]. As a consequence of Theorem 2.3.9, we can establish the following formula for change of variable for the Riemann integrals. Corollary 2.3.10. Let U and V be open intervals in R, φ : U → V a differentiable function with continuous derivative, and f : V → R a continuous function. Then Z φ(b) φ(a)

f (v) dv =

Z b a

 f φ(u) φ0 (u) du for all a, b ∈ U. Ry

Proof. Let F : V → R be the function defined by F(y) = φ(a) f (v) dv for all y ∈ V . Then F is differentiable and F 0 = f by Theorem 2.3.9. The function R φ(x) G : U → R defined by G(x) = φ(a) f (v) dv is actually the composition G = F ◦ φ of two differentiable functions, hence is itself differentiable. By the chain rule we have   G0 (x) = F 0 φ(x) φ0(x) = f φ(x) φ0(x) for all x ∈ U, thereby getting

G(x) =

Z x a

 f φ(u) φ0 (u) du +C for some constant C ∈ R.

Upon setting x = a, we achieve

C = 0 and G(x) =

Z x a

 f φ(u) φ0 (u) du for all x ∈ U.

Letting x = b gives the desired formula.

44

Jie Xiao

2.4 Improper Integrals Since there are many useful, commonly encountered functions whose domains are unbounded or whose graphs have vertical asymptotes, in this section we extend the Riemann integrals to cover two new types of integrals for some of these functions. As one of their applications, we prove the Weierstrass approximation theorem via the convolution of two functions on R. Definition 2.4.1. (i) Let a ∈ R and f ∈ R[a,t] for every t ≥ a. Then the improper integral of f on [a, ∞) is defined by Z Z ∞

t

f (x) dx = lim

t→∞ a

a

f (x) dx.

If the limit exists; that is, for any ε > 0 there is a t0 > 0 such that Z t Z ∞ t > t0 ⇒ f (x) dx − f (x) dx < ε, a

a

then the improper integral is said to be convergent; otherwise, the improper integral is divergent. (ii) The improper integral of a function f on (−∞, b] is defined similarly: Z b

−∞

f (x) dx = lim

Z b

t→−∞ t

f (x) dx.

(iii) If f ∈ R[−t,t] for any t > 0, then the improper integral of f on (−∞, ∞) is defined by Z ∞

−∞

If

Rc

−∞

f (x) dx = lim

f (x) dx and

Z c

t→−∞ t

R∞ c

f (x) dx + lim

Z t

t→∞ c

f (x) dx for all c ∈ R.

f (x) dx are convergent, then

Z ∞

−∞

f (x) dx =

Z c

−∞

f (x) dx +

Z ∞

f (x) dx.

c

The following examples show how to evaluate these improper integrals.

45

Riemann Integrals Example 2.4.2. (i)

Z ∞

x−2 dx = lim

Z t

t→∞ 1

1

x−2 dx = lim (1 − t −1 ) = 1. t→∞

Hence the improper integral is convergent. (ii) Z 0

−∞

Z 0

cosx dx = lim

t→−∞ t

cos x dx = lim (− sint). t→−∞

Hence the improper integral is divergent. (iii) Z ∞

−∞

(1 + x2 )−1 dx = =

Z 0

−∞

(1 + x2 )−1 dx +

Z ∞ 0

(1 + x2 )−1 dx

lim arctan x|t0 + lim arctan x|t0

t→−∞

t→∞

= π. Hence the improper integral is convergent. (iv) Z ∞

−∞

e−x dx = =

Z 0

−∞

e−x dx +

Z ∞

e−x dx

0

lim (−e−x )|t0 + lim (−e−x )|t0

t→−∞

= ∞.

t→∞

Hence the improper integral is divergent. Here is a fundamental result on the improper integrals of which (iii) is called Cauchy’s convergence criterion. f 1 (x) dx and a∞ f 2 (x) dx exist. Then  R∞ R R (i) a c1 f 1 (x) + c2 f 2 (x) dx = c1 a∞ f 1 (x) dx + c2 a∞ f 2 (x) dx for any constants c1 , c2 ∈ R. Theorem 2.4.3. Let

(ii)

R∞ a

f 1 (x) dx ≤

R∞

R∞ a

R∞ a

R

f 2 (x) dx provided that f 1 ≤ f 2 on [a, ∞).

(iii) a f (x) dx exists if and only if for any ε > 0 there is a t0 > 0 such that R b2 > b1 > t0 implies bb12 f (x) dx < ε.

46

Jie Xiao

Proof. As an exercise,R the arguments for (i) and (ii) are left for the students. Now we verify (iii). If a∞ f (x) dx exists, then for any ε > 0 there is a t0 > 0 such that Z t Z ∞ t > t0 ⇒ f (x) dx − f (x) dx < 2−1 ε. a a Thus the desired inequality follows from taking b2 > b1 > t0 . On the 0 > 0 such that b2 > b1 > t0 R other hand, if for any ε > 0 there isaR ta+n ∞ implies bb12 f (x) dx < ε, then the sequence f (x) dx n=1 is a Cauchy a sequence, and hence converges to a finite number s. This gives that for ε > 0 there exists a t0 > 0 such that Z a+n f (x) dx − s < ε. a + n ≥ t0 ⇒ a

Then if N 3 n0 > t0 − a, then t > a + n0 yields Z a+n Z t Z 0 = f (x) dx − s + f (x) dx − s a

a

t a+n0

as required.

f (x) dx < 2ε,

Defining the improper integral in terms of the definite Riemann integral has both advantages and disadvantages. On the positive side, there is no need to develop a separate limit theory forR the new type of integral. But, on the other hand, we are required to evaluate at f (x) dx as a function of t in order to deterR∞ mine whether a f (x) dx is convergent, and the difficulty of finding the primitive of a given function is well-known to the students. Since it is frequently suffiR cient to determine convergence of a∞ f (x) dx without finding its value, it will be very helpful to have a test that says that under some appropriate conditions the convergence of a simple integral implies that of a complicated one. Such a result is called a comparison test. That is the nature of the next theorem. Theorem 2.4.4. Let f 1 , f 2 ∈ R[a,t] and 0 ≤ f 1 (t) ≤ f 2 (t) for every a ≤ t < ∞. R R If a∞ f 2 (x) dx is convergent, then a∞ f 1 (x) dx is convergent too. Proof. Let f = fR1 and g = f 2 . First, we verify that if Rthere is a number M > 0 such that F(t) = at f (x) dx ≤ M for every t ≥ a, then a∞ f (x) dx is convergent. In fact, since Z t+h

F(t + h) − F (t) =

f (x) dx,

t

47

Riemann Integrals it follows that F is increasing on [a, ∞). Therefore F(t) ≤ ensures that lim F(t) = sup F(t) < ∞, i.e., lim

t→∞

Z t

t→∞ a

t≥a

f (x) dx exists.

Secondly, if h(x) = g(x) − f (x) for all x ≥ a, then h ∈ R[a,t] and h(t) ≥ 0 for every t ≥ a, and hence Z t a

h(x) dx ≤

Z t a

g(x) dx ≤

Thus a∞ g(x) dx is an upper bound for R graph, a∞ h(x) dx is convergent. Now R

lim

Z t

t→∞ a

f (x) dx = lim

Rt

g(x) dx.

a

a h(x) dx

Z t

t→∞ a

Z ∞

for t ≥ a, so by the first para-

g(x) dx − lim

and the last two limits exist because both gent.

Z t

t→∞ a

R∞ a

g(x) dx and

h(x) dx, R∞ a

h(x) dx are conver-

The utility of this result is illustrated in the convergence of the improper integral Z ∞

1

(1 + x3 )−1 dx since (1 + x3 )−1 ≤ x−2 for x ≥ 1.

Regarding the integrals of unbounded functions, we have the following definition. Definition 2.4.5. (i) Let f ∈ R[c, d] for every [c, d] ⊂ [a, b). Then the improper integral of f on [a, b] is defined by Z b− a

f (x) dx = lim− t→b

Z t

f (x) dx.

a

If the limit exists; that is, for any ε > 0 there is a δ > 0 such that Z b− Z t 0 < b −t < δ ⇒ f (x) dx − f (x) dx < ε, a

a

then the improper integral is said to be convergent; otherwise, the improper integral is divergent.

48

Jie Xiao

(ii) Similarly, we define the analogous improper integral: Z b

a+

f (x) dx = lim+ t→a

Z b

f (x) dx.

t

(iii) If f is unbounded at a and b, then for any c ∈ (a, b), the improper integral a+ f (x) dx is defined by

R b−

lim

Z c

t→a+ t

It is convergent if both case Z −

Rc

a+

f (x) dx + lim− t→b

f (x) dx and

b

a+

Z c

f (x) dx =

a+

R b− c

Z t

f (x) dx.

c

f (x) dx are convergent, and in that

f (x) dx +

Z b−

f (x) dx.

c

Example 2.4.6. Z 1 0

2

(x − 1)

−1

dx and

Z 0

−1

2

(x − 1)

−1

dx and

Z 1

−1

(x2 − 1)−1 dx,

are improper integrals satisfying the foregoing definition. Plainly, the preceding types of improper integrals can be used in combination to form the improper integral of any function with a graph that has at most a finite number of asymptotes. For instance, if −1 f (x) = (x − a)(x − b) , where a < b, then we choose an arbitrary number c ∈ (a, b) and write Z ∞

−∞

f (x) dx =

Z a−1 −∞

+

f (x) dx +

Z b− c

Z a−

f (x) dx +

f (x) dx

Z

f (x) dx.

f (x) dx +

a−1 Z b+1 b+

Z c

f (x) +

a+ ∞

b+1

Though the calculation of such an integral may be cumbrous – we have just decomposed one improper integral into six limits – the major achievement here is that we do not need a separate theory for each situation. There are just two basic types of improper integrals, and finite combinations of these two types suffice for most functions. Naturally, there is a comparison test for the improper integrals of unbounded functions. It is analogous to Theorem 2.4.4 and its proof can be achieved similarly.

49

Riemann Integrals

Theorem 2.4.7. Let f 1 , f 2 ∈ R[a,t] and f 1 (t) ≤ f 2 (t) for every a ≤ t < b. If R b− R b− a f 2 (x) dx is convergent, then a f 1 (x) dx is convergent too. This theorem has an obvious dual for Z 1

x(x + 1)

0+

thanks to x(x + 1)

−1

Rb

a+

−1

f (x) dx, which is illustrated by

dx = ∞,

≥ (2x)−1 for all x ∈ (0, 1].

As a good application of the improper integrals, we consider the Gamma function. Example 2.4.8. For each x > 0, let Γ(x) =

Z ∞ 0+

e−t t x−1 dt

be the Gamma function. Then: (i) Γ(x) is convergent; (ii) Γ(x + 1) = xΓ(x) – in particular - Γ(n + 1) = n! for any n ∈ N; (iii) limx→0+ Γ(x) = ∞. For (i), we write Z ∞ 0+

−t x−1

e t

dt =

Z 1

0+

−t x−1

e t

dt +

Z ∞ 1

e−t t x−1 dt.

The first term is finite since e−t t x−1 ≤ t x−1 for t ∈ (0, 1) and 01+ t x−1 dt = x−1 for x > 0. The Rsecond term is finite too due to the fact that limt→∞ e−t t n = 0 for all n ∈ N, that 1∞ t −2 dt < ∞, and that R

e−t t x−1 ≤ e−t t x+1t −2 < t −2 for all n ≥ x + 1. For (ii) Let ε, b > 0. Using integration by parts, we have Z 1 ε

−t x−1

e t

−ε x −1

dt = −e ε x

−1 −1

+e x

−1

+x

Z 1 ε

e−t t x dt.

50

Jie Xiao

Letting ε → 0, we get Z 1

0+

−t x−1

e t

−1 −1

dt = e x

−1

+x

Z 1

0+

e−t t x dt.

Similarly, Z b 1

e−t t x−1 dt = e−b bx x−1 − e−1 x−1 + x−1

Z b

e−t t xdt

1

and letting b → ∞, we obtain Z ∞ 1

e−t t x−1 dt = −e−1 x−1 + x−1

Z ∞

e−t t x dt.

1

This implies Γ(x) = x−1 Γ(x + 1). Note also that Γ(1) = 1 = 0!. So the second formula follows from the first one via induction. (iii) If x > 0 then Γ(x) >

Z 1

0+

e−t t x−1 dt ≥ e−1 x−1 → ∞ as x → 0+ .

As an interesting application of some of the previous results, we introduce an important product – the so-called convolution of two functions, and use it to prove the polynomial approximation theorem on an interval. Definition 2.4.9. Let f 1 , f 2 : R → R. The convolution f 1 ∗ f 2 of f 1 and f 2 is defined by Z ∞

f 1 ∗ f 2 (x) =

−∞

f 1 (x − y) f 2 (y) dy,

provided that the improper integral exists. It is an easy exercise to prove that f 1 ∗ f 2 = f 2 ∗ f 1. Also, it can be shown that the convolution exists for a large class of functions; however, we shall restrict our discussion to the class of continuous functions. Definition 2.4.10. A sequence φn : R → R is a Dirac sequence or a delta sequence or an approximate identity provided that: (i) φn (t) ≥ 0 for all t ∈ R;

(ii) φn is continuous and integrable over R with

R∞

−∞ φn (t) dt

= 1;

51

Riemann Integrals  R∞ −∞ φn (t) dt + δ φn (t) dt = 0.

R −δ

(iii) Given any δ > 0, limn→∞

−1 Example 2.4.11. φn (t) = π(nt 2 + n−1 ) is the Sim´eon-Denis Poisson kernel. For our purpose, let us consider the Edmund Georg Hermann Landau kernel:  Z 1 (1 − t 2 )n c−1 n , |t| < 1 φn (t) = where cn = (1 − t 2 )n dt. 0 , |t| ≥ 1 −1 It is obvious that this function satisfies (i) and (ii) of Definition 2.4.10. For (iii), note that cn = 2

Z 1 0

2 n

(1 − t ) dt ≥ 2

Z 1 0

(1 − t)n dt = (n + 1)−1 .

Thus, if δ ∈ (0, 1) is fixed, then Z 1 δ

φn (t) dt = cn

By symmetry,

−1

R −δ −1

Z 1 δ

(1 −t 2 )n dt ≤ 2−1 (n + 1)(1 − δ2 )n (1 − δ) → 0 as n → ∞.

φn (t) dt =

R1 δ

φn (t) dt, and (iii) follows.

Theorem 2.4.12. Let φn be a Dirac sequence, and f : R → R be bounded and continuous. Then limn→∞ supx∈[a,b] | f ∗ φn (x) − f (x)| = 0 for any finite closed interval [a, b] of R. Proof. The existence of f ∗ φn is established by the following estimate that for any finite interval [a, b], Z b a

| f (x − t)φn (t)|dt ≤ sup | f (x)| x∈R

Z ∞

−∞

|φn (t)|dt.

Using the continuity of f , we have that given a finite interval [a, b] and for arbitrary ε > 0 there is a δ > 0 such that | f (x − t) − f (x)| < 2−1 ε for x ∈ [a, b] and |t| ≤ δ. Since f is bounded, M = supx∈R |f(x)| + 1 is finite. Consequently, there is an n0 ∈ N such that n ≥ n0 ⇒

Z −δ −∞

φn (t) dt +

Z ∞ δ

φn (t) dt < ε(4M)−1.

52

Jie Xiao

Thus, for x ∈ K and n ≥ n0 we get | f ∗ φn (x) − f (x)|

= ≤

Z



−∞ −δ

Z

−∞

 f (x − t) − f (x) φn (t) dt

+

Z −δ

Z δ

−δ

+

Z ∞  δ

φn (t) dt + 2−1 ε



2M



2Mε(4M)−1 + 2−1 ε

=

ε.

−∞

f (x − t) − f (x) φn (t) dt

Z ∞

Z δ

−∞

−δ

φn (t) dt + 2M

Z ∞ δ

φn (t) dt

φn (t) dt

By choosing the Landau kernel, we obtain the classical Weierstrass approximation theorem for continuous functions on bounded and closed intervals on R as follows. Theorem 2.4.13. Let f ∈ C[a, b]. Then for each ε > 0 there is a polynomial p such that | f (x) − p(x)| < ε for all x ∈ [a, b]. Proof. We first simplify the problem by making some changes of variables. Let u = (x − a)(b − a)−1 . Then x = (b − a)u + a and u ∈ [0, 1]. Set g(u) = f ((b−a)u+a) = f (x). If we can find a polynomial p such that |p(u)−g(u)| < ε for all u ∈ [0, 1], then  p (x − a)(b − a)−1 − f (x) < ε  for all x ∈ [a, b], and the function defined by p (x−a)(b−a)−1 is a polynomial. So, we may assume that [a, b] = [0, 1].  Next, let h(x) = f (x) − f (0) − f (1) − f (0) x. If we can approximate h uniformly by a polynomial on [0, 1], then f can be uniformly approximately by a polynomial on [0, 1]. Since h(0) = h(1) = 0, we may assume that f (0) = f (1) = 0. Assuming this, we extend f continuously to R by setting f (x) = 0 for x 6∈ [0, 1]. Then the extension, which we still denote by f , is bounded and continuous on R. Suppose φn is the Landau kernel. It is not hard to see that φn (x − t) = (1 − (x − t)2 )n c−1 n is a polynomial in x and t; so Z ∞

−∞

φn (x − t) f (t) dt = φn ∗ f (x) = f ∗ φn (x)

is a polynomial in x. The result now follows from Theorem 2.4.12.

53

Riemann Integrals

Problems 2.1. Verify following four results. (i) f ∈ R[0, 1] and

R1

f (x) dx = 0 for  1 , x = 2−1 f (x) = . 0, x= 6 2−1

R1

f (x) dx = 0 for  1 , x = 1, 2−1 , 3−1 , ... f (x) = . 0 , otherwise

(ii) f ∈ R[0, 1] and

0

0

(iii) f ∈ R[0, 2] and

R2

f (x) dx = −1 for  1 , x ∈ [0, 1] f (x) = . −2 , x ∈ (1, 2]

(iv) f ∈ R[0, 2] and

R2

f (x) dx = 3 for   2 , x ∈ [0, 1) f (x) = 5, x=1 .  1 , x ∈ (1, 2]

0

0

2.2. Prove that if f ∈ R[0, 1] then lim n

n→∞

What about f (x) =



n

−1

∑ f (kn

k=1

−1

)=

Z 1

f (x) dx.

0

1 − x2 ?

2.3. Determine whether or not f (x) =



2 , x≤0 x−1 , x > 0

is Riemann integrable on [0, 1], [−1, 0], and [−1, 1] respectively. 2.4. Prove that if f ∈ R[a, b] and m ≤ f (x) ≤ M for all x ∈ [a, b], then m(b−a) ≤ Rb a f (x) dx ≤ M(b − a).

54

Jie Xiao

2.5. Given f (x) = x and Pn = {kn−1 }nk=0, find L( f , Pn ) and U( f , Pn ). Then, R show f ∈ R[0, 1]. Finally, evaluate 01 f (x) dx. R R 2.6. Prove that if f ∈ R[a, b] then | f | ∈ R[a, b] with ab f (x) dx ≤ ab | f (x)|dx, but not conversely.

2.7. Let f be bounded on [a, b]. Prove that f ∈ R[a, b] if and only if f ∈ R[c, d] for all [c, d] ⊂ (a, b) if and only if f ∈ R[a, c] and f ∈ R[c, b] for some c ∈ (a, b). R √ 2.8. Prove 0π x sinx dx ≤ π.

2.9. Let f , g ∈ R[a, b]. Prove the Minkowski inequality: Z

b a

2−1  Z f (x) + g(x) dx ≤ 2

a

b

2−1  Z b 2−1 2 f (x) dx + g (x) dx . 2

a

2.10. Show that if f ∈ R[a, b] and there is a constant δ > 0 such that f (x) ≥ δ > 0 for all x ∈ [a, b], then 1/ f ∈ R[a, b]. √ 2.11. Prove that f (x) = x is uniformly continuous on [0, ∞) by choosing δ = 2−2 ε2 . 2.12. Prove that if f is uniformly continuous on (a, b) then f is bounded there. 2.13. Prove that f (x) = cos(x−1 ) is not uniformly continuous on (0, 1]. 2.14. Prove that if f (x) is the linear function an x + bn on [n − 1, n) for n = 1, 2, ..., k then f ∈ R[0, k − 2−1 ]. 2.15. Let f (x) =



x−1 sinx , x 6= 0 . 3 , x=0

Prove f ∈ R[0, 5]. 2.16. Prove that if f ∈ R[a, b] and F(x) =

Rx a

f (t) dt then F ∈ C[a, b].

2.17. Prove that if f ∈ C[a, b] with f > 0 and F(x) = ax f (t) dt then F is strictly increasing on [a, b]; that is, x1 < x2 ⇒ F(x1 ) < F(x2 ). 2.18. Let f (x) =



R

sinx−1 , x 6= 0 . 0 , x=0

Is f continuous on [0, π−1 ]? Is f in R[0, π−1 ]? Explain your answers.

55

Riemann Integrals 2.19. For what values of p is

R∞ p R1 p 1 x dx or 0+ x dx convergent?

2.20. In the sequel, determine whether the improper integral is convergent: −1 √ R dx; (i) 1∞ x x + 1 R∞

−x2 dx; −∞ xe

(ii)

(iii) (iv)

R π2 − 0

R∞

1+

tanx dx; −1 x(lnx)2 dx.

1 −1 ∞ 2.21. Prove that −1 x dx and −∞ 2x(x2 + 1)−1 dx are divergent, but their Cauchy principal values are finite:  Z −a  Z 1 Z a −1 −1 lim+ x dx + x dx = 0 and lim 2x(x2 + 1)−1 dx = 0. a→0

R

−1

R

a→∞ −a

a

2.22. For a, b > 0 let f (x) = e−a|x| and g(x) = e−b|x| . Evaluate f ∗ g. 2.23. Let f ∈ C[0, 1] and 2.24. Evaluate

R 1−

R1 0

lnu−1

f (x)xn dx = 0 for n = 0, 1, 2, .... Prove f = 0.

−2−1

du. √ 2.25. Prove Γ(n + 2−1 ) = (2n)! π(4n n!)−1 for any nonnegative integer n. 0+

2.26. If the domain of f includes [0, ∞), then the Pierre-Simon Laplace transform of f is defined by L( f )(x) =

Z ∞ 0+

e−xt f (t) dt.

Note that if f (t) is bounded as t → 0+, then L( f )(x) is the improper integral of e−xt f (t) on [0, ∞), and if f (t) is unbounded as t → 0+ , then we write L( f )(x) =

Z b

0+

e−xt f (t) dt +

Z ∞ b

e−xt f (t) dt.

Prove the following results for a, b ∈ R:

(i) L(e−x ) = (x + 1)−1 for x > −1 and L(x) = x−2 for x > 0; (ii) L(a f + bg) = aL( f ) + bL(g);

 (iii) If L( f )(x) is defined for x > b, then L eax f (x) = L( f )(x − a) for x > a + b;

(iv) If f has a continuous derivative on [0, ∞) and if limt→∞ e−xt f (t) = 0 and L( f )(x) exists when x > a, then L( f 0 )(x) = xL( f )(x) − f (0) for x > a; (v) L(xeax ) = (x − a)−2 and L(eax + axeax ) = x(x − a)−2 for x > a.

Chapter 3

Riemann-Stieltjes Integrals In this chapter, we work with the so-called Riemann-Stieltjes integration. This integration is Thomas Joannes Stieltjes’ generalization of the Riemann integration. A particularly convenient feature of the Riemann-Stieltjes integration is that it can be employed to represent discrete sums.

3.1 Functions of Bounded Variation Recall that a partition P of a closed interval [a, b] of R is such a set {xk }nk=0 that a = x0 < x1 < · · · < xn = b. Let f be a function on [a, b] and consider the sum n

S( f , P) =

∑ | f (xk ) − f (xk−1)|.

k=1

We wish to handle the set of all values of S( f , P), where P ranges over all partitions of [a, b]. This set of values may be bounded above or unbounded above, and we write the supremum of this set of values as supP S( f , P). Definition 3.1.1. The function f is said to have bounded variation on [a, b] provided that supP S( f , P) is finite. In this case Vab f is used as supP S( f , P), the variation of f on [a, b]. Moreover, the collection of all functions having bounded variation on [a, b] is written as BV [a, b]. Example 3.1.2. If f is a monotonic function on [a, b], then S( f , P) = | f (b) − f (a)| for any partition P of [a, b]. Moreover, if   0 , x 6= 1 0, x≤1 and g(x) = , f (x) = 1, x>1 1, x=1

58

Jie Xiao

then V02 f = 1 and V02 g = 2. The first theorem on this topic is a simple, sufficient condition to ensure that a function has bounded variation. Theorem 3.1.3. If f has bounded derivative on [a, b], then f ∈ BV [a, b]. Conversely, if f ∈ BV [a, b], then f is bounded on [a, b]. Proof. First, let f 0 be bounded on [a, b] and P be any partition P = {xk }nk=0 of [a, b]. Then f is differentiable on the kth subinterval [xk−1 , xk ], and hence by the mean value theorem for derivatives there exists a number ξk ∈ [xk−1, xk ] such that f 0 (ξk )∆xk = f (xk ) − f (xk−1 ). Using this, we get n

S( f , P) =

∑ | f 0(ξk )|∆xk ≤

k=1

sup | f 0 (x)|(b − a).

x∈[a,b]

Hence Vab f ≤ sup | f 0(x)|(b − a). x∈[a,b]

Conversely, suppose f ∈ BV [a, b]. If x ∈ [a, b], let P be the simple partition consisting of {a, x, b}. Then S( f , P) = | f (x) − f (a)| + | f (b) − f (x)| ≤ Vab f . Therefore, | f (x)| ≤ | f (a)| +Vab f , i.e., f is bounded on [a, b]. Of course, the finiteness of Vab f does not imply that f has bounded derivative. Next we see that differentiability does not imply bounded variation. Example 3.1.4. Let f (x) =



x2 sin 2xπ2 , x 6= 0 0 , x = 0.

/ BV [0, 1] although f is differentiable on [0, 1]. Then f ∈ To see V01 f = ∞, for n ∈ N we choose a partition P = {x j }nj=0 where x0 = 0, x1 = √

1 1 1 , ..., xn−1 = √ , xn = 1. , ..., xk = p 2n − 1 3 2(n − k) + 1

59

Riemann-Stieltjes Integrals This choice gives | f (xn−k) − f (xn−k+1 )| =

1 1 2 + > . 2k + 1 2k − 1 2k + 1

−1 is divergent. So it follows that by letting n Note that the series ∑∞ k=1 (2k + 1) be large enough, there will be enough terms in S( f , P) so that the corresponding partial sum exceeds any given big number. To establish the differentiability of f , it suffices to consider this at x = 0. But π π f 0 (0) = lim x sin 2 = 0 since x sin 2 ≤ |x|. x→0 2x 2x

Remark 3.1.5. Suppose that f ∈ BV [a, b] and x ∈ [a, b]. It is easy to show that f ∈ BV [a, x] since any partition P of [a, x] can be extended to a partition P∗ of [a, b] by adjoining the one additional point xn+1 = b. Clearly, S( f , P) ≤ S( f , P∗) ≤ Vab f . Thus we can define the variation function v f by v f (x) = Vax f for x ∈ [a, b]. Lemma 3.1.6. Let f ∈ BV [a, b]. Then: y

y

(i) Va f = Vax f +Vx f for any a ≤ x < y ≤ b;

(ii) v f and v f − f are increasing on [a, b].

y

y

Proof. (i) Let a ≤ x < y ≤ b. We first show Va f ≥ Vax f +Vx f . In fact, if P1 and P2 are any partitions of [a, x] and [x, y] respectively, then P1 ∪ P2 is a partition of [a, y], and S( f , P1 ) + S( f , P2 ) =

∑ | f (xk ) − f (xk−1)| + ∑ | f (xk ) − f (xk−1)| xk >x

xk ≤x

=



xk ∈P1 ∪P2

| f (xk ) − f (xk−1 )|

= S( f , P1 ∪ P2 )

≤ sup{S( f , P) : all partitions P of [a, y]}. Accordingly, sup S( f , P1 ) + sup S( f , P2 ) ≤ Vay f ; P1

that

is, Vax f

y +Vx f

y ≤ Va f .

P2

60

Jie Xiao

Secondly, we prove the inverse inequality. To this end, let P be a partition of [a, y] and P? the partition obtained by adding the point x to P. Also, let P1 and P2 be the partitions of [a, x] and [x, y] respectively, induced by P? . Then S( f , P) ≤ S( f , P? ) = S( f , P1 ) + S( f , P2 ) ≤ Vax f +Vxy f . y

y

Taking the supremum over all partitions P yields Vax f + Vx f ≥ Va f . So, (i) follows. (ii) That v f is increasing on [a, b] is clear from (i). Next, we prove that v f − f is increasing there. If a ≤ x ≤ y ≤ b, then   (i) ⇒ Vay f − f (y) − Vax f − f (x) = Vxy f − f (y) − f (x) ≥ 0.

The reason why the right-hand number is nonnegative is that | f (y) − f (x)| is precisely S( f , P) when P is the trivial partition {x, y}. So, we conclude v f (x) − f (x) ≤ v f (y) − f (y), as desired.

Theorem 3.1.7. f ∈ BV [a, b] if and only if there exist increasing functions f 1 and f 2 such that f = f 1 − f 2 on [a, b]. Proof. Suppose that f = f 1 − f 2 on [a, b], where f 1 and f 2 are increasing. Then f 1 , f 2 ∈ BV [a, b], so f 1 − f 2 ∈ BV [a, b]. Conversely, if f ∈ BV [a, b], then Lemma 3.1.6 shows that v f and v f − f are increasing functions on [a, b], and it is obvious that f = v f − (v f − f ). It is also the case that if f is a continuous function of bounded variation, then f can be written as the difference of two increasing continuous functions. From the proof of Theorem 3.1.7, this statement would follow immediately if the variation function v f were continuous when f is continuous. This is precisely the content of the next theorem. Theorem 3.1.8. Let f ∈ BV [a, b]. If f is continuous at x0 ∈ [a, b], then v f is continuous at x0 . Proof. If ε > 0 and x0 < b, then there is a partition P of [x0 , b] such that Vxb0 f < S( f , P) + 2−1ε.

Riemann-Stieltjes Integrals

61

Since f is continuous at x0 , we may add a point x1 to P to obtain a partition P? = {x0 , x1 , ..., xn} of [x0 , b] such that | f (x1 ) − f (x0 )| < 2−1 ε. Then n

S( f , P? ) = | f (x1 ) − f (x0 )| + ∑ | f (x j ) − f (x j−1 )| < 2−1 ε +Vxb1 f j=2

implies Vxb0 f < S( f , P) + 2−1 ε ≤ S( f , P?) + 2−1 ε < ε +Vxb1 f . Accordingly, 0 ≤ v f (x1 ) − v f (x0 ) < ε holds due to Lemma 3.1.6 (ii). Since Lemma 3.1.6 (ii) indicates also that v f is increasing, the limit limx→x+ v f (x) 0 exists and by the last inequality, must equal v f (x0 ). Similarly, if a < x0 , then limx→x− v f (x) = v f (x0 ), and hence v f is continuous 0 at x0 .

3.2 Definition and Basic Properties We start by defining the Riemann-Stieltjes integral on a finite closed interval. Definition 3.2.1. Suppose that f and g are real-valued functions defined and bounded on [a, b]. Let P = {xk }nk=0 be a partition of [a, b] and {ξk }nk=1 be such that each ξk is in the k-th subinterval associated with P; that is, xk−1 ≤ ξk ≤ xk . Then (i)

n

∑ f (ξk )

k=1

g(xk ) − g(xk−1 )



is called a Riemann-Stieltjes sum for f with respect to g. (ii) f is called Riemann-Stieltjes integrable with respect to g on [a, b] provided that there is a number J such that if ε > 0 then there is a δ > 0 such that n  kPk < δ ⇒ J − ∑ f (ξk ) g(xk ) − g(xk−1 ) < ε k=1

regardless of the choice of ξk ∈ [xk−1 , xk ]. In this case J is called the RiemannR Stieltjes integral of f with respect to g, and denoted by ab f (x) dg(x). Also, f is called the integrand and g is called the integrator. The class of all functions Riemann-Stieltjes integrable on [a, b] with respect to g is written as RSg [a, b].

62

Jie Xiao

Note that if g(x) = x for x ∈ [a, b] then the above definition goes back to the Riemann setting. However, the Riemann-Stieltjes integrals are different from the Riemann integrals. Example 3.2.2. Suppose that f ∈ C[a, b] and  0 , x ∈ [a,t] g(x) = . 1 , x ∈ (t, b] Given P = {xk }nk=0 , the point of discontinuity t is in either one or two subintervals: either xm−1 < t < xm , or else t = xm , which gives [xm−1 ,t] and [t, xm+1] as subintervals corresponding to P. All the terms of the Riemann-Stieltjes sum that do not involve these subintervals are 0, so in the first case the sum reduces to f (ξm ) and in the second case the sum reduces to f (ξm ) · 0 + f (ξm+1) = f (ξm+1 ). As kPk → 0, both ξm and ξm+1 approach t; therefore the continuity of f implies that the limit of the Riemann-Stieltjes sums exists and equals f (t). Hence Rb a f (x) dg(x) = f (t). The forthcoming theorem gives the connection between the RiemannStieltjes integrals and the Riemann integrals suggested by the use of the differential notation dg: Theorem 3.2.3. Let f ∈ R[a, b] and g0 ∈ C[a, b]. Then f ∈ RSg [a, b] and

Z b a

f (x) dg(x) =

Z b a

f (x)g0(x) dx.

Proof. Given any partition P = {xk }nk=0 of [a, b], g is differentiable on the kth subinterval [xk−1 , xk ], so by the mean value theorem for derivatives there exists a number ck ∈ [xk−1, xk ] such that g0 (ck )∆xk = g(xk ) − g(xk−1 ). Using this, we get n

∑ f (ξk )

k=1

 g(xk ) − g(xk−1) =

n

n

k=1

k=1

∑ f (ξk )g0(ξk )∆xk + ∑ f (ξk )

 g0 (ck ) − g0 (ξk ) ∆xk .

The third sum in the last equation is a Riemann sum for f g0 ∈ R[a, b]. Then, as R kPk → 0 this sum approaches ab f (x)g0(x) dx. Hence we can complete our proof by showing that the second sum of the last equation approaches 0 as kPk → 0.

63

Riemann-Stieltjes Integrals

To achieve this goal we may assume M = supx∈[a,b] | f (x)| 6= 0 and refer to the uniform continuity of g0 – for ε > 0 choose δ > 0 so that |ξ − c| < δ ⇒ |g0(c) − g0 (ξ)| < ε(b − a)−1 M−1 . Now, kPk < δ ⇒ |ck − ξk | < δ, and consequently, n  ∑ f (ξk ) g0 (ck ) − g0 (ξk ) ∆xk ≤ k=1

n

∑ | f (ξk )||g0(ck ) − g0 (ξk )|∆xk

k=1

n

< (b − a)−1 M−1 ε max | f (ξk )| ∑ ∆xk 1≤k≤n

k=1

≤ ε. Thus, as kPk → 0, the Riemann-Stieltjes sums approach

Rb a

f (x)g0(x) dx.

The algebraic closure of the Riemann-Stieltjes integrals is included in the formulas (i) below. Theorem 3.2.4. Let f 1 , f 2 ∈ RSg1 [a, b] ∩ RSg2 [a, b].

(i) If c1 , c2 ∈ R, then (R  Rb Rb b a c1 f 1 (x) + c2 f 2 (x) dg1 (x) = c1 a f 1 (x) dg1 (x) + c2 a f 2 (x) dg1 (x);  Rb Rb Rb a f 1 (x) d c1 g1 (x) + c2 g2 (x) = c1 a f 1 (x) dg1 (x) + c2 a f 1 (x) dg2 (x). (iii) If f 1 ≤ f 2 and g is increasing on [a, b], then Z b a

f 1 (x) dg(x) ≤

Z b a

f 2 (x) dg(x).

Proof. This is left as an exercise. The foregoing result is based on the condition: a ≤ b. In order to consider R an integral such as ba f (x) dg(x), where a < b, we adopt the familiar convention Z a b

f (x) dg(x) = −

Z b a

f (x) dg(x).

64

Jie Xiao

Theorem 3.2.5. Let f ∈ RSg [a, b] ∩ RSg [b, c] ∩ RSg [a, c]. Then Z c

f (x) dg(x) =

a

Z b

f (x) dg(x) +

a

Z c

f (x) dg(x).

b

Proof. It suffices to prove the formula holds under the assumption a < b < c. Given ε > 0, choose δ > 0 so that if P is a partition of [a, b] or [b, c] or [a, c] satisfying kPk < δ, then any choice of ξk ’s in those subintervals induced by P yields a Riemann-Stieltjes sum, written as ∑ba (· · ·) or ∑cb (· · ·) or ∑ca (· · ·) which R R is within 3−1 ε of the Riemann-Stieltjes integral, denoted by ab (· · ·) or bc (· · ·) Rc or a (· · ·). So, Z c Z b Z c b c c kPk < δ ⇒ (· · ·) − ∑(· · ·) + (· · ·) − ∑(· · ·) + (· · ·) − ∑(· · ·) < 3−1 ε. a a b a

a

b

Since a < b < c, the sum ∑ba (· · ·) + ∑cb (· · ·) becomes the Riemann-Stieltjes sum ∑ca (· · ·) for [a, c]. Thus we can write Z Z c Z b  c (· · ·) − (· · ·) + (· · ·) a a b  Z c   Z c  Z b c b c (· · ·) − ∑(· · ·) (· · ·) − ∑(· · ·) − = (· · ·) − ∑(· · ·) − a

a

a

a

b

b

Z c Z b Z c c b c ≤ (· · ·) − ∑(· · ·) + (· · ·) − ∑(· · ·) + (· · ·) − ∑(· · ·) a a b a

a

b

< 3−1 ε + 3−1 ε + 3−1 ε = ε.

Namely, Z c Z c Z b  a f (x) dg(x) − a f (x) dg(x) + b f (x) dg(x) < ε.

Consequently, the desired formula follows.

It is worth pointing out that in the above theorem we assume that all three Riemann integral integrals exist and then conclude that the formula holds. In the Rb Rc analogue, it was sufficient to assume only the existence of (· · ·) and a b (· · ·) R which implies the existence of ac(· · ·). That implication is not valid for the Riemann-Stieltjes integral, as we see in the following example:

Riemann-Stieltjes Integrals

65

Example 3.2.6. If f (x) =



0 , x ∈ [0, 1] 1 , x ∈ (1, 2]

and g(x) =



0 , x ∈ [0, 1) , 1 , x ∈ [1, 2]

then 01 f (x) dg(x) = 0 and 12 f (x) dg(x) = 0. But 02 f (x) dg(x) does not exist since f and g are discontinuous at 1; see the forthcoming Theorem 3.3.2. R

R

R

3.3 Nonexistence and Existence for Integrals We now proceed to establish some results on the nonexistence and existence of Riemann-Stieltjes integrals for special functions such as discontinuous functions, step functions and continuous functions. Example 3.3.1. If f (x) = g(x) =



0 , x ∈ [0, 1] , 1 , x ∈ (1, 2]

then f ∈ / RSg [0, 2]. In fact, if P is a partition of [0, 2] having an arbitrarily small norm and xm−1 = 1 ∈ P, then g(xm ) − g(xm−1 ) = 1, and hence  n  0, ξ =1 ∑ f (ξk ) g(xk ) − g(xk−1) = 1 , ξmm > 1 . k=1 Examples 3.2.6 and 3.3.1 illustrate a general non-existence result as follows. Theorem 3.3.2. If f , g : [a, b] → R are both discontinuous at c ∈ [a, b], then f∈ / RSg [a, b]. Proof. Two cases will be taken into account. First suppose that limx→c g(x) does not exist. Then there is an εg > 0 such that for any δ > 0 we can choose xm−1 , xm ∈ [a, b] satisfying xm−1 < c < xm ; xm − xm−1 < δ; |g(xm ) − g(xm−1 )| ≥ εg . Now let P be a partition of [a, b] with kPk < δ whose mth subinterval is [xm−1 , xm ]. The discontinuity of f at c implies the existence of an ε f > 0 and ξm,1 , ξm,2 ∈ [xm−1 , xm ] such that | f (ξm,1 ) − f (ξm,2 )| ≥ ε f .

66

Jie Xiao

If we choose ξk,1 = ξk,2 for k 6= m, then the corresponding Riemann-Stieltjes sums differ by at least ε f εg . Since kPk is arbitrarily small, this shows that the Riemann-Stieltjes sums cannot approach a finite number as kPk → 0. Next, we consider the case in which limx→c g(x) exists but does not equal g(c), say, |g(c) − lim g(x)| = εg > 0. x→c

For a given δ > 0, choose a partition P of [a, b] such that kPk < δ, xm = c, and either |g(c) − g(xm+1 )| ≥ 2−1 εg or |g(c) − g(xm−1 )| ≥ 2−1 εg . The discontinuity of f at c implies the existence of an ε f > 0 such that either [xm−1 , xm ] or [xm , xm+1 ] contains points ξm,1 and ξm,2 such that | f (ξm,1 ) − f (ξm,2 )| ≥ ε f . As above, this produces two Riemann-Stieltjes sums for P that differ by at least 2−1 ε f εg , so the sums do not approach a finite number as kPk → 0. Motivated by Examples 3.2.6 and 3.3.1, we consider integrators as step functions. These are functions which are constant on an interval except for a finite number of jump discontinuities. Definition 3.3.3. Let g be defined on [a, b] in such a way that g is discontinuous at a finite number of points {ck }lk=1, where l ∈ N and a ≤ c1 < c2 < · · · < cl ≤ b. If g is a constant λk on each subinterval Ik with endpoints ck−1 and ck , then g is − called a step function and the number g(c+ k ) − g(ck ) is called the jump of g at ck , and hence g = ∑lk=1 λk 1Ik where  1 , x ∈ Ik 1Ik (x) = . 0, x∈ / Ik If c1 = a and cl = b, then the jumps of g at c1 and cl are g(c+ 1 ) − g(c1 ) and − g(cl ) − g(cl ) respectively. Here for each k = 1, 2, ..., n, g(c+ k)=

lim

0x−ck→0

g(x).

Step functions not only have bounded variation but also can be used to represent Riemann-Stieltjes integrals as finite sums.

67

Riemann-Stieltjes Integrals

Theorem 3.3.4. Let g be a step function on [a, b] with jumps dk at ck where p ∈ N and a ≤ c1 < c2 < · · · < cl ≤ b. If f : [a, b] → R is continuous at each ck . Then f ∈ RSg [a, b] and Z b

l

f (x) dg(x) =

a

∑ f (ck ) dk .

k=1

Proof. Let δ = min1≤k≤l (ck −ck−1 ). It is clear that for any partition P = {xk }nk=0 satisfying kPk < δ, each interval Ik = [xk−1 , xk ] can contain at most one of the points {ck }lk=1. If Ik contains no c j then g(xk ) − g(xk−1 ) = 0. If Ik contains the point c j in its interior then g(xk ) − g(xk−1 ) = d j . Finally if the point c j is common to [xk−1 , xk ] and [xk , xk+1 ]; that is, if c j = xk , then g(xk) − g(xk−1 ) + g(xk+1 ) − g(xk ) = d j . Let Ik1 , Ik2 , ..., Ikr (k1 < · · · < kr ) denote those intervals which contain a c j . Then n



k=1

 f (ξk ) g(xk ) − g(xk−1 ) =

r



m=1

 f (ξkm ) g(xkm ) − g(xkm−1 ) .

If c j is in the interior of Ikm , then the continuity of f at c j implies that  f (ξkm ) g(xkm ) − g(xkm−1 ) → f (c j ) d j as kPk → 0.

If c j is common to Ikm and Ikm+1 , then the continuity of f at c j implies that   f (ξkm ) g(xkm ) − g(xkm−1 ) + f (ξkm+1 ) g(xkm+1 ) − g(xkm ) → f (c j ) d j as kPk → 0.

Hence

n

∑ kPk→0 lim

k=1

 f (ξk ) g(xk ) − g(xk−1 ) = lim

r

∑ kPk→0

m=1

f (ξkm ) g(xkm ) − g(xkm−1 )

l

=



∑ f (ck ) dk, k=1

establishing the desired formula. Remark 3.3.5. If in Theorem 3.3.4 one has [a, b] = [1, n] for any n − 1 ∈ N and g(x) = [x] – the greatest integer which is less than or equal to x, then for any real

68

Jie Xiao

numbers a1 , a2 , ..., an one can choose a function f ∈ C[1, n] such that f (k) = ak for each k and hence by Theorem 3.3.4, n

n

∑ ak = ∑ f (k) = k=1

k=1

Z n

f (x) d[x].

0

Namely, a finite sum can be written as a Riemann-Stieltjes integral. In general, we have the following existence theorem for Riemann-Stieltjes integrals regarding continuous functions. Theorem 3.3.6. Let f ∈ C[a, b] and g ∈ BV [a, b]. Then f ∈ RSg [a, b]. Proof. By Theorem 3.1.7, we know that g = g1 − g2 , where g1 , g2 are inR creasing. If we prove that ab f (x) dg1 (x) exists for the increasing g1 , then Rb a f (x) dg2 (x) exists also for the increasing g2 . This implies the existence of Z b a

 Z f (x) d g1 (x) − g2 (x) =

b

f (x) dg(x).

a

For the sake of simplifying notation, we write g in place of g1 . For any partition P = {xk }nk=0, let   Ik = [xk−1 , xk ]; mk = inf{ f (x) : x ∈ Ik };   Mk = sup{ f (x) : x ∈ Ik }.

We also define the upper and lower sums of f with respect to g by n

U( f , P; g) =

n   M g(x )−g(x ) and L( f , P; g) = k−1 ∑ k k ∑ mk g(xk )−g(xk−1) .

k=1

k=1

If ξk ∈ Ik , then mk ≤ f (ξk ) ≤ Mk , and hence n

L( f , P; g) ≤

∑ f (ξk )

k=1

 g(xk ) − g(xk−1 ) ≤ U( f , P; g).

If P and P∗ are two partitions of [a, b], then P ∪ P∗ is a refinement of both. It is easy to verify that refining a partition increases its lower sums and decreases its upper sums, so it follows that L( f , P; g) ≤ L( f , P ∪ P∗ ; g) ≤ U( f , P ∪ P∗ ; g) ≤ U( f , P∗; g);

69

Riemann-Stieltjes Integrals

therefore any lower sum is less than or equal to any upper sum. It follows that the least upper bound of the set of all lower sums cannot exceed the greatest lower bound of the set of all upper sums: L( f ; g) = sup L( f , P; g) ≤ inf U( f , P; g) = U( f ; g). P

P

We can prove that ab f (x) dg(x) exists if and only if for any ε > 0 there is a δ > 0 such that kPk < δ ⇒ U( f , P; g) − L( f , P; g) < ε, R

or equivalently, for any ε > 0 there exists a partition P of [a, b] such that U( f , P; g) − L( f , P; g) < ε. In fact, the proof is virtually identical to the proof of Theorem 2.2.5, but with g(xk ) − g(xk−1 ) replacing ∆xk = xk − xk−1 . Because any Riemann-Stieltjes sum (using P) is between L( f , P; g) and U( f , P; g), and L( f ; g) is also between L( f , P; g) and U( f , P; g), it follows that any such Riemann-Stieltjes sum must also be within ε of L( f ; g). Thus we establish that the Riemann-Stieltjes sums converge to L( f ; g) as kPk → 0. By the uniform continuity of f ∈ C[a, b], we choose δ > 0 so that for any x, y ∈ [a, b], |x − y| < δ ⇒ | f (x) − f (y)| < 2−1 ε g(b) − g(a)

−1

.

Here, we assume naturally g(a) 6= g(b), for otherwise the monotonic function g would be a constant and the conclusion is trivial. If kPk < δ, then for k = 1, ..., n, Mk − mk < ε g(b) − g(a) Therefore

−1

.

n

U( f , P; g) − L( f , P; g) =

∑ (Mk − mk )

k=1

< ε g(b) − g(a) < ε.

g(xk ) − g(xk−1 ) −1

n



k=1



g(xk ) − g(xk−1 )



The foregoing theorem can be used to handle composition with on RiemannStieltjes integrable functions.

70

Jie Xiao

Theorem 3.3.7. Let g be increasing on [a, b] and f ∈ RSg [a, b] with the range of f contained in the closed interval [c, d]. If ψ ∈ C[c, d] then ψ ◦ f ∈ RSg [a, b]. Proof. Since ψ ∈ C[c, d], it follows that M = 1 + sup |ψ(x)| < ∞, x∈[c,d]

ψ is uniformly continuous on [c, d], and consequently, for any ε > 0 there is a   ε δ ∈ 0, g(b) − g(a) + 2M such that y1 , y2 ∈ [c, d] and |y1 − y2 | < δ ⇒ |ψ(y1 ) − ψ(y2 )|
0 there are a partition P = {x j }nj=0 of [a, b] and the lower and upper Riemann-Stieltjes sums L( f , P; g) and U( f , P; g) (see also the argument for Theorem 3.3.6) such that U( f , P; g) − L( f , P; g) < k−1 ε. Consequently,  (2k)−1 ε > (M j − m j ) g(x j ) − g(x j−1 ) ∑ (x j−1 ,x j )∩Dk ( f )6=0/

 (M j − m j ) g(x j ) − g(x j−1 ) .



+

(x j−1 ,x j )∩Dk ( f )=0/

Note that every point of Dk ( f ) belongs to P or satisfies (x j−1 , x j ) ∩ Dk ( f ) 6= 0/ for some j ∈ {1, ..., n}. Accordingly, (2k)−1 ε >



(M j − m j ) g(x j ) − g(x j−1 )

(x j−1 ,x j )∩Dk ( f )6=0/

≥ k−1



(x j−1 ,x j )∩Dk ( f )6=0/

 g(x j ) − g(x j−1 ) ,



73

Riemann-Stieltjes Integrals which means



(x j−1,x j )∩Dk ( f )6=0/

 g(x j ) − g(x j−1 ) < 2−1 ε.

Since g is uniformly continuous on [a − 1, b + 1], there exists a δ > 0 such that  sup g(x j + δ) − g(x j − δ) < 2−1 ε(1 + n)−1 . 0≤ j≤n

But then

Dk ( f ) ⊆



[

(x j−1 , x j )

(x j−1,x j )∩Dk ( f )6=0/



n [

(x j − δ, x j + δ)

j=0



and



(x j−1 ,x j )∩Dk ( f )6=0/

 n  g(x j ) − g(x j−1 ) + ∑ g(x j + δ) − g(x j − δ) < ε. j=0

In other words, Dk ( f ) is a g-null set. On the other hand, suppose that D ( f ) is a g-null set. We are about to prove f ∈ RSg [a, b]. Thanks to Dk ( f ) ⊆ D ( f ) for each k ∈ N, we can readily derive that each Dk ( f ) is a g-null set too. Consequently, we can find open intervals {(ak, j , bk, j )}∞j=1 such that

Dk ( f ) ⊆ ∪∞j=1 (ak, j, bk, j ) and





j=1

 g(bk, j) − g(ak, j ) < ε(4M)−1 .

Note that each Dk ( f ) is closed and bounded. So by the Heine-Borel property, 0 of {(ak, j , bk, j )}∞j=1 such that Dk ( f ) ⊆ we get a finite subset {(ak, jl , bk, jl )}nl=1 n0 n0 ∪l=1 (ak, jl , bk, jl ). Thus, [a, b] \ ∪l=1 (ak, jl , bk, jl ) is a disjoint union of finitely 0 many intervals {[ck,l , dk,l ]}ll=1 . Hence   0 0 [a, b] = ∪ll=1 [ck,l , dk,l ] ∪ ∪nl=1 (ak, jl , bk, jl ) . Since ω f (x) ≥ k−1 whenever x ∈ Dk ( f ), we find that ω f (x) < k−1 whenever 0 x ∈ ∪ll=1 [ck,l , dk,l ]. Now the step functions G and H are defined by G(x) =



inft∈[ck,l ,dk,l ] f (t) , x ∈ [ck,l , dk,l ], l = 1, ..., l0 , inft∈(a j ,b j ) f (t) , x ∈ (ak, jl , bk, jl ), l = 1, ..., n0 l

l

74

Jie Xiao

and H(x) =

(

supt∈[ck,l ,dk,l ] f (t) , x ∈ [ck,l , dk,l ], l = 1, ..., l0 . supt∈(a j ,b j ) f (t) , x ∈ (ak, jl , bk, jl ), l = 1, ..., n0 l

l

Then, for any ε > 0 there is an n˘ 0 ∈ N such that k > n˘ 0 implies Z b a

 H(x) − G(x) dg(x)

l0 n˘ 0   ≤ k−1 ∑ g(dk,l ) − g(ck,l ) + 2M ∑ g(bk, jl ) − g(ak, jl ) l=1

≤ k

−1

< ε.

 g(b) − g(a) + 2−1 ε

l=1

Hence f ∈ RSg [a, b], due to the argument for Theorem 3.3.6.

3.4 Evaluations of Integrals With the help of the previous discussion on the nonexistence and existence of Riemann-Stieltjes integrals, we can introduce some methods and techniques to evaluate Riemann-Stieltjes integrals. Firstly, we compute each Riemann-Stieltjes integral by parts. Theorem 3.4.1. If f ∈ RSg [a, b], then g ∈ RS f [a, b] and Z b

f (x) dg(x) +

a

Z b a

g(x) d f (x) = f (b)g(b) − f (a)g(a).

Proof. Any Riemann-Stieltjes sum for g with respect to f can be expanded and rewritten in the following way: n

∑ g(ξk )

k=1

f (xk ) − f (xk−1 )



  = g(ξ1 ) f (x1 ) − f (a) + ... + g(ξn ) f (b) − f (xn−1 )

n−1  = − f (a)g(a) − ∑ f (xk ) g(ξk+1 ) − g(ξk ) + f (b)g(b), k=1

where we have introduced ξ0 = a and ξn = b. Note that the last sum is a Riemann-Stieltjes sum for f with respect to g on the partition {ξk }nk=0 with xk

75

Riemann-Stieltjes Integrals

belonging to [ξk , ξk+1 ]. So this summation identity ensures that if the RiemannR Stieltjes sums for ab f (x) dg(x) converge as kPk → 0, then the RiemannR Stieltjes sums for ab g(x) d f (x) also converge, and their limit values are related by the desired equation. Example 3.4.2. Integrating by parts, we get Z 2

−1

Z 2 xd|x| = x|x| −1 −

2 −1

|x|dx = 5 · 2−1 .

Secondly, closely related to the integration by parts formula are the following first and second mean value theorem for Riemann-Stieltjes integrals. Theorem 3.4.3. (i) If f ∈ C[a, b] and g is increasing on [a, b], then there exists a point ξ ∈ [a, b] such that Z b  f (x) dg(x) = f (ξ) g(b) − g(a) . a

(ii) If f is monotonic on [a, b] and g ∈ C[a, b], then there is a point ξ ∈ [a, b] such that Z Z Z ξ

b

f (x) dg(x) = f (a)

a

b

dg(x) + f (b)

dg(x).

ξ

a

Proof. (i) Assuming m f = min f (x) and M f = max f (x), x∈[a,b]

x∈[a,b]

we get by Theorem 3.2.4 (iii) and Theorem 3.3.6 mf

Z b a

dg(x) ≤

Z b a

f (x) dg(x) ≤ M f

If g(b) − g(a) =

Z b

dg(x) = 0

a

then g must be constant on [a, b], and hence Z b a

Z b

f (x) dg(x) = 0.

a

dg(x).

76

Jie Xiao

In this case, ξ may be chosen to be any number in [a, b]. On the other hand, if g(b) − g(a) > 0 then by the condition f ∈ C[a, b] we can find a point ξ ∈ [a, b] such that  Z b f (ξ) g(b) − g(a) = f (x) dg(x), a

thanks to the intermediate value theorem. Thus, the desired result follows. (ii) Without loss of generality, we may assume that f is increasing. By Theorem 3.4.1, we have Z b a

f (x) dg(x) = f (b)g(b) − f (a)g(a) −

An application of (i) to

Rb a

Z b a

Consequently, Z b

Z b

g(x) d f (x).

a

g(x) d f (x) produces a point ξ ∈ [a, b] such that

 g(x) d f (x) = g(ξ) f (b) − f (a) .

f (x) dg(x) =

a

=

  f (a) g(ξ) − g(a) + f (b) g(b) − g(ξ) f (a)

Z ξ

dg(x) + f (b)

a

Z b

dg(x),

ξ

as desired. Corollary 3.4.4. (i) If f is monotonic on [a, b] and h ∈ C[a, b], then there exists a point ξ ∈ [a, b] such that Z b

f (x)h(x) dx = f (a)

a

Z ξ

h(x) dx + f (b)

a

Z b

h(x) dx.

ξ

(ii) If f is nonnegative and increasing on [a, b] and h ∈ C[a, b], then there exists a point ξ ∈ [a, b] such that Z b a

f (x)h(x) dx = f (b)

Z b ξ

h(x) dx.

77

Riemann-Stieltjes Integrals

(iii) If f is nonnegative and decreasing on [a, b] and h ∈ C[a, b], then there exists a point ξ ∈ [a, b] such that Z b

f (x)h(x) dx = f (a)

a

Z ξ

h(x) dx.

a

Proof. (i) Applying Theorems 3.4.3 (ii) and 3.2.3 to g(x) = ax h(t) dt, we reach the desired assertion. (ii) and (iii) These can be verified straightforwardly by changing f (a) and f (b) of (i) into 0 respectively. Of course, we have used the fact that this change R does not affect the Riemann integral ab f (x)h(x) dx; see also Theorem 2.1.10. R

As well-known, Corollary 3.4.4 (ii) is Pierre Ossian Bonnet’s form of the second mean value theorem for Riemann-Stieltjes integrals. Thirdly, we conclude this section with a useful result on the evaluation of Riemann-Stieltjes integrals – the change of variable formula. Theorem 3.4.5. Let φ ∈ C[a, b] be monotonic. If f ∈ RSg [φ(a), φ(b)], then f ◦ φ ∈ RSg◦φ[a, b] and Z b a

f ◦ φ(x)dg ◦ φ(x) =

Z φ(b) φ(a)

f (t) dg(t).

Proof. It is enough to handle the case that φ is increasing. Since φ ∈ C[a, b], φ maps [a, b] onto [φ(a), φ(b)]. Let P = {xk }nk=0 be an arbitrary partition of [a, b] and choose τk ∈ [xk−1, xk ] for 1 ≤ k ≤ n. Let φ(P) = {φ(xk)}nk=0. Then φ(P) is a partition of [φ(a), φ(b)] and φ(xk−1 ) ≤ φ(τk) ≤ φ(xk ) for 1 ≤ k ≤ n. We have n

∑ k=1

 f ◦ φ(τk) g ◦ φ(xk ) − g ◦ φ(xk−1 ) =

n

∑ k=1

f φ(τk )

   g φ(xk ) − g φ(xk−1) .

Note that φ ∈ C[a, b]. So it is uniformly continuous on [a, b]. It follows that

78

Jie Xiao

kφ(P)k → 0 as kPk → 0. We conclude Z b a

n

f ◦ φ(x) dg ◦ φ(x) = =

∑ f ◦ φ(τk ) kPk→0 lim

k=1 n

∑f kPk→0

φ(τk )

lim

k=1

=

Z φ(b) φ(a)

g ◦ φ(xk ) − g ◦ φ(xk−1 )





  g φ(xk ) − g φ(xk−1)

f (t) dg(t).

3.5 Improper Situations The previously-discussed Riemann-Stieltjes integrals are under the restrictions that both f and g are defined and bounded on a finite interval [a, b]. In this section, we will briefly extend the scope of integration by relaxing these restrictions, as we did in the section on the improper Riemann integrals. Definition 3.5.1. (i) Let f ∈ RSg [a, b] for every b ≥ a. Then a∞ f (x) dg(x) is said to be convergent R and equal to s ∈ R provided that s = limb→∞ ab f (x) dg(x) is finite; otherwise, R∞ a f (x) dg(x) is said to diverge. R

(ii) Let f be defined on (a, b] and f ∈ RSg [c, b] for any c ∈ (a, b]. Then Rb is said to be convergent and equal to s ∈ R provided that s = a+ f (x) dg(x) Rb R limc→a+ c f (x) dg(x) is finite; otherwise, ab+ f (x) dg(x) is said to diverge.

The improper Riemann-Stieltjes integrals share almost all the properties of the improper Riemann integrals; indeed, the proofs of most results concerning the former are virtually identical to the proofs of the corresponding results about the latter, but with dg(x) replacing dx. Below are some instances. Theorem 3.5.2. (i) Let g be Rincreasing on [a, ∞) and f ∈ RSg [a, b] for any b ≥ a. If f ≥ 0 on [a, ∞), then a∞ f (x) dg(x) converges when and only when there exists a constant R κ > 0 such that ab f (x) dg(x) ≤ κ for all b ≥ a.

79

Riemann-Stieltjes Integrals

(ii) Let g be increasing on (a, b] and f ∈ RSg [c, b] for any c ∈ (a, b]. If f ≥ 0 on R (a, b], then ab+ f (x) dg(x) converges when and only when there exists a constant R κ > 0 such that cb f (x) dg(x) ≤ κ for all c ≥ a. Proof. It suffices to prove (i). Clearly lim

Z b

b→∞ a

Rb a

f (x) dg(x) is increasing with b and so

f (x) dg(x) = sup

Z b

f (x) dg(x).

b∈[a,∞) a

The assertion follows from the above formula and the observation that a f (x) dg(x) ≤ κ whenever the integral converges.

R∞

Theorem 3.5.3. (i) Let g be increasing on [a, ∞) and f ∈ RSg [a, b] for any b ≥ a. If f 2 ≥ f 1 ≥ 0 on R R [a, ∞), then a∞ f 1 (x) dg(x) converges whenever a∞ f 2 (x) dg(x) converges, and hence Z ∞ Z ∞ f 1 (x) dg(x) ≤ f 2 (x) dg(x). a

a

(ii) Let g be increasing on (a, b] and f ∈ RSg [c, b] for any c ∈ (a, b]. If f 2 ≥ f 1 ≥ R R 0 on (a, b], then ab+ f 1 (x) dg(x) converges whenever ab+ f 2 (x) dg(x) converges, and hence Z Z b

a+

b

f 1 (x) dg(x) ≤

a+

f 2 (x) dg(x).

Proof. (i) follows from Z b a

f 1 (x) dg(x) ≤

Z b a

f 2 (x) dg(x) ≤

Z ∞ a

f 2 (x) dg(x).

Similarly, we can verify (ii). Theorem 3.5.4. (i) Let g be increasing on [a, ∞), f 1 , f 2 ∈ RSg [a, b] for any Rb ≥ a and f 1 , f 2 ≥ 0 on [a, ∞). If limx→∞ f 1 (x)/ f 2(x) = q ∈ (0, ∞), then a∞ f 1 (x) dg(x) and R∞ a f 2 (x) dg(x) both converge or both diverge.

(ii) Let g be increasing on (a, b], f 1 , f 2 ∈ RSg [c, b] for any c ∈ (a, b] and R f 1 , f 2 ≥ 0 on (a, b]. If limx→a+ f 1 (x)/ f 2(x) = q ∈ (0, ∞), then ab+ f 1 (x) dg(x) Rb and a+ f 2 (x) dg(x) both converge or both diverge.

80

Jie Xiao

Proof. To prove (i), we note that there is an n0 ∈ N such that 2−1 q f 2 (x) ≤ f 1 (x) ≤ 2q f 2(x) for all x ≥ n0 . The conclusion follows by applying Theorem 3.5.2 (i). Of course, the argument for (ii) is similar. Theorem 3.5.5. (i) Let g be increasing on [a, ∞). If f ∈ RSg [a, b] for any b ≥ a and R∞ R∞ a | f (x)|dg(x) converges, then a f (x) dg(x) also converges.

(ii) Let g be increasing on (a, b]. If f ∈ RSg [c, b] for any c ∈ (a, b] and Rb Rb a+ | f (x)|dg(x) converges, then a+ f (x) dg(x) also converges. Proof. This follows from the inequality 0 ≤ | f (x)| − f (x) ≤ 2| f (x)|.

As in the case of infinite series, we distinguish between absolutely and conditionally convergent integrals. Definition 3.5.6. Let g be increasing on [a, ∞) and f ∈ RSg [a, b] for every R b ≥ a. Then a∞ f (x) dg(x) is said to be absolutely convergent provided that R∞ R∞ exists. Also, a | f (x)|dg(x) a f (x) dg(x) is saidR to be conditionally convergent R∞ provided that a | f (x)|dg(x) does not exist, but a∞ f (x) dg(x) exists. For instance, 1∞ x−p sinx dx is absolutely convergent for p ∈ (1, ∞). Of course, there are examples of conditional convergence. To see this, we need the following Johann Peter Gustav Lejeune Dirichlet test which is an analog of the Dirichlet test for infinite series. R

Theorem 3.5.7. Let f be a nonnegative decreasing function on [a, ∞) with limx→∞ f (x) = 0. If g is bounded on [a, ∞) and f ∈ RSg [a, b] for every b ≥ a, R then a∞ f (x) dg(x) is convergent. Proof. Integrating by parts, we have Z b a

f (x) dg(x) = f (b)g(b) − f (a)g(a) +

Zb a

 g(x) d − f (x) .

Now since g is bounded, we find limb→∞ f (b)g(b) = 0.  Therefore, to complete R∞ the argument it suffices to prove that a g(x) d − f (x) is convergent. To do so,

81

Riemann-Stieltjes Integrals

 R we assume supx∈[a,∞) |g(x)| = M < ∞ and get that a∞ Md − f (x) is convergent due to Z b  lim M d − f (x) = M f (a). b→∞ a

Notice  that − f is increasing. So Theorem 3.5.3 (i) yields that f (x) is convergent.

R∞ a

|g(x)| d −

On the other hand, the following result provides a practical way to determine whether or not an improper integral is conditionally convergent. Theorem 3.5.8. Let f be continuous and φ decreasing on [a, ∞). If F(t) = Rt R∞ a f (x) dx is bounded on [a, ∞) and limx→∞ φ(x) = 0. Then a f (x)φ(x) dx converges. Proof. This result can be verified by Theorem 3.5.7. Nevertheless, we give a different argument as follows. By assumption, F is bounded on [a, ∞), i.e., M = supx∈[a,∞) |F(x)| < ∞. Thus Z t 2 f (x) dx = F(t2) − F (t1) ≤ 2M for a ≤ t1 < t2 < ∞. t1

By Corollary 3.4.4 (ii) – Bonnet’s form of the second mean value theorem, we have Z Z t2

t1

ζ

f (x)φ(x) dx = φ(t1)

f (x) dx

t1

for some ζ ∈ [t1,t2 ]. Hence Z t 2 t f (x)φ(x) dx ≤ 2Mφ(t1). 1

By assumption limx→∞ φ(x) = 0, so that for any ε > 0 there is t0 > 0 such that t1 ≥ t0 implies φ(t1 ) < ε. It follows that for any ε > 0 there is t0 > 0 such that t2 > t1 ≥ t0 implies Z t 2 ≤ 2Mε. f (x)φ(x) dx t 1

From Theorem 2.4.3 (iii) – Cauchy’s convergence criterion it follows that R∞ f (x)φ(x) dx is convergent. a

We can obtain many examples of conditional convergence via the following consequence of Theorem 3.5.8.

82

Jie Xiao

Corollary 3.5.9. Let φ be Ra decreasing function on [a, ∞) with limx→∞ φ(x) = 0. R Then a∞ φ(x) sinx dx and a∞ φ(x) cosxdx converge. Moreover, the convergence R R is absolute or conditional depending on whether a∞ φ(x) dx < ∞ or a∞ φ(x) dx = ∞. Proof. It is enough to verify the sine-case. Firstly, we have Z t sinx dx = | cosa − cos t| ≤ 2. a

It follows from Theorem 3.5.8 that a∞ φ(x) sinx dx is convergent. R∞ Secondly, suppose that a φ(x) dx is finite. We have |φ(x) sinx| ≤ φ(x) for R x ∈ [a, ∞). We conclude from the comparison test that a∞ |φ(x) sinx| dx is convergent. It remains to prove that R

Z ∞ a

φ(x) dx = ∞ ⇒

Z ∞ a

|φ(x) sinx| dx = ∞.

To check this, we show lim

Z nπ

n→∞ a

|φ(x) sinx| dx = ∞.

Let m be a fixed integer ≥ a/π. For a/π ≤ m < n, we have Z nπ a

|φ(x) sinx| dx ≥

Z nπ mπ

n

|φ(x) sinx| dx =



Z kπ

k=m+1 (k−1)π

|φ(x) sinx| dx.

Since φ is decreasing, if m + 1 ≤ k ≤ n, then R kπ R kπ  (k−1)π | sinx| dx = 2φ(kπ); R(k−1)π |φ(x) sinx| dx ≥ φ(kπ) R (k+1)π (k+1)π dx = πφ(kπ); φ(x) dx ≤ φ(kπ) kπ kπ  R kπ 2 R (k+1)π φ(x) dx. (k−1)π |φ(x) sinx| dx ≥ π kπ

Consequently, Z nπ a

2 n |φ(x) sinx| dx ≥ ∑ π k=m+1

Z (k+1)π kπ

2 φ(x) dx = π

Z (n+1)π (m+1)π

Letting n → ∞ in the last estimate, we obtain the desired result: Z ∞ a

|φ(x) sinx| dx = ∞.

φ(x) dx.

83

Riemann-Stieltjes Integrals

Example 3.5.10. In particular, p ∈ (0, 1] ensures that 1∞ x−p sinx dx is condiR ∞ −p tionally convergent since p ∈ (0, 1] implies 1 x dx = ∞. R

Problems 3.1. Find the variations of the following functions on the indicated intervals: (i) f (x) = sin x on [0, 3π]; (ii) f (x) = 2x3 − 3x2 on [−1, 2];

(iii) {ak}∞ k=1 is an infinite sequence of real numbers and  ak , x = (k + 1)−1 f (x) = . 0 , x ∈ [0, 1] \ {(k + 1)−1} 3.2. Find v f (x) for f (x) = sinx on [0, 2π].  3.3. Suppose that x = x(t), y = y(t) t∈[a,b] represents a continuous arc Γ on the plane. To each partition P = {a = t0 ,t1, ...,tn = b} there corresponds a n sequence of points Pk = x(tk), y(tk) k=0 . Let L(P) be the sum of the lengths of segments: P0 P1 , P1 P2 , ... and Pn−1 Pn . Prove that L = supP L(P) < ∞ (i.e. Γ is rectifiable) if and only if x(t) and y(t) have bounded variation. 3.4. Let D be a dense subset of the open interval (a, b) ⊆ R. Prove that if two monotonic functions f 1 , f 2 : (a, b) → R satisfy the equation f 1 = f 2 on D then f 1 and f 2 have the same continuous points. 3.5. Prove that Vab ( f 1 ) < ∞ and Vab ( f 2 ) < ∞ imply Vab ( f 1 f 2 ) < ∞. 3.6. Prove that the following function  x cos π(2x)−1 , x 6= 0 f (x) = 0 , x=0 has no bounded variation on [0, 1]. 3.7. For each k ∈ N let f k ∈ BV [a, b] and limk→∞ f k (x) = f (x) where | f (x)| < ∞ for each x ∈ [a, b]. Prove that if supk∈N Vab ( f k ) < ∞ then f ∈ BV [a, b]. 3.8. Evaluate the following Riemann-Stieltjes integrals: (i)

R4 2 0 x d([x]) where [x] denotes the greatest integer function;

84 (ii)

Jie Xiao R1 3 2 0 x d(x ); Rc

(iii)

a

f (x) dg(x) where f is continuous and  0, a≤x≤b g(x) = . 1, b 0 or x < 0; (d) (±∞) · (±∞) = ∞, ∞ · (−∞) = (−∞) · ∞ = −∞; (e) ∞ + (−∞), (−∞) + ∞, 0 · (±∞), and (±∞) · 0 are undefined. In addition, by an interval with endpoints a, b ∈ Re, we mean that it is one of [a, b], [a, b), (a, b] or (a, b). Note that [a, a) = (a, a] = (a, a) = 0/ and [a, a] = {a}. When a, b ∈ R, the interval is said to be bounded or finite. Definition 4.1.2. Suppose that g : R → R is an increasing function. For any c ∈ R let g(c+ ) = lim g(x) and g(c−) = lim g(x). 0 δ for each n ∈ N. But, since g is increasing on R, for each n ∈ N we can find a simple

90

Jie Xiao

set In and a closed simple set Jn such that In ⊆ Jn ⊆ Rn and mg (Rn \ In ) < δ2−n−1 . Accordingly, it follows from (a)-(b)-(c) in Remark 4.1.3 that mg (∩nj=1 I j ) = mg

n \

j=1

R j \ (R j \ I j )



n

≥ mg (Rn ) − ∑ mg (R j \ I j ) > 2−1 δ, j=1

and consequently, {∩nj=1 J j }∞ n=1 is a sequence of nonempty, bounded, closed and decreasing sets. Therefore, we use Theorem 1.2.4(i) to obtain ∞ ∞ n ∞ / 0/ 6= ∩∞ n=1 Jn ⊆ ∩n=1 Rn = ∩n=1 (S \ ∪ j=1S j ) = S − ∪ j=1 S j = 0,

a contradiction. Secondly, we modify the Riemann-Stieltjes sums to define summable functions and admissible sequences. Definition 4.1.5. Given an increasing function g : R → R.

(i) A function s : R → R is called a step function provided that {x ∈ R : s(x) 6= 0} is a simple set; that is, there are n ∈ N, c1 , ..., cn ∈ R and disjoint finite intervals {Ik }nk=1 such that s = ∑nk=1 ck 1Ik . In this case, we write n

Ag (s) =

∑ ck mg(Ik )

k=1

for the g-area of s. (ii) A function sequence {sk }∞ k=1 on R is called admissible for a nonnegative function f : R → Re provided that (a) each sk is a step function on R; (b) each sk is nonnegative; (c) f ≤ ∑∞ k=1 sk . In this case, we denote by Sg ( f ) = inf





∑ Ag (sk )

k=1



the g-sum of f , where the infimum ranges over all admissible sequences {sk}∞ k=1 for f .

Lebesgue-Radon-Stieltjes Integrals

91

Remark 4.1.6. Ag is just the Euclidean area under the graph of s when g is the identity. Moreover, there always exists an admissible sequence for any nonnegative f : R → Re . To this end, we take Ik = (−k, k) and sk = 1Ik . Clearly, {sk}∞ k=1 is admissible for f . This demonstration shows that Sg always exists even if it may be ∞. To better understand Ag and Sg , we explore their basic properties as follows. Theorem 4.1.7. Let g : R → R be an increasing function. Then (i) Ag (·) has the following four properties: (a) Ag (0) = 0 and Ag (s) ≥ 0 whenever s is a nonnegative step function on R; (b) Ag (s1 ) ≤ Ag (s2 ) whenever s1 and s2 are step functions satisfying s1 ≤ s2 on R; (c) If s is a step function on R, so are |s|, s+ = max{s, 0}, and s− = max{−s, 0} on R with Ag (s) = Ag (s+) − Ag (s− ) and Ag (|s|) = Ag (s+ ) + Ag (s−); (d) If {sk }nk=1 are step functions on R and {ck }nk=1 are finite real numbers, then s = ∑nk=1 ck sk is a step function on R and Ag (s) = ∑nk=1 ck Ag (sk). (ii) Sg (·) has the following four properties: (a) For f : R → Re, Sg ( f ± ) ≤ Sg (| f |) where f ± = max{± f , 0}; (b) For f : R → Re and c ∈ R, Sg (|c f |) = |c|Sg(| f |); (c) For f 1 , f 2 : R → Re with 0 ≤ f 1 ≤ f 2 , Sg ( f 1 ) ≤ Sg ( f 2 ); ∞ e (d) If { f k }∞ k=1 is a sequence of functions from R to R and ∑k=1 f k (x) converges for any x ∈ R, then  ∞  ∞ Sg ∑ f k ≤ ∑ Sg (| f k|). k=1

k=1

(iii) Ag (s) = Sg (s) holds for any nonnegative step function s on R.

Proof. It is straightforward to verify (i). Next we prove (ii). (a) It follows from the definitions of Sg , f + and f − .

92

Jie Xiao

(b) If c = 0 then the result is trivial. Suppose now c 6= 0. Then {sk }∞ k=1 is an admissible sequence for | f | if and only if {|c|sk}∞ is an admissible sequence k=1 for |c f |. Moreover, ( ) ( ) ∞



∑ Ag(|c|sk)

Sg (|c f |) = inf

= |c| inf

k=1

∑ Ag (sk)

k=1

= |c|Sg ( f ).

(c) This follows from that if {sk }∞ k=1 is admissible for f 2 then it is admissible for f 1 . (d) Without loss of generality, we may assume ∑∞ k=1 Sg (| f k |) < ∞: Otherwise, there is nothing to argue. Under this assumption, we have naturally Sg (| f k |) < ∞ for each k ∈ N. This, along with the definition of the infimum, implies that for any ε > 0 there exists a function sequence {sk,g }∞j=1 which is admissible for | f k | and satisfies ∞

∑ Ag(sk, j) ≤ Sg (| fk|) + 2−k ε.

j=1

Thus







Ag (sk, j ) =

k, j=1

Note that

∑ k=1

∞ 0 ≤ ∑ f k ≤ k=1









∑ Ag (sk, j) ≤ j=1







∑ Sg(| fk|) + ε.

k=1



∑ | fk | ≤ ∑ ∑ sk, j

k=1

k=1

j=1





=



sk, j .

k, j=1

∞ So {sk, j }∞ k, j=1 is an admissible sequence for ∑k=1 f k . Further, we get  ∞  Sg ∑ f k ≤ k=1

and consequently,





k, j=1



Ag (sk, j ) ≤

 ∞  Sg ∑ f k ≤ k=1

∑ Sg(| fk |) + ε,

k=1



∑ Sg (| fk|),

k=1

as desired. Last of all, we verify (iii). If s : R → R is a step function, then Sg (s) ≤ Ag (s). Assume now that {sk }∞ k=1 is any admissible sequence for s and let tn = s − ∑nj=1 s j . Then tn : R → R is a step function, decreasing and convergent to

93

Lebesgue-Radon-Stieltjes Integrals

limn→∞ tn ≤ 0. Consequently, tn+ = max{tn , 0} decreases and converges to 0. Since K1,+ = {x ∈ R : t1+ (x) > 0} and Kn,ε = {x ∈ R : tn+ (x) ≥ ε} are simple sets for any given ε > 0, we conclude   Ag (tn+) ≤ sup tn+(x) mg (Kn,ε ) + sup tn+(x) mg (R \ Kn,ε ) x∈Kn,ε

sup t1+ (x) x∈R



Since

x∈R\Kn,ε

 mg (Kn,ε ) + εmg (K1,+).

/ Kn+1,ε ⊆ Kn,ε and ∩∞ n=1 Kn,ε = 0, we employ Theorem 4.1.4 to achieve n

∑ n→∞

mg (K1,ε ) = lim

j=1

 mg (K j,ε ) − mg (K j+1,ε ) = mg (K1,ε ) − lim mg (Kn,ε ), n→∞

whence limn→∞ mg (Kn,ε ) = 0. The previous discussion implies immediately lim sup Ag (tk+) ≤ εmg (K1,+),

n→∞ k≥n

which yields limn→∞ Ag (tn+) = 0 since ε > 0 is arbitrary. But, n

n

n

j=1

j=1

j=1

s = tn + ∑ s j ≤ tn+ + ∑ s j ⇒ Ag (s) ≤ Ag (tn+) + ∑ Ag (s j ). Letting n → ∞ we obtain Ag (s) ≤ ∑nj=1 Ag (s j ) which, together with the definition of Sg (s), yields Ag (s) ≤ Sg (s). Accordingly, Ag (s) = Sg (s). Thirdly, the Lebesgue-Radon-Stieltjes integrals are founded on the following facts. Theorem 4.1.8. Given an increasing function g : R → R. For k ∈ N let f k : R → R be a function with Sg (| f k|) < ∞. If f : R → R satisfies limk→∞ Sg (| f − f k |) = 0, then: (i) Sg (| f |) < ∞ and Sg ( f ±) < ∞;

94

Jie Xiao

(ii)

lim Sg ( | f | − | f k | ) = 0 and lim Sg (| f ± − f k± |) = 0;

k→∞

(iii)

k→∞

lim Sg (| f k|) = Sg (| f |) and lim Sg ( f k± ) = Sg ( f ± ).

k→∞

k→∞

Proof. (i) By hypothesis, there is an n0 ∈ N such that Sg (| f − f n0 |) < ∞ and Sg (| f n0 |) < ∞. So, from Theorem 4.1.7 (ii) (d) it follows that Sg ( f ±) ≤ Sg (| f |) ≤ Sg (| f − f n0 |) + Sg (| f n0 |) < ∞. (ii) Note that  max | f | − | f k | , | f + − f k+ |, | f − − f k− | ≤ | f − f k | for all k ∈ N.

So this inequality, along with limk→∞ Sg (| f − f k |) = 0, implies the desired limits. (iii) It follows from Theorem 4.1.7 (ii) (d) that for each k ∈ N,   Sg (| f k|) ≤ Sg (| f |) + Sg (| f k − f |); Sg (| f |) ≤ Sg (| f k |) + Sg (| f − f k |);   |Sg (| f k|) − Sg (| f |)| ≤ Sg (| f − f k |). This inequality plus the hypothesis yields limk→∞ |Sg (| f k|) − Sg(| f |)| = 0. Since f k± = f ± +( f k± − f ± ), we can validate limk→∞ Sg ( f k± ) = Sg ( f ± ) in a way similar to showing limk→∞ |Sg (| f k|) − Sg (| f |)| = 0.

Now, we are in a position to introduce the definition of a Lebesgue-RadonStieltjes integral. Definition 4.1.9. Let g : R → R be an increasing function. (i) For a step function f : R → R, we define the Lebesgue-Radon-Stieltjes integral of f on R with respect to mg by Z

R

f dmg = Ag ( f + ) − Ag ( f − ).

(ii) For an arbitrary function f : R → Re , we say that f is Lebesgue-RadonStieltjes integrable on R with respect to mg , denoted f ∈ LRSg (R) provided that

Lebesgue-Radon-Stieltjes Integrals

95

there exists a sequence {sk }∞ k=1 of step functions on R such that limk→∞ Sg (| f − sk |) = 0. In this case, we write Z

R

f dmg = Sg ( f +) − Sg ( f − )

for the Lebesgue-Radon-Stieltjes integral of f on R. We close this section by making two important comments on the above notion. Remark 4.1.10. R

(i) If g in Definition 4.1.9 is the identity on R, then R f dmg is called the R Lebesgue integral of f : R → Re on R, denoted R f dmid . It is clear that if f coincides with the function discussed in Example 4.1.1 on [0, 1] and 0 elsewhere, then f is Lebesgue integrable on R. (ii) Definition 4.1.9 produces the following three facts: R (a) R 1I f dmg = 0 for any interval I ⊆ R  and g being any constant on I; R + − (b) R R 1[c,c] f dmg = f (c) g(c ) − g(c ) for any c ∈ R; (c) R 1I dmg = mg (I) for any finite interval I ⊂ R.

4.2 Essential Properties After introducing the concept of a Lebesgue-Radon-Stieltjes integral, we seek to show its basic properties. Theorem 4.2.1. Let g : R → R be increasing. If f ∈ LRSg (R), then | f |, f +, f − ∈ LRSg (R) with R R + R −  RR f dmg = RR f dmg − RR f dmg ; + − R | f | dmg = R f dmg + R f dmg ;  R  R R f dmg ≤ R | f | dmg .

Proof. Because of f ∈ LRSg (R), we can take a sequence {sk}∞ k=1 of step functions on R such that limk→∞ Sg (| f − sk |) = 0. Since each sk is a step function R, so is |sk | and it follows that Sg (|sk|) < ∞. From Theorem 4.1.8 (ii) it turns out that  lim Sg | f | − |sk | = 0 and lim Sg (| f ± − s± k |) = 0. k→∞

k→∞

96

Jie Xiao

By definition, we have | f |, f +, f − ∈ LRSg (R) with Z

R

f dmg = Sg ( f +) − Sg ( f − ) =

Z

R

f + dmg −

Z

R

f − dmg .

Meanwhile, note by Theorem 4.1.7 (i) (c) that − Ag (|sk|) = Ag (s+ k ) + Ag (sk ). − So it follows that Sg (|sk|) = Sg (s+ k ) + Sg (sk ). This, along with Theorem 4.1.8 (iii), implies

Z

R

+



| f | dmg = Sg ( f ) + Sg ( f ) =

Z

R

+

f dmg +

Z

R

f − dmg ,

as required. Finally, the foregoing argument yields the inequality in Theorem 4.2.1 right away. Theorem 4.2.2. Let g : R → R be increasing. If f k ∈ LRSg (R) and lim Sg (| f − f k |) = 0, k→∞

then f ∈ LRSg (R) with

 R R  limk→∞ RR f k dmg = RR f dmg ; limk→∞ R f k± dmg = R f ± dmg ;  R R  limk→∞ R | f k | dmg = R | f | dmg .

Proof. By Theorem 4.1.7 (ii) (d), f k ∈ LRSg (R) induces a sequence {sk,g }∞j=1 of step functions on R such that lim j→∞ Sg (| f k − sk, j |) = 0. Thus, for some j ∈ N we must have Sg (| f k − sk, j |) < k−1 . For any ε > 0, there is an n0 ∈ N such that Sg (| f − f k |) < 2−1 ε as k ≥ n0 . Accordingly, if k ≥ 2ε−1 then Sg (| f k − sk, j |) < 2−1 ε. Using Theorem 4.1.7 (ii) (d) again, we obtain that k ≥ max{n0 , 2ε−1 } ⇒ Sg (| f −sk, j |) ≤ Sg (| f − f k |)+Sg (| f k −sk, j |) < ε ⇒ f ∈ LRSg (R).

The limit formulas may be verified by Theorems 4.2.1 and 4.1.8 (iii) plus the definition of the Lebesgue-Radon-Stieltjes integrals.

97

Lebesgue-Radon-Stieltjes Integrals Theorem 4.2.3. Let g : R → R be increasing. (i) If 0 ≤ f ∈ LRSg (R), then

R

R

f dmg ≥ 0.

(ii) If f 1 , f 2 ∈ LRSg (R) satisfy f 1 ≤ f 2 on R, then

R

R f 1 dmg

(iii) If f 1 , ..., f n ∈ LRSg (R) and c1 , ..., cn ∈ R, then Z  n  n ∑ ck fk ∈ LRSg (R) with ∑ ck fk (x) dmg(x) = R

k=1

k=1



R

R f 2 dmg .

n

∑ ck

k=1

Z

R

f k dmg .

Proof. (ii) follows from (i) and (iii), but (i) is derived easily from that R R f dmg = Sg ( f ) ≥ 0 as f ≥ 0. To prove (iii), we may assume f k ≥ 0. The condition f k ∈ LRSg (R) produces a sequence {sk,g }∞j=1 of step functions on R such that lim j→∞ Sg (| f k − sk, j |) = 0. Now for each j ∈ N we use Theorem 4.1.7 (ii) (d) to get   n n n 0 ≤ Sg ∑ ck f k − ∑ ck sk, j ≤ ∑ |ck |Sg (| f k − sk, j |). k=1

k=1

k=1

Since n is finite, we conclude that

  n n lim Sg (| f k − sk, j |) = 0 ⇒ lim Sg ∑ ck f k − ∑ ck sk, j = 0 ⇒ j→∞ j→∞ k=1

k=1

n

∑ ck fk ∈ LRSg (R).

k=1

An application of Theorem 4.2.2 implies ( R R lim j→∞ R sk, j dmg = R f k dmg ; R R lim j→∞ R (∑nk=1 ck sk, j ) dmg = R ∑nk=1 ck f k dmg .

By virtue of the foregoing limits and the following equalities  n  Z  n Z  n c s dm = A c s = c ∑ k k, j g g ∑ k k, j ∑ k sk, j dmg, R

k=1

k=1

k=1

R

we obtain the desired result.

The following result, a consequence of Theorem 4.2.3, will be very useful in evaluating some Lebesgue-Radon-Stieltjes integrals. Corollary 4.2.4. Let g : R → R be increasing and I ⊆ R a union of finitely many disjoint finite intervals {Ik }nk=1 . Then 1I f ∈ LRSg (R) if and only if 1Ik f ∈ LRSg (R) for any k = 1, ..., n. In this case, Z

n R

1I f dmg =



Z

k=1 R

1Ik f dmg .

98

Jie Xiao

Proof. If 1I f ∈ LRSg (R), then there is a sequence of step functions {sk}∞ k=1 on R such that limk→∞ Sg (|1I f − sk |) = 0. So, limk→∞ Sg (|1I j f − 1I j sk |) = 0 and 1I j f ∈ LRSg (R) for each j = 1, 2, ..., n. Note that 1I f = ∑nk=1 1Ik f on R. Then by Theorem 4.2.3 (iii) we get Z

R

1I f dmg =

Z

n

n

∑ 1Ik f dmg =

R k=1



Z

k=1 R

1Ik f dmg .

Conversely, if 1Ik f ∈ LRSg (R) for each k = 1, 2, ..., n, then there is a sequence of step functions {sk,g }∞j=1 on R such that lim j→∞ Sg (|1Ik f − sk, j |) = 0. This yields lim j→∞ Sg (| f 1I − ∑nk=1 sk, j |) = 0. Since {∑nk=1 sk, j }∞j=1 is also a sequence of step functions on R, f 1I ∈ LRSg (R) follows. We give an example to demonstrate how Corollary 4.2.4 is applied. Example 4.2.5. Let g(x) =



0, x 0 and εm = ε2−m for m ∈ N. Using the assumptions on {tn }∞ n=1 and Fact 1, we can get step functions um on R such that Sg (tm −um ) < εm , 0 ≤ um ≤ tm. Then vn = inf1≤m≤n um is a step function on R, {vn }∞ n=1 is decreasing and since 0 ≤ vn ≤ un ≤ tn , it follows that limn→∞ vn = 0 mg -a.e. on R, and thus, by Fact 1, we get lim Sg (vn ) = lim

n→∞

Z

n→∞ R

vn dmg = 0.

However, {tn}∞ n=1 is a decreasing sequence and enjoys tn = inf tm = inf 1≤m≤n

1≤m≤n

 um + (tm − um ) ≤ vn +

n

∑ (tm − um ),

m=1

which derives that there is an n0 ∈ N such that n

Sg (tn ) ≤ Sg (vn ) + Thus, the result follows.

∑ Sg (tm − um ) ≤ 2ε

m=1

as n > n0 .

102

Jie Xiao

Now, let us prove the desired convergence result. Let ε > 0 and εk = 2−k ε for k ∈ N. To each k there corresponds a step function sk with Sg (| f k − sk |) ≤ εk . Then ∞ sup | f k − sk | ≤ k≥n

∑ | f k − sk |

k=n

and hence     Sg lim sup | f k − sk | ≤ Sg sup | f k − sk | m→∞ k≥m

k≥n





∑ Sg ( f k − sk )

k=n 1−n

≤ 2

ε → 0 as n → ∞.

This last estimation tells us that limn→∞ supk≥n | f k − sk | is an mg -null function but also equals 0 mg -a.e. on R. Note that limk→∞ f k = f mg -a.e. on R. So it follows that limk→∞ sk = f mg -a.e. on R. Of course, this gives that tn = sup |sk − sm | ⇒ lim tn = lim sup |sk − sm | = 0 mg -a.e. on R. n→∞

k,m≥n

n→∞ k,m≥n

In addition, {tn }∞ n=1 is a decreasing sequence with the following property ∞

t1 ≤ 2 sup |sk | ≤ 2 sup | f k | + 2 sup | f k − sk | ≤ 2h + 2 ∑ | f k − sk |, k∈N

k∈N

k∈N

k=1

which gives ∞

Sg (t1 ) ≤ 2Sg (h) + 2 ∑ Sg (| f k − sk |) ≤ 2Sg (h) + 2ε. k=1

In other words, tn satisfies the conditions of Fact 2. So, limn→∞ Sg (tn ) = 0. Nevertheless, since | f − sn | = lim |sk − sn | ≤ tn mg -a.e. on R, k→∞

it follows that lim Sg (| f − sk |) = 0 and so f ∈ LRSg (R).

k→∞

103

Lebesgue-Radon-Stieltjes Integrals

Finally, Sg (| f k − sk |) ≤ εk , together with Theorems 4.1.7 (ii) (d) and 4.1.8, implies Z Z lim

k→∞ R

f k dmg =

R

f dmg .

As a direct consequence of the dominated convergence theorem, we get the following monotone convergence theorem. Theorem 4.3.2. Let g : R → R be increasing. Suppose that { f k }∞ k=1 is an increasing sequence in LRSg (R) enjoying limk→∞ f k = f mg -a.e. on R. Then R f ∈ LRSg (R) if and only if limk→∞ R f k dmg exists. In this case, lim

Z

k→∞ R

f k dmg =

Z

R

f dmg .

R

Proof. If f ∈ LRSg (R), then R f dmg is finite, and hence the estimate | f k − f 1 | ≤ | f − f 1 | and Theorem 4.3.1 give lim

Z

k→∞ R

f k dmg =

On the other hand, if limk→∞ (d) and 4.2.3 (iii) we have

R

Sg ( f − f 1 ) = Sg





R f k dmg

R

f dmg .

is finite, then by Theorems 4.1.7 (ii)



∑ ( fk − fk−1)

k=2



∑ Sg ( fk − fk−1)



k=2

=

lim

n

=

Z

n→∞



k=2

= −

Z

( f k − f k−1 ) dmg

k=2 R n Z

∑ n→∞ lim

Z

R

R

f k dmg −

f 1 dmg + lim

Z

n→∞ R

Z

R

f k−1 dmg



f n dmg

< ∞. Consequently, there exists a sequence of nonnegative step functions {s j }∞j=1 on R such that ( f − f 1 ≤ h = ∑∞j=1 s j = limn→∞ ∑nj=1 s j ; Sg ( f − f 1 ) ≤ ∑∞j=1 Ag (s j ) < ∞.

104

Jie Xiao

With the last result, we have   n ∞ Sg h − ∑ s j ≤ ∑ Sg (s j ) ≤ j=1

j=n+1





j=n+1

Ag (s j ) → 0 as n → ∞.

This, plus Theorem 4.2.2, implies h ∈ LRSg (R). Since f 1 ∈ LRSg (R) and lim ( f k − f 1 ) = f − f 1 mg − a.e. on R, k→∞

we conclude from Theorem 4.3.1 that f ∈ LRSg (R) and

Z

R

( f − f 1 ) dmg = lim

Z

k→∞ R

( f k − f 1 ) dmg ,

as desired. Given a sequence {an }∞ n=1 in R bounded from below. If bn = inf{am : m ≥ n}, then {bn }∞ is an increasing sequence of real numbers, and hence n=1 e limn→∞ bn ∈ R . Using this notion, we have the following Fatou’s lemma. Lemma 4.3.3. Let g : R → R be increasing. If { f k }∞ k=1 is a nonnegative function sequence in LRSg (R) such that limk→∞ f k = f mg -a.e. on R and R limn→∞ infm≥n R f m dmg is finite, then f ∈ LRSg (R) and

Z

R

f dmg ≤ lim inf

Z

n→∞ m≥n R

f m dmg .

Proof. For each k ∈ N let hk = inf j≥k f j . Then {hk}∞ k=1 is increasing and satisfied with f = lim hk = sup hk mg − a.e. on R. k→∞

k∈N

By Theorem 4.3.2 and the fact that 0 ≤ hk ≤ f j for each j ≥ k, we derive Z

R

f dmg = lim

Z

k→∞ R

hk dmg ≤ lim inf

Z

k→∞ j≥k R

f j dmg ,

as desired. In the rest of this section, we use the above-discussed convergence results to show that the Lebesgue-Radon-Stieltjes integral is a generalization of the Riemann-Stieltjes integral.

105

Lebesgue-Radon-Stieltjes Integrals

Theorem 4.3.4. Let g : R → R be increasing and continuous, and suppose that f : R → R equals 0 outside the finite interval [a, b] in R. If f ∈ RSg [a, b], then f ∈ LRSg (R) and

Z

R

f dmg =

Z b

f (x) dg(x).

a

Proof. Assume f ∈ RSg [a, b]. For each k ∈ N, partition [a, b] into k subintervals with equal length k−1 . Denote this partition by P = {xk }nk=0 . Define k

Gk =

k

∑ m j 1[x j−1,x j ) and Hk =

j=1

∑ M j 1[x

j−1 ,x j )

,

j=1

where m j = inf{ f (x) : x ∈ [x j−1 , x j ]} and M j = sup{ f (x) : x ∈ [x j−1 , x j ]}. ∞ Clearly, {Gk }∞ k=1 and {Hk }k=1 are increasing and decreasing sequences respectively, but also Gk and Hk belong to LRSg (R) thanks to both that g is continuous on R and that f ∈ RSg [a, b] implies

lim

Z

k→∞ R

Gk dmg =

lim Ag (Gk )

k→∞

k

=

∑ mj k→∞

=

Z b

lim

j=1

g(x j ) − g(x j−1 )



f (x) dg(x)

a

and lim

Z

k→∞ R

Hk dmg =

lim Ag (Hk )

k→∞

k

=

∑ Mj k→∞

=

Z b

lim

j=1

g(x j ) − g(x j−1 )



f (x) dg(x).

a

Putting G = limk→∞ Gk and H = limk→∞ Hk , we employ Theorem 4.3.2 to deduce G, H ∈ LRSg (R) for which G ≤ f ≤ H mg -a.e. on R. Notice also that Hk − Gk ≥ 0 and lim (Hk − Gk ) = H − G mg − a.e. on R. k→∞

106

Jie Xiao

So, from Theorems 4.2.7 and 4.3.2 it follows that Z

0 ≤

R

=

(H − G) dmg

lim

Z

(Hk − Gk ) dmg

lim

Z

Hk dmg − lim

k→∞ R

=

k→∞ R

Z

k→∞ R

Gk dmg

= 0. The above estimates indicate that H − G = 0 and so G = f = H mg -a.e. on R. Accordingly, we obtain f ∈ LRSg (R) and

Z

R

f dmg = lim

Z

k→∞ R

Gk dmg =

Z b

f (x) dg(x).

a

Theorem 4.3.4 presents a practical way to evaluate many Lebesgue-RadonStieltjes integrals. Example 4.3.5. Let g(x) =



0, x h

then |tk| ≤ h and tk ∈ LRSg (R) by Theorem 4.2.3 (iii). It is easy to see that limk→∞ tk = f mg -a.e. on R. So Theorem 4.3.1 is used to deduce f ∈ LRSg (R). Any step function on R is obviously mg -measurable and so 1 = limn→∞ 1(−n,n) is mg -measurable. Moreover, the mg -measurability is preserved under the usual analytic operations - more precisely - we have the following property. Theorem 4.4.3. Let g : R → R be increasing.

(i) If f : R → Re is mg -measurable, so are | f |, f + and f − .

(ii) If f 1 , f 2 : R → Re are mg -measurable, so are f 1 ± f 2 , f 1 f 2 , f 1 / f 2 (where f 2 6= 0 mg -a.e. on R), max{ f 1 , f 2 }, and min{ f 1 , f 2 }.

(iii) If { f j }∞j=1 is a sequence of mg -measurable functions from R to Re , so are lim sup f j and lim inf f j . In particular, if the mg -measurable sequence { f j }∞j=1 converges to f mg -a.e. on R, then f is mg -measurable. Proof. (i) This follows from the definition of an mg -measurable function on R. (ii) It suffices to verify that 1/ f is mg -measurable when f : R → Re is mg measurable and f 6= 0 mg -a.e. on R. Assuming the last when-statement, we get a sequence of step functions {sk }∞ k=1 on R such that limk→∞ sk = f mg -a.e. on R. If  −1  sk (x) , sk (x) 6= 0 tk(x) = , 1 , sk (x) = 0, x ∈ (−k, k)  0 , sk (x) = 0, x ∈ R \ (−k, k) then {tk }∞ k=1 is a sequence of step functions on R and convergent to 1/ f mg -a.e. on R, and hence 1/ f is mg -measurable. (iii) Noting the following formulas: ( lim sup f j = lim j→∞ supk≥ j f k = lim j→∞ liml→∞ supl≥k≥ j f k ; lim inf f j = lim j→∞ infk≥ j f k = lim j→∞ liml→∞ infl≥k≥ j f k ,

Lebesgue-Radon-Stieltjes Integrals

109

we just prove the special case that the mg -a.e. limit function f of the mg measurable function sequence { f j }∞j=1 is mg -measurable. To do so, let S = {Sk }∞ k=1 be a sequence of disjoint simple sets such that R = ∪∞ k=1 Sk and 0 < mg (Sk ) < ∞.

−1 Then the positive function h : R → Re , defined via putting h(x) = k2 mg (Sk ) for x ∈ Sk , belongs to LRSg (R) thanks to Theorem 4.3.2. Since f j is mg measurable, by Definition 4.4.1 there exists a sequence of step functions {s j,k }∞ k=1 on R such that s j,k equals 0 outside a finite union of sets from S , and limk→∞ s j,k = f j mg -a.e. on R. Accordingly, if  h f j (h + | f j |)−1 , f j 6= ±∞ −1 q j,k = hs j,k (h + |s j,k |) and q j = , ±h , f j = ±∞ then limk→∞ q j,k = q j mg -a.e. on R; q j,k is a step function on R and hence q j is mg -measurable; moreover |q j | ≤ h and {q j }∞j=1 converges mg -a.e. on R to the following function  h f (h + | f |)−1 , f 6= ±∞ . q= ±h , f j = ±∞ From h ∈ LRSg (R), Theorems 4.3.1 and Theorem 4.4.2 it follows that q is mg measurable, and one can define a sequence of step functions {s j }∞j=1 , each being 0 outside a finite union of sets from S , so that |s j | < h and lim j→∞ s j = q mg a.e. on R. Accordingly, (h − |s j |)−1s j h is a step function on R and convergent to f (see also the definition of q) mg -a.e. on R, and consequently, f is mg measurable. In the light of Theorem 4.4.2, we introduce the following concept. Definition 4.4.4. Let g : R → R be increasing. (i) A subset E of R is called the Lebesgue-Radon-Stieltjes g-measurable or mg measurable provided that 1E is mg -measurable. (ii) If E ⊆ R is mg -measurable, then mg (E) = the Lebesgue-Radon-Stieltjes g-measure of E.

R

R 1E dmg

= Sg (1E ) is said to be

110

Jie Xiao

(iii) Let E ⊆ R and f : R → Re . We say f ∈ LRSg (E) provided that f 1E ∈ LRSg (R) and E is mg -measurable. In this case, the Lebesgue-Radon-Stieltjes R R integral of f on E is defined by E f dmg = R f 1E dmg . In particular, if g = id, i.e., g(x) = x for all x ∈ R, in (i)-(ii)-(iii), then the corresponding terms are the Lebesgue measurable, the Lebesgue measure, the Lebesgue integral.

Remark 4.4.5. A few words on Definition 4.4.4 are arranged below: (a) It is possible to have mg (E) = ∞ – this means that E has infinite gmeasure. Of course, if mg (E) < ∞ then E is said to have finite g-measure – in particular - mg (E) = mg (S) < ∞ whenever E = S is a simple set. (b) When f ∈ LRSg (R) and E is mg -measurable, (a) and Theorem 4.4.2 are applied to imply that f 1E is mg -measurable with Sg (| f 1E |) ≤ Sg (| f |) < ∞, and hence f 1E ∈ LRSg (R) by Theorem 4.4.2. (c) Given an mg -measurable interval I in R, the theorems about LRSg (R) given above have analogues for LRSg (I), obtained by multiplying all functions involved by 1I . In order to avoid unnecessary repeating, we will not state such theorems explicitly, yet will use them freely. Theorem 4.4.6. Let g : R → R be increasing and Mg comprise all mg measurable subsets of R. Then (i) Mg is a σ-ring in the sense of:  / R ∈ Mg ;  0, E ∈ Mg ⇒ E c = R \ E ∈ Mg ;   E j ∈ Mg ⇒ ∪∞j=1 E j ∈ Mg .

(ii) mg is a measure on Mg in the sense of: ( / = 0; mg (0) mg (∪∞j=1 E j ) = ∑∞j=1 mg (E j ) for any sequence of disjoint sets {E j }∞j=1 ⊆ Mg . (iii) mg is not only increasing - mg (E1 ) ≤ mg (E2 ) as E1 ⊆ E2 in Mg , but also countably subadditive mg (∪∞j=1 E j ) ≤



∑ mg (E j) j=1

for any sequence {E j }∞j=1 ⊆ Mg .

111

Lebesgue-Radon-Stieltjes Integrals

(iv) mg (R) < ∞ if and only if mg is bounded on all simple sets if and only if limn→∞ g(±n) < ∞. (v) If f ∈ LRSg (R), then Lg (E) =

Z

E

f dmg and L± g (E) =

Z

E

f ± dmg for any E ∈ Mg

are countably additive in the sense of: Lg (∪∞j=1 E j ) =



∑ Lg (E j)

∞ and L± g (∪ j=1 E j ) =

j=1



∑ L±g (E j )

j=1

for any sequence of disjoint sets {E j }∞j=1 ⊆ Mg . (vi)

R

E

f dmg = 0 for all E ∈ Mg if and only if f is an mg -null function.

(vii) A function f : R → Re is mg -measurable if and only if E( f ; λ) = {x ∈ R : f (x) > λ} is mg -measurable for any λ ∈ R. Proof. (i) Below is a simple observation. If f ≥ 0 is mg -measurable and E, E1 , E2 ∈ Mg , then 1E f is mg -measurable by Theorem 4.4.3 (ii), and the following two formulas 1E c =R\E f = (1 − 1E ) f and 1E1 ∪E2 f = max{1E1 f , 1E2 f } are valid. The preceding observation, along with Definition 4.4.4 and Theorems 4.4.2-4.4.3, establishes the desired result. / = 0. Suppose that {E j }∞j=1 is a sequence of (ii) Trivially, we have mg (0) disjoint sets in Mg . Then it is easy to see that 1∪∞j=1 E j is the pointwise limit of  ∞ the increasing sequence ∑nj=1 1E j n=1 and so that mg (∪∞j=1 E j ) =

= =

Z

R

1∪∞j=1 E j dmg

lim

Z

1∪nj=1 E j dmg



Z

n→∞ R n

lim

n→∞

j=1 R

1E j dmg



=

∑ mg (E j). j=1

112

Jie Xiao (iii) On the one hand, if E1 ⊆ E2 in Mg , then (ii) gives  mg (E2 ) = mg E1 ∪ (E2 \ E1 ) = mg (E1 ) + mg (E2 \ E1 ) ≥ mg (E1 ).

On the other hand, for any sequence {E j }∞j=1 of sets in Mg let j−1

F1 = E1 and Fj = E j ∩ ∪k=1 Ek Then

Accordingly, (ii) yields

c

for j > 1.

  Fj ∩ Fl = 0/ for j 6= l; E j ⊆ F j ∈ Mg ;   ∞ ∪ j=1 E j = ∪∞ k=1 Fk ∈ Mg .

mg (∪∞j=1 E j ) = mg (∪∞ k=1 Fk ) =



∑ mg (Fk ) ≤

k=1



∑ mg(Ek ).

k=1

(iv) This follows from R = ∪∞j=1 (− j, j) and (ii) as well as  ∞  mg (R) = mg (−1, 1) + ∑ mg (− j, j) \ (− j + 1, j − 1) j=2

 = mg (−1, 1) + lim

n

∑ n→∞

=

 lim mg (−n, n) .

j=2

   mg (− j, j) − mg (− j + 1, j − 1)

n→∞

(v) Since f ∈ LRSg (R), the result follows from Theorems 4.2.3 (iii) and − 4.3.1 for L+ g (·) as well as for Lg (·) and Lg (·). (vi) The sufficiency is evident. The necessity follows from the fact that Lg (E) = 0 implies L± g (E) = 0 as E ∈ Mg . (vii) Assume that the statement after the if and only if is valid. Then E( f ; ∞) = {x ∈ R : f (x) = ∞} = ∩∞j=1 E( f ; j) and E( f ; −∞) = {x ∈ R : f (x) = −∞} = ∩∞j=1 E( f ; − j)

c

Lebesgue-Radon-Stieltjes Integrals

113

are mg -measurable. Moreover, o n E( f ; j, k) = x ∈ R : jk−1 < f (x) ≤ ( j + 1)k−1 c = E( f ; jk−1 ) ∩ E( f ; ( j + 1)k−1 )

is mg -measurable for k ∈ N and j ∈ Z. Consequently, the characteristic functions 1E( f ;±∞) and 1E( f ; j,k) are mg -measurable. This gives that f k = k−1





j=−∞

j1E( f ; j,k) + k1E( f ;∞) − k1E( f ;−∞)

is mg -measurable and so is f = limk→∞ f k mg -a.e. on R. Conversely, fix λ ∈ R and suppose that f is mg -measurable. Then there exists a sequence of step functions {sk}∞ k=1 which converges to f mg -a.e. on R. Note that {x ∈ R : lim infsk (x) > λ} = ∪∞j=1 lim inf{x ∈ R : sk (x) ≥ λ + j −1 } k→∞

∞ −1 = ∪∞j=1 ∪∞ n=1 ∩m=n {x ∈ R : sm (x) ≥ λ + j }

is mg -measurable since each {x ∈ R : sm (x) ≥ λ + j −1 } is a simple set. So, it follows that E( f ; λ) is mg -measurable since f − lim infk→∞ sk is an mg -null function.

4.5 Double, Iterated and Generic Integrals Motivated by Theorems 4.2.3(iii) and 4.3.1 or 4.3.2 as well as Remark 3.3.5 or Theorem 3.3.4, in this section we settle the question of exchanging the order of the integration in a double integral, and of evaluating an integral as an iterated one, but also employ the established results to introduce an integration over an abstract set. Definition 4.5.1. Let g1 , g2 : R → R be increasing. Then

(i) A rectangle in R2 = R × R is a set of the form I1 × I2 where I1 , I2 are finite intervals in R. A simple set in R2 is a union of finitely many disjoint rectangles. (ii) The g1 ⊗ g2 -measure of a rectangle I1 × I2 in R2 is determined by mg1 ⊗g2 (I1 × I2 ) = mg1 (I1 )mg2 (I2 ).

114

Jie Xiao

The g1 ⊗ g2 -measure of a simple set S = ∪nk=1 (I1,k × I2,k ) in R2 is defined by n

mg1 ⊗g2 (S) =

n

∑ mg ⊗g (I1,k × I2,k ) = ∑ mg (I1,k )mg (I2,k ). 1

2

1

k=1

2

k=1

(iii) A function f : R2 → R is said to be a step function on R2 provided that {x = (x1 , x2 ) ∈ R2 : f (x) 6= 0} is a simple set S = ∪nk=1 I1,k × I2,k for which f equals a constant ck on I1,k × I2,k . (iv) The Lebesgue-Radon-Stieltjes double integral on R2 of a function f : R2 → Re , denoted Z

R2

f dmg1 ⊗g2 =

Z

R2

f + dmg1 ⊗g2 −

Z

R2

f − dmg1 ⊗g2 ,

is defined by a process similar to that used for the one-variable LebesgueRadon-Stieltjes integral in the Section 4.1. The class of all functions f with R 2 f dm ⊗ m 2 g g 1 2 being finite is written as LRSg1 ⊗g2 (R ). R

Under this definition, we can analogously establish the two-dimensional concepts and results corresponding to those presented in Sections 4.1-4.4. For instance, a function f : R2 → Re is mg1⊗g2 -measurable provided that there is a 2 sequence of step functions {sk }∞ k=1 on R such that limk→∞ sk = f mg1 ⊗g2 -a.e. 2 on R . When evaluating a Lebesgue-Radon-Stieltjes double integral, we are naturally led to a problem how to convert the integral to iterated integrals. This problem can be handled via Fubini-Tonelli’s theorem (named in honor of Guido Fubinias and Leonida Tonelli) as follows. Theorem 4.5.2. For two increasing functions g1 , g2 : R → R let f : R2 → Re be mg1 ⊗g2 -measurable.

(i) f (·, x2) and f (x1 , ·) are mg1 -measurable for mg2 -almost all x2 ∈ R and mg2 measurable for mg1 -almost all x1 ∈ R respectively. (ii) The following three conditions are equivalent: R R (a) R R | f | dmg1 dmg2 < ∞; R < ∞; (b) R2| f | dmg1⊗g2 

(c) R R | f | dmg2 dmg1 < ∞. Under any one of (a)-(b)-(c), the following three integrals Z Z Z Z Z   f dmg1 dmg2 ; f dmg1 ⊗g2 ; f dmg2 dmg1 R

R

R

R

are finite and equal.

R2

R

R

Lebesgue-Radon-Stieltjes Integrals

115

Proof. First of all, if f is a step function on R2 , then there are real numbers {ck }nk=1 and disjoint rectangles {I1,k × I2,k }nk=1 such that n

f=

∑ ck 1I

1,k ×I2,k

k=1

.

Accordingly, (i) follows right away, and (ii) follows from the related definitions and properties of the integrals via Z

n R2

f dmg1 ⊗g2

=

∑ ck k=1 n

=

1I1,k ×I2,k dmg1 ⊗g2

1

∑ ck

k=1 n

=

R2

∑ ck mg (I1,k )mg (I2,k )

k=1 n

=

Z

∑ ck k=1

2

Z Z R

R

Z Z R

R

 1I2,k dmg2 1I1,k dmg1

 f dmg2 dmg1 ZR  ZR  = f dmg1 dmg2 . =

Z Z

 1I1,k dmg1 1I2,k dmg2

R

R

Next, we consider the general case. Since f = f + − f − and | f | = f + + f − , without loss of generality we may assume f ≥ 0. (i) Note that f = limn→∞ 1Qn min{ f , n} holds mg1 ⊗g2 -a.e. on R2 where Qn is the rectangle {(x1 , x2 ) ∈ R2 : |x1 |, |x2 | ≤ n} for n ∈ N. If f n = 1Qn min{ f , n}, then 0 ≤ f n ≤ n on Qn and f = 0 on R2 \ Qn . Since f is mg1 ⊗g2 -measurable, so is f n . Therefore, it suffices to verify (i) for each f n . 2 By definition there are a sequence of step functions {sn,k}∞ k=1 on R and a set En ⊆ Qn such that   mg1 ⊗g2 (En ) = 0; 0 ≤ sn,k ≤ n;   limk→∞ sn,k (x) = f n (x) for x ∈ Qn \ En . Since En is an mg1 ⊗g2 -null set, for any ε > 0 there are a sequence of rectangles

116

Jie Xiao

{Rn, j }∞j=1 and a sequence of numbers {cn, j }∞j=1 such that ∞

1En ≤



∑ cn, j 1Rn, j and

j=1

∑ cn, j mg ⊗g (Rn, j) < ε. 1

2

j=1

Accordingly, Z Z R

R

 1En dmg2 dmg1 ≤ =





Z Z

j=1 R ∞ Z



j=1 Rn, j

R

 cn, j 1Rn, j dmg2 dmg1

cn, j 1Rn, j dmg1 ⊗g2

< ε and Z Z R

R



1En dmg1 dmg2







=



Z Z

j=1 R ∞ Z

j=1 R j

R

 cn, j 1Rn, j dmg1 dmg2

cn, j 1Rn, j dmg1 ⊗g2

< ε. Since ε > 0 is arbitrary, it follows that

and

Z

R

1En (x1 , ·) dmg2 = 0 for mg1 − almost all x1 ∈ R

Z

R

1En (·, x2 ) dmg1 = 0 for mg2 − almost all x2 ∈ R.

In other words, 1En (x1 , ·) and 1En (·, x2 ) are mg2 -null and mg1 -null functions for almost all x1 and x2 in R respectively, and consequently, lim sn, j (x1 , ·) = f n (x1 , ·) for mg1 − almost all x1 ∈ R

j→∞

and lim sn, j (·, x2) = f n (·, x2 ) for mg2 − almost all x2 ∈ R.

j→∞

Accordingly, (i) is valid for f n and then for f .

Lebesgue-Radon-Stieltjes Integrals (ii) It is enough to show that Z Z Z  f dmg1 dmg2 < ∞ ⇔ R

R

R2

117

f dmg1 ⊗g2 < ∞.

To this end, let f n = 1Qn min{ f , n} be as above. Then Z Z Z  f n dmg1 ⊗g2 = f n dmg1 dmg2 R2

R

R

which follows from the dominated convergence theorem and the validity of this formula for the step functions {sn, j }∞j=1 above.  R R Let R R f dmg1 dmg2 < ∞. Since f n = min{ f , n}1Qn belongs to LRSg1 ⊗g2 (R2 ) for each n ∈ N and converges to f as n → ∞, we use the last formula for f n to obtain Z Z Z  f n dmg1 ⊗g2 ≤ f dmg1 dmg2 < ∞, R2

R

R

which ensures f ∈ LRSg1 ⊗g2 (R2 ). Conversely, if f ∈ LRSg1 ⊗g2 (R2 ), then a combination of both the last formula for f n and the monotone convergence theorem gives Z Z Z Z  lim f n dmg1 dmg2 = lim f n dmg1 ⊗g2 = f dmg1 ⊗g2 < ∞. n→∞ R

n→∞ R2

R

R2

∞ Since { f n }∞ n=1 increases to f , for each fixed x2 ∈ R the sequence { f n (·, x2 )}n=1 increases to f (·, x2), and consequently, if

Fn (x2 ) =

Z

R

f n (·, x2 ) dmg1 and F(x2 ) =

Z

R

f (·, x2) dmg1 ,

 ∞ then Fn (x2 ) n=1 increases to F(x2 ). Since each Fn is mg2 -measurable by the argument for (i), so is F. This fact and the monotone convergence theorem yield lim

Z

n→∞ R

Fn dmg2 =

Z

The previous argument actually implies Z Z Z  f dmg1 dmg2 = R

R

R

R2

F dmg2 .

f dmg1 ⊗g2 < ∞.

118

Jie Xiao

Example 4.5.3. Let g1 (x) = g2 (x) = x on R and   x1 exp − x21 (1 + x22 ) , (x1 , x2 ) ∈ [0, ∞) × [0, ∞) f (x1 , x2 ) = . 0 , (x1 , x2 ) ∈ R2 \ [0, ∞) × [0, ∞) Then Z Z R

R



Z ∞ Z ∞

f dmg1 dmg2 =

Z0 ∞

=

0 −2

  x1 exp − x21 (1 + x22 ) dx1 dx2

0

−1

2 (1 + x22 )−1 dx2

= 2 π and hence by Theorem 4.5.2 (ii) and the substitution z1 = x1 x2 we get Z ∞ Z ∞   2−2 π = x1 exp − x21 (1 + x22 ) dx2 dx1 0 Z0 ∞ Z ∞  2 = exp(−x1 ) exp(−z21 ) dz1 dx1 0 0 Z ∞ 2 = exp(−t 2 ) dt . 0

The Fubini-Tonelli theorem can be used to establish the well-known Hermann Minkowski’s inequality.. To this end, we need the following H¨older inequality which is named after Otto H¨older and viewed as a fundamental inequality in analysis. Theorem 4.5.4. Let g : R → R be increasing, 1 < p < ∞ and q = p(p − 1)−1 . If f 1 , f 2 : R → Re are mg -measurable, then Z

R

| f 1 f 2 | dmg ≤

Z

R

1 Z p

p

| f 1 | dmg

R

q

| f 2 | dmg

Proof. Two cases are considered below. Case 1:

Z

R

p

| f 1| dmg = ∞ or

The desired inequality is trivial.

Z

R

| f 2|q dmg = ∞.

1 q

.

Lebesgue-Radon-Stieltjes Integrals Case 2: 0
c} ∪ {x ∈ R : | f 2 (x)| ≤ c}, and hence as a limiting case of Theorem 4.5.4, the following H¨older inequality is valid: Z  Z  | f 1 f 2 | dmg ≤ | f 1| dmg inf c ≥ 0 : mg ({x ∈ R : | f 2 (x)| > c}) = 0 . R

R

120

Jie Xiao

(iii) The H¨older inequality has the following generalization: If p1 , ..., pn ∈ (1, ∞) with ∑nj=1 p−1 j = 1, and f 1 , ..., f n are mg -measurable, then Z

n

n

∏ | f j| dmg ≤ ∏

R j=1

j=1

Z

R

| f j | p j dmg

1

pj

.

This can be proved by H¨older’s inequality for f 1 and ∏nj=2 f j and induction on Z

n

p1

∏ | f j| p1 −1 dmg .

R j=2

Now, we can establish the following general form of Minkowski’s inequality. Theorem 4.5.6. Let g1 , g2 : R → R be increasing and 1 ≤ p < ∞. If f : R2 → Re is mg1 ⊗g2 -measurable, then Z  Z R

R

| f |dmg2

p

1

p

dmg1



Z Z R

R

| f | p dmg1

1

p

dmg2 .

Proof. The case p = 1 follows immediately from the Fubini-Tonell’s theorem. R Next, let p ∈ (1, ∞). Suppose that F(x1 ) = R | f (x1 , x2 )|dmg2 (x2 ). Then a combination of Fubini-Tonelli’s theorem and H¨older’s inequality gives Z

R

p

Z

F p−1 F dmg1 Z  p−1  Z = F(x1 ) | f (x1 , x2 )| dmg2 (x2 ) dmg1 (x1 ) R ZR  Z  = | f (x1 , x2 )|F p−1 (x1 ) dmg1 (x1 ) dmg2 (x2 )

F dmg1 =

R

R



R

Z Z R

R

| f (x1 , x2 )| p dmg1 (x1 )

whence deriving the required inequality.

1 Z p

R

F p dmg1

 p−1

dmg2 (x2 ),

Remark 4.5.7. In Theorem 4.5.6, if f (x1 , 1) = f 1 (x1 ) and f (x1 , 2) = f 2 (x1 ) are mg1 -measurable, and if g2 is the following step function   0 , x2 ∈ (−∞, 1) g2 (x2 ) = 1 , x2 ∈ [1, 2] ,  2 , x2 ∈ (2, ∞)

121

Lebesgue-Radon-Stieltjes Integrals then the well-known Minkowski inequality of the triangle type: Z

R

p

| f 1 + f 2 | dmg1

 1p



Z

p

R

| f 1| dmg1

 1p

+

Z

R

p

| f 2 | dmg1

 1p

holds for p ∈ [1, ∞) by Remark 4.1.10 (ii) (a)-(b). In case of p = ∞, we can readily get  inf c ≥ 0 : mg ({x ∈ R : | f 1 (x) + f 2 (x)| > c}) = 0  ≤ inf c ≥ 0 : mg ({x ∈ R : | f 1 (x)| > c}) = 0  + inf c ≥ 0 : mg ({x ∈ R : | f 2(x)| > c}) = 0 . Furthermore, it is easy to verify that Z

R

| f 1 + f 2 | p dmg1 ≤

Z

R

| f 1 | p dmg1 +

Z

R

| f 2 | p dmg1

holds for p ∈ (0, 1). Based on Remark 4.5.7 and Definition 4.1.9, we consider the class of Lebesgue-Radon-Stieltjes p-integral functions. Definition 4.5.8. Let g : R → R be increasing and p ∈ (0, ∞].

(i) When p ∈ (0, ∞), the so-called Lebesgue-Radon-Stieltjes space LRSgp (R), often denoted L p (mg , R), comprises all functions f : R → Re satisfying | f | p ∈ LRSg (R) or Sg (| f | p ) < ∞. Regarding the limiting case p = ∞, LRS∞ g (R), often ∞ e denoted L (mg , R), stands for the class of all functions f : R → R with inf{c ≥ 0 : mg ({x ∈ R : | f (x)| > c}) = 0} < ∞. p

(ii) If E ⊆ R is mg -measurable, then LRSg (E) is defined to be the class of all functions f : R → Re with f 1E ∈ LRSgp (R). Remark 4.5.9. For the purpose of integration it does not make any difference whenever one changes function values on mg -null sets. Actually, one can integrate functions f that are only defined on an mg -measurable set E for which mg (E c ) = 0 simply by taking f to be 0 on E c = R \ E. In this manner, extended real-valued functions that are finite mg -a.e can be treated freely as real-valued functions.

122

Jie Xiao

Minkowski’s inequality in Remark 4.5.7 is used to produce the following convergence theorem in LRSgp (R). Theorem 4.5.10. Let g : R → R be increasing, p ∈ [1, ∞), and C0 (R) the class of all continuous functions on R vanishing outside a bounded and closed subset of R. (i) If

 p    f j , f ∈ LRSg (R); lim j→∞ Sg (| f j − f | p ) = 0; p   h ∈ LRS p−1 (R), g

then

 p p  lim j→∞ Sg (| f j| ) = Sg (| f | ); lim j→∞ Sg (| f jh − f h| p ) = 0;  R R  lim j→∞ R f j h dmg = R f hdmg .

(ii) If

f j ∈ LRSgp (R) and lim Sg (| f j − f k | p ) = 0, j,k→∞

then there exist a function f ∈

LRSgp (R)

and a subsequence { f jn }∞ n=1 such that

lim Sg (| f j − f | p ) = 0 and lim f jn (x) = f (x) for mg − a.e. x ∈ R. n→∞

j→∞

p

(iii) If f ∈ LRSg (R) and ε ∈ (0, 1), then there exists a function h ∈ C0 (R) such that Sg (| f − h| p ) < ε p . Proof. (i) It suffices to handle the case p ∈ (1, ∞) since p(p − 1)−1 is treated as ∞ when p = 1. Minkowski’s inequality in Remark 4.5.7 gives   S (| f | p ) 1p ≤ S (| f − f | p ) 1p + S (| f | p ) 1p ; g j g j g 1 1 1  Sg (| f | p ) p ≤ Sg (| f j − f | p ) p + Sg (| f j | p ) p ,

whence deriving the first limit result. The second and third limit assertions follow from an application of H¨older’s inequality to Z Z Z f j h dmg − f h dmg ≤ | f j − f ||h| dmg R

R

R



Sg (| f j − f | p )

1

p

p

Sg (|h| p−1 )

 p−1 p

.

123

Lebesgue-Radon-Stieltjes Integrals (ii) Choose { jn }∞ n=1 in N such that 1 Sg (| f j − f jn | p ) p ≤ 2−n for all j ≥ jn .

n−1 Then Fn = ∑k=1 | f jk+1 − f jk | increases with n ≥ 2 and hence by Minkowski’s inequality in Remark 4.5.7,

Sg (|Fn | p )

 1p

n−1



∑ k=1

Sg (| f jk+1 − f jk | p

 1p

n−1



∑ 2−k < 1.

k=1 p

By Fatou’s lemma (i.e. Lemma 4.3.3) there is an F ∈ LRSg (R) such that lim Fn (x) = F(x) for mg − a.e. x ∈ R.

n→∞

Accordingly, f j1 + ∑∞ k=1 ( f jk+1 − f jk ) is absolutely convergent and its partial sum sequence { f jn }∞ converges mg -a.e. on R to a function f ∈ LRSgp (R) thanks to n=1 | f jn | ≤ Fn ≤ F mg − a.e. on R. A further application of the Minkowski inequality in Remark 4.5.7 gives 1 1 1 Sg (| f j − f | p ) p ≤ Sg (| f j − f jn | p ) p + Sg (| f jn − f | p ) p ∞ 1 ≤ 2−n + ∑ Sg (| f jk+1 − f jk | p p k=n

≤ 2

2−n

,

which deduces the desired conclusion. (iii) For f ∈ LRSgp (R) and j ∈ N let   − j , f (x) < − j f j (x) = f (x) , | f (x)| ≤ j .  j , f (x) ≥ j Then

| f j (x) − f (x)| ≤ | f (x)| and lim | f j (x) − f (x)| = 0 for mg − a.e. x ∈ R, j→∞

and hence for any ε ∈ (0, 1) there is an j ∈ N such that Sg (| f j − f | p ) < 2−1 ε due to Theorem 4.3.1, and at the same time, there is a step function s on R such that  1p Z 1 1 Sg (|s − f j | p ) p ≤ (2 j)1− p |s − f j | dmg ≤ 2−1 ε. R

124

Jie Xiao

Accordingly, the Minkowski’s inequality in Remark 4.5.7 yields Sg (| f − s| p )

1

p

≤ Sg (| f − f j | p )

1

p

+ Sg (| f j − s| p

1

p

< ε.

Note that s = ∑nk=1 ck 1Ik where {ck }nk=1 are constants and {Ik }nk=1 are disjoint finite intervals in R. So, if each 1Ik can be approximated by C0 (R)-functions, then the argument is finished. Without loss of generality (see also the argument for Theorem 2.4.13), we may assume Ik = [0, 1]. For any ε > 0 let  0 , x ∈ (−∞, −ε)    −1  1 + ε x , x ∈ [−ε, 0)  φε (x) = 1 , x ∈ [0, 1] .  −1   ε (1 + ε − x) , x ∈ (1, 1 + ε]   0 , x ∈ (1 + ε, ∞) Then φ ∈ C0 (R) and 1[0,1] ≤ φ and

Z

R

|φε − 1[0,1] | p dmg ≤ mg ([−ε, 0)) + mg ((1, 1 + ε]).

Note that the right-hand side of the last inequality tends to zero as ε approaches zero since ∩ε>0 [−ε, 0) = 0/ = ∩ε>0 (1, 1 + ε]. So, the desired conclusion follows. The next application of Fubini-Tonelli’s theorem is to provide a very useful way evaluating a Lebesgue-Radon-Stieltjes integral via the classical Lebesgue integral or reducing a problem about an integral of a general function to a problem about the integration of characteristic functions. Theorem 4.5.11. Let g : R → R be increasing and p ∈ (0, ∞). If f : R → Re is mg -measurable, then Z

p

R

| f | dmg = p

Z ∞ 0

mg ({x ∈ R : | f (x)| > t})t p−1 dt.

Lebesgue-Radon-Stieltjes Integrals

125

Proof. Using Fubini-Tonelli’s theorem we have Z

R

p

| f | dmg =

Z Z |f| R

 dt p dmg

= p

Z 0Z ∞

= p

Z0 ∞

 1{0 t}) dt.

Remark 4.5.12. For a fixed point c ∈ R, let  0, xt} (c) dt.

Even more interesting and important is that Theorem 4.5.11 suggest a useful approach to define an integral on an abstract set. Definition 4.5.13. For a given set X let P (X) be the collection of all subsets of X and µ : P (X) → [0, ∞] an increasing set function; that is, µ satisfies both / = 0 and µ(E1) ≤ µ(E2 ) for E1 ⊆ E2 ⊆ X. Suppose f : X → [0, ∞]. Then µ(0) µ({x ∈ X : f (x) > t}) is an increasing function of t ∈ (0, ∞), and hence Z

f dµ =

Z ∞ 0

X

µ({x ∈ X : f (x) > t}) dt

can be used to define the integral of f on X. In general, for f : X → Re we write Z

X

f dµ =

Z

X

f + dµ −

Z

X

f − dµ.

Theorem 4.5.14. The integral introduced in Definition 4.5.13 enjoys the following four properties.

126

Jie Xiao

(i) f 1 , f 2 : X → [0, ∞] and f 1 ≤ f 2 implies (ii)

R

(iii) (iv)

R

X min{ f , c} dµ + X ( f

R

X

R

f dµ = limε→0+

X c f dµ

=c

R

X

R

+

− c) dµ =

X min{( f

R

R

X f 1 dµ

X f dµ for + − ε) , ε−1 } dµ.



R

X f 2 dµ.

any c ∈ [0, ∞).

f dµ for c ∈ [0, ∞).

Conversely, if a real-valued function L defined on the family of functions from P (X) to [0, ∞] obeys (i)-(ii)-(iii)-(iv) above, and µ(E) = L(1E ) is defined for any E ∈ P (X), then L( f ) =

Z

X

f dµ for all f : X → [0, ∞].

Proof. (i) This follows from the definition. (ii) This follows from the following two formulas: (R Rc X min{ f , c} dµ = 0 µ({x ∈ X : f (x) > t}) dt;  R R∞ + X ( f − c) dµ = c µ {x ∈ X : f (x) > t} dt.

(iii) This follows from the monotone property of the function t 7→ µ({x ∈ X : f (x) > t})

and the following calculation Z

X

min{( f − ε)+ , ε−1 } dµ =

Z ε+ε−1 ε

µ({x ∈ X : f (x) > t}) dt.

(iv) This follows from the substitution t = cs. For the reversed result, suppose that L satisfies (i)-(ii)-(iii)-(iv) and set µ(E) = L(1E ) for E ∈ P (X). Fix f : X → [0, ∞]. Then f ≥ ε1{x∈X:

f (x)>ε}

for any ε > 0,

and hence we may assume that µ({x ∈ X : f (x) > ε}) is finite – otherwise it follows from (i) that L( f ) ≥ εL(1{x∈X:

f (x)>ε} ) = ∞ =

Z

and so that the above inequality becomes an equality.

f dµ X

Lebesgue-Radon-Stieltjes Integrals

127

Given ε > 0 and n ∈ N, set  −1  c = ε + ε ; Tε ( f ) = min{( f − ε)+ , ε−1 };   Xn, j = {x ∈ X : Tε ( f ) > jcn−1 } for j ∈ {0, ..., n − 1}.

Using (ii) we obtain

 Z L Tε ( f ) − Tε ( f ) dµ X

n−1

=

L(min{Tε ( f ) − jcn−1 ,cn−1 1Xn, j }) −



j=0

Z

X

 min{Tε ( f ) − jcn−1 ,cn−1 1Xn, j } dµ .

Meanwhile, using (i) and (iv) we find the following two estimates: (  cn−1 µ(Xn, j+1) ≤ L min{Tε ( f ) − jcn−1 , cn−1 1Xn, j } ≤ cn−1 µ(Xn, j );  R cn−1 µ(Xn, j+1) ≤ X min{Tε ( f ) − jcn−1 , cn−1 1Xn, j } dµ ≤ cn−1 µ(Xn, j ).

Therefore  Z L Tε ( f ) − Tε ( f ) dµ ≤ cn−1 µ(Xn,0 ) = cn−1 µ({x ∈ X : f (x) > ε}). X

Since ε > 0 and n ∈ N are arbitrary, this last estimate, along with (iii), must R ensure L( f ) = X f dµ.

As an aside of Theorem 4.5.14, we can establish the following properties of the above-defined integrals. Corollary 4.5.15. For a set X let µ : P (X) → [0, ∞] be an increasing set function, f : X → [0, ∞] and E ⊆ X. (i) If c ∈ [0, ∞), then

Z

X

( f + c)1E dµ =

Z

X

f 1E dµ + cµ(E).

(ii) If µ is sub-additive; that is, µ(E1 ∪ E2 ) ≤ µ(E1 ) + µ(E2 ) for all E1 , E2 ⊆ X, then

Z

X

f 1S dµ ≤

Z

X

f 1S∩E dµ +

Z

X

f 1S\E dµ for all S ⊆ X

with equality if E is µ-measurable; that is, µ(A) = µ(A ∩ E) + µ(A \ E) for all A ⊆ X.

128

Jie Xiao

Proof. (i) This follows from Theorem 4.5.14 (ii) applied to + min{( f + c)1E , c} = c1E and ( f + c)1E − c = f 1E .

(ii) Using both the sub-additivity of µ and the equality S = (S ∩ E) ∪ (S \ E) for any S ⊆ X, we have Z

X

f 1S dµ

= ≤ +

Z ∞

Z0 ∞

Z0 ∞ 0

=

Z

X

µ({x ∈ X : 1S (x) f (x) > t}) dt µ({x ∈ X : 1S (x) f (x) > t}) dt µ({x ∈ X : 1S\E (x) f (x) > t}) dt

f 1S∩E dµ +

Z

X

f 1S\E dµ,

where equality occurs when E is µ-measurable since 1S = 1S∩E + 1S\E . Remark 4.5.16. We can develop a measure theoretic based theory of integration via establishing countable subadditivity/additivity, dominated convergence theorem, monotone convergence theorem and Fatou’s lemma, Fubini-Tonelli’s theorem and so on.

Problems 4.1. Let g : R → R be determined by  x, x 0 Find mg (I) when I = (0, 1); [0, 1]; (−1,1);[0,0].

4.2. Suppose g : R → R is an increasing function.

(i) Prove that if S1 and S2 are disjoint simple subsets of R then mg (S1 ∪ S2 ) = mg (S1 ) + mg (S2 ). Construct examples to show that if S1 and S2 are not disjoint, then mg (S1 ∪ S2 ) may or may not equal mg (S1 ) + mg (S2 ).

(ii) Prove that if S1 and S2 are simple subsets of R such that S1 ⊆ S2 then mg (S2 \ S1 ) = mg (S2 ) − mg (S1 ). Construct examples to show that if S1 and S2 are simple with S1 6⊆ S2 then mg (S2 \ S1 ) may or may not equal mg (S2 ) − mg (S1 ).

Lebesgue-Radon-Stieltjes Integrals 4.3. Let g : R → R be given by g(x) =



129

2−1 x , x < 0 . 1, x≥0

Decide whether or not the following five functions are step functions on R. If the answer is yes, then find the corresponding Ag (·): (i) s1 (x) = (ii)



0 , x ∈ R \ (−∞, −1) ; 1 , x ∈ [−1, 0)

  −2 , x ∈ [−1, 0) s2 (x) = 1 , x ∈ [0, 1] ;  0 , x ∈ [−1, 1]

(iii)

  2 , x ∈ [−1, 3] s3 (x) = 1 , x ∈ (3, ∞) ;  0 , x ∈ R \ (−∞, −1)

(iv)

s4 (x) = (v)

 

3 , x ∈ (−2, −1) −3 , x ∈ [−1, 1) ;  0 , x ∈ R \ (−2, 1)

s5 (x) =



−1 , x ∈ (−∞, 0) . 1 , x ∈ [0, ∞)

∞ 4.4. Give a sequence of step functions {sk}∞ k=1 on R such that f = ∑k=1 sk is not a step function and Sid ( f ) = 0.

4.5. Let f : R → R be the following function   n−1 , x ∈ [n − 1, n − 2−1 ), n ∈ N f (x) = −n−1 , x ∈ [n − 2−1 , n), n ∈ N .  0 , x ∈ (−∞, 0) (i) Evaluate

Rb 0

(ii) Prove that

f (x) dx for any b > 0.

R∞ 0

f (x) dx exists.

130

Jie Xiao

(iii) Prove that limb→∞

Rb 0

R

| f (x)|dx and

R 1[0,∞) f

dmid do not exist.

4.6. Prove the following mean value inequality for the Lebesgue-Radon-Stieltjes integrals. Let g : R → R be increasing and I ⊆ R be an finite interval. If 1I f ∈ LRSg (R) and c1 ≤ f ≤ c2 on I for some finite constants c1 , c2 , then c1 mg (I) ≤ R R f 1I dmg ≤ c2 mg (I). 4.7. (i) Given an increasing function g : R → R. Show that if f 1 , f 2 , ..., f n ∈ LRSg (R) then Z  n  n Z R ∑ f k dmg ≤ ∑ R | f k | dmg . k=1

k=1

(ii) Let g = Prove that

∑nk=1 ck gk ,

where each gk : R → R is increasing and each ck ≥ 0.

f ∈ ∩nk=1 LRSgk (R) ⇒ f ∈ LRSg (R) and

n

Z

R

f dmg =

∑ ck k=1

Z

R

f dmgk .

4.8. (i) Show that the union of two mg -null sets is an mg -null set. (ii) Give an example of a finite subset E ⊆ R and an increasing function g : R → R such that E is not an mg -null set. (iii) Let g : R → R be an increasing function. Verify that if f 1 , f 2 ∈ LRSg (R) and R R f 1 ≤ f 2 mg -a.e. on R, then R f 1 dmg ≤ R f 2 dmg .

4.9. Let g : R → R be an increasing function. Suppose that { f k}∞ k=1 is a sequence in LRSg (R) and ∑∞ f = f exists m -a.e. on R. Prove the following g k=1 k results. (i) If there is a function F ∈ LRSg (R) such that ∑n f k ≤ F on R, then k=1



Z

R

(ii) If f k ≥ 0 or f k ≤ 0 on R and ∑∞ k=1

R

f ∈ LRSg (R) and

f ∈ LRSg (R) and

Z

f dmg =

R f k dmg

R



Z

k=1 R

f k dmg .

converges, then ∞

f dmg =



Z

k=1 R

f k dmg .

131

Lebesgue-Radon-Stieltjes Integrals 4.10. For n ∈ N let f n (x) =



1 , x ∈ [n, n + 1) . 0 , x ∈ R \ [n, n + 1)

(i) Find f = limn→∞ f n . (ii) Prove Z

lim

n→∞ R

f n dmid 6=

Z

R

f dmid .

(iii) Point out which hypothesis of the monotone convergence theorem is not valid in this case, and write down the reasoning why it is not valid. 4.11. If α > 0, prove the following two formulas: (i) lim

Z

n→∞ R

−1

n

1[0,n] (x)(1 − n x) |x|

α−1

dmid (x) =

Z

R

1(0,∞)(x)e−x|x|α−1 dmid (x);

(ii) ∞ Z



n=1 R

1(0,∞) (x)

 |x|α  enx

dmid (x) =

Z

R



1(0,∞)(x)(ex − 1)|x|α dmid (x) = Γ(1 + α) ∑ n−1−α. n=1

4.12. Prove the following formulas: (i) Z

R

1(0,1](x)|x|

−p

dmid (x) =



(1 − p)−1 , ∞,

p ∈ (0, 1) ; p ∈ [1, ∞)

(ii) lim

Z

n→∞ R

1

1(0,∞)(x)|x|− n (1 + n−1 |x|)−n dmid (x) = 1.

4.13. For each n ∈ N and all x ∈ R, let f n (x) = e−nx − 2e−2nx.

(i) Prove that f (x) = ∑∞ n=1 f n (x) is convergent for all x > 0, and calculate f (x). (ii) Prove that each 1(0,∞) f n and 1(0,∞) f , is Lebesgue integrable on R. (iii) Compare

R

R 1(0,∞) (x) f (x) dmid (x)

and ∑∞ n=1

R

R 1(0,∞) (x) f n (x) dmid (x).

132

Jie Xiao

4.14. (i) For each n ∈ N let fn =

(

1[0, 1 ] , n is even 4 . 1[ 1 ,1] , n is odd 4

Calculate the following four numbers:  R∞  N = lim inf  1 k→∞ n≥k 0 f n (x) dx;   N = R ∞ lim 2 k→∞ infn≥k f n (x) dx; 0 R  N3 = limk→∞ supn≥k 0∞ f n (x) dx;    N = R ∞ lim 4 k→∞ supn≥k f n (x) dx. 0

(ii) If { f n }∞ n=1 is a sequence of positive Lebesgue measurable function, what can be said of the four numbers above, and more particularly about N3 and N4 ? 4.15. Let g(x) = x on R. If f ∈ LRSg (0, ∞) is uniformly continuous on (0, ∞), prove limx→∞ f (x) = 0. 4.16. Let g : R → R be increasing and   f : [a, b] × [c, d] → R;    y ∈ [c, d];  f (·, y) ∈ LRSg [a, b];    F(y) = R [a,b] f (x, y) dmg (x).

(i) If there is an h ∈ LRSg ([a, b]) such that | f (x, y)| ≤ h(x) for all (x, y) ∈ [a, b] × [c, d], and if limy→y0 f (x, y) = f (x, y0 ) for all (x, y0 ) ∈ [a, b] × [c, d], then prove that limy→y0 F(y) = F(y0 ) - in particular - if f (x, ·) is continuous for every x ∈ [a, b], then F is continuous. (ii) If ∂y f (x, y) exists and there is an h ∈ LRSg ([a, b]) such that |∂y f (x, y)| ≤ h(x) for all (x, y) ∈ [a, b] × [c, d], then prove that F is differentiable and F 0 (y) = [c, d].

[a,b] ∂y

R

f (·, y)dmg (·) for any y ∈

Lebesgue-Radon-Stieltjes Integrals

133

4.17. Let g : R → R be increasing and Mg be the σ-ring of all mg -measurable sets. Prove that if {E j }∞j=1 is a sequence of sets in Mg then mg (∪∞j=1 E j ) ≤ ∑∞j=1 mg (E j ). 4.18. Given Ran increasing function g : R → R. For an mg -measurable set E ⊆ R let Lg (E) = E dmg < ∞. If 1E f : R → Re is mg -measurable and E j = {x ∈ E : j − 1 ≤ f (x) < j} for each j ∈ Z, prove that Z

E



| f | dmg < ∞ ⇔



j=−∞

| j|mg(E j ) < ∞.

4.19. Let g : R → R be increasing. Prove that f : R → Re is mg -measurable if and only if {x ∈ R : f (x) ≥ c} is mg -measurable for any c ∈ R. 4.20. Let g : R → R be increasing. A sequence { f n }∞ n=1 of mg -measurable functions on R is said to be: (i) convergent to f in the Lebesgue-Radon-Stieltjes g-measure provided that lim mg ({x ∈ R : | f n (x) − f (x)| ≥ ε}) = 0 for any ε > 0;

n→∞

(ii) a Cauchy sequence in the Lebesgue-Radon-Stieltjes g-measure provided that lim mg ({x ∈ R : | f k(x) − f n (x)| ≥ ε}) = 0 for any ε > 0.

k,n→∞

Prove the following results. R

(iii) If limn→∞ g-measure mg .

R | fn −

f | dmg = 0, then f n → f in the Lebesgue-Radon-Stieltjes

(iv) If { f n }∞ n=1 is a Cauchy sequence in the Lebesgue-Radon-Stieltjes g-measure, then there exists an mg -a.e. unique function f such that { f n }∞ n=1 is convergent ∞ to f in mg , and hence there exists a subsequence { f nk }k=1 of { f n }∞ n=1 which is convergent to f mg -a.e. on R. (v) If { f n }∞ mg -measurable functions which conn=1 is a sequence of nonnegative R verges to f mg -a.e. on R, then limn→∞ R f n dmg = 0 ⇒ f = 0 mg -a.e. on R.

(vi) Give an example of g to show that f n → f mg -a.e. on R does not ensure f n → f in the g-measure mg .

134

Jie Xiao

4.21. For p ∈ (0, 1) let f 1 (x) = |x|−p 1{x∈R: |x| 0. Then:

 (i) there is a union F of finitely many intervals such that mid (E \F)∪(F \E) < ε; (ii) there is an open set O ⊇ E such that 0 ≤ mid (O) − mid (E) < ε;

(iii) there are an N ∈ N and a decreasing sequence of open sets {O j }∞j=1 such that O j ⊇ E and 0 ≤ mid (O j ) − mid (E) < ε when j ≥ N.

136

Jie Xiao

Proof. (i) Since 1E is mid -measurable, there is a sequence of step functions {ψ j }∞j=1 such that 1E = lim j→∞ ψ j mid -a.e. on E. Upon choosing φ j (x) =



1 , ψ j (x) ≥ 2−1 , 0 , ψ j (x) < 2−1

we read off that lim j→∞ φ j = 1E mid -a.e. on E. Note that φ j is an indicator 1Fj of a set Fj which is a union of finitely many intervals. So, we have max{φ j − 1E , 0} = 1Fj \E ; max{1E − φ j , 0} = 1E\Fj ; |φ j − 1E | = 1Fj \E + 1E\Fj . Since 

lim mid (E \ Fj ) + mid (Fj \ E) = lim

j→∞

there is an n0 ∈ N such that

Z

j→∞ R

|φ j − 1E | dmid = 0,

mid (E \ Fn0 ) + mid (Fn0 \ E) < ε, as desired. (ii) In accordance with (i), for each j ∈ N we can get an open set Fj such that  mid (E \ Fj ) ∪ (Fj \ E) < 2− j−1 ε. Upon choosing

F0 = ∪∞j=1 Fj ; F∗ = ∩∞j=1 (E \ Fj ); F? = ∪∞j=1 (Fj \ E), we obtain ∞

mid (F∗ ) = lim mid (E \ Fj ) = 0 and mid (F? ) ≤ j→∞

∑ mid (Fj \ E) < 2−1ε.

j=1

Note that mid (F∗ ) = 0 and F0 = (E ∩ F0 ) ∪ (F0 \ E) ⊆ E ∪ F? . So, not only there is an open set O0 ⊇ F∩ enjoying mid (O0 ) < 2−1 ε, but also, O = F0 ∪ O0 is an open set containing E and satisfies mid (O) ≤ mid (F0 ) + mid (O0 ) ≤ mid (E) + mid (F∪ ) + mid (O0 ) < mid (E) + ε,

Absolute Continuities in Lebesgue Integrals

137

as desired. (iii) According to (ii), for each j ∈ N there are an N ∈ N and an open set Fj ⊇ E such that j ≥ N ⇒ 0 ≤ mid (Fj ) − mid (E) < ε. j

If O j = ∩k=1 Fk , then O j is open, O j ⊇ O j+1 and j ≥ N ⇒ 0 ≤ mid (Fj ) − mid (E) ≤ mid (O j ) − mid (E) < ε, as desired. Next, we bring the concepts of Lebesgue’s outer measure and Vitali’s covering (due to Giuseppe Vitali) into play. Definition 5.1.2. Let E ⊆ R.

(i) The Lebesgue outer measure mid,∗ (E) of E is defined as the infimum of ∑ j mid (I j ) for all sequences of open intervals I j ⊆ R enjoying E ⊆ ∪ j I j . (ii) A Vitali’s covering V of E is a family of closed intervals such that for any pair (ε, x) ∈ (0, ∞) × E there is an I ∈ V with 0 < mid (I) < ε and x ∈ I.

Theorem 5.1.3. For j ∈ N let E, E j ⊆ R obey mid,∗ (E), mid,∗ (E j ) < ∞ and E have a Vitali’s covering V and its R 3 t-translation E + t = {x + t : x ∈ E}. Then (i) mid,∗ (E) = inf{mid (O) : open O ⊇ E} = mid,∗ (E + t). (ii)   (ii − 1) E1 ⊆ E2 ⇒ mid,∗ (E1 ) ≤ mid,∗ (E2 ); (ii − 2) E j ⊆ E j+1 ⇒ mid,∗ (∪∞j=1 E j ) = lim j→∞ mid,∗ (E j );   (ii − 3) mid,∗ (∪∞j=1 E j ) ≤ ∑∞j=1 mid,∗ (E j ).

(iii) mid,∗ (E0 ) = mid (E0 ) for any mid -measurable set E0 ⊆ R - in particular mid,∗ (I) = mid (I) for any interval I ⊆ R. (iv) for an arbitrarily ε > 0 there exist mutually disjoint intervals I1 , ..., In ∈ V such that mid,∗ (E \ ∪nj=1 I j ) < ε.

138

Jie Xiao

Proof. (i) On the one hand, the translation-invariance mid,∗ (E) = mid,∗ (E + t) follows from Definition 5.1.2(i) and the translation-invariance mid (I + t) = mid (I) for any interval I ⊆ R. On the other hand, if O ⊆ R is an open set containing E, then there exists a sequence {I j }∞j=1 of mutually disjoint open intervals such that O = ∪∞j=1 I j , and hence  ∞  1O = ∑ j=1 1I j ; mid (O) = ∑∞j=1 mid (I j );   mid,∗ (E) ≤ mid (O). Furthermore, for any ε > 0 if I j = (a j , b j ), then its ε-enlargement I j,ε = (a j − ε2− j−1 , b j + ε2− j−1 ) produces an open set Oε = ∪∞j=1 I j,ε ⊇ ∪∞j=1 I j = O ⊇ E such that



mid (Oε ) ≤



∑ mid (I j,ε ) ≤ ∑ mid (I j ) + ε

j=1

j=1

and hence Definition 5.1.2(i) ensures mid (O) ≤ mid (Oε ) ≤ mid,∗ (E) + ε. Accordingly, mid,∗ (E) ≤ inf{mid (O) : open O ⊇ E} ≤ mid,∗ (E) + ε which yields the desired equality. (ii) Regarding (ii-1), if E1 ⊆ E2 , then any interval covering of E2 is an interval covering of E1 , and hence Definition 5.1.2(i) derives mid,∗ (E1 ) ≤ mid,∗ (E2 ). Regarding (ii-2), we use (ii-1) to get that if E j ⊆ E j+1 then mid,∗ (E j ) ≤ mid,∗ (E j+1 ) and hence lim mid,∗ (E j ) ≤ mid,∗ (∪∞j=1 E j ).

j→∞

So, if lim j→∞ mid,∗ (E j ) = ∞, then (ii-2) is trivially true. This leads us to considering the case lim j→∞ mid,∗ (E j ) < ∞. Now, for any ε > 0 and j ∈ N, according to (i) there is a sequence of mutually disjoint open intervals {I j,k }∞ k=1 such that E j ⊆ ∪∞ k=1 I j,k = I j and mid (I j ) =



∑ mid (I j,k ) < mid,∗(E j) + 2− j ε.

k=1

Absolute Continuities in Lebesgue Integrals

139

Note that I j is open and mid (I1 ) < mid,∗ (E1 ) + (1 − 2−1 )ε. So, if J j = ∩∞ k= j Ik (which is mid -measurable, increases and contains E j ), then by induction we may assume j mid (∪k=1 Jk ) < mid,∗ (E j ) + (1 − 2− j )ε

to achieve (via Theorem 4.4.6(ii)) j+1

mid (∪k=1Jk ) = < =

j

j

mid (∪k=1Jk ) + mid (J j+1 ) − mid ((∪k=1Jk ) ∩ J j+1)

(mid,∗ (E j ) + (1 − 2− j )ε) + (mid,∗ (E j+1 ) + 2− j−1ε) − mid,∗ (E j ) mid,∗ (E j+1) + (1 − 2− j−1)ε.

Consequently, we obtain mid (∪lk=1Jk ) < mid,∗ (El ) + (1 − 2−l )ε for all l ∈ N, thereby finding via sending ε → 0, mid (∪lk=1 Jk ) ≤ mid,∗ (El ) for all l ∈ N. This in turn implies mid,∗ (∪∞j=1 E j ) ≤ mid (∪∞j=1 J j )

k = mid (∪∞ k=1 ∪ j=1 J j )

= ≤

lim mid (∪kj=1 J j )

k→∞

lim mid,∗ (Ek).

k→∞

thereby reaching the desired equality. Regarding (ii-3), for any ε > 0 and each E j , according to (i) there is a sequence of mutually disjoint intervals {I j,k }∞ k=1 such that E j ⊆ ∪∞ k=1 I j,k and



∑ mid (I j,k ) = mid (∪∞k=1I j,k ) < mid,∗(E j ) + ε2− j .

k=1

Note that ∪∞j,k=1 I j,k ⊇ ∪∞j=1 E j . So we have mid,∗ (∪∞j=1 E j ) ≤





j,k=1



mid (I j,k ) ≤

∑ mid,∗(E j) + ε,

j=1

140

Jie Xiao

as desired. (iii) Clearly, if E0 ⊆ R is mid -measurable, then (i) and Theorem 5.1.1(ii) derive mid,∗ (E0 ) = inf{mid (O) : open O ⊇ E0 } = mid (E0 ). However, for the special case that E0 is an interval I ⊆ R, we offer a direct argument. Note that mid,∗ (I) ≤ mid (I) follows from Definition 5.1.2(i). So, it is required to verify the reversed inequality. To do so, we consider three situations. The first is that I is a compact (bounded and closed) interval [a, b]. If {I j }∞j=1 is a sequence of open intervals covering I, then there is a finite number of open intervals - for example - {I jk }nk=1 such that I ⊆ ∪nk=1 I jk , and hence there is an open interval, say, I j1 = (a1 , b1 ) such that a1 < a < b1 . Of course, if b < b1 , then I ⊆ I j1 , and hence n

mid (I) ≤ mid (I j1 ) ≤

∑ mid (I j ). k

k=1

However, if b ≥ b1 , then there exists another open interval, say, I j2 = (a2 , b2 ) such that a2 < b1 < b2 . This yields a finite number of open intervals, say, {I jk = (ak , bk)}lk=1 with a1 < a and ak < bk−1 < bk as well as b < bl . Accordingly, n

l

l

∑ mid (I j ) ≥ ∑ (bk − ak ) = bl + ∑ (bk−1 − ak ) − a1 > b − a = mid (I). k

k=1

k=1

k=2

So, mid,∗ (I) = mid (I). The second is that I is a bounded interval. Regarding this case, for a sufficiently small ε > 0 there exists a compact interval J ⊆ I such that mid (I) < mid (J) + ε. Consequently, mid,∗ (I) ≥ mid,∗ (J) = mid (J) > mid (I) − ε. Thus, the desired estimate mid,∗ (I) ≥ mid (I) ≥ mid,∗ (I) follows. The third is that I is an unbounded interval. Regarding this case, for any ` > 0 there exists a compact interval J ⊆ I such that mid,∗ (J) = mid (J) = `. Accordingly, mid,∗ (I) ≥ mid,∗ (J) = ` and so mid,∗ (I) = ∞ = mid (I).

Absolute Continuities in Lebesgue Integrals

141

(iv) First of all, according to (i) we may assume that for a Vitali’s covering

V of E there is an open set O with mid (O) < ∞ and E ⊆ O. Note that this last inclusion implies that if x ∈ E then there are an open interval (x − r, x + r) ⊆ O and an interval I = [a, b] ∈ V with mid (I) < 2−1 r. So, it follows that   b < a + 2−1 r;    a ≤ x ≤ b;  y ∈ I ⇒ |x − y| ≤ |x − a| + |y − a| < r;    y ∈ (x − r, x + r) ⊆ O.

Accordingly, we may further assume that any interval I ∈ V is a subset of O, thereby getting that if {I j }∞j=1 ⊆ V are mutually disjoint then ∞

∑ mid (I j ) ≤ mid (O) < ∞. j=1

Moreover, because I j is closed and V is a Vitali’s covering of E, it follows that {I j }nj=1 ⊆ V ⇒ E \ ∪nj=1 I j ⊆ ∪I∈Vn I where





Vn = I ∈ V : I ∩ ∪nj=1 I j = 0/ for all n ∈ N. As a matter of fact, if x ∈ E \ ∪nj=1 I j , then x ∈ / I j for any j ∈ {1, ..., n}, and / Upon choosing r0 = the closedness of I j produces (x − r j , x + r j ) ∩ I j = 0. 2−1 min{r1 , ..., rn}, we get a closed interval Ir0 ∈ V such that mid (Ir0 ) < r < 2−1 r j and x ∈ Ir0 , thereby finding   Ir0 ⊆ (x − r j , x + r j ); / Ir0 ∩ I j = 0;   x ∈ Ir0 ⊆ ∪I∈Vn I.

If there exists a finite mutually disjoint subfamily of V which covers E, then the argument is over. Otherwise, we can inductively select a mutually disjoint countable subfamily {I j = [c j − r j , c j + r j ]}∞j=1 of V such that Jk = [ck − 5rk , ck + 5rk ] ⇒ E \ ∪nj=1 I j ⊆ ∪∞j=n+1 J j for all k, n ∈ N.

142

Jie Xiao

To be more precisely, let n ∈ N and I1 ∈ V be selected arbitrarily, and suppose that the mutually disjoint closed intervals {I j }nj=1 ⊆ V have been chosen. Since E 6⊆ ∪nj=1 I j , it follows that  Vn 6= 0/ and so κn = sup mid (I) : I ∈ Vn } ≤ mid (O) < ∞. Now, via choosing

In+1 ∈ Vn+1 with mid (In+1 ) > 2−1 κn , we obtain inductively a countable mutually disjoint subfamily of V such that I ∈ V and I ∩ ∪nj=1 I j = 0/ ⇒ mid (In+1) > 2−1 mid (I). Since



lim mid (I j ) = 0, ∑ mid (I j ) < ∞ ⇒ j→∞

j=1

in order to validate E \ ∪nj=1 I j ⊆ ∪∞j=n+1 J j

let x ∈ E \ ∪nj=1 I j . Then x ∈ ∪I∈Vn I and hence there exists I ∈ V which is disjoint from ∪nj=1 I j and contains x. Now, this closed interval I must enjoy the property I ∩ I j0 6= 0/ for some j0 ∈ {1, ..., n} - otherwise - mid (Ik ) > 2−1 mid (I) for all k ∈ N, contradicting limk→∞ mid (Ik ) = 0. Suppose that n0 is the first integer 0 −1 / Then n0 > n. Thanks to I ∩ ∪nj=1 / we have obeying x ∈ I ∩ In0 6= 0. I j = 0, −1 mid (In0 ) > 2 mid (I). Note that the distance from x to the centre cn0 of In0 is at most mid (I) + 2−1 mid (In0 ). Thus mid (I) < 2mid (In0 ). This in turn implies that the distance from x to cn0 is less than 2−1 5mid (In0 ) and hence x ∈ Jn0 ⊆ ∪ j=n+1 J j due to n + 1 < n0 + 1 ⇒ n + 1 ≤ n0 .

Now, for any ε > 0 there is n ∈ N such that ∑∞j=n+1 mid (I j ) < 5−1 ε. Consequently, both (ii) and (iii) are used to derive mid,∗ (E \ ∪nj=1 I j ) ≤ mid,∗ (∪∞j=n+1 J j ) ∞





mid,∗ (J j )

j=n+1 ∞

= 5



j=n+1

< ε.

mid (I j )

Absolute Continuities in Lebesgue Integrals

143

Remark 5.1.4. The property (ii-3) in Theorem 5.1.3 means that mid,∗ is countably subadditive. And yet, mid,∗ is not countably additive. To see this, we say that two points x, y in the closed interval [0, 1] are equivalent, denoted x ∼ y, whenever x − y ∈ Q. According to the Axiom of Choice, we take such a set E? ⊆ [0, 1] that contains exactly one point from each equivalence class decided by ∼, thereby finding (i) / r1 , r2 ∈ Q and r1 6= r2 ⇒ (E? + r1 ) ∩ (E? + r2 ) = 0; (ii) [0, 1] ⊆ ∪r∈Q∩[0,1] (E? + r) ⊆ [−1, 2]; (iii)

(iv)

  1 = mid,∗ ([0, 1]) ≤ mid,∗ ∪r∈Q∩[0,1] (E? + r) ≤ mid,∗ [−1, 2] = 3; r ∈ Q ∩ [0, 1] ⇒ mid,∗ (E? + r) = mid,∗ (E? ).

Accordingly, if mid,∗ is countably additive, then 1 ≤ mid,∗ (∪r∈Q∩[0,1] (E? + r))



=

mid,∗ (E? + r)

r∈Q∩[0,1]



=

mid,∗ (E? )

r∈Q∩[0,1]

≤ 3, and hence we have a contradiction no matter if mid,∗ (E? ) is finite or infinite. But nevertheless, E? is not mid -measurable - otherwise - we use (iii) of Theorem 5.1.3 to achieve mid,∗ (E? ) = mid (E? ) < ∞, which leads to the last contradiction.

5.2 Derivatives of Increasing Functions Recall that if f : [a, b] → R has derivative f 0 (c) at any c ∈ (a, b) then for any ε > 0 there is a closed interval Ic,ε such that it contains c in its interior and enjoys f (x) − f (c) 0 mid (Ic,ε ) < ε and − f (c) < ε for x ∈ Ic,ε ∩ [a, b]. x−c

144

Jie Xiao

Then V = {Ic,ε } is a Vitali’s covering of (a, b). Accordingly, it is natural to consider four one-sided derivatives. Definition 5.2.1. Given f : [a, b] → R and c ∈ (a, b).

(i) Four one-sided limits:   lim sup00 sup0 mid,∗ (E) − 2ε.



This in turn implies m0



j=1

m0  f (y j + η j ) − f (y j ) > R ∑ η j > R(mid,∗ (E) − 2ε). j=1

At the same time, noticing that each J j is a subset of some In , we sum only over those j’s with J j ⊆ In and use the hypothesis that f is increasing to obtain  ∑ f (y j + η j ) − f (y j ) ≤ f (xn) − f (xn − hn ). j: J j ⊆In

Furthermore, via summing over all such n, we estimate n0



n=1

 f (xn ) − f (xn − hn ) ≥

m0



j=1

 f (y j + η j ) − f (y j ) > R(mid,∗(E) − 2ε).

The above two-fold argument, plus r < R, derives R(mid,∗(E) − 2ε) < r(mid,∗ (E) + ε) and mid,∗ (E) < (R − r)−1 (r + 2R)ε. Therefore, mid (E) = 0. Below is Lebesgue’s theorem on the derivative of an increasing function. Theorem 5.2.4. If f : [a, b] → R is increasing, then f 0 exists mid -a.e. on [a, b], is nonnegative and enjoys Z

[a,b]

f 0 dmid ≤ f (b) − f (a)

with the strict inequality holding for any increasing step function f with f (a) < f (b).

147

Absolute Continuities in Lebesgue Integrals Proof. Note that is equal to

[ 

r,R∈Q

 x ∈ (a, b) : D− f (x) < D+ f (x)

x ∈ (a, b) : D− f (x) < r < R < D+ f (x) .

So, Proposition 5.2.3 deduces

 mid {x ∈ (a, b) : D− f (x) < D+ f (x)} = 0,

whence

D− f (x) ≥ D+ f (x) for mid − a.e. x ∈ [a, b].

Since − f (−x) is increasing on [−b, −a], substituting − f (−x) for f (x) in the last a.e.-statement gives that D− f (x) ≤ D+ f (x) for mid − a.e. x ∈ [a, b]. Consequently, D+ f (x) ≤ D− f (x) ≤ D− f (x) ≤ D+ f (x) for mid − a.e. x ∈ [a, b].

This in turn implies the existence of f 0 (x) for mid -a.e. x ∈ [a, b]. In order to verify the desired integral inequality, for n ∈ N we extend the definition of f via setting f (x) = f (b) as x > b and define mid -measurable function  f n (x) = n f (x + n−1 ) − f (x) for all n ∈ N.

Clearly, f n (x) ≥ 0 and f n (x) → f 0 (x) ≥ 0 for mid -a.e. x ∈ [a, b]. Since f 0 is mid measurable, we can utilize Fatou’s lemma and the monotonicity of f to estimate Z

[a,b]

f 0 dmid ≤ lim inf n→∞

Z

[a,b]

= n lim inf n→∞

f n dmid

Z

[a,b]

 f (x + n−1 ) − f (x) dmid (x) ! −1 Z

= n lim inf

Z b+n

= n lim inf

Z b+n−1

n→∞

n→∞

=

b

f (b) − n lim sup n→∞



f dmid −

a+n−1

f (b) − f (a).

f dmid −

Z a+n−1 a

[a,b]

f dmid

Z a+n−1 a

f dmid

f dmid

!

148

Jie Xiao

Note that if f is an increasing step function on [a, b] then f 0 = 0 mid -a.e. on [a, b]. So if f (a) < f (b) is further assumed, then there is a strict inequality Z

[a,b]

f 0 dmid = 0 < f (b) − f (a).

5.3 Absolutely Continuous Functions The class BV [a, b] more or less leads to the following concept. Definition 5.3.1. A function f : [a, b] → R is said to be absolutely continuous on [a, b] provided that for any ε > 0 there exists δ > 0 such that for any finite sequence of mutually disjoint intervals (a1 , b1 ), ..., (an, bn ) ⊆ [a, b] one has the implication n

n

∑ (b j − a j ) < δ ⇒ ∑ | f (b j ) − f (a j )| < ε. j=1

j=1

The class of all absolutely continuous functions on [a, b] is written as AC[a, b]. Example 5.3.2. If 3

(i) x 7→ f (x) = x 2 sin(x−1 ) with f (0) = 0 is in AC[a, b]. √ (ii) if 0 < α ≤ 2−1 then x 7→ f 2−1 (x) = x is in AC[0, 1] and satisfies sup x6=y in [0,1]

| f 2−1 (x) − f 2−1 (y)| < ∞. |x − y|α

Proposition 5.3.3. The basic properties of AC[a, b] are listed below. (i) The sum, difference and product of two AC[a, b]-functions are also AC[a, b]functions. (ii) If an AC[a, b]-function is nowhere zero, then its reciprocal is also an AC[a, b]function. (iii) If f is of Lipschitz class on [a, b], denoted f ∈ Lip[a, b] - namely sup x6=y in [a,b]

| f (x) − f (y)| < ∞, |x − y|

Absolute Continuities in Lebesgue Integrals

149

equivalently, for any ε > 0 there is a δ > 0 such that for any finite family of open intervals (x j , y j ) ⊆ [a, b] with n

n

∑ (y j − x j ) < δ ⇒ ∑ | f (x j ) − f (y j )| < ε, j=1

j=1

then f ∈ AC[a, b], and hence f is uniformly continuous on [a, b], denoted f ∈ UC[a, b] - i.e. - for any ε > 0 there is δ > 0 such that x, y ∈ [a, b] and |x − y| < δ ⇒ | f (x) − f (y)| < ε. (iv) If f ∈ Lip[c, d] and φ ∈ AC[a, b] obeys φ([a, b]) ⊆ [c, d] ⊆ R, then f ◦ φ ∈ AC[a, b]. (v) If f ∈ AC[a, b], then f ∈ BV [a, b], and hence f is differentiable mid -a.e. on [a, b] with Z  | f 0 | dmid ≤ 2Vab f − f (b) − f (a) . [a,b]

(vi) Any f ∈ AC[a, b] has Luzin’s invariant property  -namely- if E ⊆ [a, b] satisfies mid (E) = 0 then f (E) ⊆ R satisfies mid f (E) = 0. Proof. Since (i)-(ii)-(iii) follow directly from Definition 5.3.1, we are about to validate (iv)-(v)-(vi). Firstly, for (iv), we use the condition that f is Lipschitz continuous on [c, d] to get a constant M = 1+

sup x6=y in [c,d]

| f (x) − f (y)| 0 there is δ > 0 such that n

∑ |φ(x j) − φ(x j−1 )| < M−1ε j=1

holds for any partition n

{a = x0 < x1 < ... < xn = b} with

∑ (x j − x j−1 ) < δ. j=1

150

Jie Xiao

Accordingly, n

n

∑ | f ◦ φ(x j ) − f ◦ φ(x j−1)| ≤ M ∑ |φ(x j ) − φ(x j−1)| < ε,

j=1

j=1

namely, f ◦ φ ∈ AC[a, b]. Secondly, in order to verify (v), suppose f ∈ AC[a, b]. Definition 5.3.1 ensures a δ > 0 such that for any finite family of mutually disjoint open intervals (x1 , y1 ), ..., (xn, yn ) ⊆ [a, b] there is the implication n

n

∑ (y j − x j ) < δ ⇒ j=1

∑ | f (y j ) − f (x j )| < 1. j=1

Consequently, if the natural number n0 is bigger than (b − a)δ−1 , then a

j a j = a + j(b − a)n−1 0 and j ∈ {0, 1, ..., n0} ⇒ a j − a j−1 < δ and Va j−1 f ≤ 1,

and hence f ∈ BV [a, b] follows from Vab f =

n0

aj

∑ Va

j=1

j−1

f ≤ n0 .

According to Theorem 3.1.7, there are two increasing functions f 1 (x) = V[a,x] f and f 2 (x) = V[a,x] f − f (x) such that f = f 1 − f 2 . This decomposition, along with Theorem 5.2.4, implies that f is differentiable mid -a.e. on [a, b] and enjoys Z b a

0

| f | dmid ≤

Z b a

f 10 dmid

+

Z b a

f 20 dmid



f 1 (b) − f 1 (a) + f 2 (b) − f 2 (a)  = Vab f +Vab f − f (b) − f (a)  = 2Vab f − f (b) − f (a)

Thirdly, we validate (vi). To do so, let f ∈ AC[a, b] and E ⊆ [a, b] obey mid (E) = 0. Then for any ε > 0 there are a δ > 0 and a sequence of mutually disjoint open intervals {(a j , b j )}∞j=1 such that  ∞  E ⊆ ∪ j=1 (a j , b j ); ∑∞j=1 (b j − a j ) < δ;   ∞ ∑ j=1 | f (b j ) − f (a j )| < ε.

Absolute Continuities in Lebesgue Integrals

151

According to (iii), f ∈ C[a, b] and consequently, there are x j , y j ∈ [a j , b j ] such that f (x j ) = min f (z) = m j and f (y j ) = max f (z) = M j . z∈[a j ,b j ]

z∈[a j ,b j ]

This, along with f ∈ AC[a, b], implies   ∞ ∞   f (E) ⊆ ∪ j=1 f [a j , b j ] ⊆ ∪ j=1 [m j , M j ]; ∑∞j=1 |x j − y j | ≤ ∑∞j=1 (b j − a j ) < δ;   ∞ ∑ j=1 (M j − m j ) = ∑∞j=1 | f (y j ) − f (x j )| < ε.  In other words, f (E) enjoys mid f (E) = 0.

The above assertion leads to two alternatives of AC[a, b].

Theorem 5.3.4. The following statements on a function f : [a, b] → R are equivalent: (i) f ∈ AC[a, b];

(ii) f has a derivative f 0 ∈ LRSid [a, b] and f (x) = f (a) +

Z

[a,x]

f 0 dmid for all x ∈ [a, b];

 (iii) f ∈ C[a, b] ∩ BV[a, b] and mid (E) = 0 ⇒ mid f (E) = 0.

Here, (i)⇔(ii) and (i)⇔(iii) are known as the fundamental theorem of Lebesgue integration and the Banach-Zarecki theorem (due to Stefan Banach and Moisej Abramovitch Zarecki) respectively. Proof. The result will be verified according to (i)⇔(ii) and (i)⇔(iii). (i)⇔(ii) Suppose that (i) holds. Proposition 5.3.3(v) implies that f ∈ BV [a, b] and so f 0 exists mid -a.e on [a, b] and mid -integrable. If φ(x) = f (a) +

Z

[a,x]

f 0 dmid for all x ∈ [a, b],

then φ ∈ AC[a, b] and (φ − f )0 (x) = 0 for mid − a.e x ∈ [a, b]. Next, we are about to prove the desired formula φ = f on [a, b]. To do so, let c ∈ (a, b]. Then φ − f ∈ AC[a, c] and there is an mid -measurable set E ⊆ (a, c)

152

Jie Xiao

with mid (E) = c − a such that φ0 = f 0 on E. Let ε > 0 and δ > 0 be chosen as by the definition of φ − f ∈ AC[a, c]. If x ∈ E, then there is hx > 0 such that |(φ − f )(x + h) − (φ − f )(x)| < εh for all h ∈ (0, hx), and hence

V = {[x, x + h] : h ∈ (0, hx) and x ∈ E} is a Vitali’s covering of E. Now, mid (E) = c − a and Theorem 5.1.3(iv) produce mutually disjoint intervals {[x j , x j + h j ]}nj=1 from V such that ∪nj=1 (x j , x j + h j ) ⊆ (a, c) and mid (E \ ∪nj=1 [x j , x j + h j ]) < ε. Upon arranging the intervals {[x j , x j + h j ]}nj=1 so that x j + h j < x j+1 , setting a = x0 ; h0 = 0; c = xn+1 , and using φ − f ∈ AC[a, c], we obtain  n  ∑ j=0 |x j+1 − (x j + h j )| < δ; ∑nj=0 |(φ − f )(x j+1 ) − (φ − f )(x j + h j )| < ε;   n ∑ j=0 |(φ − f )(x j + h j ) − (φ − f )(x j )| < ε ∑nj=0 h j ≤ ε(c − a), whence

n |(φ − f )(c) − (φ − f )(a)| = ∑ (φ − f )(x j+1 ) − (φ − f )(x j ) j=0 n



∑ |(φ − f )(x j+1) − (φ − f )(x j + h j )| j=0

n

+ ∑ |(φ − f )(x j + h j ) − (φ − f )(x j )| j=0

≤ ε(1 + c − a). Letting ε → 0 gives (φ − f )(c) = (φ − f )(a) = 0, whence φ = f on [a, b]. Conversely, suppose that (ii) is valid. Then there exists a function g = f 0 ∈ LRSid [a, b] satisfying f (x) = f (a) +

Z

[a,x]

g dmid for all x ∈ [a, b], R

and thus it is enough to verify that x 7→ [a,x] g dmid is in AC[a, b]. To do so, for ε > 0 and n ∈ N, let gn = min{|g|, n}. Then the monotone convergence theorem

Absolute Continuities in Lebesgue Integrals

153

yields an n0 ∈ N such that Z

[a,b]

(|g| − gn0 ) dmid < 2−1 ε.

Upon taking δ = (2n0 )−1 ε, we obtain that if mid (E) < δ then Z

E

|g| dmid ≤

Z

[a,b]

(|g| − gn0 ) dmid +

Z

E

gn0 dmid < 2−1 ε + n0 mid (E) < ε.

Next, assume that {(a j , b j )}nj=1 is a finite family of mutually disjoint open subintervals of [a, b] with ∑nj=1 (b j − a j ) < δ. If E = ∪nj=1 (a j , b j ), then mid (E) < δ and

Z

E

|g| dmid < ε,

and hence Z ∑ n

n

∑ | f (b j) − f (a j )|

=

j=1

≤ =

j=1 (a j ,b j ) n Z



j=1 (a j ,b j )

Z

E

< ε.

g dmid

|g| dmid

|g| dmid

Therefore, (i) holds. (i)⇔(iii) Suppose that (i) is valid. Then (iii) follows from both (iii) and (v)-(vi) of Proposition 5.3.3. Conversely, let (iii) hold. Since f ∈ BV [a, b], Theorem 5.2.4 or Proposition 5.3.3(v) ensures f 0 ∈ LRSid [a, b]. Consequently, for any ε > 0 there is a step function ψ : [a, b] → R such that Z

[a,b]

| f 0 − ψ| dmid < 2−1 ε.

Moreover, if Eε ⊆ [a, b] is mid -measureable with

 −1 mid (Eε ) < 2−1 ε 1 + sup |ψ(x)| , x∈[a,b]

154

Jie Xiao

then Z



| f | dmid ≤

Z



Z

0



0

| f − ψ| dmid +

[a,b] −1

Z

| f 0 − ψ| dmid + −1

< 2 ε+2 ε = ε.





|ψ| dmid

 sup |ψ(x)| mid (Eε )

x∈[a,b]

Also, f 0 exists mid -a.e. on [a, b] - in other  words - if D stands for the points 0 in [a, b] where f exists, then mid [a, b] \ D = 0. If [c, d] ⊆ [a, b], then f ∈ C[a, b] ⇒ [min{ f (c), f (d)}, max{ f (c), f (d)}] ⊆ f ([c, d]),

and hence setting E = [c, d] ∩ D and F = [c, d] \ D, along with Theorem 5.1.3(ii-1) and the second condition in (iii), yields   [c, d] = E ∪ F ;    m (F) = 0; id   mid f (F) = 0;       | f (d) − f (c)| ≤ m id,∗ f ([c, d]) ≤ mid,∗ f (E) + mid,∗ f (F) .  Next, let us control mid,∗ f (E) from above. Step 1 – proving that if ε > 0 and k ∈ N and

Eε,k = {x ∈ E : (k − 1)ε ≤ | f 0(x)| < εk} then  Eε,k ∩ Eε,l = 0/ for k 6= l and mid,∗ f (Eε,k ) ≤ εkmid,∗ (Eε,k).

In fact, note that for any η > 0 and x ∈ Eε,k there is nx ∈ N enjoying the implication y ∈ Eε,k and |x − y| < n−1 x ⇒ | f (x) − f (y)| ≤ (εk + η)|x − y|. So, for each n ∈ N let Eε,k;η,n be the set of all points x ∈ Eε,k with the implication y ∈ Eε,k and |x − y| < n−1 ⇒ | f (x) − f (y)| ≤ (εk + η)|x − y|.

155

Absolute Continuities in Lebesgue Integrals Then we use Theorem 5.1.3(ii) to get   Eε,k;η,n ⊆ Eε,k;η,n+1;      Eε,k = ∪∞  n=1 Eε,k;η,n ;   m (E ) = lim id,∗ ε,k n→∞ mid,∗ (Eε,k;η,n );  f (Eε,k;η,n) ⊆ f (Eε,k;η,n+1 );      f (Eε,k) = ∪∞  n=1 f (Eε,k;η,n);     m id,∗ f (Eε,k ) = limn→∞ mid,∗ f (Eε,k;η,n) .

By Theorem 5.1.3(i) there is a sequence of intervals {In, j }∞j=1 such that

Thus,

 −1  mid (In, j ) < n ; Eε,k;η,n ⊆ ∪∞j=1 In, j ;   ∞ ∑ j=1 mid (In, j ) ≤ mid,∗ (Eε,k;η,n ) + η.

x, y ∈ Eε,k;η,n ∩ In, j ⇒ |x − y| < n−1 ⇒ | f (x) − f (y)| ≤ (εk + η)|x − y|. As a consequence, we read off that f (Eε,k;η,n ∩ In, j ) is a subset of an interval of length at most (εk + η)mid (In, j ), thereby estimating mid,∗ f (Eε,k;η,n )







∑ mid,∗

j=1

f (Eε,k;η,n ∩ In, j ) ∞



≤ (εk + η) ∑ mid (In, j ) j=1

 ≤ (εk + η) mid,∗ (Eε,k;η,n ) + η .

This, along with Theorem 5.1.3(ii-2), derives   mid,∗ f (Eε,k ) = lim mid,∗ f (Eε,k;η,n ∩ In, j ) n→∞

 ≤ (εk + η) lim mid,∗ (Eε,k;η,n) + η n→∞  = (εk + η) mid,∗ (Eε,k ) + η .

Upon sending η to 0 in the last estimation, we reach the desired inequality.

156

Jie Xiao

Step 2 – proving mid,∗ ( f (E)) ≤ E | f 0 | dmid . As a matter of fact, via using E = ∪∞ k=1 Eε,k (which and whose Eε,k are mid measurable) and Step 1 we obtain   mid,∗ f (E) = mid,∗ ∪∞ k=1 f (Eε,k ) ∞  ≤ ∑ mid,∗ f (Eε,k ) R

k=1 ∞

≤ ≤ = ≤

∑ (εk)mid,∗(Eε,k)

k=1 ∞



∑ ε(k − 1)mid,∗(Eε,k ) + +ε ∑ mid,∗(Eε,k) k=1 ∞

k=1 ∞

∑ ε(k − 1)mid (Eε,k) + +ε ∑ mid (Eε,k )

k=1 ∞ Z



k=1 Eε,k

k=1

| f 0 | dmid + ε mid (E),

thereby arriving at the desired inequality due to ε → 0. Now, an application of Step 2 derives  | f (d) − f (c)| ≤ mid,∗ f ([c, d])   ≤ mid,∗ f (E) + mid,∗ f (F) ≤

Z

| f 0 | dmid +

=

Z

| f 0 | dmid .

E

[c,d]

Z

F

| f 0 | dmid

Accordingly, for ε > 0 there is δ > 0 such that if {(a j , b j )}nj=1 are mutually disjoint subintervals of [a, b] with  mid ∪nj=1 (a j , b j ) = then

Z

∪nj=1 [a j ,b j ]

n

∑ (b j − a j ) < δ

j=1

| f 0 | dmid < ε

Absolute Continuities in Lebesgue Integrals

157

and hence n

n

∑ | f (b j ) − f (a j )| ≤ j=1

=

Z

j=1 (a j ,b j )

| f 0 | dmid

Z

| f 0 | dmid



∪nj=1 (a j ,b j )

< ε, namely, f ∈ AC[a, b].

As the first application of Theorem 5.3.4, we have the following Lebesgue’s integration-by-part formula. Corollary 5.3.5. If f , g ∈ AC[a, b], then Z

[a,b]

0

f g dmid +

Z

[a,b]

f 0 g dmid = f (b)g(b) − f (a)g(a).

Proof. Since f , g ∈ AC[a, b], from Proposition 5.3.3(i) it follows that f g ∈ AC[a, b]. Consequently, Theorem 5.3.4 derives Z

[a,b]

0

0

( f g + f g ) dmid =

Z

[a,b]

( f g)0 dmid = f (b)g(b) − f (a)g(a),

as desired. As the second application of Theorem 5.3.4, we establish the following change-of-variable formula for LRSid [a, b]. Corollary 5.3.6. If

then Z

φ([t1 ,t2 ])

  f ∈ LRSid [c, d];    φ ∈ AC[a, b];  φ([a, b]) ⊆ [c, d];    ( f ◦ φ)φ0 ∈ LRS [a, b], id

f dmid =

Z

[t1 ,t2 ]

( f ◦ φ)φ0 dmid for all a ≤ t1 < t2 ≤ b.

158

Jie Xiao

Proof. Upon setting F(x) =

R

[c,x]

f dmid , we obtain F ∈ AC[c, d] and

(F ◦ φ)0(t) = f (φ(t))φ0(t) for mid − a.e. t ∈ [a, b], thereby using Theorem 5.3.4 to derive that if a ≤ t1 < t2 ≤ b then Z

φ([t1 ,t2 ])

f dmid =

Z

φ([t1 ,t2 ])

F 0 dmid

= (F ◦ φ)(t2) − (F ◦ φ)(t1) =

Z

(F ◦ φ)0 dmid

=

Z

( f ◦ φ)φ0 dmid ,

[t1 ,t2 ]

[t1 ,t2 ]

as desired.

5.4 Cantor’s Ternary Set and Singular Function The Cantor ternary set C is constructed by iteratively deleting the open middle third from a family of line segments within the closed interval [0, 1]. More precisely, let ( C0 = [0, 1]; αE + β = {αx + β : x ∈ E} for E ⊆ R and α, β ∈ R. Then, the first step is to remove the open interval (3−1 , 3−1 · 2) from C0 - the remaining part is C1 = [0, 3−1] ∪ [3−1 · 2, 1].

We continue this process by removing the middle third of each interval in C1 to reach the second step C2 = [0, 3−2] ∪ [3−2 · 2, 3−1] ∪ [0, 3−1 · 2] ∪ [3−2 · 23 , 1]. For the 3 ≤ n-th step, we remove the middle third of each interval in the recursively-defined Cn−1 to get Cn = 3−1 Cn−1 ∪ (3−1 · 2 + 3−1 Cn−1 ).

Clearly, {Ck }∞ k=0 is a decreasing sequence of closed subsets of [0, 1], and k each Ck≥1 is the union of mutually disjoint 2k closed intervals {Mk, j }2j=1 with mid (Mk, j ) = 3−k . Now, Cantor’s ternary set is defined by C = ∩∞ k=0 Ck .

Absolute Continuities in Lebesgue Integrals

159

Theorem 5.4.1. (i) C is closed and yet has no isolated points. (ii) mid (C) = 0. (iii) C consists of all ternary numbers ∑∞j=1 3− j a j with a j ∈ {0, 2}. (iv) C is uncountable.

Proof. (i) Because each C j is closed, so is ∩∞j=0 C j = C. If x ∈ C and ε > 0, then there is a big integer j such that 3− j < ε, and hence x ∈ C j ⇒ x ∈ [a, b] for some interval M j,l ⊆ C j with mid (M j,l ) = 3− j . Accordingly, the endpoints of M j,l are in the interval (x − ε, x + ε). Since the endpoints of intervals in C j are  not removed in subsequent Ck> j , there is a point y ∈ C ∩ (x − ε, x + ε) \ {x} . Thus, C contains no isolated points. (ii) Note that {Mk, j }k, j are disjoint one from another. So it follows from Theorem 5.1.3(ii) that mid,∗ (C)

≤ ≤ ≤ =

mid,∗ (Ck ) k

mid,∗ (∪2j=1 Mk, j ) 2k

∑ mid,∗(Mk, j)

j=1

(2 · 3−1 )k

→ 0 as k → ∞. Consequently, mid (C) = 0. (iii) On the one hand, we have the following implication ∞

x=

∑ 3− j a j

j=1

⇒ 3−1 x =

with a j ∈ {0, 2}



∑ 3− j b j j=1

⇒ 3−1 · 2 + 3−1 x =

with b1 = 0 and b j = a j−1 for j ≥ 2 ∞

∑ 3− j c j j=1

with c1 = 2 and c j = a j−1 for j ≥ 2.

Accordingly, x ∈ C1 if and only if x is of form 3−1 y or 2 · 3−1 + 3−1 y for some y ∈ [0, 1]. Upon repeating this argument, we find that x ∈ Ck amounts to x =

160

Jie Xiao

∑kj=1 3− j a j with a j ∈ {0, 2} for j ∈ {1, ..., k}. Since C is closed, x = ∑∞j=1 3− j a j with a j ∈ {0, 2} is in C. On the other hand, let x ∈ C. If x’s ternary expansion ∑∞j=1 3− j a j with a j ∈ {0, 2} is unique, then the previous argument reveals a j ∈ {0, 2}, and hence the desired result follows. Now it remains to verify the uniqueness. In fact, let x have two ternary expansions ∞

∑ 3− j a j = x = j=1



∑ 3− j b j j=1

with a j , b j ∈ {0, 2}.

If the statement “a j = b j for all j ∈ N” is not valid, then there exists a minimal n such that an 6= bn - this ensures a j = b j for all j ∈ {1, ..., n − 1}, and hence one of an or bn is 0 and the other is 2. Without loss of generality, we may take an = 0 and bn = 2, thereby getting ∞

n−1

∑ 3− j a j + j=1



3− j a j =



j=n+1

∑ 3− j a j

j=1 ∞

∑ 3− j b j

=

j=1

n−1



j=1

j=n+1

∑ 3− j b j + 3−n · 2 + ∑

=

3− j b j .

The choice of n gives a j = b j for j ∈ {1, ..., n − 1}, and thus 3−n · 2 +



3− j b j =



j=n+1





j=n+1

3− j a j ≤ 2





3− j = 3−n .

j=n+1

Consequently, we meet a contradiction ∞

0≤



j=n+1

3− j b j ≤ −3−n where b j ∈ {0, 2}.

So, a j = b j for all j ∈ N - namely - the ternary representation of x is unique. (iv) Since C ⊆ [0, 1], it is enough to find a surjective map from C to [0, 1] which is uncountable. To do so, according to (iii) we define ∞

C3x=

∑ 3− ja j 7→ c(x) =

j=1



∑ 2−1− ja j ∈ [0, 1].

j=1

Absolute Continuities in Lebesgue Integrals

161

Note that any y ∈ [0, 1] has a binary expansion y = ∑∞j=1 2− j b j with b j ∈ {0, 1}. So, if x = ∑∞j=1 3− j (2b j ) then c(x) = y - namely - c is surjective. Next, we are about to extend the Cantor map c in Theorem 5.4.1 to [0, 1]. Let a and b be the left and right endpoints respectively of the same removed interval (a, b) ∈ {Mk, j } which was removed in the k-th step. Then ( a = ∑kj=1 3− j a j + 0 · 3−k + ∑∞j=k+1 2 · 3− j ; b = ∑kj=1 3− j a j + 2 · 3−k + ∑∞j=k+1 0 · 3− j . This yields ( c(a) = ∑kj=1 2− j−1 a j + 0 · 2−k + ∑∞j=k+1 1 · 2− j = ∑kj=1 2−1− j a j + 2−k ; c(b) = ∑kj=1 2−1− j a j + 1 · 2−k + ∑∞j=k+1 0 · 2− j = c(a). Since the union of both C and the removed intervals is [0, 1], Cantor’s singular function cs : [0, 1] → [0, 1] is now defined as  c(x) , x∈C cs (x) = , supC3y≤x c(y) , x ∈ [0, 1] \ C and enjoys the following assertion. Theorem 5.4.2. (i) cs is increasing. (ii) cs is continuous. (iii) cs has zero derivative on [0, 1] \ C. (iv) cs belongs to BV [0, 1] \ AC[0, 1]. Proof. (i) Firstly, let ∞

x=

∑ 3− j x j , y =

j=1



∑ 3− j y j ∈ C

j=1

with x j , y j ∈ {0, 2}.

If x < y, then x and y belong to the same interval until some k, and at that point y is in a later interval. Namely, for some k, xk < yk , for all j < k, x j = y j . However, xk < yk , plus xk , yk ∈ {0, 2}, implies xk = 0 and yk = 2. Accordingly, k−1

cs (x) =

∑ 2−1− jx j + j=1



j=k+1

2−1− j x j ≤

k−1

∑ 2−1− jy j + 2−k ≤ cs (y).

j=1

162

Jie Xiao

Secondly, let x, y ∈ [0, 1] with x < y and cs (x) > cs (y). There exist two points x0 , y0 ∈ C obeying cs (x) = cs (x0 ) and cs (y) = cs (y0 ) - actually - x0 and y0 are chosen to be the left and right endpoints of the removed interval to which x and y belongs respectively. But, cs (x) 6= cs (y) ensures that x0 and y0 must not correspond to the endpoints of the same interval. Meanwhile, x < y implies x0 < y0 . So x0 , y0 ∈ C ⇒ cs (x) = cs (x0 ) ≤ cs (y0 ) = cs (y) against cs (x) > cs (y). Consequently, cs is increasing. (ii) Evidently, cs is continuous at x ∈ [0, 1] \ C. Now, if x ∈ C, then we consider two situations. The first is to prove that if y ∈ C is close to x then cs (y) is close to cs (x). Without loss of generality, we may take y > x and y − x < 3−l for any l ∈ N. Upon taking {x j } and {y j } as the respective coefficients of x’s and y’s ternary expansions, we get ∞

0 < y−x =

∑ 3− j(y j − x j ) < 3−l ,

j=1

thereby reaching x j = y j for all j < l - otherwise - if j0 < l is the minimal index enjoying x j0 6= y j0 then x j0 = 2, y j0 = 0 or x j0 = 0, y j0 = 2, and hence y − x < 0 or y − x > 3−l contradicting 0 < y − x < 3−1 . Consequently, we achieve cs (y) − cs (x) =





j=1

j=l

∑ 2−1− j(y j − x j ) = ∑ 2−1− j(y j − x j ) ≤ 2−l ,

thereby taking a sufficiently large l such that 2−l < ε for any ε > 0. The second is to prove that if y ∈ / C is close to x then cs (y) is also close to cs (x). In fact, for y ∈ / C there is y0 ∈ C such that cs (y) = cs (y0 ) and |x − y0 | ≤ |x − y|, via taking y0 as the endpoint of the removed interval which y is in - there exist two such endpoints, so we take the one closer to x such that |x − y0 | ≤ |x − y|. But then for any ε > 0 there is δ > 0 such that |x − y0 | < δ ⇒ |cs (x) − cs (y0 )| < ε.

Absolute Continuities in Lebesgue Integrals

163

Since cs (y) = cs (y0 ), we just take |x − y| < δ. (iii) If x ∈ [0, 1] \ C, then cs is constant on an open interval around x, and hence  c0s (x) = lim t −1 cs (x + t) − cs (x) = 0. tø0

(iv) According to the just-proved (iii), we have c0s (x) = 0 for mid -a.e. x ∈ [0, 1] and Z [0,1]

c0s dmid = 0 6= cs (1) − cs (0),

thereby finding that the Fundamental Theorem of Lebesgue Calculus fails for cs and so cs ∈ / AC[0, 1]. However, cs ∈ BV [0, 1] due to the above-proved (i).

5.5 Lebesgue’s Points Needless to say, Cantor’s singular function cs is uniformly continuous on [0, 1]. However, there is a clear difference between the continuity and the uniform continuity of a function on R. Theorem 5.5.1. Let f : R → R be locally mid -integrable - namely - f 1[−r,r] ∈ LRSid (R) for all r > 0. (i) If f is continuous, denoted f ∈ C(R), then lim(2r)

r→0

−1

Z

(x−r,x+r)

| f − f (x)| dmid = 0 for all x ∈ R,

but not conversely. (ii) f is uniformly continuous on R, denoted f ∈ UC(R) - namely - for any ε > 0 there exists δ > 0 such that |y − x| < δ ⇒ | f (y) − f (x)| < ε - if and only if - for any ε > 0 there is a δ > 0 such that (2r)−1

Z

(x−r,x+r)

| f − f (x)| dmid < ε for all (x, r) ∈ R × (0, δ).

Proof. (i) Suppose that f ∈ C(R) and x ∈ R. Then for any ε > 0 there is δ > 0 such that |y − x| < r < δ ⇒ | f (y) − f (x)| < ε. This implies (2r)−1

Z

(x−r,x+r)

| f (y) − f (x)| dmid ≤ ε,

164

Jie Xiao

as desired. For the converse, we may consider the following function f 0 = ∑∞j=1 f j where  3 , 0 ≤ x − j −1 ≤ 2−1 j −3  j x − j2 1 + j 2 − j 3 x , 2−1 j −3 ≤ x − j −1 ≤ j −3 . f j (x) =  0 , x − j −1 ∈ R \ [0, j −3 ]

Clearly, f 0 is discontinuous at 0, but if ( j0 + 1)−1 ≤ r < j0−1 is valid for some j0 ∈ N, then a computation gives (2r)

−1

Z

(−r,r)

| f 0 − f 0 (0)| dmid

−1





Z

=

(2r)



2−1 ( j0 + 1)



2−1 ( j0 + 1)

=

2−3 ( j0 + 1)

j=1 (0,r) ∞



| f j | dmid Z

j= j0 (0,r) ∞ Z



| f j | dmid

| f j | dmid

j= j0 R ∞ −3



j

j= j0


0 there is a δ > 0 such that (2r)

−1

Z

(x−r,x+r)

| f − f (x)| dmid < ε for all (x, r) ∈ R × (0, δ).

If f ∈ / UC(R), then there exists an ε0 > 0 such that for every δ > 0 there is a pair of points {x0 , y0 } satisfying |x0 − y0 | < 2−1 δ and | f (x0 ) − f (y0 )| > ε0 . If r0 = 2|x0 − y0 | < δ, then (x0 − 2−1 r0 , x0 + 2−1 r0 ) ⊆ (x0 − r0 , x0 + r0 ) ∩ (y0 − r0 , y0 + r0 ),

Absolute Continuities in Lebesgue Integrals

165

and hence (2r0)

−1

Z

(x0 −r0 ,x0 +r0 )

≥ (2r0 )−1

Z

≥ (2r0 )−1

Z

≥ 2−1 ε0 .

| f − f (x0 )| dmid +

(x0 −2−1 r,x0 +2−1 r) (x0 −2−1 r,x0 +2−1 r)

Z

(y0 −r,y0 +r)

| f − f (y0 )| dmid



(| f − f (x0 )| + | f − f (y0 )|) dmid | f (x0 ) − f (y0 )| dmid

Consequently, we have 2−2 ε0 ≤ (2r0 )−1

Z

or −2

2 ε0 ≤ (2r0 )

−1

(x0 −r0 ,x0 +r0 )

Z

(y0 −r,y0 +r)

| f − f (x0 )| dmid

| f − f (y0 )| dmid .

In other words, we get ε1 = 2−2 ε > 0 such that for every δ > 0 there are z0 ∈ R and r0 ∈ (0, δ) obeying ε1 ≤ (2r0)−1

Z

(z0 −r0 ,z0 +r0 )

| f − f (z0 )| dmid ,

contradicting the assumption. So, f ∈ UC(R). The original Lebesgue’s differentiation theorem states that if f is mid integrable on R, i.e., f ∈ LRSid (R), then F(x) =

Z

(−∞,x)

f dmid

has the following property F 0 (x) = lim(2r)−1 r→0

Z

(x−ε,x+ε)

f dmid = f (x) for mid − a.e. x ∈ R.

Partially, this is an analogue or extension of Theorem 5.3.4 over a compact interval. Based on Theorem 5.5.1 there is a slightly stronger statement.

166

Jie Xiao

Theorem 5.5.2. Let f : R → R be locally mid -integrable. Then Z

lim(2r)−1

r→0

(x−r,x+r)

| f − f (x)| dmid = 0 for mid − a.e. x ∈ R.

Accordingly, any point x for which this last formula is true is called Lebesgue’s point of f . Proof. Because x is Lebesgue’s point of f if and only if x is Lebesgue’s point of f 1(−r,r) for any given r > 0, without loss of generality, we may assume f ∈ LRSid (R), and then verify the assertion according to the following two steps. Step 1 - Hardy-Littlewood’s maximal theorem: if f ∈ LRSid (R) and M f (x) = sup(2r)

−1

r>0

Z

(x−r,x+r)

| f | dmid for all x ∈ R,

then mid ({x ∈ R : M f (x) > λ}) ≤ 2λ

−1

Z

R

| f | dmid for all λ > 0.

To validate this result of Godfrey Harold Hardy and John Edensor Littlewood, given f ∈ LRSid (R) let Eλ ( f ) = {x ∈ R : M f (x) > λ}. Then Eλ ( f ) is open - in fact - if x ∈ Eλ ( f ), then there is r > 0 such that (2r)−1

Z

(x−r,x+r)

| f | dmid > λ.

(x−r,x+r)

| f | dmid > λ,

Upon choosing s > r such that (2s)−1

Z

we get that |y − x| < s − r ⇒ (x − r, x + r) ⊆ (y − s, y + s) and λ < (2s)−1

Z

(x−r,x+r)

| f | dmid ≤ (2s)−1

Z

(y−s,y+s)

| f | dmid ≤ M f (y),

Absolute Continuities in Lebesgue Integrals

167

namely, y ∈ Eλ ( f ) and so Eλ ( f ) is an open set. Accordingly, any x ∈ Eλ ( f ) has an open interval Ix = (x − ρ, x + ρ) ⊆ Eλ ( f ) enjoying x ∈ Ix and (mid (Ix ))−1

Z

Ix

| f | dmid > λ.

Consequently, Eλ ( f ) = ∪x∈Eλ Ix . Since f ∈ LRSid (R) ⇒ ε = λ−1

Z

R

| f | dmid > mid (Ix) > 0,

an application of Theorem 5.1.3(iv) yields a finite family of mutually disjoint open intervals {J j }nj=1 such that  Eλ ( f ) = (Eλ ( f ) \ ∪nj=1 J j ) ∪ (∪nj=1 J j ) and mid Eλ ( f ) \ ∪nj=1 J j < ε,

thereby producing the desired estimate mid Eλ ( f )



n



∑ mid (J j ) + ε j=1

n

≤ λ−1 ∑ ≤ λ−1 ≤ 2λ

Z

j=1 J j

| f | dmid + ε

Z

| f | dmid + λ−1

−1

∪nj=1 J j

Z

R

Z

R

| f | dmid

| f | dmid .

Step 2 - Lebesgue point determination: mid -a.e. x ∈ R is Lebesgue’s point of f ∈ LRSid (R). To do so, for x ∈ R let ˜ f (x) = lim sup(2r)−1 M r→0

Z

(x−r,x+r)

| f − f (x)| dmid .

˜ f (x) = 0 implies that x is Lebesgue’s point of f . So, for λ ≥ 0 we Note that M consider ˜ f (x) > λ} Fλ ( f ) = {x ∈ R : M and prove mid (Fλ ( f )) = 0 whose case λ = 0 implies that mid -a.e. x is Lebesgue’s point of f .

168

Jie Xiao Now, for any open interval I with centre x we have (mid (I))

−1

Z

I

| f − f (x)| dmid ≤ M f (x) + | f (x)|,

thereby getting 2λ < M f (x) + | f (x)| for all x ∈ F2λ ( f ) and λ > 0. Thus, λ > 0 and x ∈ F2λ ( f ) ensure either M f (x) > λ or | f (x)| > λ. Consequently, an application of Step 1 derives    mid F2λ ( f ) ≤ mid {x ∈ R : M f (x) > λ} + mid {x ∈ R : | f (x)| > λ} ≤ 3λ

−1

Z

R

| f | dmid .

In accordance with Theorem 2.4.12 and the Fubini theorem, for any ε > 0 there are h = f ∗ φn ∈ C(R) and η > 0 such that Z Z Z Z | f −h| dmid ≤ | f − f ∗φn | dmid +2 | f | dmid φn dmid < ε. R

[−η,η]

R

R\[−η,η]

This, along with Theorem 5.5.1(i), implies ˜ f (x) ≤ M( ˜ f − h)(x) + Mh(x) ˜ ˜ f − h)(x), M = M( and so, if λ > 0 then mid F2λ ( f )



˜ f − h)(x) > 2λ}) ≤ mid ({x ∈ R : M( ≤ 3λ−1

Z

R

| f − h| dmid

≤ 3λ−1 ε. Letting ε → 0 gives

whence

 mid F2λ ( f ) = 0 for all λ > 0,

  mid F0 ( f ) = mid ∪0 0 such that f (x) < κ + ε for all x with 0 < x − c < δ; (ii) lim inf0 0 there is x with 0 < x − c < δ such that f (x) < κ + ε;

(iii) if D+ f (c) > R ∈ R then for any ε > 0 there is h ∈ (0, ε) enjoying R < h−1 f (c + h) − f (c) ;

(iv) if D+f (c) < r ∈ R then for any ε > 0 there is h ∈ (0, ε) enjoying h−1 f (c + h) − f (c) < r.

5.5. For a compact interval [a, b] and a function f : [a, b] → R define its positive and negative variation as follows: (  + hV + iba f = sup ∑nj=1 f (x j ) − f (x j−1 ) : a = x0 < x1 < · · · < xn = b ; −  hV − iba f = sup ∑nj=1 f (x j ) − f (x j−1 ) : a = x0 < x1 < · · · < xn = b .

170

Jie Xiao

Prove the following assertions:  f (i) max hV + ia , hV − iba f ≤ Vab f ≤ hV + iba f + hV − iba f ;

(ii) if f ∈ AC[a, b], then:  + −  (ii − 1) x 7→ hV i[a,x] f and x 7→ hV i[a,x] f are increasing on [a, b]; (ii − 2) V[a,x] f = hV + i[a,x] f + hV − isa f for all x ∈ [a, b];   (ii − 3) f (x) − f (a) = hV +i[a,x] f − hV − isa f for all x ∈ [a, b]. (iii) if f ∈ BV [a, b], then AC[a, b].

R

[a,b] | f

0

| dmid ≤ Vab f with equality holding for f ∈

(iv) if f 0 exists everywhere on [a, b] and f 0 ∈ LRSid [a, b], then f ∈ AC[a, b]. 5.6. For a compact interval [a, b] let f ∈ AC[a, b] and E ⊆ [a, b] be mid measurable. Prove that f (E) is mid -measurable. 5.7. Verify Lebesgue’s decomposition theorem on a given compact interval [a, b] - if f : [a, b] → R is increasing, then there are two increasing functions g, h on [a, b] such that f = g + h, where g ∈ AC[a, b] and h0 = 0 mid -a.e. on [a, b] moreover - g, h are unique up to a constant. 5.8. Given a compact interval [a, b]. Prove that if f ∈ LRSid [a, b] then Z

[a,x]

f dmid = 0 for all x ∈ [a, b] ⇔ f (t) = 0 for mid − a.e. t ∈ [a, b].

5.9. Let f : R → R be locally mid -integrable and 0 < α ≤ 1. Prove that sup x6=y in R

| f (y) − f (x)| (2r)−1−α c} = 0 , p = ∞ is finite. If

d p,mg ,E ( f 1 , f 2 ) =



p

k f 1 − f 2 k p,mg ,E , p ∈ (0, 1) , k f 1 − f 2 k p,mg ,E , p ∈ [1, ∞]

then d p,mg ,E ( f 1 , f 2 ) = 0 if and only if f 1 = f 2 mg -a.e. on E; and (by Minkowski’s inequality) d p,mg ,E ( f 1 , f 2 ) ≤ d p,mg ,E ( f 1 , f 3 ) + d p,mg ,E ( f 3 , f 2 )  p p for f 1 , f 2 , f 3 ∈ LRSg (E). Accordingly, LRSg (E, F), d p,mg,E is a pseudo-metric space. Example 6.1.3 leads to a general fact about metric spaces.

176

Jie Xiao

Theorem 6.1.4. (i) Let X be any nonempty set and define the discrete distance d(x, y), x, y ∈ X, by  1 , x 6= y d(x, y) = . 0, x=y Then (X, d) is a metric space. (ii) If (X, d) is a metric space and Y is a subset of X, then (Y, d) is a metric space for which the domain of d is now restricted to Y ×Y . Proof. (i) It is clear that this d satisfies Definition 6.1.1 (i) and (ii). To verify Definition 6.1.1 (iii), consider any points x, y, z ∈ X. If d(x, z) = 0, then Definition 6.1.1 (iii) is trivial. If d(x, z) > 0, then x 6= z and d(x, z) = 1, and hence y cannot equal both x and z, so at least one of the distances d(x, y) and d(y, z) must equal 1. Therefore, Definition 6.1.1 (iii) follows. (ii) This truth is an immediate consequence of the definition of metric space and the concept of subset. The topology for a space of points provides a theory of cluster points and convergence for sets of points in the space. In a metric space, this convergence theory is based on the concept of distance between points. For more abstract spaces it is based on a system of open sets, and there may be no distance concept associated with the notion of limit. We here take the former approach and use the metric distance function to define certain open sets, which are generalizations of the open intervals in R. In the rest of this section, we always assume that X is a metric space with distance function d. Definition 6.1.5. If x ∈ X and r > 0, then  X  Br (x) = Br (x) = {y ∈ X : d(x, y) < r}; X Br (x) = Br (x) = {y ∈ X : d(x, y) ≤ r};   Sr (x) = SXr (x) = {y ∈ X : d(x, y) = r},

are called the open ball; closed ball; sphere of radius r about x respectively. A neighborhood of x is any subset of X that contains some open ball about x. Remark 6.1.6. Note that B0 (x) = 0/ and B0 (x) = {x}.

Metric Spaces

177

Example 6.1.7. Let X = Rn , x ∈ Rn and r > 0.

(i) If n = 1, then Br (x) = (x − r, x + r) and Br (x) = [x − r, x + r]

(ii) If n = 2, then Br (x) and Br (x) are the open and closed disks of radius r centered at x, respectively. (iii) If n = 3, then Br (x) and Br (x) are the open and closed (3-dimensional) balls of radius r centered at x, respectively. Definition 6.1.8. Let A ⊆ X.

(i) x ∈ X is called an interior point of A, denoted x ∈ A◦ , provided there exists r > 0 such that Br (x) ⊆ A. (ii) A is said to be open provided A ⊆ A◦ .

(iii) x ∈ X is called a cluster point of A, denoted x ∈ A0 , provided for every r > 0 / one has Br (x) ∩ (A \ {x}) 6= 0.

(iv) x ∈ X is called an isolated point of A provided there exists an r0 > 0 such that Br0 (x) ∩ A = {x}. (v) A is said to be closed provided A0 ⊆ A.

(vi) A is said to be bounded provided A ⊆ Br (x) for certain r > 0 and some x ∈ X. Remark 6.1.9. It is clear that A is open or closed if and only if A = A◦ or A = A = A ∪ A0 .

Example 6.1.10. (i) Any open ball Br (x) in a given metric space X is a bounded open set. √ √ (ii) In R2 , let A = B2 (0) and x = ( 2, 2); then x ∈ A0 . Since √ √ ( 2 − 2−1 ε, 2 − 2−1 ε) ∈ B2 (0) ∩ Bε (x) where Bε (x) is an open disk about x with radius ε > 0. However, in R2 , if A = B1 (0)∪{x} where x = (2, 0). Then x ∈ / A0 , for the open disk B2−1 (x) contains no point of A except x itself. (iii) Any singleton {x} of X is closed.

(iv) X and 0/ are not only open, but also closed.

178

Jie Xiao

(v) Let F = {(1, 0), (2−1, 0), (3−1, 0), ..., (k−1, 0), ...} be the subset of R2 . Then (0, 0) is a cluster point of F, and since (0, 0) is not in F we infer that F is not closed. Also, the point (1, 0) is not an interior point of F, so F is not open. The forthcoming theorem is the fundamental relationship between the notions of open set and closed set. Theorem 6.1.11. For a subset A of X let Ac = X \ A be the complement of a set A in X. Then A is open if and only if Ac is closed. Proof. Suppose A is open. To show that Ac is closed, we use the contrapositive method to prove that if x ∈ / Ac then x is not a cluster point of Ac. Suppose now c x∈ / A . Then x ∈ A. Since A is open, there is a ball Br (x) ⊆ A. Thus x is not a cluster point of Ac, as desired. Conversely, suppose that Ac is closed. If A is not open. Then A contains some point y that is not an interior point of A. Thus every ball Br (y) contains a point of Ac . Consequently, y is a cluster point of Ac, and since y is in A \ Ac , we conclude that Ac is not closed, a contradiction to the assumption. Remark 6.1.12. Perhaps the greatest value of Theorem 6.1.11 is that it gives us an alternative method for proving that a set is open or closed. It is sometimes easier to work with the complement of a set than with the set itself. For instance, we know immediately that X \ {x} is open because {x} is closed. Theorem 6.1.13. (i) The intersection of a finite number of open sets is itself an open set . (ii) The union of any collection of open sets is an open set. Proof. (i) Suppose that each of A1 , ..., An is open, and let x ∈ A = ∩nk=1 Ak . Then x is in each of A1 , ..., An, so for each 1 ≤ k ≤ n there exists a positive radius rk such that Brk (x) ⊆ Ak . Define r = min{r1 , ..., rn}. Then for each k, Br (x) ⊆ Brk (x) ⊆ Ak ⊆ A. Hence x is in A◦ , and so A is open. (ii) Suppose that A j is an open set for each j in an index set I. Let x ∈ ∪ j∈I A j . Then x belongs to some A j , say, A j0 . Since it is open, there exists an open ball Br (x) such that Br (x) ⊆ A j0 ⊆ ∪ j∈I A j . Hence ∪ j∈I A j is open. This theorem leads to a result regarding closed sets.

Metric Spaces

179

Corollary 6.1.14. (i) The union of a finite number of closed sets is a closed set. (ii) The intersection of any collection of closed sets is itself a closed set . Proof. It suffices to prove (i). Suppose that F1 , ..., Fn are closed sets. It is easy to verify that X \ ∪nk=1 Fk = ∩nk=1 (X \ Fk ). By Theorem 6.1.13, each X \ F is open. Thus, according to Theorem 6.1.13 (i) their intersection is open. Hence ∪nk=1 Fk is closed since its complement is open. Remark 6.1.15. (i) The union of an infinite collection of closed sets need not be closed. For instance, let Fk be the singleton (k−1 , 0) in R2 . It is clear that (0, 0) is a cluster point of ∪∞ k=1 Fk , but (0, 0) is not in this union. So the union is not closed. Of course, it is not hard to give an example of an infinite collection of open sets whose intersection is not open. For example, Ak = Bk−1 (0). Clearly, ∩∞ k=1 Ak = {0} which is not open. (ii) We should take particular notice of the fact that the collections of sets in Theorem 6.1.13 and Corollary 6.1.14 may be arbitrarily large. In particular, the collections may contain so many sets that is impossible to describe them as an infinite sequence of sets. This is why we use the device of an index set I to describe the collection. Had we written {An }∞ n=1 for the collection of sets, this would have indicated a sequence of sets and therefore limited our discussion to countable collections. We finish this section with applying the formula A = {x ∈ X : Br (x) ∩ A 6= 0/ for all r > 0} to discussing the connectedness of a metric space – the phenomenon of a set that cannot be naturally separated into two subsets. The vagueness of the phrase “naturally separated” suggests that it is not easy to give a precise definition of the property we want. Indeed, we define it by first stating when a set does not possess this property. Definition 6.1.16. The set S ⊆ X is disconnected provided that there exist / nonempty sets S1 and S2 such that S1 ∪ S2 = S, S1 ∩ S2 = 0/ and S2 ∩ S1 = 0. If S is not disconnected, then S is said to be connected.

180

Jie Xiao

It is worth pointing out that the above S1 and S2 satisfy a property that is stronger than being disjoint. Not only must S1 and S2 fail to contain any point of the other set, they must also fail to contain any cluster point of the other set. Note that this extra stipulation in Definition 6.1.16 is significant because any set containing two or more points can be written as the union of two nonempty disjoint subsets – for x ∈ S, let S1 = {x} and S2 = S \ {x}. But as we see in the next theorem, when we are dealing with open sets, disjointness is sufficient to imply that their union is disconnected. Theorem 6.1.17. Let S ⊆ X. Then S is disconnected if and only if there are / and B ∩ S 6= 0. / disjoint open sets A and B such that S ⊆ A ∪ B, A ∩ S 6= 0, Proof. Suppose that A and B exist as in the statement of Theorem 6.1.17 and let S1 = A ∩ S and S2 = B ∩ S. Clearly, S1 and S2 are nonempty, and S1 ∪ S2 = S. Consider an arbitrary point x ∈ S1 ; then x is in the open set A, so for some r > 0, / so Br (x) ∩ S2 = 0. / Thus Br (x) ⊆ A. Thus Br (x) ∩ B = 0/ because A ∩ B = 0, / Similarly, S1 ∩ S2 = 0, / and we conclude that S is x∈ / S2 ; hence S1 ∩ S2 = 0. disconnected. Now assume that S is disconnected, say, S = S1 ∪ S2 and S1 ∩ S2 = S1 ∩ S2 = / / Then no point of S1 is in S02 . For x ∈ S1 , choose Br (x) such that Br (x) ∩ S2 = 0. 0. Thus A = ∪x∈S1 B2−1 r (x) is an open set that contains no point of S2 . Similarly, we can take a ball Bs (y) for each y ∈ S2 that satisfies Bs(y) ∩ S1 = 0/ and define B = / Suppose ∪x∈S2 B2−1 s (x), which is open. It remains only to show that A ∩ B = 0. not, and let p ∈ A ∩ B. Then for some x ∈ S1 and y ∈ S2 , p ∈ B2−1 r (x) ∩ B2−1s (y). But this implies that d(x, y) ≤ d(x, p) + d(p, y) < 2−1 (r + s) ≤ max{r, s}. Therefore either y ∈ Br (x) or x ∈ Bs (y), which contradicts the choice of either r or s, respectively. Corollary 6.1.18. Let S ⊆ X. If S is connected, then S is connected. Proof. Assume that S is disconnected, and let A and B be disjoint open sets in / and B ∩ S 6= 0. / Since S is obviously Theorem 6.1.17: S ⊆ A ∪ B, A ∩ S 6= 0, / Let contained in A ∪ B, we need to prove only that A ∩ S 6= 0/ and B ∩ S 6= 0. x ∈ S ∩ A. Since A is open, there is a ball about x such that Br (x) ⊆ A. But x ∈ S too, so Br (x) must contain some point y of S. Therefore y ∈ Br (x) ∩ S ⊆ A ∩ S, / Similarly, B ∩ S 6= 0, / so by Theorem 6.1.17, S is disconnected. so A ∩ S 6= 0. This contradicts the assumption that S is connected.

181

Metric Spaces

6.2 Completeness To discuss the completeness of a metric space, we start with the definition of the convergence of a point sequence in the metric space. A point sequence is a function from N into a metric space (X, d); this is, it is a sequence whose terms are points in X. Since our principal examples of metric spaces are the Euclidean spaces and we use subscripts to designate the coordinates of points in Rn , we use superscripts to index the terms of a point sequence – {x(k)}∞ k=1 represents a point sequence in a metric space. Definition 6.2.1. Given a metric space (X, d). The point sequence {x(k)}∞ k=1 in (k) X is said to converge to the point x ∈ X, denoted limk→∞ x = x, provided for every ε > 0 there is an N ∈ N such that d(x(k), x) < ε whenever k > N. Our first task is to establish a connection between convergent point sequences and cluster points. Theorem 6.2.2. Given a metric space (X, d) and A ⊆ X. Then x ∈ A0 if and only if there is a non-repeating point sequence in A that converges to x. Proof. Suppose x ∈ A0 and r1 = 1. Then there exists a point x(1) in Br1 (x) ∩ (A \ {x}). Let r2 = min{2−1 , d(x(1), x)}, and choose a point x(2) in (A \ {x}) ∩ Br2 (x). After x(1) , ..., x(k) have been defined, let rk+1 = min{(k + 1)−1 , d(x(k), x)} and choose x(k+1) as a point of A \ {x} in the open ball Brk+1 (x). Since rk ≤ k−1 , it is clear that limk→∞ x(k) = x; and since rk ≤ d(x(k−1), x), it follows that no two x(k)’s are the same. To prove the converse, we simply observe that if A contains such a point sequence that converges to x, then it is obvious from the definition that every open ball about x contains infinitely many points of A. Example 6.2.3. In Rn there is a very strong connection between convergent point sequences and convergent number sequences. More precisely, x(k) → x in (k) Rn if and only if limk→∞ x j = x j for each j = 1, ..., n. This is because x(k) ∈ Bε (x) if and only if dn (x(k), x) =



n

(k)

∑ |x j − x j |2

j=1

2−1

< ε.

182

Jie Xiao

Definition 6.2.4. Suppose (X, d) is a metric space. A point sequence {x(k)}∞ k=1 in X is a Cauchy sequence provided for every ε > 0 there is an N > 0 such that d(x(k), x(m) ) < ε whenever k > m > N. This definition is obviously a generalization of the concept of Cauchy number sequences treated in Chapter 1. Example 6.2.5. A point sequence in Rn is a Cauchy sequence if and only if it converges to a point in Rn . To verify this, suppose that {x(k) }∞ k=1 is a Cauchy sequence. Since (k)

(m)

dn (x(k), x(m) ) ≥ |x j − x j | for each j = 1, ..., n, (k)

it follows that {x j }∞ k=1 forms a Cauchy number sequence. Thus such a se(k)

quence converges to a point in R, say, limk→∞ x j = x j . Now Example 6.2.3 n ensures that {x(k)}∞ k=1 converges to x = (x1 , ..., xn) in R . The converse assertion is true in any metric space, and therefore we give it as the next theorem. Theorem 6.2.6. Suppose (X, d) is a metric space. (k) ∞ (i) If {x(k) }∞ k=1 in X is convergent, then {x }k=1 is a Cauchy sequence.

(ii) If {x(k)}∞ k=1 in X is a Cauchy sequence, then it is bounded.

(iii) If x ∈ X and Y = X \ {x}, then any sequence {x(k)}∞ k=1 in Y with (k) limk→∞ d(x , x) = 0 is a Cauchy but not a convergent sequence in Y with respect to d. Proof. (i) Assume that {x(k)}∞ k=1 converges to x, and suppose ε > 0. Choose N > 0 so that d(x(k), x) < 2−1 ε as k > N. This implies that d(x(k) , x(m)) ≤ d(x(k) , x) + d(x(m) , x) < ε, whenever k > m > N. Therefore, {x(k) }∞ k=1 is a Cauchy sequence in X. (k) ∞ (ii) Let {x }k=1 be a Cauchy sequence. By definition, we find an N > 0 such that d(x(k), x(m) ) < 1 whenever k > m > N. In particular, if m = N + 1 this property becomes d(x(k), x(N+1)) < 1 whenever k > N. Therefore all the points x(k) for k > N lie in the open ball B1 (x(N+1) ). Now we simply enlarge the radius of the ball until it also includes the first N points of the sequence. Define r = 1 + max{1, d(x(1), x(N+1) ), ..., d(x(N), x(N+1) )}.

Metric Spaces

183

Thus r is at least one unit more than the distance between x(N+1) and x(k) for every k = 1, 2, ...; so Br (x(N+1)) contains every point x(k) of the sequence. (k) ∞ (iii) Suppose that x ∈ X is the limit of {x(k)}∞ k=1 under d. Then {x }k=1 is a sequence in both X and Y , and it converges in X; so by (i) it is a Cauchy sequence in X. The distance function d on Y × Y is the same one as on X × X, so {x(k) }∞ k=1 is also a Cauchy sequence in Y . But in Y there is no point to which {x(k)}∞ k=1 can converge, because x has been removed and the limit of a sequence is unique. Accordingly, we see from Theorem 6.2.6 that Example 6.2.5 cannot be generalized to all metric spaces. This idea is the general principle behind the following example. Example 6.2.7. (i) Let X = B1 (0) be the open unit ball in R2 with the usual Euclidean distance. Consider a sequence in B1 (0) that converges to a point on its boundary, say, x(k) = (1 − k−1 , 0), so that it has cluster point (1, 0). Then this sequence is a Cauchy sequence in X, but it has no limit in X, so it does not converge in X. (ii) Again for n ∈ N let Qn be the subset of Rn consisting √ of all points for which all coordinates are rational numbers. The point x = ( 2, 0, ..., 0) is in Rn \√ Qn , ∞ but if {rk }k=1 is a sequence of rational numbers such that limk→∞ rk = 2, and x(k) = (rk, 0, ..., 0), then limk→∞ x(k) = x. Therefore {x(k)}∞ k=1 is a Cauchy sequence but does not converge in Qn . Naturally, we can give the definition of the completeness of a metric space. Definition 6.2.8. Suppose (X, d) is a metric space. S ⊆ X is said to be complete with respect to d provided every Cauchy sequence in S converges to a point in S with respect to d. In particular, if X is complete with respect to d, then we say that (X, d) is a complete metric space. Theorem 6.2.9. Let (X, d) be a complete metric space. Then S ⊆ X is complete with respect to d if and only if S is a closed subset of X. Proof. In fact, suppose that S is closed and {xk }∞ k=1 is a Cauchy sequence in S. Since X is complete, there is an x ∈ X such that xk → x. But because S is closed, one gets x ∈ S. Conversely, if S is complete, then for any sequence {xk }∞ k=1 converging to x ∈ X, one has that this sequence is a Cauchy sequence in S and, therefore, converges to a point y ∈ S. So x = y ∈ S and S is closed.

184

Jie Xiao

In the terminology of Definition 6.2.8, Rn is a complete metric space. The next result gives four alternative properties that are equivalent to completeness in Rn . Each is an important result in its own right, and collectively they are the most important tools of analysis. They are immediately recognizable as multidimensional extensions of the four theorems of R1 -case. Theorem 6.2.10. For n ∈ N let X = Rn be equipped with the distance dn (·, ·). Then the following statements are equivalent: (i) X is complete. (ii) The Heine-Borel property: If F is a closed and bounded subset of X and {Aµ }µ∈I is a collection of open sets whose union contains F, then there is a finite subcollection {Aµ j }mj=1 such that F ⊆ ∪mj=1 Aµ j . (iii)The Bolzano-Weierstrass property: If S is an infinite bounded subset of X, / then S0 6= 0.

(iv) The bounded sequence property: If {x(k)}∞ k=1 is a bounded sequence in X, then it has a convergent subsequence. (v) The nested set property: If {Fk }∞ k=1 is a sequence of closed and bounded / nonempty sets such that Fk+1 ⊆ Fk ⊆ X for each k ∈ N, then ∩∞ k=1 Fk 6= 0. Proof. Since statements (ii)-(v) refer to bounded sets, it will be more convenient to use another form of the boundedness; that is, a set S is bounded in X if and only if it is contained in an n-cube Q with side-length 2r > 0, i.e., S ⊆ Q = {x ∈ Rn : |x j | ≤ r for j = 1, ..., n}. (i)⇒(ii) Let F be a closed and bounded set and suppose that we are given that {Aµ }µ∈I is an open cover of F as in (ii). Since F is bounded, there is an n-cube Q1 containing F. Assume that F cannot be covered by any finite number of the Aµ ’s. Subdivide Q1 into 2n n-cubes by halving each of the coordinate intervals of Q1 ; that is, the jth coordinate of a point x in one of the n-cubes is restricted to either [0, r] or [−r, 0] instead of [−r, r]. If it were possible to cover those parts of F that lie in each one of the 2n n-cubes with finitely many Aµ ’s, then the union of all of these Aµ ’s would still be only a finite number and would cover F. Therefore one of these 2n n-cubes must contain a subset of F that cannot be covered by any finite number of the Aµ ’s. Call this n-cube Q2 . Now subdivide Q2 into 2n n-cubes and repeat the process. The result is a sequence of n-cubes such that Qk+1 ⊆ Qk , and the jth coordinate of each

Metric Spaces

185

point in Qk is restricted on an interval whose length is r22−k. Choose a point (k) (k) sequence {x(k) }∞ is in F ∩ Qk . Then {x j }∞ k=1 such that x k=1 – the sequence of jth coordinates forms a Cauchy number sequence in R thanks to (k)

(m)

|x j − x j | ≤ r(22−k + · · · + 22−m ) < r23−m → 0 as k > m → ∞. n Consequently, {x(k)}∞ k=1 is a Cauchy sequence in R . Assuming (i), we conclude that there is a point x ∈ Rn such that limk→∞ x(k) = x. Since each x(k) is in F and F is closed, we have x ∈ F. Therefore x is in one of the Aµ ’s, say, x ∈ Aµx . But Aµx is open, so for some s > 0, x ∈ Bs (x) ⊆ Aµx , which means that Bs (x) contains all but a finite number of the n-cubes Qk ’s. But this derives that the points of F in all these Qk ’s are contained in Aµx , and this contradicts the choice of Qk . (ii)⇒(iii) Suppose that S is a bounded set which has no cluster point. We show that S must be finite, which establishes (iii). Let B be a closed and bounded ball containing S. Since no point of B is a cluster point of S, we can choose a open ball Brx (x) for each x ∈ B such that Brx (x) contains no point of S except possibly x itself. Now B ⊆ ∪x∈B Brx (x), so {Brx (x) : x ∈ B} is a collection of open sets whose union contains B. Therefore from (ii) we conclude that there is a finite subcollection of these open balls that covers B. But this subcollection must also cover S due to S ⊆ B, and since each open ball contains at most one point of S, we conclude that S is finite. (iii)⇒(iv) Assume that {x(k) }∞ k=1 is a bounded point sequence, and let S be the range of this sequence; that is, S = {p ∈ Rn : p = x(k) for some k}. If S is a finite set, then at least one of its points must appear infinitely many times as a term of the sequence, in which case {x(k)}∞ k=1 has a constant subsequence. If S has infinitely many points, then (iii) implies that S has a cluster point, and in proving Theorem 6.2.2 we have seen how to construct a sequence in S that converges to that cluster point. (iv)⇒(v) Let {Fk }∞ k=1 be a nested sequence of closed and bounded sets as in the hypothesis of (v). For each k, choose a point x(k) in Fk . Every such point (k) ∞ is in F1 , so {x(k)}∞ k=1 is bounded sequence. By (iv), {x }k=1 has a convergent subsequence whose limit we here call x. We see that the nested property of the sets guarantees that each Fk contains all but possibly a finite number of the points x(k). Therefore for each k there is a sequence of points in Fk that converges to x. Since they are closed sets, x must belong to each Fk . Hence ∩∞ k=1 Fk contains x, so the intersection is nonempty. (v)⇒(i) Let {x(k)}∞ k=1 be a Cauchy sequence, and for each m set Fm be the closure of the set of points {x(k) : k ≥ m}. Thus {Fm }∞ m=1 is a nested sequence

186

Jie Xiao

of closed sets. By Theorem 6.2.6 (ii) we see that {x(k)}∞ k=1 is bounded, and therefore F1 is bounded. By (v) there is a point x in all sets Fm ’s. Let Bε (x) be any open ball about x, and choose a number N such that k > m > N implies that dn (x(k) , x(m)) < 2−1 ε. Since x is in FN , which is the closure of the subsequence (m) {x(k)}∞ , where m > N, such that x(m) is in B2−1 ε (x). k=N , there is some point x Now if k > N, then dn (x(k), x) ≤ dn (x(k), x(m) ) + dn (x(m) , x) < ε, and hence limk→∞ x(k) = x. Remark 6.2.11. (i) In examining the proof of Theorem 6.2.10, we find that the Euclidean distance formula was used only in proving (i) ⇒(ii). Therefore we conclude that the other four implications hold in any metric space. (ii) For x ∈ Rn let X = Rn \{x} and the distance between points in X be the same Euclidean distance as in Rn . Now if {x(k) }∞ k=1 is a sequence that converges to x n (k) in R and x 6= x, then it is still a Cauchy sequence in X. But because there is no point in X to which it converges, {x(k)} is not convergent in X. Thus X is not complete. Since completeness is implied by the other four statements, it follows that all five are false. (iii) Of course, the fact that all five statements in Theorem 6.2.10 are true in X = Rn is quite dependent upon the distance formula. The use of the distance formula in proving (i)⇒(ii) is very subtle. It is inherent in the notions of boundedness and n-cubes that were used. Example 6.2.12. Let X be an infinite set and define d by  1 , x 6= y d(x, y) = . 0, x=y Since each point in X can be written as {x} = B2−1 (x), we see that every singleton set is an open set in X. Also X = B2 (x) for any x ∈ X, so X is bounded. Therefore X is a bounded and closed set, and ∪x∈X {x} is an open cover of X that obviously cannot be reduced to a finite number of open sets. We now assert that X is complete, because it is not hard to show that if {x(k)}∞ k=1 is a Cauchy sequence, then there is an N such that x(k) = x(N) whenever k ≥ N. Therefore limk→∞ x(k) = x(N) , so X is complete even though the Heine-Borel property does not hold.

Metric Spaces

187

6.3 Compactness, Density and Separability As for an expansion of the Heine-Borel property, we discuss the compactness, density and countability of sets in a given metric space. The first important notion induced by the Heine-Borel property is that of compact sets. To see this picture clearly, we first consider two examples that enable us to describe some important and attractive results in the theory of functions. Example 6.3.1. A set K ⊆ Rn is said to be compact provided every sequence in K contains a subsequence that converges to a point in K. This notion can be characterized by the equivalence of the following three statements: (i) K is compact; (ii) K is closed and bounded; (iii) Every open cover of K has a finite subcover; that is, if K ⊆ ∪α∈I Gα where each Gα is an open subset of Rn , then there is a finite subset J of the index set I such that K ⊆ ∪α∈J Gα . In fact, suppose that (i) holds. Then every convergent sequence in K converges to a point in K. Using the fact that a subset E of Rn is closed if and only if E equals its closure, we get that K is closed. If K is not bounded, then for each k ∈ N, there is an xk ∈ K such that dn (xk , 0) > k. Consequently, no subsequence of (xk )∞ k=1 can converge and so K is not compact, a contradiction. Thus, (ii) holds. If (ii) holds, then an application of Theorem 6.2.10 (i) yields (iii). Finally, suppose that (iii) holds. To verify (i), it suffices to prove that K is bounded and closed since Theorem 6.2.10 (iv) implies that every sequence in such K has a subsequence which converges to a point in K. To see that K is bounded, set Gx = B1 (x). Then {Gx : x ∈ K} is an open cover of K and so has a finite subcover, say, Gx1 , ..., Gxk . Let M = max{dn (x j , 0) : j = 1, ..., k}. If x ∈ K, then x ∈ Gx j for some j and d(x, 0) ≤ dn (x j , 0) + 1 ≤ M + 1. Thus, K is bounded. To show that K is closed, fix a point x ∈ Rn \ K. For any y ∈ K, there are disjoint open balls Uy and Vy centered at y and x, respectively. Note that these two balls will generally depend on x. The family {Uy : y ∈ K} is an open over of K and so has a finite subcover, say, Uy1 , ...,Uyk . Then K ⊆ ∪kj=1Uy j , which is disjoint from the open ball B = ∩kj=1Vy j centered at x. This ball B must be contained / entirely in Rn \ K: Otherwise, B ∩ K 6= 0/ and hence B ∩ (∪kj=1Uy j ) ⊇ B ∩ K 6= 0, a contradiction. It follows that Rn \ K is open and so K is closed.

188

Jie Xiao

Example 6.3.2. The family F = {(k−1 , 2) : k ∈ N} is an open cover of (0, 1]; the family G = {(2−k, 2) : k ∈ N} is a subfamily of F . Each point of (0, 1] belongs to (2−k , 2) for k sufficiently large, and so G is a subcover for (0, 1]. If H is a finite subfamily of F , then there is a largest value of k, say k1 such that (2−k1 , 2) ∈ H . Since 2−k1 −1 ∈ (0, 1] and does not belong to any element of H , H is not a subcover for (0, 1]. A close look at this example reveals that the problem lies in the fact that (0, 1] is not closed. Other examples, which the reader should construct, show that similar problems can arise with unbounded sets. Remark 6.3.3. There is an important result generalizing the nested set property – if F = {K j } j∈I is a class of compact subsets of Rn and the intersection of any / To see this, let K ∈ F be such finite subclass of F is nonempty then ∩ j∈I K j 6= 0. n / Then K ⊆ ∪ j∈I (R \ K j ). Since K is compact, there are that K ∩ (∩ j∈I K j ) = 0. finitely many indices j1 , ..., jk such that K ⊆ ∪kl=1 (Rn \ K jl ) = Rn \ ∩kl=1 K jl , / a contradiction. Accordingly, K ∩ (∩ j∈I K j ) 6= 0/ and so K ∩ (∩kl=1 K jl ) = 0, / ∩ j∈I K j 6= 0. An advantage of the last characterization of compactness in Example 6.3.1 is that it remains meaningful when the discussion is extended to the topological spaces much more general than Rn , and thus serves as the basis for a definition of the compactness in these more general spaces. Definition 6.3.4. Suppose that (X, d) is a metric space. We say that S ⊆ X is compact provided that whenever S is contained in the union of a collection of open subsets of X, S is contained in the union of a finite number of these open subsets. In particular, if S = X then (X, d) is called a compact metric space. It is very natural to give an alternative of Definition 6.3.4 in terms of cluster point. Theorem 6.3.5. Let (X, d) be a metric space and S ⊆ X. Then the following statements are equivalent: (i) S is compact; (ii) Any infinite subset of S has at least one cluster point in S;

Metric Spaces

189

(iii) Any sequence of points in S has a subsequence which is convergent to a point in S. Proof. (i)⇒(ii) Suppose that (i) is true. If (ii) is false, then there is an infinite subset S of X that has no cluster point in S. Now for each x ∈ S, we can find an open ball having x as center and containing only a finite number of points of S. Note that S is compact and covered by all such open balls centered at x. So it is subset of the union of a finite number of such open balls of which each contains only a finite number of points of S. This implies that S is finite, a contradiction. (ii)⇒(iii) This implication is trivial. (iii)⇒(i) Suppose that (iii) is valid, but (i) is not true. Given ε > 0 and x1 ∈ S. Choose x2 ∈ S such that d(x2 , x1 ) ≥ ε. For k = 1, 2 choose x3 ∈ S such that d(x3 , xk ) ≥ ε. Continuing the previous selection, we can find xn+1 ∈ S such that d(xn+1 , xk ) ≥ ε for k = 1, 2, ..., n and n ∈ N. This process can be done because S is covered by all open balls centered at points in S of radius ε but this cover has no a finite subcover thanks to the assumption that S is not compact. According ∞ to (iii), {xn }∞ n=1 must possess a subsequence {xn j } j=1 which is convergent to a point in S, so d(xn j , xnk ) < ε whenever n j and nk are big enough. This contradicts d(xn j , xnk ) ≥ ε. Here is an interesting consequence of Theorem 6.3.5. Corollary 6.3.6. Let (X, d) be a complete metric space. Then a closed subset S of X is compact if and only if S is totally bounded; that is, for any ε > 0 there exist finitely many open balls {Bε (x j )}nj=1 in X such that S ⊆ ∪nj=1 Bε (x j ). Proof. Suppose S ⊆ X is closed. If S is compact and ε > 0, then the trivial inclusion S ⊆ ∪x∈S Bε (x) induces a finite cover S ⊆ ∪nj=1 Bε (x j ), as desired. Conversely, suppose that S is totally bounded. Let S be covered by a collection of open sets {Oα}α∈I . To reach a contradiction, assume S cannot be covered by any finite sub-collection of {Oα }α∈I . Since S is closed, from the total boundedness of S it follows that for ε = 1 there is finitely many open balls 1 1 {B1 (x j )}nj=1 such that S = ∪nj=1 S ∩ B1 (x j ). According to the assumption, one n1 of closed sets in {S ∩ B1 (x j )} j=1 , say, S1 = S ∩ B1 (x1 ), cannot be covered by finitely many elements of {Oα }α∈I . Do the same with ε = 2−1 and S1 in place of ε = 1 and S respectively, and continue this process. The result is a sequence of closed sets (Sk )∞ k=1 such that: (a) S ⊇ S1 ⊇ S2 ⊇ · · ·;

190

Jie Xiao

(b) supx,y∈Sk d(x, y) ≤ k−1 ; (c) each Sk cannot be covered by finitely many elements of {Oα }α∈I . Choose xk ∈ Sk . By (a), (b) and the fact that X is complete and each Sk is closed, {xk }∞ k=1 is a Cauchy sequence which is convergent to an element x ∈ ∞ ∩k=1 Sk . This implies x ∈ Sk ⊆ Oα for some α ∈ I and for sufficiently large k ∈ N, contradicting (c). Therefore, S must be covered by finitely many elements of {Oα }α∈I , i.e., S is compact. One example of a compact metric space will give us many more, by means of the following result. Theorem 6.3.7. Let (X, d) be a compact metric space. Then X is complete with respect to d. Furthermore: (i) Any closed subset S of X is compact; (ii) For any sequence of nonempty closed subsets (S j )∞j=1 of X, with S1 ⊇ S2 ⊇ S3 ⊇ · · ·, there is at least one point in ∩ j∈N S j . Proof. From Theorem 6.3.5 (iii) it follows that any Cauchy sequence has a convergent subsequence and therefore is itself convergent. Thus, (X, d) is complete. (i) Let S be closed. Then Sc is open. If S ⊆ ∪α∈I Uα where Uα ⊆ X is open for each α ∈ I. Then X ⊆ (∪α∈I Uα)∪Sc . Since X is compact, we can find a finite subset J ⊆ I such that X ⊆ (∪α∈J Uα) ∪ Sc . Hence S ⊆ ∪α∈J Uα. This shows that S is compact. / then X = 0/ c = ∪ j∈N Scj . Since X is compact, there is a (ii) If ∩ j∈N S j = 0, finite number of the open subsets {Scij }Jj=1 such that X = ∪Jj=1 Scij . Note that since Sc1 ⊆ Sc2 ⊆ Sc3 ⊆ · · ·, we must have X = Sck for some k, which produces the / contradiction Sk = X c = 0. Remark 6.3.8. Here, we point out that Theorem 6.3.7 (ii) does not hold if the word “compact” is replaced by “complete” in the hypothesis – for example, X = R and Sk = {x ∈ R : x ≥ k} for k ∈ N. In addition to this counterexample, we have the following interesting consequence. Corollary 6.3.9. Suppose that (X, d) is a metric space. If S ⊆ X is compact, then it is bounded and closed, but not conversely.

191

Metric Spaces

Proof. Since X is the union of its open balls, if S ⊆ X is compact then S is covered by a finite number of those balls and hence S is bounded. To establish the closedness of S, we note that (S, d) is a compact metric space and so it is complete, by Theorem 6.3.7. Accordingly, S is closed. For the converse, consider the distance d defined by d(x, y) = 1 or d(x, y) = 0 if x 6= y or x = y. Clearly, any infinite set X is bounded and closed under d, but X is not compact since X = ∪x∈X B2−1 (x) has no finite cover. Next we deal with the density and the separability of metric spaces. Definition 6.3.10. Suppose that (X, d) is a metric space. (i) D ⊆ X is said to be dense in X provided that any open ball in X contains a point of D. (ii) X is called separable provided that there is a countable set S = {x1 , x2 , ...} ⊆ X that is dense in X. To give an example, we turn immediately to: Rn ; `∞ ; C[a, b]. Example 6.3.11. (i) For n ∈ N, Qn is a countable dense subset of Rn . This fact can be deduced from the countability of Q. To show that Qn is dense in Rn , let Bε (x) be any open ball of radius ε > 0 and center x = (x1 , .., xn ) ∈ Rn and use the density of Q in R to choose rational numbers r1 , ..., rn such that |x j − r j | < εn−2

−1

for j = 1, ..., n.

Then dn (x, r) =



n

∑ |x j − r j |2

j=1

2−1

< ε and so r = (r1 , ..., rn) ∈ Qn ∩ Bε (x).

∞ 2 (ii) Let `2 and `∞ be the spaces of real-valued sequences (xk )∞ k=1 with ∑ j=1 x j < ∞ and supk∈N |xk | < ∞, respectively. For two sequences x = (x j )∞j=1 and y = (y j )∞j=1 , define

d p (x, y) =

(

−1

(∑∞j=1 |x j − y j |2 )2 , p = 2 . sup1≤ j 0 is given, then for any x = (x j )∞j=1 ∈ `2 there is some N ∈ N such that ∑∞j=N |x j |2 < (2−1ε)2 ; at the same time, for each j ∈ {1, ..., N} there √ is an r j ∈ Q such that |r j − x j | < ε(2 N)−1 . Accordingly, the sequence r = (r1 , ...., rN−1, 0, 0, 0, 0....) ∈ S satisfies  N−1 1  ∞ 1 2 2 2 d2 (x, r) ≤ ∑ |x j − r j | + ∑ x2j < ε. j=N

j=1

This implies that S is dense in `2 . As for the latter, given n ∈ N let (n)

(n)

(n)

x(n) = (x1 , x2 , ..., x j , ...) be an element in `∞. Define x = (x1 , ..., x j, ...), where ( ( j) ( j) x j + 1 , |x j | ≤ 1 . xj = ( j) 0 , |x j | > 1 Thus x ∈ `∞. Moreover,

(n)

d∞ (x, x(n)) ≥ |xn − xn | ≥ 1 for all n ∈ N. This shows that the set {x(n) }∞ n=1 cannot be dense in `∞ .

(iii) For any finite closed interval [a, b] ⊂ R, C[a, b] of all real-valued continuous functions on [a, b] is a metric space under the sup-distance: d( f , g) = sup | f (x) − g(x)| < ∞ for all f , g ∈ C[a, b]. x∈[a,b]

Moreover, it is separable. In fact, let S be the set of piecewise linear functions of the form   sk+1 − sk f (t) = sk + (t − tk ) for all tk ≤ t ≤ tk+1, tk+1 − tk where a = t0 < t1 < · · · < tn = b is a partition of [a, b], and the sk ,tk are rational (with the possible exceptions of t0 = a and tn = b). The set S is enumerable. Suppose g ∈ C[a, b]. For any ε > 0, there is a δ > 0 such that |t − s| < δ ⇒ |g(t) − g(s)| < 2−2 ε.

Metric Spaces

193

Let a = t0 < t1 < · · · < tn = b be a partition of [a, b] with t1 , ...,tn−1 rational and such that max0≤k≤n−1(tk+1 −tk ) < δ. Let s0 , ..., sn be rational numbers such that max0≤k≤n |g(tk) − sk | < 4ε . Then for tk ≤ t ≤ tk+1 we have       tk+1 − t t − tk f (t) − g(t) = sk − g(t) + sk+1 − g(t) , tk+1 − tk tk+1 − tk

and hence

| f (t) − g(t)| ≤ |sk − g(tk )| + |g(tk ) − g(t)| + |sk+1 − g(tk+1 )| + |g(tk+1 ) − g(t)| < ε.

Thus S is dense in C[a, b]. Motivated by the foregoing example, we can establish a relationship between compactness and separability. Theorem 6.3.12. Let (X, d) be a metric space. If X is compact, then it is separable. Proof. Suppose that n ∈ N and X is compact. Since X = ∪x∈X Bn−1 (x), it follows jn that there is a jn ∈ N such that X = ∪ j=1 Bn−1 (x j ). Now, the countable set S = j

n ∪∞ n=1 ∪ j=1 {x j } is dense in X. As a matter of fact, if x ∈ X then x ∈ Bn−1 (xk ) for some k ∈ {1, ..., jn}. Now for any ε > 0 there exists an n ∈ N such that / Therefore, X is n−1 < ε and hence xk ∈ Bn−1 (x) ⊆ Bε (x), i.e., Bε (x) ∩ S 6= 0. separable.

The next result is Lebesgue’s property which is of great generality and combines the concepts of open sets, density and separability. This theorem should be compared to the Heine-Borel property, for each statement makes an assertion about reducing an open covering of a set. Theorem 6.3.13. For n ∈ N let X = Rn be equipped with the distance dn (·, ·).

(i) There is a countable collection B of open balls in X such that any open subset of X can be expressed as the union of some subcollection of B .

(ii) If S is any subset of X and {Aµ }µ∈I is a collection of open sets whose union contains S, then there is a countable subcollection {Aµk }∞ k=1 whose union contains S. Proof. (i) As a matter of fact, if B stands for the collection of all open balls Br (q) with r > 0 being a rational number and q ∈ Qn , then for a given open

194

Jie Xiao

set A we can consider the subcollection BA consisting of those open balls such that Br (q) ⊆ A. Their union is obviously contained in A, and the collection is countable because both Qn and Q are countable. Therefore it remains only to show that A is contained in the union of this collection. If x is a point in A, then there is an open ball about x with rational radius r such that Br (x) ⊆ A. Since Qn is dense, there is a point q of Qn in B2−1 r (x). Then x is in B2−1 r (q), and B2−1 r (q) is in BA because it is contained in Br (x), which lies in A. This last assertion is verified by observing that y ∈ B2−1 r (q) ⇒ dn (x, y) ≤ dn (x, q) + dn (q, y) < r. We have shown that an arbitrary point x ∈ A is contained in some open ball in BA , so A is contained in the union of all balls in BA . (ii) Suppose S ⊆ ∪µ∈I Aµ . By (i), each of the open sets Aµ can be written µ as the union of all the balls Brk centered at the points of Qn that have rational µ radii and are contained in Aµ , say, Aµ = ∪∞ k=1 Brk . If x ∈ S then there is some rk such that x ∈ Bµrk ⊆ Aµ . For each of the open balls {Bµrk }, choose just one Aµ µ µ µ that contains Brk and call it Ark . The subcollection of all such Ark is countable µ µ because there is only one Ark for each Brk . Also, the union of this subcollecµ tion contains the union of the Brk ’s, which in turn contains S. Hence there is a countable subcollection of {Aµ }µ∈I whose union contains S. Remark 6.3.14. It is worth remarking that the hypothesis of Theorem 6.3.13 (ii) makes no restrictive assumption about D. That is the strength and generality of this result; it applies to any subset D of Rn . If it is given, in addition, that D is closed and bounded, then the countable subcollection obtained from Lebesgue’s property can be further reduced to a finite subcollection whose union still contains D. Definition 6.3.15. Suppose that (X, d) is a metric space. ◦

/ (i) A set S ⊆ X is said to be nowhere dense if S contains no open ball, i.e., S = 0.

(ii) A set S ⊆ X is said to be of first category in X if S is a countable union of nowhere dense sets. (iii) A set S ⊆ X is said to be of second category in X if S is not of first category.

195

Metric Spaces Example 6.3.16.

(i) Cantor’s ternary set C is nowhere dense. In fact, mid (C) = 0 implies that if (C)◦ = C◦ were not empty, then C would contain a finite open interval (a, b), / then b − a ≤ mid (C) = 0, and hence a = b, a contradiction. Thus, C◦ = 0. (ii) Q is of first category in R – this follows from the definition.

(iii) R is of second category in itself – this assertion is not obvious, and in fact it can only be deduced as a consequence of the so-called Baire category theorem (named after Ren´e-Louis Baire) as follows. Theorem 6.3.17. Given a metric (X, d). If X is complete, then it is of second category, and hence the complement of a first category set in X is of second category and dense in X Proof. Let Ak be nowhere dense in X for every k ∈ N, and let E = ∪∞ k=1 Ak . We shall show that there is x ∈ X \ E = E c and this will establish the result. Let x(0) ∈ X. Since A1 is nowhere dense, there is a closed ball B1 of radius / Since less than 2−1 inside the closed ball B1 (x(0)) and such that B1 ∩ A1 = 0. A2 is nowhere dense, there is also a closed ball B2 of radius smaller than 1/3 / Continuing this construction produces a such that B2 ⊆ B1 and B2 ∩ A2 = 0. nested sequence of closed balls: {Bk }∞ k=1 such that the radius of Bk is less than −1 (k + 1) . By the proof for Theorem 6.2.10 and its immediate Remark 6.2.11, (0) there is x ∈ ∩∞ / Ak for any k; so x ∈ / E. k=1 Bk ⊆ B1 (x ). But x ∈ c If S is of first category, then the density of S can be worked out by replacing B1 (x(0) ) by Bε (x(0)) in the foregoing argument. Of course, Sc must be second category: Otherwise, X = S ∪ Sc is of first category, a contradiction.

Problems 6.1. For each of the following functions on R2 , determine which of the properties in Definition 6.1.1 (i)-(ii)-(iii) are satisfied: (i) d(x, y) = |x1 − y1 | + |x2 − y2 |;

(ii) d(x, y) = max{|x1 − y1 |, |x2 − y2 |}; (iii) d(x, y) =



((x1 − y1 )2 + (x2 − y2 )2 )2 |x1 − y1 | + |x2 − y2 |

−1

, x2 ≥ y2 . , x2 ≤ y2

196

Jie Xiao

6.2. Prove that if x(1) , ..., x(m) are points in a metric space (X, d), then d(x(1), x(m) ) ≤ ∑mj=2 d(x( j) , x( j−1) ). 6.3. Given n ∈ N.

(i) Let dn∗ (x, y) = ∑nk=1 |xk − yk | for x, y ∈ Rn . Prove that dn∗ satisfies Definition 6.1.1(i)-(ii)-(iii). In addition, show that dn (x, y) ≤ dn∗ (x, y) for x, y ∈ Rn .

(ii) Let d∗,n (x, y) = max1≤k≤n |xk − yk | for x, y ∈ Rn . Prove that d∗,n satisfies Definition 6.1.1(i)-(ii)-(iii). Discuss the relationship between dn and d∗,n . 6.4. Prove that A◦ and A are the largest open subset of A and the smallest closed set containing A, respectively. 6.5. In R2 , show that the closure of B1 (0) is B1 (0). 6.6. Suppose (X, d) is a metric space. Prove that if S is a finite subset of X, then S is disconnected. 6.7. Prove that, if f , g > 0 on R so that their graphs G1 = {x ∈ R2 : x2 = f (x1 )} and G2 = {x ∈ R2 : x2 = −g(x1 )} are sets in R2 , then S = G1 ∪ G2 is disconnected. 6.8. Decide whether or not each of the following sequences in R2 :  (i) x(k) = (k − 1)k−1 , (k + 1)k−1 ;

(ii) x(k) = (k−1 , sinπk);

(iii) x(k) = (k−1 , k); converges and, if so, find its limit. 6.9. Let X be the interval (0, 1) in R with the usual distance |x − y|. Find a Cauchy sequence in X that does not converge in X. 6.10. For n ∈ N, prove the following two equivalent statements:

n (i) {x(k)}∞ k=1 is a bounded sequence in R if and only if each coordinate sequence (k) {x j }∞ k=1 ( j = 1, ..., n) is bounded in R;

(ii) A is a bounded set in Rn when and only when there exists an n-cube containing A.

Metric Spaces

197

6.11. Prove that if X is an infinite set and its distance function d is given by  1 , x 6= y d(x, y) = , 0, x=y then every Cauchy sequence in X is eventually constant – there exists an N ∈ N such that x(k) = x(N) whenever k ≥ N. 6.12. For n ∈ N, prove that D = {x = (x1 , ..., xn) ∈ Rn : x1 , ..., xn ∈ R \ Q} is dense in Rn . 6.13. Let X be a metric space with the distance function d. Show that if (k j ) ∞ } j=1 that {x(k)}∞ k=1 is a Cauchy sequence in X and has a subsequence {x converges to a point x ∈ X, then d(xk , x) → 0. 6.14. Let (X, d) be a metric space. Prove that X is complete if and only if every closed nested sequence of closed balls {Brk (xk )}∞ k=1 with rk → 0 has non-void intersection. 6.15. Let (X, d) be a complete metric space. Prove that if {S j }∞j=1 is a sequence of nonempty closed subsets of X with S j+1 ⊆ S j for any j ∈ N and diam(S j ) = / sup{d(x, y) : x, y ∈ S j } → 0 as j → ∞ then ∩∞j=1 S j 6= 0. 6.16. Suppose that (X, d) is a metric space. Show that S ⊆ X is nowhere dense / if and only if every open ball O contains an open ball O0 such that O0 ∩ S = 0. 6.17. Suppose that (X, d) is a metric space. Prove that if Eα ⊆ X is compact for any α ∈ I, then ∩α∈I Eα is compact. 6.18. Let (X, d) be a metric space and limn→∞ d(xn , x0 ) = 0. Prove that {x j }∞j=0 is compact.

Chapter 7

Continuous Mappings Since every metric space is automatically a topological space with the set of all open sets as the metrizable topology, we have a notion of continuous mappings between metric spaces. Note that a function between topological spaces is said to be continuous if the inverse image of every open set is open – this is an attempt to capture the intuition that there are no breaks or separations in the function. Without referring to the topology, this notion can also be directly defined using limits of sequences. In this chapter, we consider properties of continuous mappings, contractions with straightforward applications to integral and differential equations, and equivalence of metric spaces.

7.1 Criteria for Continuity By a mapping on a metric space (X, d), frequently denoted (X, dX ) in what follows, we mean a mapping on the set of points in X, and by a mapping with values in a metric space Y we mean a mapping with values in the set of points in Y . Thus if f : X → Y is a mapping from the metric space X into the metric space Y , then to each point x ∈ X is associated a point f (x) ∈ Y . The mapping f will be called continuous at a point x0 ∈ X if, roughly speaking, points of X that are near x0 are mapped by f into points of Y that are near f (x0 ). Below is the precise definition. Definition 7.1.1. Let (X, dX ) and (Y, dY ) be two metric spaces, let f : X → Y be a mapping, and let x0 ∈ X. Then f is said to be continuous at x0 if, given any

200

Jie Xiao

ε > 0, there exists a δ > 0 such that  x ∈ X and dX (x, x0 ) < δ ⇒ dY f (x), f (x0 ) < ε.

Moreover, if f is continuous at all points in X, then f is said to be continuous on X. Here we want to emphasize that δ depends on ε (as well as x0 ). Therefore, we could more accurately have written δ(ε) instead of δ. We stick to the notation δ rather than δ(ε) for notational simplicity, always bearing in mind that each ε must have its own δ. Example 7.1.2. (i) The mapping f : R → R given by f (x) = x2 is continuous. (ii) Let X be any metric space with the distance d, and x0 a fixed point in X. Then the mapping f : X → R given by f (x) = d(x, x0 ) for all x ∈ X is continuous. This follows from a use of the triangle inequality of d. The special case X = R and x0 = 0 shows that the absolute value function |x| is continuous.

(iii) If f : X → Y is continuous and S is a subspace of X, then the restriction of f to S is continuous on S. This is clear from Definition 7.1.1. One of the curious consequences of Definition 7.1.1 occurs at a point in x0 ∈ S that is not a cluster point of S. Since x0 is an isolated point of S, there is an open ball Bδ (x0 ) that contains no point of S other than x0 . Accordingly, the mapping is always continuous at x0 . Let us have a look at another example. Example 7.1.3. For n, m ∈ N, let T : Rn → Rm be a linear mapping; that is, T(ax + by) = aT(x) + bT(y) for all x, y ∈ Rn . Then there exists an m × n real-valued matrix A = [ak j ] such that y = T(x) = Ax, i.e.,     x1 y1  ·   ·          T  ·  =  ·  ,  ·   ·  xn ym

Continuous Mappings where

201

n

yj =

∑ a jk xk k=1

with ak j ∈ R; k = 1, ..., n; j = 1, ..., m.

Thus T is continuous. To prove this result, we first consider the case m = 1, and see that T : Rn → R is a function. Let e(k), k = 1, ..., n be the standard basis vectors of Rn : Its k-th coordinate is 1 and others are 0. And set Ak = T(e(k)). Since x = ∑nk=1 xk e(k) , by the definition of T we have T(x) =

n

n

k=1

k=1

∑ xk T(e(k)) = ∑ xk Ak .

Thus T is represented by the row matrix A = [A1 , ..., An] whenever x is written as the column matrix   x1  ·     · x=    ·  xn Given m > 1. Then



  T(x) =   

y1 · · · ym



  ,  

where y j is a function of x, say y j = f j (x). The linearity of T implies that T is linear in each coordinate; that is, each f j is a linear function from Rn into R. Therefore, by the first case, each f j is given by a row matrix: n

y j = f j (x) =

∑ a jk xk . k=1

This set of m linear functions f 1 , ..., f m determines the m rows of the matrix [a jk ]. Thus T(x) = Ax.

202

Jie Xiao In order to prove the continuity of T, for    x1  ·        x =  ·  and y =    ·   xl

let

y1 · · · yl



   in Rl  

  −1 m n 2 2   ; MT = ∑ j=1 ∑k=1 a jk 2−1 l 2 kxkl = ∑k=1 xk ;   d (x, y) = kx − yk . l l

Then

kT(x)km =



n

∑ a1k xk k=1

2

+ ... +



n

∑ amk xk k=1

2

!2−1

.

Using the Cauchy-Schwarz inequality, we get n

∑ a jk xk

k=1

!2

n





k=1

a2jk

!

n



k=1

x2k

!

for j = 1, 2, ..., m.

and hence kT(x)km ≤ MT kxkn . Consequently, for any fixed point x0 ∈ Rn we have  dm T(x), T(x0 ) = kT(x) − T(x0 )km ≤ MT dn (x, x0 ). If ε > 0 is given, then we can define δ = ε(1 + MT )−1 , and hence  dm T(x), T(x0) < ε whenever dn (x, x0 ) < δ.

Remark 7.1.4. Note that the choice of δ above does not depend on x0 where the continuity is established. Thus we conclude that linear mappings, like linear functions on R, are uniformly continuous on Rn . Definition 7.1.1 may be reformulated by saying that f is continuous at x0 if, given any open ball in Y of center f (x0 ), there exists an open ball in X centered at x0 whose image under f is contained in the former ball. Clearly, these open balls can be replaced by open sets. The forthcoming criterion for the continuity of a mapping from one metric space into another is often useful.

Continuous Mappings

203

Theorem 7.1.5. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a mapping. Then f is continuous if and only if for every open subset V of Y the inverse image f −1 (V ) = {x ∈ X : f (x) ∈ V } is an open subset of X. Proof. To prove this theorem, first suppose that f is continuous. We have to show that if V ⊆ Y is open, then also f −1 (V) is open. Let x0 ∈ f −1 (V ). Then f (x0 ) ∈ V . Since V is open, it contains the open ball in Y of center f (x0 ) and some radius ε > 0. Since f is continuous  at x0 , there is a δ > 0 such that if x ∈ X and dX (x, x0 ) < δ then dY f (x), f (x0) < ε. This means that if x is in the open ball in X of center x0 and radius δ then f (x) belongs to the open ball in Y of center f (x0 ) and radius ε, so that f (x) ∈ V . That is, f −1 (V ) contains the open ball in X of center x0 and radius δ. Since x0 was any point of f −1 (V), the set f −1 (V) is open. Conversely, suppose that for every open V ⊂ Y , the set f −1 (V ) is an open subset of X. We must  show that f is continuous at any point x0 ∈ X. For any ε > 0 let Bε f (x0 ) ⊆ Y be the open ball of center f (x0 ) and radius ε. Then f −1 Bε f (x0 ) is an open subset of X that contains x0 , and hence contains the open ball in X of center x0 and some radius δ > 0. Thus  x ∈ X and dX (x, x0 ) < δ ⇒ dY f (x), f (x0 ) < ε. Namely, f is continuous at x0 .

Remark 7.1.6. As an immediate consequence of this theorem, we have that if f : X → R is continuous then for any c ∈ R {x ∈ X : f (x) > c} and {x ∈ X : f (x) < c} are open subsets of X. Additionally, it is not hard to see that a continuous mapping of a continuous mapping is a continuous mapping; that is, if f : X → Y and g : Y → Z are continuous, so is the mapping g ◦ f : X → Z. The next result is the sequence criterion for continuity. Theorem 7.1.7. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a mapping. Then f is continuous if and only if for every sequence {xk }∞ k=1 converges to of points in X that converges to x0 ∈ X, the sequence { f (xk )}∞ k=1 f (x0 ) in Y .

204

Jie Xiao

Proof. Suppose first that f is continuous at x0 ∈ X and that {xk }∞ k=1 converges to x0 in X. We have to show that { f (xk )}∞ converges to f (x ) in Y . Given ε > 0, 0 k=1 the continuity of f at x0 implies that there is a δ > 0 such that dY f (x), f (x0 ) < ε whenever x ∈ X and dX (x, x0 ) < δ. Because {xk }∞ k=1 converges to x0 , there exists a positive integer n0 such that dX (xk , x0 ) < δ for all k > n0 . Hence if k > n0 then dY f (xk ), f (x0) < ε, which shows that { f (xk )}∞ k=1 converges to f (x0 ). We now suppose that the statement after the if and only if is valid. If f is not continuous at x0 , then there is some ε0 > 0 such that for no number δ > 0 it is true that whenever x ∈ X and dX (x, x0 ) < δ then necessarily dY f (x), f (x0 ) < ε. Hence for any k∈ N we can find a point xk ∈ X such that dX (xk , x0 ) < k−1 and dY f (xk ), f (x0 ) ≥ ε0 . Since dX (xk , x0 ) < k−1 , the sequence {xk }∞ k=1 converges  ∞ to x0 . However { f (xk )}k=1 does not converge to f (x0 ) since dY f (xk ), f (x0 ) ≥ ε0 for all k. This completes the proof. Example 7.1.8. Let f be defined on R2 by  0 , x=0 f (x) = f (x1 , x2 ) = . x1 x2 (x21 + x22 )−1 , x = 6 0 (k) ∞ Then as {x(k)}∞ k=1 approaches 0 along the axes, we see that { f (x )}k=1 converges to 0. But if x(k) = (k−1 , k−1 ), then f (x(k) ) = 2−1 . Hence f is not continuous at 0.

7.2 Continuous Mappings over Compact or Connected Metric Spaces First, we consider continuous mappings on a compact metric space. Theorem 7.2.1. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a continuous mapping. If X is compact, so is its image f (X). Proof. We must prove that if f (X) = { f (x) : x ∈ X} is covered by the union of a collection of open subsets of Y then it is covered by the union of a finite number of these open subsets. So, suppose that f (X) ⊆ ∪ j∈I V j where I is an index set and V j is open. Because f is continuous, each inverse image f −1 (V j ) is open. Also, for any x ∈ X we have f (x) ∈ V j for some j ∈ I, in which case

Continuous Mappings

205

x ∈ f −1 (V j ), so that X = ∪ j∈I f −1 (V j ). By the compactness of X we can find a finite subset J ⊆ I such that X = ∪ j∈J f −1 (V j ). Therefore f (X) = ∪ j∈J f ( f −1 (V j )) ⊆ ∪ j∈JV j . Thus f (X) is compact. This theorem has two extremely important immediate consequences. Corollary 7.2.2. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a continuous mapping. If X is compact, then f is bounded. Proof. Since f (X) is compact, it is bounded according to Corollary 6.3.9. The example f (x) = x on (0, 1) shows that a bounded continuous real-valued function need not assume either its least upper bound or its greatest lower bound on its domain of definition. If, however, the domain is compact, then these values are actually taken on by the continuous function. Corollary 7.2.3. Let (X, dX ) be a nonempty compact metric space and f : X → R be continuous. Then (i) sup{ f (x) : x ∈ X} = M and inf{ f (x) : x ∈ X} = m are finite.

(ii) There are points x, y ∈ X such that f (x) = M and f (y) = m. Proof. (i) By Theorem 7.2.1 it follows that f (X) is a compact subset of R, hence closed and bounded. Since f (X) is not empty, f (X) has both a greatest and a least element. To prove (ii), by the definition of M we derive that for each k ∈ N there is xk ∈ X such that M − k−1 < f (xk ) ≤ M. Since X is compact, {xk }∞ k=1 has a ∞ convergent subsequence {xk j } j=1 , which converges to a point x ∈ X. Since f is continuous, f (x) = lim j→∞ f (xk j ) = M. The argument for the existence of the point y such that f (y) = m is similar.

Definition 7.2.4. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a mapping. Then f is said to be uniformly continuous if, given any ε> 0, there exists a δ > 0 such that if x, y ∈ X and dX (x, y) < δ then dY f (x), f (y) < ε.

If it happens that a mapping f : X → Y is such that for a certain subset S of X, the restriction of f to S is uniformly continuous, we say that f is uniformly continuous on S.

206

Jie Xiao

Clearly, a uniformly continuous mapping is continuous. To check continuity at a point x0 ∈ X just set y = x0 in the definition. The next theorem will state that conversely if f is continuous then f is actually uniformly continuous, provided X is compact. Theorem 7.2.5. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a continuous mapping. If X is compact, then f is uniformly continuous; equivalently, { f (x j )}∞j=1 is a Cauchy sequence in Y whenever {x j }∞j=1 is a Cauchy sequence in X. Proof. Suppose f is not uniformly continuous on X. Then there are an ε0 > 0 ∞ and sequences {xk }∞ k=1 and {yk }k=1 of points in X such that  dX (xk , yk ) < k−1 and dY f (xk ), f (yk ) ≥ ε0 .

∞ Since X is compact, {xk }∞ k=1 has a subsequence {xk j } j=1 , which converges to a point x ∈ X. Then yk j → x since

0 ≤ dX (yk j , x) ≤ dX (yk j , xk j ) + dX (xk j , x) < k j −1 + dX (xk j , x). Since f is continuous and

we have

   dY f (xk j ), f (yk j ) ≤ dY f (xk j ), f (x) + dY f (yk j , f (x) ,

   lim j→∞ dY f (xk j ), f (x) = 0; lim j→∞ dY f (yk j ), f (x) = 0;    lim j→∞ dY f (xk j ), f (yk j ) = 0,  contradicting dY f (xk j ), f (yk j ) ≥ ε0 . Next, we prove the equivalence. The above argument indicates that if f is not uniformly continuous on X, then the sequence {xk1 , yk1 , xk2 , yk2 , ...} is a Cauchy sequence in X but { f (xk1 ), f (yk1 ), f (xk2 ), f (yk2 ), ...} is not a Cauchy sequence in Y . On the other hand, if f is uniformly continuous, then we conclude thatfor any ε > 0 there is a δ > 0 such that dX (x, y) < δ implies dY f (x), f (y) < ε. Now if {x j }∞j=1 is a Cauchy sequence in X then for the above obtained δ > 0 there existsn0 > 0 such that dX (x j , xk ) < δ as k, j > n0 and consequently, dY f (x j ), f (xk) < ε, i.e., { f (x j )}∞j=1 is a Cauchy sequence in Y .

Continuous Mappings

207

Remark 7.2.6. The compactness of the domain is important. For example, the function f , defined by f (x) = x−1 on (0, 1], is continuous but is not uniformly continuous on the noncompact set (0, 1]. The problem, of course, lies at the left end-point of the interval. The function values are arbitrarily large in every open ball of center 0. Secondly, we discuss continuous mappings on a connected metric space. Theorem 7.2.7. Let (X, dX ) and (Y, dY ) be two metric spaces and let f : X → Y be a continuous mapping. If X is connected, so is its image f (X). Proof. If f (X) is not connected, then f (X) = A ∪ B, where A and B are disjoint nonempty open subsets of Y . It is clear that f −1 (A) and f −1 (B) are disjoint nonempty open subsets of X. We therefore have the expression of X as X = f −1 (A) ∪ f −1 (B). This contradicts the fact that X is connected, proving the theorem. Below is the intermediate value property as a consequence of the above theorem. Example 7.2.8. Suppose [a, b] is a finite interval of R. Let f : [a, b] → R be continuous with f (a) < f (b). If f (a) < y < f (b), then there is ζ ∈ [a, b] such that f (ζ) = y. In fact, since [a, b] is connected, so is f ([a, b]). Note that any open subset of R which contains a and b and does not contain some point between a and b is not connected: for suppose that a < c < b and S is an open subset of R with a, b ∈ S, c ∈ / S, then S = (S ∩ {x ∈ R : x < c}) ∪ (S ∩ {x ∈ R : x > c}) expresses S as the union of two disjoint nonempty open subsets. In other words, any connected subset of R contains all points between any two of its points. Since y is between f (a) and f (b) in f ([a, b]) which is connected, we therefore have y ∈ f ([a, b]).

7.3 Sequences of Mappings Definition 7.3.1. Let (X, dX ) and (Y, dY ) be two metric spaces, and for k ∈ N let f k : X → Y be a mapping. We say that:

208

Jie Xiao

∞ (i) { f k }∞ k=1 converges at x ∈ X provided that { f k (x)}k=1 converges in Y ;

∞ (ii) { f k }∞ k=1 converges on X, provided that { f k (x)}k=1 converges at any x ∈ X;

(iii) { f k}∞ k=1 converges to a mapping f : X → Y , denoted f = limk→∞ f k , provided that f (x) = limk→∞ f k (x) whenever x ∈ X. Example 7.3.2. (i) Let f k : [0, 1] → R be given by f k (x) = x − k−1 x. Then limk→∞ f k (x) = x. (ii) Let f k : [0, 1] → R be given by f k (x) = xk . Then ( 0 , x ∈ [0, 1] lim f k (x) = f (x) = . k→∞ 1, x = 1 Note that f k is continuous, but f is not. Definition 7.3.3. Let (X, dX ) and (Y, dY ) be two metric spaces, for k ∈ N let f k : X → Y be a mapping, and let f : X → Y be another mapping. Then we say that { f k}∞ to f on X if, given any ε > 0, there is an n0 ∈ N k=1 converges uniformly  such that dY f k (x), f (x) < ε whenever k > n0 for all x ∈ X. If the restrictions of { f k }∞ k=1 to a certain subset S of X converge uniformly to some mapping on S, we say that { f k }∞ k=1 converges uniformly on S. Uniform convergence clearly implies convergence. Having a look at Example 7.3.2, we have that the first one is uniform, but not the second. Moreover, we have the following Cauchy criterion for uniform convergence. Theorem 7.3.4. Let (X, dX ) and (Y, dY ) be two metric spaces, let Y be complete, and for k ∈ N let f k : X → Y be a mapping. Then { f k }∞ k=1 converges uniformly to f on X when  and only when for any ε > 0 there is an n0 ∈ N such that dY f k (x), f l (x) < ε whenever k, l > n0 for all x ∈ X. Proof. It is enough to verify the sufficiency. Suppose that the statement after the when and only when is true. So, for any x ∈ X, { f k (x)}∞ k=1 is a Cauchy sequence in Y . Since Y is complete, this sequence has a limit. Thus { f k}∞ k=1 converges to some mapping f . Given ε > 0, choose an integer n so that we 0  have dY f k (x), f l (x) < 2−1 ε whenever k, l > n0 , for all x ∈ X. Then for any fixed k > n0 and fixed x ∈ X the sequence { f l (x)}∞ l=1 is such that all terms after −1 the n0 -th are within distance 2 ε of f k (x), and are therefore in the closed ball in Y of center f k (x) and radius 2−1 ε. Since closed balls are closed sets, the limit

Continuous Mappings

209

f (x) of the convergent sequence { f l (x)}∞ l=1 is also in this closed ball, so that  −1 dY f k (x), f (x) ≤ 2 ε. Hence if k > n0 we have dY f k (x), f (x) < ε for all x ∈ X, proving uniform convergence. Interestingly, the continuity is preserved under the uniform convergence. Theorem 7.3.5. Let (X, dX ) and (Y, dY ) be two metric spaces and let { f k }∞ k=1 be a uniformly convergent sequence of continuous mappings from X into Y . Then f = limk→∞ f k is continuous.  Proof. Fix x0 ∈ X. Let ε > 0. Take an n0 ∈ N such that dY f k (x), f (x) < 3−1 ε for all x ∈ X, which is possible by the uniform convergence. Since each f k is continuous atx0 , there is a δ > 0 such that if x ∈ X and dX (x, x0 ) < δ then dY f k (x), f k(x0 ) < 3−1 ε. Hence if x ∈ X and dX (x, x0 ) < δ we have     dY f (x), f (x0 ) ≤ dY f (x), f k (x) + dY f k (x), f k (x0 ) + dY f k (x0 ), f (x0 ) < ε.

Thus f is continuous at x0 . We are done.

If f 1 and f 2 are mappings from X into Y , it is natural to try to find some measure of the extent to which f 1 and f 2 differ, that is to find some sort of distance between f 1 and f 2 . For any specific x ∈ X we may say that f 1 and f 2 differ at x by the distance between their values at x, that is by dY f 1 (x), f 2 (x) , but we would really like to measure how much f 1 and f 2 differ over all points of X, not just at x. There are various ways to do this, depending on the circumstances and purposes in mind, but the most simple-minded method turns out to be one It is to take the distance between f 1 and f 2 to be  of the most useful.  max dY f 1 (x), f 2(x) : x ∈ X if this maximum happens to exist. In order to develop this idea we turn aside for a simple lemma. Lemma 7.3.6. Let (X, dX ) and (Y, dY ) be two metric spaces and let f 1 and f2 be continuous mappings from X into Y . Then the function x 7→ dY f 1 (x), f 2 (x) is continuous on X. Proof. This follows from the triangle inequality based estimate below:   dY f1 (x), f2 (x) − dY f1 (x0 ), f2 (x0 )     ≤ dY f1 (x), f2 (x) − dY f1 (x), f2(x0 ) + dY f1 (x), f2(x0 ) − dY f1 (x0 ), f2 (x0 )   ≤ dY f2 (x), f2 (x0 ) + dY f1 (x), f1(x0 ) .

210

Jie Xiao

Now we consider the set C(X,Y ) of all continuous mappings from X into Y . If X is compact, then it is clear that for f 1 , f 2 ∈ C(X,Y ) the following notion   dC(X,Y ) ( f 1 , f 2 ) = max dY f 1 (x), f 2(x) : x ∈ X

makes sense since any continuous real-valued function on a compact metric space attains a maximum. Moreover, it is easy to prove that C(X,Y ) is a metric space under dC(X,Y ) (·, ·). However, it is abstract in the sense that its points are mappings on another metric space. A sequence of points in C(X,Y ) is a sequence of continuous mappings { f k }∞ k=1 from X into Y . This sequence will converge to a point f ∈ C(X,Y ) if and only if limk→∞ dC(X,Y ) ( f k, f ) = 0; in other words, if and only if for each ε > 0 there is an n0 ∈ N such that for any integer k > n0 one has dC(X,Y ) ( f k , f ) < ε; that is, dY f k (x), f (x) < ε for all x ∈ X. Equivalently, { f k }∞ k=1 converges uniformly to f . An application of the last two theorems gives the following result covering the well-known Arzela-Ascoli theorem which comes about as this is a generalization going back to Cesare Arzel´a’s theorem – if f n is a uniformly bounded sequence of Riemann integrable functions that converge pointwise to a Riemann integrable function f on the finite interval [a, b], then Z b a

f (x)dx = lim

Z b

n→∞ a

f n (x)dx.

Theorem 7.3.7. Let (X, dX ) and (Y, dY ) be compact and complete metric spaces respectively. Then the following two results are true. (i) C(X,Y ) is a complete metric space with respect to the distance dC(X,Y ) (·, ·).

(ii) A subset E of C(X,Y ) is compact if and only if E is bounded, closed, and equi-continuous in the sense that for any ε > 0 there is a δ > 0 such that  dX (x1 , x2 ) < δ ⇒ dY f (x1 ) − f (x2 ) < ε for all f ∈ E.

Proof. (i) Suppose { f k }∞ k=1 ⊆ C(X,Y ) is a Cauchy sequence. Then for any ε > 0 there is an n0 ∈ N such that if k, l > n0 then dC(X,Y ) ( f k, f l ) < ε; that is, dY f k (x), f l (x) < ε for all x ∈ X. Since Y is complete, Theorem 7.3.4 is applied to derive that { f k }∞ k=1 converges uniformly on X to some mapping f : X → Y . However, Theorem 7.3.5 indicates that f is continuous. Thus f ∈ C(X,Y  ) and limk→∞ f k = f in the sense of points of the metric space C(X,Y ), dC(X,Y ) . Thus C(X,Y ) is complete under dC(X,Y ) (·, ·).

211

Continuous Mappings

(ii) Let E be a subset of C(X,Y ). If E is compact, then E is automatically bounded and closed. Moreover, E is totally bounded by Theorem 7.3.7 and Corollary 6.3.6, and hence for ε > 0 there are finite many open S C(X,Y ) C(X,Y ) balls {B3−1 ε ( f k )}nk=1 of C(X,Y ) such that E ⊆ nj=1 B3−1 ε ( f k ). Since each f k : X → Y is uniformly continuous, there exists a δk > 0 such that  dX (x1 , x2 ) < δk ⇒ dY f k (x1 ), f k(x2 ) < 3−1 ε. Let δ = min{δk : k = 1, 2, ..., n}. Then for a given f ∈ E, choose an f k such that dC(X,Y ) ( f , f k) < 3−1 ε. Hence, dX (x1 , x2 ) < δ yields     dY f (x1), f (x2) ≤ dY f (x1), fk (x1 ) + dY fk (x1 ), fk (x2 ) + dY fk (x2 ), f (x2) < ε.

Conversely, assume the statement after the if and only if. Since E is closed and C(X,Y ) is complete, according to Corollary 6.3.6 and Theorem 6.3.5 it remains to check that each sequence { f k }∞ k=1 in E has a Cauchy subsequence. Since X is separable, there is a sequence {xk }∞ k=1 which is dense in X. Note is bounded in the complete space Y . So that E is bounded and then { f k (x1 )}∞ k=1 (1) this sequence has a convergent subsequence, denoted { f k (x1 )}∞ k=1. Similarly, (2) ∞ (1) ∞ (2) suppose that { f k }k=1 is a subsequence of { f k }k=1 such that { f k (x2 )}∞ k=1 ( j) converges. In this way, we can inductively obtain subsequence { f k }∞ k=1 of ( j) (k) ∞ ∞ { f k }k=1 such that limk→∞ f k (xl ) exists for l = 1, 2, ..., j. Now { f k }k=1 en(k) (k) sures that { f k (x j )}∞ k=1 is convergent for each j. Since f k ∈ C(X,Y ), for ε > 0 there is a δ > 0 such that  (k) (k) dX (x, y) < δ ⇒ dY f k (x), f k (y) < 3−1 ε. Also, note that {xk }∞ k=1 is dense in the compact space X. So, it follows that N X X = ∪l=1 Bδ (xl ) for some natural number n0 , where BXδ (xl ) is the open ball in X (k)

centered at xl with radius δ. Furthermore, the definition of f k yields a natural number m0 such that  (m) (n) m, n > m0 ⇒ dY f m (x j ), f n (x j ) < 3−1 ε for j = 1, 2, ..., n0. Finally, upon choosing x j such that x ∈ BXδ (x j ) for any x ∈ X, we get that if m, n > m0 then  (m) (n) dY f m (x), f n (x)    (m) (m) (m) (n) (m) (n) ≤ dY f m (x), f m (x j ) + dY f m (x j ), f n (x j ) + dY f m (x j ), f n (x) < ε,

212

Jie Xiao (k)

and consequently, { f k }∞ k=1 is convergent to an element of E and hence E is compact. Remark 7.3.8. Theorem 7.3.7 extends (from a closed interval to a compact metric space) Guido Ascoli’s theorem – every bounded equi-continuous sequence of real-valued functions on the unit interval [0, 1] has a uniformly convergent subsequence.

7.4 Contractions In general, a fixed point of a mapping f from a set X to itself is a point x0 ∈ X such that f (x0 ) = x0 . For example, 0 is a fixed point of the mapping f (x) = x2 + x from R to R. But, not all mappings have fixed points – for instance, the mapping f (x) = x + 1 has no fixed point on R. So, what mapping has a fixed point is an interesting question. In what follows, we introduce an important tool – the so-called Banach fixed point theorem (named after Stefan Banach) – which ensures the existence and uniqueness of fixed points of certain self mappings of metric spaces, but also provides a constructive method to find those fixed points. Definition 7.4.1. Let (X, dX ) be a metric space. A mapping f from X to itself is a contraction if there is an α ∈ (0, 1) such that  dX f (x1 ), f (x2 ) ≤ αdX (x1 , x2 ) for all x1 , x2 ∈ X.

A contraction mapping contracts or shrinks the distance between points by the factor α. Clearly, any contraction map is uniformly continuous on X. This property actually suggests the following Banach’s theorem on contraction mappings and fixed points. Theorem 7.4.2. Let (X, dX ) be a complete metric space. If f : X → X is a contraction mapping, then f has a unique fixed point. Proof. Let x0 ∈ X. Define xk+1 = f (xk ) for k ∈ N ∪ {0}. We claim that {xk }∞ k=1 is a Cauchy sequence in X. In fact  dX (x2 , x1 ) = dX f (x1 ), f (x0 ) ≤ αdX (x1 , x0 ), and so

 dX (x3 , x2 ) = dX f (x2 ), f (x1 ) ≤ αdX (x2 , x1 ) ≤ α2 dX (x1 , x0 ).

Continuous Mappings

213

Generally, one has that if k > j then k−1

dX (xk , x j ) ≤

∑ αidX (x1, x0 ) ≤ α j (1 − α)−1dX (x1 , x0).

i= j

This, together with α ∈ (0, 1), implies {xk }∞ k=1 is a Cauchy sequence and hence it converges to a point x ∈ X: limk→∞ xk = x in X. Since f is uniformly continuous, one has f (x) = lim f (xk ) = lim xk+1 = x; k→∞

k→∞

that is to say, x is a fixed point. Regarding the uniqueness, suppose that y is also a fixed point of f . Then  dX (x, y) = dX f (x), f (y) ≤ αdX (x, y) and hence dX (x, y) = 0; that is, x = y, thanks to α ∈ (0, 1).

Remark 7.4.3. The above proof is constructive in the sense that the fixed point is the limit of the iterates given by xk+1 = f (xk ), where the initial point x0 is an arbitrary point in X. The previous estimate gives the rapidity of the convergence xk → x:  dX (x, x j ) ≤ α j (1 − α)−1 dX f (x0 ), x0 . Below is an interesting example which shows an application of Theorem 7.4.2 in integral equations. Example 7.4.4. For a finite interval [a, b] of R, let k : [a, b] × [a, b] → R be continuous and φ ∈ C[a, b]. An equation of the type: f (x) = λ

Z b a

k(x, y) f (y)dy + φ(x), λ ∈ R,

where f is an unknown function, is called a Fredholm integral equation of the second kind. This equation introduced by Ivar Fredholm has a unique solution f ∈ C[a, b] for certain λ. To check this example, define a mapping T : C[a, b] → C[a, b] by T( f )(x) = λ

Z b a

k(x, y) f (y)dy + φ(x) for λ ∈ R.

214

Jie Xiao

We write the distance function on C[a, b] as dC[a,b] ( f 1 , f 2 ) = max | f 1 (x) − f 2 (x)| for all f 1 , f 2 ∈ C[a, b]. x∈[a,b]

Solving the desired equation is equivalent to showing that T has a fixed point. Now  dC[a,b] T( f 1 ), T( f 2) ≤ |λ| max |k(x, y)|(b − a)dC[a,b]( f 1 , f 2 ). x,y∈[a,b]

Thus, if

|λ| max |k(x, y)|(b − a) < 1, x,y∈[a,b]

then T is a contraction and, therefore, has a unique fixed point by Theorem 7.4.2. As a consequence of Theorem 7.4.2, we can get a more general fixed point theorem. Corollary 7.4.5. Let (X, dX ) be a complete metric space and n − 1 ∈ N. If the n-th iteration f (n)(x) = f ( f ( f (· · ·( f (x))))) is a contraction, then f has a unique | {z } n

fixed point.

Proof. Since f (n) is a contraction mapping, it has a unique fixed point x by Theorem 7.4.2. If α is the contraction constant for f (n), then      dX f (x), x = dX f ( f (n) (x) , f (n)(x) = dX f (n) ( f (x)), f (n) (x) ≤ αdX f (x), x

and hence f (x) = x. If y is another fixed point of f then it is also a fixed point of f (n) since  f (n)(y) = f (n−1) f (y) = f (n−1)(y) = · · · = y.

The uniqueness of the fixed point of f (n) implies y = x.

Remark 7.4.6. In Corollary 7.4.5, f needs not be a contraction. In fact, there is a discontinuous function f from [0, 1] to itself such that its second iteration becomes a contraction. For instance,  −2 2 , x ∈ [0, 2−1] f (x) = ⇒ f (2)(x) = 2−2 for all x ∈ [0, 1]. 2−1 , x ∈ (2−1 , 1]

Continuous Mappings

215

Example 7.4.7. Consider the following Volterra integral equation (VIE), named after Vito Volterra: f (x) = λ

Z x

k(x, y) f (y)dy + φ(x),

a

for which the notations are the same as in Example 7.4.4. If T : C[a, b] → C[a, b] is defined by Z x

k(x, y) f (y)dy + φ(x),

T( f )(x) = λ

a

then the VIE has a unique solution in C[a, b] for any λ ∈ R. To see this assertion, it is enough to show that T in the above has a unique fixed point in C[a, b]. A simple computation implies that if f 1 , f 2 ∈ C[a, b] and M = max |k(x, y)|, x,y∈[a,b]

then   |T( f 1 )(x) − T( f 2 )(x)| ≤ |λ|MdC[a,b]( f 1 , f 2 )(x − a); |T(2)( f 1 )(x) − T(2) ( f 2 )(x)| ≤ 2−1 (|λ|M)2 dC[a,b]( f 1 , f 2 )(x − a)2 ;   |T(n)( f 1 )(x) − T(n) ( f 2 )(x)| ≤ (n!)−1(|λ|M)n dC[a,b] ( f 1 , f 2 )(x − a)n ,

and hence

Since

 n dC[a,b] T(n)( f 1 ), T(n)( f 2 ) ≤ (n!)−1 |λ|(b − a)M dC[a,b] ( f 1 , f 2 ). n lim (n!)−1 |λ|(b − a)M = 0,

n→∞

T(n) is a contraction mapping for large n and, therefore, has a unique fixed point for any value of the parameter λ by Corollary 7.4.5. As a consequence of Remark 7.4.3, we consider a family of contractions that vary continuously with respect to a parameter and prove that the corresponding fixed points also vary continuously. Corollary 7.4.8. Let (X, dX ) be a complete metric space and let (Y, dY ) be a metric space. Suppose that f : Y × X → X is such that f (·, x) is continuous on Y for any point x ∈ X and there exists α ∈ (0, 1) such that  dX f (y, x1), f (y, x2 ) ≤ αdX (x1 , x2 ) for all y ∈ Y and x1 , x2 ∈ X. For y ∈ Y let xy ∈ X be the unique fixed point of the contraction f (y, ·). Then the mapping y 7→ xy is continuous from Y to X.

216

Jie Xiao

Proof. For y ∈ Y , consider the equation x = f (y, x). Let y0 ∈ Y . Construct a solution of the equation starting with the initial point x0 = xy0 so that xk+1 = f (y, xk ) → xy . From Remark 7.4.3 it follows that   dX (xy0 , xy) ≤ (1 − α)−1 dX xy0 , f (y, xy0 ) = (1 − α)−1 dX f (y0 , xy0 ), f (y, xy0 ) so that the mapping y → xy is continuous at y0 since the mapping y 7→ f (y, xy0 ) is continuous at y0 .

As a further application of the contraction mapping theorem, we give the following Charles Emile Picard’s existence and uniqueness theorem for a fristorder system of nonlinear ordinary differential equations based on the Rudolf Lipschitz continuity. Theorem 7.4.9. For n ∈ N equip Rn with the distance s n

dn (x, y) = kx − ykn =

∑ (xk − yk )2

k=1

between both x = (x1 , ..., xn) and y = (y1 , ..., yn) in Rn . Let O ⊆ Rn be open and [a, b] ⊆ R be a finite interval. Suppose f : [a, b] × O → Rn is a continuous and satisfies a Lipschitz condition with the constant κ > 0 below:  dn f (t, y1 ), f (t, y2) ≤ κdn (y1 , y2 ) for all (t, y1 ), (t, y2 ) ∈ [a, b] × O. Then, there exists a δ ∈ (a, b] such that the initial value problem (IVP):  d u(t) = u0 (t) = f t, u(t) subject to u(a) = y0 ∈ Rn dt

has a unique solution in the interval [a, δ].

Proof. Clearly, solving this IVP is equivalent to solving the integral equation u(t) = y0 +

Z t a

 f s, u(s) ds.

So, it is our aim to show the existence and uniqueness of a solution to this integral equation. Assuming that Br (y0 ) ⊆ O is the closed ball centered at y0 with radius r > 0, we have M= max k f (t, y)kn < ∞. (t,y)∈[a,b]×Br (y0 )

Continuous Mappings

217

Now choose δ ∈ (a, b] such that (δ − a)κ < 1 and (δ − a)M ≤ r. Set S be the class of continuous mappings from [a, δ] to Br (y0 ). It is clear that S is complete under the distance defined by  dS (φ1 , φ2 ) = max dn φ1 (t), φ2(t) for all φ1 , φ2 ∈ S. t∈[a,δ]

Define a mapping F : S → S by

F(φ)(t) = y0 +

Z t a

 f s, φ(s) ds.

Obviously, F(φ) is continuously differentiable. If φ ∈ S and t ∈ [a, δ], then

 Z t 

dn F(φ)(t), y0 = f s, φ(s) ds

≤ M|t − a| ≤ (δ − a)M ≤ r. a

n

This implies F(φ) ∈ S. Now, our problem amounts to showing that F has a unique fixed point in S. Thus, it suffices to verify that F is a contraction. To do so, we note that t ∈ [a, δ] and φ1 , φ2 ∈ S imply  Z t  

f s,φ1 (s) − f s,φ2 (s) ds ≤ (δ − a)κdS (φ1 ,φ2 ), dn F(φ1 )(t),F(φ2 )(t) ≤ n a

whence yielding

 dS F(φ1 ), F(φ2) ≤ (δ − a)κdS (φ1 , φ2 ).

Accordingly, the assertion follows. Remark 7.4.10.

(i) Note that any existence theorem for the nonlinear IVP in the above must be local in nature. For example, y0 (t) = y2 (t) subject to y(1) = −1 has the solution y(t) = −1/t, which is not defined at t = 0 even though f (t, y) = y2 is satisfied with the required condition on O ⊆ R which is assumed to be open and bounded.

218

Jie Xiao

(ii) If the Lipschitz condition is dropped, then it is still possible for the IVP to have one and even many more solutions – for instance, the IVP: 1

y0 (t) = y 3 (t) subject to y(0) = 0 has an infinite number of solutions ( 0 , 0≤t ≤c yc (t) =  23 2 ( 3 )(t − c) , c < t ≤ 1

where c ∈ [0, 1].

7.5 Structures of Metric Spaces

It is often necessary to compare two metric spaces and decide in what sense they are equivalent or to analyze how the structure of a metric space changes when changing the metric. Note that every metric space is a set with additional topological structure induced by the metric. So to decide in what sense two metric spaces are equivalent we have to use continuous mappings between them. Definition 7.5.1. Let (X, dX ) and (Y, dY ) be two metric spaces. Then they are called: (i) topologically isomorphic or homeomorphic provided that there exists a homeomorphism between them – a mapping f : X → Y which is bijective and continuous, and has continuous inverse; (ii) uniformly isomorphic provided that there exists a uniform isomorphism between them – a mapping f : X → Y which is bijective and uniformly continuous, and has uniformly continuous inverse; (iii) isometrically isomorphic provided that  there exists a bijective mapping f such that f : X → Y and dY f (x1 ), f (x2) = dX (x1 , x2 ) for all x1 , x2 ∈ X;

(iv) similar provided that there exists a positive constant  κ > 0 and a bijective mapping f such that f : X → Y and dY f (x1 ), f (x2 ) = κdX (x1 , x2 ) for all x1 , x2 ∈ X.

(v) equivalent provided that as sets X = Y and there exist two positive constants κ1 and κ2 independent of all x1 , x2 ∈ X such that κ1 dX (x1 , x2 ) ≤ dY (x1 , x2 ) ≤ κ2 dX (x1 , x2 ). The following example helps us grasp Definition 7.5.1.

Continuous Mappings

219

Example 7.5.2. (i) Let f : [−2−1 π, 2−1 π] → [−1, 1] be defined by f (x) = sinx. Then it is a uniform isomorphism, and hence any two finite closed subintervals of R are uniformly isomorphic. (ii) For x = (x1 , ..., xn), y = (y1 , ..., yn) ∈ Rn and n ∈ N, recall s n

dn (x, y) = kx − ykn =

∑ |xk − yk |2

k=1

and let d∞(x, y) = kx − yk∞ = max{|xk − yk | : k = 1, ..., n}. Then dn (x, y) ≤ d∞ (x, y) ≤ ndn (x, y).

This means that (Rn , dn ) and (Rn , d∞) are equivalent. Of course, dn and d∞ define the same notions of continuity and convergence and do not need to be distinguished for most purposes. In other words, the identity mapping is uniformly isomorphic from (Rn , dn ) to (Rn , d∞). (iii) In (Rn , dn ), an isometrically isomorphic mapping is a rotation (the movement of a body in such a way that the distance between a certain fixed point and any given point of that body remains constant), a reflection (to invert a geometric figure, respect to a line or plane, but not a point), or a translation (to move every point by a fixed distance in the same direction). (iv) (Rn , dn ) is similar to itself. This is because the mapping f (x) = κx +x0 , where κ > 0 is constant and x0 ∈ Rn , is bijective and satisfies dn f (x), f (y) = κdn (x, y) for any x, y ∈ Rn . Both a uniform isomorphism and an isometric isomorphism are certainly homeomorphisms which are not only open mappings (from open sets to open sets) but also closed mappings (from closed sets to closed sets). Intuitively, a homeomorphism not only maps points in the first object that are close together to points in the second object that are close together, but also sends points in the first object that are not close together to points in the second object that are not close together. Topology is the study of those properties of objects that do not change when homeomorphisms are applied. Below is a classical and important result.

220

Jie Xiao

Theorem 7.5.3. For n+1 ∈ N let Rn+1 be equipped with the Euclidean distance s n+1

dn+1(x, y) = kx − ykn+1 =

∑ (xk − yk )2

k=1

for x = (x1 , ..., xn+1) and y = (y1 , ..., yn+1) in Rn+1 . (i) If n = 0, then Rn+1 is homeomorphic to the open interval (−1, 1) ⊆ R.

(ii) If n ≥ 1, then Rn is homeomorphic to the punctured sphere Sn \ {(0, ..., 1)}, where Sn = {x ∈ Rn+1 : kxkn+1 = dn+1(x, 0) = 1} is the compact unit sphere in Rn+1 . Proof. To check (i), we just take f (x) = 2π−1 arctan x. As with (ii), we define the mapping f : Sn \ {(0, ..., 1)} → Rn by letting  y = (y1 , ..., yn) = f (x1 , ..., xn+1) = f (x) be the point in Rn such that

yk = xk (1 − xn+1 )−1 for k = 1, 2, ..., n. Next we define h : Rn → Rn+1 by letting x = h(y) be the point in Rn+1 such that ( 2yk , k = 1, ..., n 1+dn2(y,0) xk = . dn2 (y,0)−1 , k = n+1 d 2 (y,0)+1 n

Note that dn+1(h(y), 0) = 1 and xn+1 6= 1. So, h(Rn ) ⊆ Sn \ {(0, 0, ...,1)}. It is easy to see that both f and h are continuous and satisfy f ◦ h(y) = y and h ◦ f (x) = x. Thus, h is the inverse of f , and f is a homeomorphism of Sn \ {(0, ..., 1)} onto Rn . Regarding the compactness of Sn , we define a function F : Rn+1 → R via F(x) = kxkn+1 . The triangle inequality for dn+1 yields thatF is continuous. With this and the closedness of the single point {1}, we see that Sn = F −1 ({1}) is closed. Since dn+1(0, x) = 1 for x ∈ Sn , Sn is bounded. Being closed, Sn is therefore compact.

221

Continuous Mappings

Problems 7.1. Let f : R3 → R2 be given by f (x) = (x1 , x2 ) for x = (x1 , x2 , x3 ) ∈ R3 . Prove that f is continuous on R3 . 7.2. Given f (x) = dX (x, 0) for every point x in the subset X \ {0} of a metric space X with distance dX , define f (0) so that f is continuous at 0. 7.3. For n ∈ N let f : Rn → R be given by f (x) = (∑nj=1 x2j )2 (x1 , ..., xn) ∈ Rn , and let S = (−3, −2). Find f −1 (S).

−1

for x =

7.4. Define D = {x = (x1 , x2 , x3 ) ∈ R3 : 0 < x3 < x21 + x22 } and let f be the mapping from R3 into R given by  1, x∈D . f (x) = 0, x∈ /D Show that f (x(k)) tends to f (0) as x(k) approaches 0 along any linear path, but f is discontinuous at 0. 7.5. Let T be the linear transformation from R2 into itself such that   T (1, 1) = (3, −1) and T (1, −1) = (1, 7). Find the matrix representation of T.

7.6. Given n ∈ N. If K is a compact subset of Rn , f : K → R is continuous, and f (x) > 0 for any x ∈ K, show that there is a δ > 0 such that f (x) ≥ δ for any x ∈ K. 7.7. Show that the function f : [0, 1] → [0, 1], defined by  x , x ∈ Q ∩ [0, 1] f (x) = 1 − x , x ∈ (R \ Q) ∩ [0, 1] is onto and thus satisfies the intermediate value property. Show, however, that f is continuous only at x = 2−1 . 7.8. Define T : C[0, 1] → C[0, 1] by T( f )(x) = Verify the following three facts:

Z x 0

f (t) dt.

222

Jie Xiao

(i) T is not a contraction; (ii) T has a unique fixed point; (iii) T ◦ T is a contraction. 7.9. Let f : R → R be differentiable with | f 0 (x)| ≤ α, where α ∈ (0, 1). Prove that f is a contraction mapping. 7.10. For a complete metric space (X, dX ) let T : X → X be a mapping and  ∞ dX T(k)(x1 ), T(k)(x2 ) < ∞. ∑ sup dX (x1 , x2 ) k=1 x1 6=x2 · · · ◦ T}. Prove that T has a unique fixed point, where T(k) = T | ◦ T{z k

7.11. Let (X, dX ) be a metric space. Suppose K is a nonempty compact subset of X and the mapping T : K → K satisfies  dX T(x1 ), T(x2 ) < dX (x1 , x2 ) for all x1 6= x2 in K, prove that there exists precisely one point x ∈ K such that x = T(x).

7.12. Let h ∈ C[0, 1]. Show that there is an f ∈ C[0, 1] such that f (x) −

Z x 0

f (x − t) exp(−t 2 ) dt = h(x).

7.13. Let (X, dX ) be a complete metric space and f : D ⊆ X → X = f (D) be a mapping. Suppose that there is a number κ > 1 such that  dX f (x1 ), f (x2 ) ≥ κdX (x1 , x2 ) for all x1 , x2 ∈ D. Prove that there exists exactly one point x ∈ D such that f (x) = x.

7.14. Let (X, dX ) and (Y, dY ) be two metric spaces and f : X → Y bijective. Prove that the following three conditions are equivalent: (i) f is a homeomorphism; (ii) f is continuous, and maps closed subsets of X to closed subsets of Y ; (iii) f is continuous and maps open subsets of X to open subsets of Y .

Continuous Mappings

223

7.15. Suppose that R is equipped with the usual Euclidean distance. Let f : R → R be defined by  x + 3 · 2−1 , x ∈ (−∞, 2−1 ] f (x) = . x−1 , x ∈ (2−1 , ∞) Prove that f is not homeomorphic. 7.16. Let (X, dX ) be a metric space. Prove that there is a complete metric space (Y, dY ) and a mapping f : X → Y such that f is an isometry, i.e.,  dY f (x1 ), f (x2 ) = dX (x1 , x2 ) for all x1 , x2 ∈ X, and f (X) is a dense subset of Y .

7.17. Let (X, dX ) be a metric space and K be a nonempty closed subset of X. Prove that if f : X → R is defined by f (x) = infy∈K dX (x, y) then: (i) f is continuous; (ii) f (x) = 0 if and only if x ∈ K.

Chapter 8

Normed Linear Spaces In contrast to the previous practice of examining functions individually, the theory of functionals is initially treated as studying common and distinct properties of various spaces of functions, but actually it can be considered as proceeding from an interplay between both the linear algebra and the Euclidean topology, resulting in the notion of normed linear space. Accordingly we begin our discussion with this fundamental concept which, when combined with increasingly specialized assumptions, will lead to various increasingly specialized spaces and results. Here we would like to say that our investigation will involve only linear functional analysis whose relation to the developing nonlinear counterpart is similar to and arises from the connection between the Euclidean spaces and manifolds.

8.1 Linear Spaces, Norms and Quotient Spaces From now on, we always use F as either R or C = R + iR – the field of real or complex numbers. Definition 8.1.1. A linear (or vector) space over F is a set X equipped with two functions + : X × X → X and · : F × X → X with the properties below: (i) x + y = y + x for all x, y ∈ X;

(ii) (x + y) + z = x + (y + z) for all x, y, z ∈ X;

226

Jie Xiao

(iii) there is 0 ∈ X such that x + 0 = 0 + x = x for all x ∈ X;

(iv) there is −x ∈ X such that x + (−x) = (−x) + x = 0 for all x ∈ X; (v) α · (β · x) = (αβ) · x for all α, β ∈ F, x ∈ X;

(vi) (α + β) · x = (α · x) + (β · x) for all α, β ∈ F, x ∈ X;

(vii) α · (x + y) = (α · x) + (α · y) for all α ∈ F, x, y ∈ X;

(viii) 1 · x = x for all x ∈ X where 1 is the multiplicative identity in F. Although our main purpose is to study linear spaces over R or C, we frequently state that the spaces are over F whenever both R and C may be handled in the same way. Example 8.1.2. (i) For n ∈ N, Fn is a linear space with the usual vector addition and scalar multiplication over F. (ii) Let X be the set of all polynomials with coefficients in R of degree less than n ∈ N. Then X is a linear space with usual addition of polynomials and scalar multiplication over R. (iii) For m, n ∈ N let Mm,n (C) be the set of all complex-valued m × n matrices. Then Mm,n (C) is a linear space with usual addition of matrices and scalar multiplication over C. (iv) Recall that `∞ represents the set of infinite real-valued sequences (x j )∞j=1 which are bounded: sup j∈N |x j | < ∞. Then `∞ is a linear space over R with: sup |x j + y j | ≤ sup |x j | + sup |y j |; sup |αx j | = |α| sup |x j | < ∞. j∈N

j∈N

j∈N

j∈N

j∈N

(v) Let C(S, R) be the class of all continuous functions f : S → R with ( f + g)(x) = f (x) + g(x) and (α · f )(x) = α f (x). Here S is any nonempty subset of R. Then C(S, R) is a linear space over R. (vi) Let C∞ [a, b] be the space of all infinitely differentiable real-valued functions on the finite interval [a, b] of R. Then it is linear space over R with the same addition and scalar multiplication as in (v). (vii) For any closed interval [a, b] ⊂ R and any increasing function g : R → R, the classes R[a, b], RSg [a, b] and LRSg ([a, b]) are also linear spaces over R with the same addition and scalar multiplication as in (v).

Normed Linear Spaces

227

For convenience, we will drop the special notation +, · for vector addition and scalar multiplication, and simply refer to X as a linear space over F. Moreover, if F = R then we will say that X is a real linear space; whereas if F = C, then we will say that X is a complex linear space. As in the linear algebra of finite-dimensional vector spaces, we have the concept of a linear subspace which is often called simply a subspace (when the context serves to distinguish it from other kinds of subspaces) and plays an important role in functional analysis and related fields of mathematics. Definition 8.1.3. Let X be a linear space over F. (i) A subset Y ⊆ X is called a linear subspace of X provided that α, β ∈ F and x, y ∈ Y ⇒ αx + βy ∈ Y. (ii) If Y = {x + x0 : x ∈ S} where S is a linear subspace of X and x0 is a fixed element of X, then Y is called an affine subset of X. Example 8.1.4. (i) For n − 2 ∈ N, the vectors in Rn of the form (x1 , x2 , x3 , 0, ..., 0) form a linear subspace of Rn . (ii) For m, n ∈ N and m ≤ n, the set of polynomials of degree ≤ m forms a linear subspace of the set of polynomials of degree ≤ n. (iii) If Y = {(x1 , x2 , 1, ..., 1) ∈ Rn } for n − 2 ∈ N, then Y is an affine subset of Rn .

(iv) In Example 8.1.2 (iii), let Y be the set of matrices with certain blocks of 1’s. Then Y is an affine subset of Mm,n (C). (v) In R3 all lines and planes through the origin are linear subspaces, whereas all lines and planes not passing through the origin are affine subsets. A fundamental concept for linear spaces is that of dimension. To see this, suppose that X is a linear space over F and n ∈ N. Elements e1 , e2 , ..., en of X are linearly dependent provided there are scalars α1 , α2 , ..., αn (not all zero) such that ∑nj=1 α j e j = 0. If there is no such set of scalars, then they are linearly independent. The linear span of the vectors {e j }nj=1 is the linear subspace of X as follows:  n  n span{e j } j=1 = span{e1 , ..., en} = ∑ α j e j : α j ∈ F . j=1

228

Jie Xiao

In general, the linear span of a subset S of X is defined to be the set of all finite linear combinations of elements in S; that is, the intersection of all linear subspaces of X containing S. Definition 8.1.5. Let n ∈ N. If the linear space X is equal to the space spanned by a linearly independent set of n vectors in X, then X is said to have the dimension n. If there is no such set of vectors, then X is infinite-dimensional. Furthermore, a linearly independent set of vectors that spans X is called a basis for X. Example 8.1.6. (i) For n ∈ N, the space Rn has dimension n; the standard basis is given by the vectors e1 = (1, 0, ...,0), e2 = (0, 1, 0, ..., 0), ...,en = (0, 0, ..., 0, 1). (ii) For n ∈ N, the set {1,t,t 2, ...,t n} is a basis for the linear space of real-valued polynomials of degree ≤ n which has dimension n + 1. (iii) All linear spaces given in Example 8.1.2 (iv)-(vii) are infinite-dimensional. Next, we equip a linear space with a norm. The norm on a linear space is a way of measuring the length of a vector and hence the distance between two vectors. Definition 8.1.7. Let X be a linear space over F. Then a norm on X is a nonnegative function k · k : X → R with the following three properties: (i) kxk = 0 ⇔ x = 0;

(ii) kx + yk ≤ kxk + kyk for all x, y ∈ X;

(iii) kαxk = |α|kxk for all x ∈ X and α ∈ F.

In this case, (X, k · k) is called a normed linear space. Remark 8.1.8. (i) In the definition we are assuming that | · | denotes the usual absolute value. If k · k is a nonnegative function only with both (ii) – the triangle inequality and (iii) – homogeneity, then it is called a semi-norm. For instance, k · k p,mg ,E defined in Example 6.1.3 (iii) is a semi-norm whenever p ∈ [1, ∞). (ii) Whenever (X, k · k) is a normed linear space over F, defining dX (x1 , x2 ) = kx1 − x2 k for x1 , x2 ∈ X

Normed Linear Spaces

229

we find that (X, dX ) is a metric space since dX is a distance function which can be sufficiently demonstrated by dX (x1 , x3 ) = kx1 − x3 k ≤ kx1 − x2 k + kx2 − x3 k = dX (x1 , x2 ) + dX (x2 , x3 ). (iii) Clearly, if Y is a linear subspace of the linear space X (over F) which is equipped with norm k · k, then (Y, k · k) is also a normed linear space and hence it can be regarded as a linear subspace of (X, k · k). Example 8.1.9.

q (i) For n ∈ N and x = (x1 , ..., xn) ∈ Rn let dn (x, 0) = ∑nj=1 |x j |2 . Then dn (·, 0) defines a norm on Rn – the only difficulty to verify this fact is the triangle inequality – for this we use the Cauchy-Schwarz inequality: s   n  n n 2 . 2 |x y | ≤ |y | |x | ∑ j ∑ j j ∑ j j=1

j=1

j=1

(ii) For n ∈ N, there are many other norms on Rn , called the (n, p)-norms. More precisely, if p ∈ [1, ∞), then kxkn,p =



n

∑ |x j |

j=1

p

 1p

is a norm on Rn . To see this, it suffices to verify the triangle inequality, i.e., the following Hermann Minkowski’s inequality (see also Problem 8.1 (i)) kx + ykn,p ≤ kxkn,p + kykn,p for x, y ∈ Rn . If p = ∞, then kxkn,∞ = sup |x j | 1≤ j≤n

Rn .

defines a norm on It is conventional to write `np for these normed spaces. Note that `np and `nq have exactly the same elements which are just ones of Rn . (iii) Recall that `∞ is the linear space of bounded infinite sequences of real numbers. For x = (x j )∞j=1 ∈ `∞, define ( 1 ∑∞j=1 |x j | p p , 1 ≤ p < ∞ . kxk p = sup1≤ j 0 such that k · k(3) ≤ ck · k, i.e.,

n

n



∑ |α j| ≤ c ∑ α je j . j=1

j=1

We may assume that kxk(3) = 1 by dividing the last inequality by ∑nj=1 |α j |. Now if the last inequality is not true then there would be a sequence {yk }∞ k=1 where n

yk =

n

∑ αk, je j

j=1

with kyk k(3) =

∑ |αk, j| = 1 j=1

but lim kyk k = 0. k→∞

Note that |αk, j | ≤ 1 for j = 1, ..., n and every k. So there is a subsequence of {yk }∞ k=1 , denoted itself also, such that limk→∞ αk, j exists and equals, say, α j for j = 1, 2, ..., n as well as

n

lim yk − ∑ α j e j ≤

k→∞

j=1

 max ke j k lim

1≤ j≤n

n

∑ |αk, j − α j | = 0.

k→∞ j=1

236

Jie Xiao

Consequently, n

n

∑ |α j| = 1

j=1

while

∑ α j e j = 0.

j=1

{e j }nj=1

This is impossible, since are linearly independent. This completes the argument. (ii) It suffices to show that X is topologically isomorphic to Fn with the following Euclidean norm:  n 2−1 2 kαkn,2 = ∑ |αk | for α = (α1 , ..., αn) ∈ Fn . k=1

By the argument for (i), there is an isomorphism T from X onto Fn . We can use this map to define a new norm on X as follows: For each x ∈ X let |||x|||X = kT(x)kn,2 . When X is given this new norm, T becomes a topological isomorphism. Nevertheless, From (i) it turns out that ||| · |||X is equivalent to any norm k · kX on X. Hence T is a topological isomorphism from (X, k · kX ) onto Fn . Interestingly, Theorem 8.2.3 has some important consequences. Corollary 8.2.4. Let n ∈ N and X be a normed linear space over F. (i) If X is n-dimensional, then it is complete.

(ii) Any n-dimensional linear subspace of X is closed. (iii) If X is n-dimensional, then every bounded closed subset S of X is compact. Proof. (i) Let {ek }nk=1 be a basis for X. Then each x ∈ X can be written as 1 x = ∑nj=1 α j e j , α j ∈ F. Set kxkn,2 = (∑nj=1 |α j |2 ) 2 . This is a norm on X, and by Theorem 8.2.3 it is equivalent to the given norm on X. Accordingly, if (k) (k) (k) ∞ {x(k) = ∑nj=1 α j e j }∞ k=1 is a Cauchy sequence in X, then {(α1 , ..., αn )}k=1 is a Cauchy sequence in Fn which is complete, and hence there exists a point (α1 , ..., αn) ∈ Fn such that (k)

(k)

(α1 , ..., αn ) → (α1 , ..., αn) as k → ∞. n Therefore {x(k)}∞ k=1 is convergent to x = ∑ j=1 α j e j in X, whence X is complete. (ii) Suppose that S is an n-dimensional linear subspace of X. Then S is complete due to (i) and hence any sequence {x(k)}∞ k=1 in S which is convergent to x ∈ X must be a Cauchy sequence in S – this yields x ∈ S and so S is closed.

Normed Linear Spaces

237

(iii) By Theorem 8.2.3 (ii) it follows that X is topologically isomorphic to Fn and hence every bounded closed subset S of X is topologically isomorphic to a compact (i.e., bounded and closed) subset of Fn . Accordingly, S is compact. Even more interestingly, we can say something more about the converse of Corollary 8.2.4 (iii). Theorem 8.2.5. Let (X, k · kX ) be a normed linear space over F.

(i) If S is a proper closed subspace of X, then for each number θ ∈ (0, 1) there exists an element xθ ∈ X such that kxθ kX = 1 and d(xθ , S) = inf kxθ − xkX ≥ θ. x∈S

(ii) If the unit sphere {x ∈ X : kxkX = 1} of X is compact, then X is finite dimensional. Proof. (i) Since S is closed and not equal to X, there exists an x1 ∈ X \ S such that δ = infx∈S kx1 − xkX > 0. Consequently, for any ε > 0 there is an x2 ∈ S such that kx2 − x1 kX < δ + ε. Choosing ε = δ(θ−1 − 1) and xθ = x2 − x1 kx2 − x1 k−1 X , we obtain kxθ kX = 1 and

−1 kx − xθ kX = (kx2 − x1 kX x + x1 ) − x2 X kx2 − x1 k−1 X ≥ δkx2 − x1 kX ≥ θ

for any x ∈ S. Here we have used the fact that S is a linear subspace of X. (ii) Suppose that x1 is any element of X with kx1 kX = 1 and S1 is the subspace spanned by x1 . If S1 = X then X is finite dimensional and we are done. Otherwise, by Theorem 8.2.5 (i), Corollary 8.2.4 (ii) and the fact that S1 is finite dimensional, there exists an x2 ∈ X such that kx2 kX = 1 and d(x2 , S1 ) ≥ 2−1 . Suppose S2 is the linear subspace of X spanned by x1 and x2 . Then S2 is a closed linear subspace of X and consequently, if S2 6= X then by Theorem 8.2.5 (i) there exists an x3 ∈ X such that kx3 kX = 1 and d(x3 , S2 ) ≥ 2−1 . Continuing in this manner, we find inductively that if the closed linear subspace Sn spanned by x1 , ..., xn is not equal to X then by Theorem 8.2.5 (i) there exists an xn+1 ∈ X such that kxn+1 kX = 1 and d(xn+1 , Sn ) ≥ 2−1 . However, this process must stop at some step – for otherwise there would be an infinite sequence {x j }∞j=1 with kx j kX = 1 but kx j − xk kX ≥ 2−1 when j 6= k. Of course, this sequence has no

238

Jie Xiao

convergent subsequence, thereby deriving that the unit sphere of X is not compact (thanks to Theorem 6.3.7 (iv)), a contradiction. Accordingly, X equals some Sk , and the argument is complete. The foregoing discussion reveals that the unit sphere of a given normed linear space is compact if and only if the space is finite dimensional, but also leads naturally to the following concept. Definition 8.2.6. If (X, k · kX ) and (Y, k · kY ) are normed linear spaces over F, then any one of: p

p

1

(i) k(x, y)k = (kxkX + kykY ) p under p ∈ [1, ∞); (ii) k(x, y)k = max{kxkX , kykY }.

may be defined as a norm on the Cartesian product X ×Y .

This certainly does not exhaust all the possible combinations of the norms kxkX and kykY , and yet these are the most commonly used ones. An extension to the Cartesian product of n normed linear spaces is defined in the following manner. Remark 8.2.7. For n − 1 ∈ N let {Xk }nk=1 be a collection of linear spaces over F with norms {k · kXk }nk=1 . If their Cartesian product space is defined as n

∏ Xk = X1 × · · · × Xn =

k=1

with vector addition



(x1 , ..., xn) : x1 ∈ X1 , ..., xn ∈ Xn ,

(x1 , ..., xn) + (y1 , ..., yn) = (x1 + y1 , ...xn + yn ) and scalar multiplication α(x1 , ..., xn) = (αx1 , ..., αxn) for

n

α ∈ F and (x1 , ..., xn), (y1 , ..., yn) ∈ ∏ Xk . k=1

Then it is not hard to show that the

function ∏nk=1 Xk n

k(x1 , ..., xn)k∏nk=1 Xk = is a norm on X.

→ R defined by

∑ kxk kX

k

k=1

Normed Linear Spaces

239

Example 8.2.8. If n = m + k with k, m, n ∈ N, then Rn may be viewed as the Cartesian product of Rm and Rk .

8.3 Bounded Linear Operators In this section, we discuss bounded operators, or equivalently, continuous operators, and inverse operators. To begin with, we give the definition for a linear operator to be bounded. Definition 8.3.1. Let (X, k · kX ) and (Y, k · kY ) be normed linear spaces over F. A linear operator T : X → Y is said to be bounded provided there exists a constant C ≥ 0 such that kT(x)kY ≤ CkxkX for all x ∈ X. Define kTk = kTkX→Y = sup kT(x)kY kxk−1 X . x∈X,x6=0

According to this definition, we have kTk = supkxkX =1 kT(x)kY which follows from

xkxk−1 = 1 for all x 6= 0. X X Example 8.3.2.

(i) Equip `2 with the 2-norm, and let T : `2 → `2 be given by T(x) = (0, x1, x2 , ...) when x = (x1 , x2 , ...). Then it is easy to check that T is a bounded linear operator with kTk = 1. (ii) Choose for both X and Y the real space C[0, 1] with the sup-norm. Define T : X → Y by T( f )(x) = ex f (x), x ∈ [0, 1]. Then T is bounded with kTk = e. The following result tells us that as for a linear operator, the boundedness amounts to the continuity. Theorem 8.3.3. Let (X, k · kX ) and (Y, k · kY ) be normed linear spaces over F and T : X → Y be a linear operator. Then the following statements are equivalent: (i) T is continuous in X; (ii) T is continuous at 0; (iii) T is bounded; (iv) T maps bounded subsets of X to bounded subsets of Y .

240

Jie Xiao

Proof. In the sequel, we will write A ⇔ B for the statement: If A then B and vice versa, but also use the criterion for a linear map T : X → Y to be continuous at a point x ∈ X: kxn − xkX → 0 ⇒ kT(xn ) − T(x)kY = 0. (i)⇔(ii) The implication (i) ⇒ (ii) is trivial. Regarding (ii) ⇒ (i), we suppose that T is continuous at 0 ∈ X. If xn → x then xn − x → 0. Hence T(xn − x) → T(0) = 0 so that T(xn ) → T(x) in Y. (iii)⇔(i) If T is bounded and xn → x in X, then T(xn ) → T(x) as well. It follows that T is continuous at x. Conversely, assume that T is continuous in X. If T is not bounded, then for any n ∈ N there exists a point xn ∈ X with kT(xn )kY ≥ nkxn kX . Let yn = xn (nkxn kX )−1 , so that kyn kX = n−1 → 0. However, kT(yn )kY > 1 and T(0) = 0, contradicting the assumption that T is continuous at 0. (iv)⇔(ii) Suppose that (ii) is true. So T is bounded with kTk ≤ C1 for some constant C1 > 0 due to (iii)⇔(ii). If S is a bounded subset of X, then there is a constant C2 > 0 such that kxkX ≤ C2 for all x ∈ S, and hence kT(x)kY ≤ C1 kxkX ≤ C1C2 for all x ∈ X. That is to say, T(S) is a bounded subset of Y . Conversely, assume that (iv) is true. Given an open ball BYε (0) = {y ∈ Y : kykY < ε} in Y let BX1 (0) = {x ∈ X : kxkX < 1} denote the open unit ball in X. By (iv) it follows that T(BX1 ) is bounded in Y . Thus, there is a λ > 0 such that  T BX1 (0) ⊆ λBYε (0) = {y ∈ Y : kykY < λε}.  This implies T BXλ−1 (0) ⊆ BYε since T is linear, and so T is continuous at 0.

241

Normed Linear Spaces

Example 8.3.4. For n ∈ N equip X = Rn with the (n, 2)-norm and let {e j }nj=1 (for which the j-th coordinate of e j is 1 and others are 0) be the standard basis for Rn . Then any x = (x1 , ..., xn) ∈ Rn has the form n

x=

n

∑ x je j

j=1

with kxk22 =

∑ |x j |2.

j=1

If T : Rn → Y is a linear transformation, where (Y, k · kY ) is a normed linear space over F, then the Cauchy-Schwarz inequality implies kT(x)kY ≤ kxk2



n



j=1

kT(e j )kY2

2−1

.

So, T is bounded with kTk ≤



n

∑ kT(e j )kY2 j=1

2−1

.

Indeed, Example 8.3.4 is a special case of the following result. Theorem 8.3.5. Let (X, k · kX ) and (Y, k · kY ) be two normed linear spaces over F. If X is finite dimensional, then any linear transformation T : X → Y is bounded. Proof. Note that any two norms on a finite dimensional linear space X over F are equivalent. So, we construct a new norm k · k via k · kX and k · kY : kxk = kxkX + kT(x)kY for all x ∈ X. Of course, k · k is a norm on X and so it is equivalent to k · kX . This yields a constant C > 0 with k · k ≤ Ck · kX . It follows that kT(x)kY ≤ kxk ≤ CkxkX for all x ∈ X. In other words, T : X → Y is bounded. Next, we consider some basic algebraic properties of the space of all bounded linear operators. Definition 8.3.6. Let (X, k · kX ) and (Y, k · kY ) be two normed linear spaces over F. Denote by B (X,Y) the set of all bounded linear operators from X to Y . In particular, B (X) = B (X, X).

242

Jie Xiao

Theorem 8.3.7. Let (X, k·kX ), (Y, k·kY ) and (Z, k·kZ ) be normed linear spaces over F. Then (i) B (X,Y ) is a linear space over F with respect to the operations for T, S ∈ B (X,Y ): (T + S)(x) = T(x) + S(x) and (αT)(x) = αT(x) for x ∈ X, α ∈ F. (ii) The function k · k : B (X,Y) → R, defined by kTk =

sup x∈X,kxkX 6=0

kT(x)kY kxk−1 X for T ∈ B (X,Y),

is a norm on B (X,Y). (iii) If T ∈ B (X,Y ) and S ∈ B (Y, Z), then the composition ST = S ◦ T belongs to B (X, Z) with kSTk ≤ kSkkTk. (iv) If T, S ∈ B (X) are invertible; that is, there are two elements T−1 , S−1 ∈ B (X) such that TT−1 = T−1 T = I and SS−1 = S−1 S = I, where I stands for the identity element of B (X), then ST ∈ B (X) is invertible with (ST)−1 = T−1 S−1 . Proof. (i) This follows from checking those conditions for a linear space with B (X,Y ). (ii) We have to verify three conditions required for a norm. First, it is clear that kTk ≥ 0. If kTk = 0 then kT(x)kY = 0 for all x ∈ X, and hence T(x) = 0 for all x ∈ X. This gives T = 0. Conversely, T = 0 implies kTk = 0. Next, kαTk = sup kαT(x)kY kxk−1 X = |α|kTk. x∈X,kxkX 6=0

Finally, kT + Sk = ≤

sup x∈X,kxkX 6=0

sup x∈X,kxkX 6=0

kT(x) + S(x)kY kxk−1 X kT(x)kY kxk−1 X +

= kTk + kSk.

sup x∈X,kxk−1 X 6=0

kS(x)kY kxk−1 X

Normed Linear Spaces

243

(iii) Clearly, ST is a linear transformation from X to Z. Since T and S are bounded, we conclude k(ST)xkZ = kS(T(x))kZ ≤ kSkkT(x)kY ≤ kSkkTkkxkX for all x ∈ X. This yields that ST is bounded and so in B (X, Z) with kSTk ≤ kSkkTk, as desired. (iv) This follows from the following formula: (ST)T−1 S−1 = I = T−1 S−1 (ST).  Example 8.3.8. Define T : `2 → `2 by T (x1 , x2 , ...) = (0, x1 , x2 , ...). This map is bounded but not  invertible, since it is clearly not onto. If S : `2 → `2 is given by S (x1 , x2 , ...) = (x2 , x3 , ...), then it is bounded but not invertible, because it is clearly not one to one. Note that ST = I 6= TS.

8.4 Linear Functionals via Hahn-Banach Extension Looking at the Lebesgue-Radon-Stieltjes integrals, we find that if g : R → R is R increasing then Lg ( f ) = R f dmg defines a linear function from the real linear space LRSg (R) into R. This observation suggests that we should work with the so-called linear functionals. Definition 8.4.1. Let X be a linear space over F. A linear map from X to F is called an F-valued linear functional on X. If X is a normed linear space, then X ∗ = B (X, F) and X ∗∗ = (X ∗ )∗ are respectively called the dual space and second dual space of X over F; moreover X is said to be reflexive whenever X ∗∗ and X are isometric. The question is whether or not the class of F-valued linear functionals on X consists only of the zero functional. To settle this question, let us consider the following example of Mahlon Marsh Day type. Example 8.4.2. Let p ∈ (0, 1) and I = [0, 1]. If L is a continuous linear function p on LRSid (I, R) then L must be the zero functional. To see this conclusion, suppose that, to the contrary, L 6= 0. Then, without p loss of generality we may assume that there exists an f ∈ LRSid (I, R) such that L( f ) = 1 and k f k p,mid ,I 6= 0.

244

Jie Xiao p

Hence, f 1[0,x] is continuous in the metric of LRSid (I, R), and consequently, φ(x) = L( f 1[0,x] ) is a continuous function on I since L is continuous. Note that   φ(0) = 0; φ(1) = 1;   f = f 1[0,x] + f 1[x,1] . So there is an x0 ∈ (0, 1) such that

φ(x0 ) = 2−1 and L( f 1[0,x0 ] ) = 2−1 = L( f 1[x0 ,1] ). Since p

p

p

k f 1[0,x0 ] k p,mid ,I + k f 1[x0 ,1] k p,mid ,I = k f k p,mid ,I ,

p

one of two left terms of the last equation is not greater than 2−1 k f k p,mid ,I , for example, p p k f 1[0,x0 ] k p,mid ,I ≤ 2−1 k f k p,mid ,I and then L(2 f 1[0,x0 ] ) = 1 and k2 f 1[0,x0] k pp,mid ,I ≤ 2 p−1 k f k pp,mid ,I . p

Continuing the above argument, we obtain a sequence { f j }∞j=1 in LRSid (I, R) such that for any j ∈ N, p

p

L( f j ) = 1 and k f j k p,mid ,I ≤ 2 j(p−1)k f k p,mid ,I . But, the last inequality cannot hold due to 0 < p < 1, 2 j(p−1) → 0, and L being continuous. Actually, the above question will be answered in great generality using the Hahn-Banach extension theorem which is stated below and named for Hans Hahn and Stefan Banach. Theorem 8.4.3. Let X be a linear space over F, and p : X → [0, ∞) a function with p(x + y) ≤ p(x) + p(y) and p(λx) = |λ|p(x) for all λ ∈ F and x, y ∈ X. If Y is a linear subspace of X and f is an F-valued linear functional on Y with | f (x)| ≤ p(x) for all x ∈ Y , then there is an F-valued linear functional F on X such that F(x) = f (x) for all x ∈ Y and |F(x)| ≤ p(x) for all x ∈ X.

Normed Linear Spaces

245

Proof. First of all, we demonstrate the theorem for F = R. By the hypothesis, we denote by K the set of all pairs (Yα , gα) in which Yα is a linear subspace of X containing Y , and gα is a real linear functional on Yα with gα (x) = f (x) for all x ∈ Y and gα(x) ≤ p(x) for all x ∈ Yα . Make K into a partially ordered set by defining the relation: (Yα , gα)  (Yβ , gβ ) if Yα ⊆ Yβ and gα = gβ on Yα . Clearly, every totally ordered subset {(Yλ, gλ )} of K , for which at least one of (Yα, gα)  (Yβ , gβ ) and (Yβ, gβ )  (Yα, gα ) holds, has an upper bound ∪λYλ on which the functional is given by gλ on each Yλ . By Zorn’s lemma we find that there is a maximal element (Y0 , g0 ) in K . Subsequently, we will prove Y0 = X and g0 = F. If Y0 6= X, then there is a y1 ∈ X \ Y0 . Let Y1 be the linear space spanned by Y0 and y1 ; that is, Y1 = {y + λy1 : y ∈ Y0 , λ ∈ R}. Note that Y0 is a subset of Y1 and that x, y ∈ Y0 ⇒ g0 (y) − g0 (x) = g0 (y − x) ≤ p(y − x) ≤ p(y + y1 ) + p(−y1 − x) ⇒ −p(−y1 − x) − g0 (x) ≤ p(y + y1 ) − g0 (y).

So, it follows that a = sup{−p(−y1 − x) − g0 (x)} ≤ inf {p(y + y1 ) − g0 (y)} = b. y∈Y0

x∈Y0

Now for any number c ∈ [a, b] define g1 (y + λy1 ) = g0 (y) + λc for all (y, λ) ∈ Y0 × R. Then g1 is evidently linear. Furthermore, we have the following three cases: Case 1: If λ = 0, then g1 (y) = g0 (y) ≤ p(y). Case 2: If λ > 0 and y ∈ Y0 , then  g1 (y + λy1 ) = λ g0 (λ−1 y) + c  ≤ λ g0 λ−1 y) + p(λ−1 y + y1 ) − g0 (λ−1 y) = λp(λ−1 y + y1 ) = p(y + λy1 ),

246

Jie Xiao Case 3: If λ < 0, then  g1 (y + λy1 ) = |λ| g0 (|λ|−1 y) − c

≤ |λ| g0 (|λ|−1 y) − g0 (|λ|−1 y) + p(−y1 + |λ|−1 y) = |λ|p(−y1 + |λ|−1 y)



= p(y + λy1 ).

Accordingly, we get g1 (y + λy1 ) = g0 (y) + λc ≤ p(y + λy1 ) for all (λ, y) ∈ R ×Y0 . This is to say, (Y1 , g1 ) ∈ K and (Y0 , g0 )  (Y1 , g1 ) with Y0 6= Y1 . This contradicts the maximality of (Y0 , g0 ). The preceding argument indicates that if x ∈ X and λ = sgnF(x), the real number obeying λF(x) = |F(x)|, then |F(x)| = λF(x) = F(λx) ≤ p(λx) ≤ p(x). Next, we prove the theorem for the case F = C. Let f be a complex linear functional on Y such that | f (x)| ≤ p(x) for x ∈ Y . Then u = ℜ f is clearly real linear on Y and |u(x)| = |ℜ f (x)| ≤ | f (x)| ≤ p(x). According to the above Rcase there is a real-valued linear functional U on X such that U(x) = u(x) for x ∈ Y and |U(x)| ≤ p(x) for x ∈ X. Now if F(x) = U(x) − iU(ix), then ( ℑF(x) = −ℜ(iF(x)) = −U(ix); F(ix) = U(ix) − iU(−x) = U(ix) + iU(x) = iF(x), and consequently, F is a complex linear extension of f (x) = u(x) − iu(ix) to X. Furthermore, if   exp − i arg F(x) , F(x) 6= 0 α = sgnF(x) = , 0 , F(x) = 0 then |F(x)| = αF(x) = F(αx) = U(αx) ≤ p(αx) ≤ p(x).

247

Normed Linear Spaces

Here is a series of important and interesting consequences of the HahnBanach extension theorem applied to normed linear spaces. Corollary 8.4.4. Let (X, k · kX ) be a normed linear space over F and Y a linear subspace of X. (i) To any f ∈ Y ∗ there corresponds an F ∈ X ∗ such that kFk = k f k and F(y) = f (y) for all y ∈ Y. (ii) If x0 ∈ X satisfies infy∈Y ky − x0 kX = d > 0, then there is an F ∈ X ∗ such that F(x0 ) = 1; kFk = d −1 ; F(y) = 0 for all y ∈ Y.

(iii) If Y is not dense in X, then there is a nonzero F ∈ X ∗ such that F(y) = 0 for all y ∈ Y . (iv) If x 6= 0 in X then there is an F ∈ X ∗ such that kFk = 1 and F(x) = kxkX . (v) If y, z ∈ X and y 6= z, then there exists an F ∈ X ∗ such that F(y) 6= F(z). (vi) For any x ∈ X,

kxkX = sup |F(x)|kFk−1 = 06=F∈X ∗

sup F∈X ∗ , kFk=1

|F(x)|.

(vii) If F ∈ X ∗ is nonzero and N = {x ∈ X : F(x) = 0}, then there exists a onedimensional subspace M of X such that X = N + M = {x + y : x ∈ N and y ∈ M} and N ∩ M = {0}. (viii) If X ∗ is separable, so is X. (ix) If Y is closed and Y ⊥ = { f ∈ X ∗ : f (y) = 0 for all y ∈ Y }, then Y ∗ = X ∗ /Y ⊥ and (X/Y )∗ = Y ⊥ in the sense that there exist isometries between the corresponding normed spaces. (x) If X is reflexive and Y is closed, then Y is reflexive. Proof. (i) Given f ∈ Y ∗ , let p(x) = k f kkxkX for all x ∈ X. Then | f (x)| ≤ k f kkxkX = p(x) for all x ∈ Y. Hence from Theorem 8.4.3 it turns out that there exists an extension F ∈ X ∗ with F = f on Y and |F| ≤ p on X. Clearly, kFk = k f k follows from kFk ≤ k f k and kfk =

sup y∈Y, kykX =1

| f (y)| =

sup y∈Y, kykX =1

|F(y)| ≤ kFk.

248

Jie Xiao

(ii) Let Y1 be the linear space spanned by Y and x0 . Since d > 0, we conclude that x0 ∈ / Y and every point x ∈ Y1 may be written uniquely as x = y + λx0 where y ∈ Y and λ ∈ F. Define a linear functional f ∈ Y1∗ by f (y + λx0 ) = λ. Then f (y) = 0 for y ∈ Y and f (x0 ) = 1. If λ 6= 0 and x = y + λx0 , then kxkX = ky + λx0 kX = |λ|kλ−1 y + x0 kX ≥ |λ|d = | f (x)|d, and hence k f k ≤ d −1 . Pick a sequence (yn )∞ n=1 in Y with kx0 − yn kX → d. Then 1 = f (x0 − yn ) ≤ k f kkx0 − yn kX → dk f k, so k f k ≥ d −1 . Therefore k f k = d −1 . Accordingly, a direct application of (i) produces an F ∈ X ∗ such that F(x) = f (x) when x ∈ Y1 , and kFk = k f k, as desired. (iii) Since Y is not dense in X, we conclude that there is an x0 ∈ X such that infy∈Y ky − x0 kX = d > 0. An application of (ii) produces the conclusion in (iii). (iv) Just apply (ii) with Y = {0} and x ∈ X to get an f ∈ X ∗ such that k f k = −1 kxkX and f (x) = 1. We may then take F = kxkX f . (v) Apply (iv) to x = y − z. (vi) If x = 0 then the assertion is trivial. So, suppose x 6= 0. Then sup F∈X ∗ , kFk=1

|F(x)| ≤ kxkX .

Meanwhile, by (iv) there is an f ∈ X ∗ such that f (x) = kxkX and k f k = 1, thus sup F∈X ∗ , kFk=1

|F(x)| = kxkX .

(vii) If F 6= 0 then there is a point x0 6= 0 such that F(x0 ) = 1. Note that any element x ∈ X can then be written as x = x − λx0 + λx0 with λ = F(x). So, if Y = {λx0 : λ ∈ F} then the desired decomposition follows right away. It is clear that Y is the one-dimensional space spanned by x0 . If x ∈ N ∩Y , then x = λx0 and 0 = F(x) = λF(x0 ) = λ

Normed Linear Spaces

249

and hence x = 0. (viii) Suppose that { f j }∞j=1 is a dense subset of the unit sphere { f ∈ X ∗ : k f k = 1} and choose x j ∈ {x ∈ X : kxkX = 1} to obey | f j (x j )| ≥ 2−1 . Let Y = {x j }∞j=1 . If Y 6= X, then the just-verified (iii) gives a functional f ∈ X ∗ such that k f k = 1 and f (x) = 0 for all x ∈ Y . Now if ε ∈ (0, 2−1), then there exists a functional f j0 ∈ { f j }∞j=1 such that x j0 ∈ Y and 2−1 ≤ | f j0 (x j0 )| = | f (x j0 ) − f j0 (x j0 )| ≤ k f − f j0 k < ε, a contradiction. Accordingly, Y = X, i.e., X is separable. (ix) Suppose f ∈ Y ∗ . By (i) there is an F ∈ X ∗ such that F = f on Y and kFk = k f k. Then φ(F) = F +Y ⊥ is a well-defined element in X ∗ /Y ⊥ . It is easy to prove that φ establishes an isometry between Y ∗ and X ∗ /Y ⊥ . As with the second isometry, we define ψ( f )(x) = f (x + Y ) for any f ∈ (X/Y )∗ , and then find that ψ : (X/Y )∗ → Y ⊥ is an isometry and so (X/Y )∗ and Y ⊥ are isometric. (x) According to (vi), every x ∈ X defines a unique element x˜ in the second dual X ∗∗ of X and kxk ˜ = kxkX , and consequently, e· is always linear and an e but also isometry from X into X ∗∗ . Therefore, X is not only identified with X ∗∗ regarded as a linear subspace of X . Since X is reflexive, every functional in X ∗∗ just arises in the fashion x(L) ˜ = L(x) for L ∈ X ∗ and x ∈ X; that is, the mapping e· is surjective. Suppose that Y is closed. Then (ix) tells us that Y ∗ = X ∗ /Y ⊥ and ∗ (X /Y ⊥ )∗ = (Y ⊥ )⊥ = { f ∈ X ∗∗ : f (Y ⊥ ) = {0}}. Since X is viewed as a linear subspace of X ∗∗ , any element y ∈ Y induces f (y) = 0 for all f ∈ Y ⊥ = { f ∈ X ∗ : f (Y ) = {0}} and consequently, y is a member of (Y ⊥)⊥ . Meanwhile, if there is an element y0 in (Y ⊥ )⊥ \ Y , then the closedness of Y ensures infy∈Y ky − y0 kX = δ > 0, and hence by (ii) there exists an f ∈ X ∗ such that f (y0 ) = 1, f (y) = 0 for all y ∈ Y (this implies f ∈ Y ⊥), and k f k = δ−1 . Note that X is reflexive. So y0 ∈ (Y ⊥ )⊥ and f ∈ Y ⊥ yield f (y0 ) = 0, contradicting f (y0 ) = 1. Therefore, (Y ⊥ )⊥ = Y , and consequently, Y ∗∗ = Y , as desired.

Problems 8.1. Let p ∈ (1, ∞) and q = p(p − 1)−1 . Prove H¨older’s and Minkowski’s inequalities of the discrete forms:

250

Jie Xiao

(i) For vectors (x j )nj=1 and (y j )nj=1 in Rn , one has 

n

n

∑ |x j y j | ≤ ∑ |x j | j=1

and



n

∑ |x j + y j |

p

j=1

1

p



p

j=1



 1p 

n

∑ |x j |

p

j=1

n

∑ |y j |

q

j=1

1

p

+



 q1

n

∑ |y j |

p

j=1

1

p

.

(ii) For real-valued sequences (x j )∞j=1 ∈ ` p and (y j )∞j=1 ∈ `q , one has 





∑ |x j y j | ≤ ∑ |x j | j=1

and





∑ |x j + y j |

j=1

p

 1p



j=1



p

1



∑ |x j |

j=1

p

p



∑ |y j |

q

j=1

 1p

+



1 q



∑ |y j | j=1

p

 1p

.

8.2. Determine the dimensions of the following linear spaces. (i) The set of vectors x = (x1 , ..., xn) in Rn with ∑nj=1 x j = 0; (ii) The set C[0, 1] of all continuous functions f : [0, 1] → R; (iii) The set of all real-valued polynomials on [0, 1]. 8.3. Prove the following two results. (i) On C[0, 1] the sup-norm is not equivalent to any p-norm where 1 ≤ p < ∞; (ii) If C[0, 1] is equipped with the sup-norm, then the unit sphere of C[0, 1] is not compact. 8.4. Prove that if X is a linear space over F then any norm k · k : X → R is continuous on X, but also vector addition and scalar multiplication are continuous whenever X ×Y is equipped with the norm k · kX + k · kY . 8.5. Recall that `∞ is the space of all x = (x j )∞j=1 of real numbers x j with kxk∞ = sup j∈N |x j | < ∞. Prove that if `0 is the class of all elements (x j )∞j=1 ∈ `∞ having only finitely many non-zero coordinates, then `0 is not closed in `∞ . 8.6. Prove that:

251

Normed Linear Spaces (i) lim p→∞ k f k p,mid ,[0,1] = k f k∞,mid ,[0,1] whenever k f k∞,mid ,[0,1] < ∞; R

(ii) [0,1] | f 1 f 2 |dmid ≤ k f 1 k1,mid ,[0,1] k f 2 k∞,mid ,[0,1] with equality if and only if | f 2 (x)| = k f 2 k∞,mid ,[0,1] for mid -a.e. x ∈ [0, 1].

8.7. Prove that if C[0, 1] is equipped with the sup-norm and T : C[0, 1] → R is given by T( f ) = f (0), then T is bounded with kTk = 1. 8.8. (i) Suppose that the real infinite matrix [ak j ] satisfies supk∈N ∑∞j=1 |ak j | < ∞. Define   ∞ ∞ T : x = (x1 , x2 , ...) → y = T(x) = ∑ a1 j x j , ∑ a2 j x j , ... . j=1

j=1

Prove that T : `∞ → `∞ is bounded and kTk = supk∈N ∑∞j=1 |ak j |. (ii) Let T : `2 → `2 be defined by

T(x) = (0, x1 , x2 , ...) for x = (x1 , x2 , ...) ∈ `2 . Prove kT(x)k2 = kxk2 .

(iii) For a finite interval [a, b] in R, let R1 [a, b] be the Rspace of all real-valued Riemann integrable functions f on [a, b] with k f k1 = ab | f (x)|dx < ∞. Define R T( f )(x) = ax f (t)dt. Prove that T is a bounded linear operator from R1 [a, b] to itself with kTk = b − a. 8.9. Let C(0, 1) be the space of all real-valued continuous functions on (0, 1). 2−1 R Equip C(0, 1) with the 2-norm k f k2 = 01 | f (t)|2dt . Define T : C(0, 1) → C(0, 1) by T( f )(t) = t f (t) for t ∈ (0, 1).

Prove that T is bounded but not invertible.  8.10. Let T : R2 → R2 be given by T (x1 , x2 ) = (x1 , 0). Prove that T is linear, bounded, but not onto, and cannot map open sets to open sets in R2 . 8.11. For a normed linear space X over F, prove the following results: (i) If T ∈ B (X) and T−1 exists and belongs to B (X), then (T−1 )n = (Tn )−1 holds for any n ∈ N.

252

Jie Xiao

(ii) If T, S ∈ B (X) and TS has an inverse in B (X), then T and S must have inverses in B (X). 8.12. Let K be a convex set in a normed linear space (X, k · kX ) over R; that is, x1 , x2 ∈ K and λ ∈ (0, 1) ⇒ λx1 + (1 − λ)x2 ∈ K. If 0 is an interior point of K, then the Minkowski functional p of K is defined by p(x) = inf{λ > 0 : x ∈ λK} where λK = {λx : x ∈ K}. Prove the following results: (i) The Minkowski functional of the unit ball in X is the norm k · kX . (ii) The Minkowski functional p is positive homogeneous; that is, p(κx) = κp(x) for κ > 0 and x ∈ X and subadditive; that is, p(x + y) ≤ p(x) + p(y) for all x, y ∈ X. (iii) For each x0 6∈ K there exists an L ∈ B (X, R) such that L(x) ≤ L(x0 ) for x ∈ K. 8.13. Let X be a real normed linear space. By a hyperplane in X is meant a set of the form H = {x ∈ X : f (x) = r} where f is a continuous linear functional on X and r ∈ R. Prove the following separation theorem: If E1 and E2 are disjoint convex subsets of X and E1 has an interior point, then there is a hyperplane which separates E1 and E2 ; that is, there exist a continuous functional f ∈ B (X, R) and an r ∈ R such that E1 ⊆ {x ∈ X : f (x) ≤ r} and E2 ⊆ {x ∈ X : f (x) ≥ r}. 8.14. Let X be a normed linear space over F and f : X → F be a linear functional. Prove that f ∈ B (X, F) if and only f −1 ({0}) is closed in X. 8.15. Let X be a normed linear space over F. Prove that if M is a closed linear subspace of X and x ∈ X \ M then M + Fx is closed. 8.16. If M is a finite-dimensional subspace of a normed linear space X over F, prove that there is a closed linear subspace N such that M ∩ N = {0} and M + N = X.

Normed Linear Spaces

253

8.17. Krein’s extension theorem. Let X be a normed linear space over R and Y ⊆ X such that if y1 , y2 ∈ Y and c ≥ 0 then y1 + y2 ∈ Y and cy1 ∈ Y . Define a partial order on X by declaring that x1  x2 if and only if x2 − x1 ∈ Y , and call a linear functional f on X Y -positive if f (x) ≥ 0 for x ∈ Y . Let M be a subspace of X such that for each x ∈ X there exists y ∈ M with x  y. Prove that if f is a (Y ∩ M)-positive linear functional on M, then there is a Y -positive linear functional F on X such that the restriction of F on M is the same as f .

Chapter 9

Banach Spaces via Operators and Functionals Banach spaces, named after Stefan Banach who investigated them, are one of the central objects of study in linear functional analysis. Banach spaces are typically infinite-dimensional spaces containing functions. In this chapter we are concerned with the definition of a Banach space followed by some examples, some of the basic algebraic and topological properties of Banach spaces, continuous linear functionals, as well as bounded and compact linear operators over Banach spaces.

9.1 Definition and Beginning Examples As before, F always means the real field R or the complex field C. Definition 9.1.1. A normed linear space over F which is complete in the metric generated by the norm is called a Banach space over F. Clearly, if a linear space is a Banach space under one norm, then it is also a Banach space under any equivalent norm, and hence for each n ∈ N, Rn is complete under any norm. One more beginning example of Banach spaces is listed below.

256

Jie Xiao

Example 9.1.2. (i) For a compact metric space (X, dX ), suppose that C(X, F) is equipped with the sup-norm k f k∞ = supx∈X | f (x)| for f ∈ C(X, F). Then it is a Banach space over F, because of Theorem 7.3.7 (i). (ii) For each p ∈ [1, ∞], let ` p (N, F) be the space of all F-valued sequences  (k) x j = (x j )∞ k=1 j satisfying ( (k)  1 |x j | p p , p ∈ [1, ∞) ∑∞ (k) ∞ k=1 k(x j )k=1 k p = . (k) , p=∞ supk∈N |x j |  Then ` p (N, F), k · k p is a Banach space over F. To see this, assume that (1)

(2)

{x j }∞j=1 is a Cauchy sequence in ` p (N, F), and write x j = (x j , x j , ...). Because of k · k p ≥ k · k∞, for any ε > 0 we may find N ∈ N such that m, n > N ⇒ kxn − xm k p < ε which in turn implies that kxn − xm k∞ < ε, so for each k, (k) (k) |xn − xm | < ε. In other words, if {x j }∞j=1 is a Cauchy sequence in ` p (N, F) (k)

then {x j }∞j=1 is a Cauchy sequence in F. Since F is complete, for each k (k)

we have that |x j − y(k)| → 0 as j → ∞. Note that this does not imply by itself x j → y = (y(1), y(2), ...). However, if we know that {x j }∞j=1 is a Cauchy sequence, then it does. In fact, we prove this for p < ∞ but the p = ∞ case is similar. Fix ε > 0, and use the Cauchy criterion to find an n0 ∈ N such that ∞

n, m > n0 ⇒

(k)

∑ |xn

k=1

(k)

− xm | p < ε. (k)

(k) p Now fix n ∈ N and let m → ∞ to see ∑∞ k=1 |xn − y | ≤ ε. This inequality means 1 kxn − yk p ≤ ε p and y = (y(1) , y(2), ...) ∈ ` p (N, F).

To give another important class of Banach spaces, we need one more definition and theorem. Definition 9.1.3. For a sequence {x j }∞j=1 in a normed linear space (X, k · kX ) over F, we say that the series ∑∞j=1 x j is absolutely convergent provided that ∑∞j=1 kx j kX < ∞. Moreover, a series ∑∞j=1 x j is said to converge to x in X provided that limn→∞ k ∑nj=1 x j − xkX = 0.

257

Banach Spaces via Operators and Functionals

Theorem 9.1.4. Let (X, k · kX ) be a normed linear space over F. Then it is a Banach space if and only if the absolute convergence of ∑∞j=1 x j implies its convergence. Proof. On the one hand, suppose that (X, k·kX ) is a Banach space. Consider the sequence of partial sums sk = ∑kj=1 x j . Since ∑∞j=1 x j is absolutely convergent, we conclude that m

ksm − sk kX ≤



j=k+1

kx j kX → 0 as m > k → ∞.

It follows that {sm}∞ m=1 is a Cauchy sequence, and consequently this sequence converges since X is complete. Thus, ∑∞j=1 x j converges. On the other hand, assume {x j }∞j=1 is a Cauchy sequence in (X, k·kX ). Then for each k ∈ N there is an nk ∈ N such that n, j ≥ nk ⇒ kxn − x j kX < 2−k . Without loss of generality, we may assume nk+1 ≥ nk . This yields that {xnk }∞ k=1 is a subsequence of {xk }∞ . Set y = x and y = x − x when k ≥ 2. Note 1 n k n n 1 k k−1 k=1 that l

l

∑ kyk kX < ky1kX + ∑ 21−k ≤ ky1 kX + 1.

k=1 ∞ l So ∑k=1 kyk kX l=1 is increasing hypothesis, ∑∞ k=1 yk is convergent.



k=2

and bounded, and hence is convergent. By Since ∑lk=1 yk = xnl , it follows that {xnl }∞ l=1 is convergent in (X, k · kX ). This, together with the fact that {xk }∞ k=1 is a Cauchy sequence, infers that {xk }∞ k=1 is convergent. Therefore, (X, k · kX ) is a Banach space.

Remark 9.1.5. It is worthwhile to point out that Theorem 9.1.4 is clearly not true for general normed linear spaces. For example, for j ∈ N and p ∈ [1, ∞) let  , x ∈ [−1, − j −1 ]  0 h j (x) = 1 + jx , x ∈ [− j −1 , 0] and f j = h j+1 − h j .  1 , x ∈ [0, 1] Then



f j ∈ C([−1, 1], R) and

∑ k f j k p,mid ,[−1,1] = j=1



∑ ( j + 1)−1 j=1

j(p + 1)

− 1

p

< ∞,

258

Jie Xiao

but ∑∞j=1 f j = limn→∞(hn+1 − h1 ) with respect to k · k p,mid ,[−1,1] is outside C([−1, 1], R) since limn→∞ khn+1 − 1[0,1] k p,mid ,[−1,1] = 0. This example indicates that C([−1, 1], R) is not complete under k · k p,mid ,[−1,1]. Actually, its completion p is LRSid ([−1, 1], R). Example 9.1.6. Let LRSgp (E, F) be as in Example 8.1.14. Then each LRSgp (E, F) is a Banach space whenever p ∈ [1, ∞]. Actually, in the case p ∈ [1, ∞), the result can be demonstrated via Theorem 9.1.4. As with p = ∞, suppose that { f j }∞j=1 is a Cauchy sequence in LRS∞ g (E, F). Clearly, we may assume | f j (x)| ≤ k f k k∞,mg ,E for all x ∈ E by replacing f j with an equivalent function for which this is true. Then { f j (x)}∞ k=1 is a Cauchy sequence in F for each x ∈ E and we can thus get an F-valued function f on E such that limk→∞ f k (x) = f (x) for each x ∈ E. This in turns yields limk→∞ k f − f k k∞,mg ,E = 0. Interestingly, Example 9.1.6 leads to a criterion for a quotient space to be complete. Theorem 9.1.7. Let (X, k · kX ) be Banach space over F and Y ⊆ X. Then X/Y is a Banach space under the norm kx + Y kX/Y = infz∈x+Y kzkX if and only if Y is closed. Proof. If Y is not closed, then Theorem 8.1.13 tells us that kx + Y kX/Y is not a norm. So, it remains to prove that if Y is closed then X/Y is complete. In doing so, suppose that ∑∞ k=1 (xk + Y ) is an absolutely convergent series in X/Y . For each k ∈ N choose zk ∈ xk +Y such that kzkkX ≤ kxk +Y kX/Y + 2−k . Since X is a Banach space, it follows that ∑∞ k=1 zk is absolutely convergent and thus convergent to some x ∈ X thanks to Theorem 9.1.4. Note that

n 

x +Y − ∑ xk +Y k=1

X/Y

n

≤ x − ∑ zk → 0 as n → ∞. k=1

X

So we obtain that ∑∞ k=1(xk + Y ) converges to x + Y ∈ X/Y . Namely, X/Y is complete due to Theorem 9.1.4. In many situations it makes sense to multiply elements of a Banach space together.

Banach Spaces via Operators and Functionals

259

Definition 9.1.8. Let (X, k · kX ) be a Banach space over F. If there is a multiplication (x, y) 7→ xy from X × X → X such that for any x, y, z ∈ X and α ∈ F, (i) x(yz) = (xy)z;

(ii) x(y + z) = xy + xz; (iii) (x + y)z = xz + yz; (iv) α(xy) = (αx)y = x(αy); (v) kxykX ≤ kxkX kykX ,

then X is called a Banach algebra. Example 9.1.9. (i) (C[0, 1], k · k∞) is a Banach algebra with ( f g)(x) = f (x)g(x). (ii) LRS1id (R, F) is a Banach algebra under the convolution f 1 ∗ f 2 (x) =

Z

R

f 1 (x − t) f 2 (t)dmid (t).

Here, we should note f ∈ LRS1id (R, F) ⇒

Z

R

f dmid =

Z

R

ℜ f dmid + i

Z

R

ℑ f dmid .

(iii) If X = Rn , n ∈ N, then by choosing a basis for Rn we may identify B (Rn) with the space of n × n real matrices which is a Banach algebra under the usual matric multiplication. (iv) If (X, k·kX ) is a Banach space over F, then (B (X), k·k) is a Banach algebra under the composition ST = S ◦ T. Beyond Example 9.1.9 (iv), we can derive more information on all bounded linear operators on a given Banach space. Theorem 9.1.10. Let (X, k · kX ) be a Banach space over F.

(i) If T ∈ B (X) satisfies kTk < 1 and I is the identity element in B (X), then I − T ∈ B (X) is invertible and  (I − T)−1 = lim (I + T + T2 + · · · + Tn ) in B (X), k · k , n→∞

where Tk is the k-th composition T · · ◦ T} for k ∈ N. | ◦ ·{z k

260

Jie Xiao

(ii) If I stands for the class of all invertible operators in B (X), then I is an open set in B (X). Proof. (i) Fix x ∈ X. Since m, n ∈ N and m > n imply k(I + T + T2 + · · · + Tm )(x) − (I + T + T2 + · · · + Tn )(x)kX = kTn+1(x) + · · · + Tm (x)kX

≤ kTn+1(x)kX + · · · + kTm (x)kX  ≤ kTn+1 k + · · · + kTm k kxkX  ∞  j ≤ kTk kxkX ∑ j=n+1

= kTkn+1(1 − kTk)−1 kxkX

→ 0 as n → ∞,  ∞ we conclude that (I + T + T2 + · · · + Tn )(x) n=1 is a Cauchy sequence in X. Note that (X, k · kX ) is a Banach space. So the sequence converges to a limit y ∈ X. Let y = A(x). It is not hard to show that A : X → X is a linear operator on X. Furthermore, letting m → ∞, we have kA(x) − (I + T + T2 + · · · + Tn )(x)kX ≤ kTkn+1(1 − kTk)−1 kxkX , so that A − (I + T + T2 + · · · + Tn ) ∈ B (X), and thus A ∈ B (X). Because kA − (I + T + T2 + · · · + Tn )k ≤ kTkn+1(1 − kTk)−1 → 0 as n → ∞, we derive that I + T + T2 + · · · + Tn → A as n → ∞. It remains to verify A = (I − T)−1 . For any x ∈ X we have   (I − T)A (x) = (I − T) lim (I + T + T2 + · · · + Tn ) (x) n→∞   = (I − T) lim I(x) + T(x) + T2 (x) + · · · + Tn (x) n→∞  = lim x − Tn+1 (x) . n→∞

But

kTn+1 (x)kX ≤ kTkn+1 kxkX → 0 as n → ∞,

 so that Tn+1 (x) → 0 as n → ∞, and consequently (I − T)A(x) = x. Similarly, we have A(I − T) (x) = x, whence getting A = (I − T)−1 .

Banach Spaces via Operators and Functionals

261

(ii) Note that kT−1 k 6= 0 for T ∈ I . So, to prove that the open ball  S ∈ B (X) : kT − Sk < kT−1 k−1

is a subset of I , we just show that every element S of this ball is invertible. Using k(T − S)T−1 k ≤ kT − SkkT−1 k < 1, we obtain that ST−1 = I − (T − S)T−1 is invertible and thus S = (ST−1 )T is invertible.

Remark 9.1.11. When (X, k · kX ) is a Banach space over F and T belongs to B (X), we can define an operator eT = I + T +

T2 T3 + +··· , 2! 3!

which makes sense since kTk2 kTk3 + + · · · = exp kTk. 2! 3!  With this notion, we have that if x(t) = x1 (t), x2(t), ..., xn(t) ∈ Rn , then the system ! dx dxn (t) dx1 (t) dx2 (t) = , , ..., = x(t)A subject to x(0) = a ∈ Rn , dt dt dt dt keT k ≤ 1 + kTk +

where A is an n × n matrix, has a solution x(t) = aetA .

9.2 Uniform Boundedness - Open Map - Closed Graph In this section, we proceed to discuss three fundamental results – the uniform boundedness principle (sometimes known as the Banach-Steinhaus theorem due to Stefan Banach and Hugo Dyonizy Steinhaus), open mapping theorem and closed graph theorem. Below is the so-called uniform boundedness principle. Theorem 9.2.1. Let (X, k·kX ) be a Banach space and (Y, k·kY ) a normed linear space over F. Let {Tα }α∈I be a family of bounded linear operators Tα : X → Y indexed by the nonempty set I. If supα∈I kTα (x)kY < ∞ for each x ∈ X, then supα∈I kTα k < ∞.

262

Jie Xiao

Proof. For each j ∈ N let X j = {x ∈ X : sup kTα(x)kY ≤ j} = ∩α∈I {x ∈ X : kTα (x)kY ≤ j}. α∈I

Then each X j is a closed subset of X and X = ∪∞j=1 X j . Since X is complete, X is of second category. By the Baire category theorem (i.e., Theorem 6.3.17) we see that X ◦j 6= 0/ for some j = j0 ∈ N and consequently, there is an x0 ∈ X and an r ∈ (0, ∞) such that the ball Br (x0 ) ⊆ X j0 . This means that for x ∈ Br (x0 ) we have supα∈I kTα(x)kY ≤ j0 . Consequently, for any x ∈ Br (0) - the origincentered ball we obtain x0 − x ∈ Br (x0 ) and so for any α ∈ I, kTα (x)kY ≤ kTα(x − x0 )kY + kTα (x0 )kY ≤ 2 j0. Thus, for each x ∈ B1 (0) we get kTα (x)kY ≤ 4 j0 r−1 . This yields supα∈I kTα k ≤ 4 j0 r−1 , as required. The boundedness of a sequence of linear operators is useful in the study of convergence of operator sequences. To see this, we introduce two types of convergence involving operator sequences. Definition 9.2.2. Let (X, k · kX ) and (Y, k · kY ) be normed linear spaces over F.

(i) A sequence {Tn }∞ n=1 in B (X,Y ) is said to be uniformly convergent provided that there is a T ∈ B (X,Y ) such that limn→∞ kTn − Tk = 0. (ii) A sequence {Tn }∞ n=1 in B (X,Y) is said to be strongly (or pointwise) convergent on X provided that the sequence {Tn (x)}∞ n=1 is convergent in Y for any x ∈ X. Moreover, if there is a T ∈ B (X,Y ) such that limn→∞ kTn (x) − T(x)kY = 0 for all x ∈ X, then {Tn }∞ n=1 is said to be strongly convergent to T. Example 9.2.3. Clearly, the uniform convergence implies the strong convergence, but not conversely. Consider ` p (N, F), p ∈ [1, ∞). For each n ∈ N define Tn (x) = (xn , xn+1 , ...) for all x = (x1 , x2 , ...) ∈ ` p (N, F).  Then Tn is in B ` p (N, F) with kTn k ≤ 1. Note that if x = (x1 , x2 , ...) ∈ ` p (N, F) then  1p  ∞ p lim kTn (x)k p = lim ∑ |x j | = 0. n→∞

n→∞

j=n

Banach Spaces via Operators and Functionals

263

So {Tn }∞ n=1 is strongly convergent to 0. Meanwhile, for n ∈ N let en = (0, ..., 0, 1, 0, ...). Then ken k p = 1 and Tn (en ) = (1, 0, 0, ...) and hence kTn k ≥ | {z } n−1

kTn (en )k p = 1. This shows that {Tn }∞ n=1 is not uniformly convergent to 0. Here is a special consequence of the uniform boundedness principle. Theorem 9.2.4. Let (X, k · kX ) be a Banach space over F.

(i) Suppose that (Y, k · kY ) is a normed linear space over F. If a sequence {Tn }∞ n=1 in B (X,Y) is strongly convergent, then there exists a T ∈ B (X,Y ) such that {Tn }∞ n=1 is strongly convergent to T.

(ii) Suppose that (Y, k · kY ) is a Banach space over F. Then a sequence {Tn }∞ n=1 in B (X,Y ) is strongly convergent if and only if supn∈N kTn k < ∞ and {Tn (x)}∞ n=1 is convergent in Y for any x in a dense subset D of X. Proof. (i) Let {Tn }∞ n=1 in B (X,Y) be strongly convergent. Then for each x ∈ X the sequence {Tn (x)}∞ n=1 is convergent in Y and hence defines a linear operator T on X by setting T(x) = limn→∞ Tn (x). The key is to show that T ∈ B (X,Y ). Note that {Tn (x)}∞ n=1 is bounded in Y . So, from the uniform boundedness principle it turns out that there is a constant C > 0 such that supn∈N kTn k ≤ C. Hence kTn (x)kY ≤ CkxkX for all x ∈ X. This implies kT(x)kY ≤ CkxkX for all x ∈ X, showing that T is bounded. The definition of T means that {Tn }∞ n=1 converges strongly to T. (ii) The necessity follows from the uniform boundedness principle. As for the sufficiency, we assume the statement after the if and only if. Given ε > 0. Then for any x ∈ X there is an element z ∈ D such that kx − zkX < ε, and there is an N ∈ N such that m > n > N ⇒ kTm (z) − Tn (z)kY < ε. Consequently, kTm(x) − Tn (x)kY

≤ kTm (x) − Tm (z)kY + kTm (z) − Tn (z)kY + kTn (z) − Tn (x)kY

≤ kTm kkx − zkX + ε + kTn kkx − zkX ≤ (1 + 2 sup kTn k)ε. n∈N

264

Jie Xiao

That is to say, {Tn (x)}∞ n=1 is a Cauchy sequence in Y and thus convergent thanks to the completeness of (Y, k · kY ). Recall that a continuous mapping between two normed linear spaces has the property that the pre-image of any open set is open, but in general the image of an open set is not open. But, continuous linear operators between Banach spaces cannot do this. This is the content of the open mapping theorem as follows. Theorem 9.2.5. Let (X, k · kX ) and (Y, k · kY ) be Banach spaces over F. If T is a continuous linear operator from X onto Y , then T is an open map, namely, T(O) is open in Y whenever O is open in X. Proof. We split the argument into three steps. In what follows, we employ BXr (x) and BYr (y) to denote the balls of radius r > 0 centered at x ∈ X and y ∈ Y respectively. In particular, we write BXr = BXr (0) and BYr = BYr (0) respectively. Step 1. We prove that for any ε > 0 there is a δ > 0 such that BYδ ⊆ T(BX2ε). To see this, note that X ∞ X X = ∪∞ n=1 nBε = ∪n=1 {nx : x ∈ Bε },

and T ∈ B (X,Y ) is surjective. So we have X ∞ X Y = T(X) = ∪∞ n=1 nT(Bε ) = ∪n=1 nT(Bε ).

Since (Y, k · kY ) is a Banach space, we conclude that from the Baire category X theorem (i.e. Theorem ◦ 6.3.17) that some nT(Bε ) is not nowhere dense in Y / and so that namely - nT(BXε ) 6= 0, BYr (z) = {y ∈ Y : ky − zkY < r} ⊆ nT(BXε ) = {ny : y ∈ T(BXε )}

for some z ∈ Y and some r > 0. Thus T(BXε ) must contain the ball BYδ (y0 ) where y0 = n−1 z and δ = n−1 r. It follows that V = {y1 −y2 : y1 , y2 ∈ BYδ (y0 )} ⊆ T(U) where U = {x1 −x2 : x1 , x2 ∈ BXε } ⊆ BX2ε . Thus, T(BX2ε ) ⊇ V . Any point y ∈ BYδ can be written as y = (y + y0 ) − y0 , so BYδ ⊆ V , as desired. Step 2. We further prove that for any ε > 0 there is a δ > 0 such that BYδ ⊆ T(BX2ε ).

Banach Spaces via Operators and Functionals

265

∞ To do so, choose {εn }∞ n=1 with εn > 0 and ∑n=1 εn < ε. By Step 1 there is a Y X sequence of positive numbers {δn }∞ n=1 such that Bδn ⊆ T(B2εn ). Without loss of generality, we may assume limn→∞ δn = 0. Let y ∈ BYδ1 . Then y ∈ T(BX2ε1 ) and hence there is a point x1 ∈ BX2ε1 such that ky − T(x1 )kY < δ2 . Since y − T(x1 ) ∈ BYδ2 , we conclude that there is a point x2 ∈ BX2ε2 such that ky − T(x1 ) − T(x2 )kY < δ3 .

X Continuing this process, we obtain a sequence {xn }∞ n=1 such that xn ∈ B2εn and

 n 

y − T ∑ xk < δn+1 . k=1

Y

Since kxn kX < 2εn , we conclude that ∑∞ n=1 xn is absolutely convergent and hence by Theorem 9.1.4 that x = ∑∞ x makes sense in (X, k · kX ). This implies n=1 n ∞

kxkX ≤



∑ kxn kX < 2 ∑ εn < 2ε.

n=1

n=1

Since the map T is continuous and δn → 0, we get y = T(x). In other words, for any y ∈ BYδ where δ = δ1 , we have found a point x ∈ BX2ε such that T(x) = y, whence implying the desired inclusion. Step 3. We prove that if O is open in X then for any point x ∈ O, there exists a δ > 0 such that BYδ T(x) ⊆ T(O). In fact, if x ∈ O, then there exists an ε > 0 such that BX2ε (x) ⊆ O. By Step 2, we have BYδ ⊆ T(BX2ε ) for some δ > 0. Hence   BYδ T(x) = T(x) + BYδ ⊆ T(x) + T(BX2ε ) = T(x + BX2ε ) = T BX2ε (x) ⊆ T(O).

Of course, this step completes the proof of the theorem.

Remark 9.2.6. The open mapping theorem cannot ensure that the image of a closed linear subspace of the  domain is closed. For example, let T : `2 (N, R) 7→ `2 (N, R) be given by T(x) j = x2 j for x = {x j }∞j=1 , and set S be a linear subspace of `2 (N, R) comprising all vectors x with x2 j = x2 j−1 (2 j)−1, j ∈ N. Note that S is the intersection of the kernels {x ∈ `2 (N, R) : Lk (x) = 0} of a countable

266

Jie Xiao

collection of bounded linear functionals {Lk } on `2 (N, R) (cf. Theorem 9.3.2). So, C is closed. But, T(C) is not closed since `2 (N, R) 6= T(S) and T(S) contains all points with finitely many non-zero terms - this means T(S) = `2 (N, R) As an application of the open mapping theorem we establish a general property of inverse mappings. Theorem 9.2.7. Let X and Y be Banach spaces over F and let T ∈ B (X,Y). If T is bijective, then the inverse operator T−1 of T is a bounded linear map. Proof. It suffices to prove the continuity of T−1 . Since T = (T−1 )−1 maps the Banach space X onto the Banach space Y , we conclude from Theorem 9.2.5 that T maps open sets in X to open sets in Y . This amounts to saying that T−1 is continuous. Corollary 9.2.8. Let k · k(1) and k · k(2) be two norms defined on a Banach space X over F. If there is a constant C1 > 0 such that kxk(1) ≤ C1 kxk(2) for all x ∈ X, then there exists another constant C2 > 0 such that kxk(2) ≤ C2 kxk(1) for all x ∈ X. Consequently, both norms are equivalent. Proof. Consider the identity operator I : (X, k · k(2)) → (X, k · k(1)) where I(x) = x for all x ∈ X. Clearly, I is bounded. By Theorem 9.2.7 we see that I−1 is also bounded, whence getting the norm inequality in the other direction. Definition 9.2.9. Given two normed linear spaces X and Y over F, let T : X → Y be a linear operator. Then the graph of T is defined as G(T) = {(x, y) ∈ X ×Y : y = T(x)}. Moreover, T is said to be closed when G(T) is a closed subset of X ×Y . The following result is called the closed graph theorem. Theorem 9.2.10. Let (X, k · kX ) and (Y, k · kY ) be Banach spaces over F. Then a linear operator T : X → Y is bounded if and only if T is closed.

Banach Spaces via Operators and Functionals

267

Proof. We initially observe that X × Y is a Banach space under the norm k(x, y)k = kxkX + kykY . Addition and scalar multiplication are defined in the expected manner: (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , y1 + y2 ) and α(x, y) = (αx, αy). The completeness of (X × Y, k · k) follows readily from the completeness of (X, k · kX ) and (Y, k · kY ). On the one hand, suppose that T : X → Y is bounded. To prove that T is closed, i.e., G(T) is closed, we assume lim kxn − xkX = 0 and lim kT(xn ) − ykY = 0.

n→∞

n→∞

The boundedness of T implies limn→∞ kT(xn ) − T(x)kY = 0. These two limits plus the triangle inequality imply T(x) = y. This means (x, y) ∈ G(T). Thus T is closed. On the other hand, suppose that T is closed. To verify T ∈  B (X,Y ), we consider the projection map P from G(T) onto X via: P x, T(x) = x. Clearly, P is linear, bijective and bounded. From Theorem 9.2.7 it turns out that P−1 is a bounded linear map from X onto G(T), so there is a constant C > 0 such that

 kxkX + kT(x)kY = x, T(x) = kP−1 (x)k ≤ CkxkX for all x ∈ X. Consequently, T is bounded.

Remark 9.2.11. It is worth mentioning that the second part of the argument for Theorem 9.2.10 has used the assumption that (X, k · kX ) and (Y, k · kY ) are complete. Whenever this assumption is removed, the conclusion fails, i.e., there exist a linear operator T0 and a non-Banach space (X0 , k · kX0 ) such that T0 is closed but unbounded on (X0 , k · kX0 ); see also Problem 9.8.

9.3 Dual Banach Spaces by Examples When dealing with a normed linear space X over the base field F, one typically is only interested in the continuous linear functionals from the space into F. According to Theorem 8.3.7 and Definition 8.4.1, these form a normed linear space. In this section, we will not only see that this new normed linear space is indeed a Banach space, but also completely describe the dual spaces of three typical Banach spaces. We begin with a more general result.

268

Jie Xiao

Theorem 9.3.1. Let (X, k · kX ) be a normed linear space over F. If (Y, k · kY ) is a Banach space over F, then B (X,Y ) is a Banach space over F. Proof. If {T j }∞j=1 is a Cauchy sequence in B (X,Y ), then it is bounded and so there is a constant C > 0 such that kT j (x)kY ≤ CkxkX for all x ∈ X and j ∈ N. Since kT j (x) − Tk (x)kY ≤ kT j − Tk kkxkX → 0 as j ≥ k → ∞, {T j (x)}∞j=1 is a Cauchy sequence in Y . Moreover, since Y is a Banach space, {T j (x)}∞j=1 converges to y ∈ Y . Accordingly, y = lim j→∞ T j (x) = T(x). Clearly, T is linear, and kT(x)kY ≤ CkxkX for all x ∈ X. This means T ∈ B (X,Y ). Note that we have not yet proved that {T j }∞j=1 converges to T in norm. But, since {T j }∞j=1 is a Cauchy sequence, for every ε > 0 there is an n0 ∈ N such that j > k > n0 ⇒ kT j − Tk k < ε. Consequently, j > k > n0 ⇒ kT j (x) − Tk (x)kY ≤ εkxkX for all x ∈ X. If j → ∞, then

k > n0 ⇒ kT(x) − Tk (x)kY ≤ εkxkX .

That is to say, kTk − Tk ≤ ε as k > n0 . Thus, kTk − Tk → 0 as k → ∞. Theorem 9.3.1 particularly says that the dual space X ∗ = B (X, F) of the normed linear space X over a given field F is always a Banach space no matter whether X is complete or not. To see this picture clearly, we will identify the dual spaces of three Banach spaces. Thanks to their importance, we will state them as three independent theorems. The first one is about the sequence spaces. Theorem 9.3.2. Let p ∈ [1, ∞). Then ` p (N, F)∗ = ` p(p−1)−1 (N, F) in the sense that L ∈ ` p (N, F)∗ if and only if there is a unique y = (yk )∞ k=1 ∈ ` p(p−1)−1 (N, F) such that ∞ L(x) =

∑ xk yk

k=1

Proof. Since

for all x = (xk )∞ k=1 ∈ ` p (N, F).



∑ |xk yk | ≤ kxk pkyk p(p−1)

k=1

−1

,

269

Banach Spaces via Operators and Functionals

we have that the above-defined functional L is an element in ` p (N, F)∗ . Conversely, assume L ∈ ` p (N, F)∗ . To get the representation of L, we consider two cases. Case 1: p ∈ (1, ∞). Suppose that x = (xk )∞ k=1 ∈ ` p (N, F). For each j ∈ N let e j be the vector in ` p (N, F) having the jth entry equal to 1 and all other entries equal to 0. Set zn = ∑nj=1 x j e j . Then zn ∈ ` p (N, F) and lim kx − zn k p = lim

n→∞

n→∞







j=n+1

|x j |

p

1

p

= 0.

Consequently, n

L(zn ) =

∑ x j L(e j )

j=1

and |L(x) − L(zn )| = |L(x − zn )| ≤ kLkkx − zn k p → 0 as n → ∞. Hence



L(x) =

∑ x j L(e j ). j=1

Let y j = L(e j ) and y =

(y j )∞j=1 .

Then ∞

L(x) =

∑ x jy j ,

j=1

and hence it remains to show y ∈ ` p(p−1)−1 (N, F). To this end, we choose  −1 |y j | p(p−1) −2 y j , y j 6= 0 xj = . 0 , yj = 0 This choice gives n

kzn k pp =

∑ |x j | p =

j=1

n

∑ |y j | p(p−1)

−1

.

j=1

Moreover n

L(zn ) =

∑ |y j | j=1

p(p−1)−1

and |L(zn )| ≤ kLkkzn k p = kLk



n

∑ |y j | j=1

p(p−1)−1

 1p

.

270

Jie Xiao

Hence



n

∑ |y j | p(p−1)

j=1

−1

(p−1)p−1

≤ kLk.

This yields kyk p(p−1)−1 ≤ kLk < ∞, and so that for any x ∈ ` p (N, F), ∞ |L(x)| = ∑ x j y j ≤ kxk p kyk p(p−1)−1 . j=1

Accordingly, kLk = kyk p(p−1)−1 . The uniqueness of y is obvious.

Case 2: p = 1. To handle this case, we follow the foregoing argument until the definition of the vector y. To prove y ∈ `∞ (N, F), we just observe that for each j ∈ N, |y j | = |L(e j )| ≤ kLkke j k1 = kLk. Additionally, we obtain that for any x ∈ `1 (N, F), ∞ |L(x)| = ∑ x j y j ≤ kyk∞kxk1 . j=1

Thus, kLk = kyk∞. Of course, y is uniquely determined. Remark 9.3.3. (i) From the argument for Theorem 9.3.2 it follows that (Fn )∗ = Fn , i.e., L ∈ (Fn )∗ if and only if there exists a unique y = (y1 , ..., yn) ∈ Fn such that n

L(x) =

∑ xk yk k=1

for all x = (x1 , ..., xn) ∈ Fn .

(ii) If p ∈ (1, ∞), then ` p (N, F) is reflexive. However, `1 (N, F) is not reflexive. To see this, since `1 (N, F)∗ = `∞(N, F), it is enough to prove `∞ (N, F)∗ 6= `1 (N, F). As a matter of fact, if c0 (N, F) stands for the space of all vectors (x j )∞j=1 in `∞ (N, F) with lim j→∞ x j = 0 then c0 (N, F) 6= `∞ (N, F) and c0 (N, F)∗ = `1 (N, F) whose equality can be verified in a similar way to proving Theorem 9.3.2 with p = 1, and hence c0 (N, F) is not reflexive. Moreover, there exists a bounded linear operator L : `1 (N, F) → c0 (N, F)∗ determined by ∞

L(x, y) =

∑ x jy j

j=1

for x = (x j )∞j=1 ∈ `1 (N, F) and y = (y j )∞j=1 ∈ c0 (N, F).

Banach Spaces via Operators and Functionals

271

Consequently, L maps `1 (N, F) to `∞ (N, F)∗ with kL(·, y)k`∞(N,F)∗ ≤ kyk1 . Note that c0 (N, F) is a closed subspace of `∞(N, F). So kyk1 ≤ kL(·, y)k`∞(N,F)∗ ≤ kyk1 . According to Corollary 8.4.4(ii), there is 0 6≡ F ∈ `∞ (N, F)∗ such that F(y) = 0 for all y ∈ c0 (N, F). This, along with the last estimation, implies F ∈ / `1 (N, F) namely - there is no element (y j )∞j=1 ∈ `1 (N, F)∗ satisfying ∞

F(x) =

∑ x jy j

j=1

for all x = (x j )∞j=1 ∈ `∞(N, F).

The second one characterizes the Banach dual of the space of all F-valued continuous functions on a given finite closed interval in R. Theorem 9.3.4. For −∞ < a < b < ∞ equip f ∈ C([a, b], F) with the sup-norm k f k∞ = max | f (x)|, x∈[a,b]

and let BV ([a, b], F) be the class of all F-valued functions having bounded variation on [a, b]. If Z n BV0 ([a, b], F) = g ∈ BV ([a, b], F) :

b a

o f (x)dg(x) = 0 for all f ∈ C([a, b], F) ,

then C([a, b], F)∗ = BV ([a, b], F)/BV0([a, b], F) in the sense that L ∈ C([a, b], F)∗ when and only when there is a unique equivalent class g + BV0 ([a, b], F) ∈ BV ([a, b], F)/BV0([a, b], F), such that L can be written as the Riemann-Stieltjes integral L( f ) =

Z b a

f (x)dg(x) for all f ∈ C([a, b], F).

Proof. On the one hand, suppose that the statement after the when and only when is true. We may write g = g1 +ig2 where g1 , g2 ∈ BV [a, b], i.e., Vab (gk ) < ∞

272

Jie Xiao

for k = 1, 2, in the sense of Definition 3.1.1, and consequently, Z b  |L( f )| = f (x)d g1 (x) − ig2 (x) a Z b Z b = f (x)dg1 (x) − i f (x)dg2 (x) a a  ≤ k f k∞ Vab (g1 ) +Vab (g2 ) .

Thus, this functional L belongs to C([a, b], F)∗. Conversely, suppose that L ∈ C([a, b], F)∗ and E is a Hahn-Banach extension of L to the space BD([a, b], F) of all functions f : [a, b] → F satisfying k f k∞ = sup{| f (x)| : x ∈ [a, b]} < ∞. For t ∈ [a, b] let g(t) = E(1[a,t] ), but also for {t0 ,t1 , ...,tn}, a partition of [a, b], and for each k ∈ {1, ..., n} let λk be the sign of g(tk) − g(tk−1 ), i.e., the number in F obeying  |g(tk) − g(tk−1)| = λk g(tk) − g(tk−1 ) . Because of the following representation of 1: n

1=

∑ (1[a,t ] − 1[a,t

k−1 ]

k

) on [a, b],

k=1

one has n

n

∑ |g(tk) − g(tk−1)|

k=1

=

∑ λk

k=1

E(1[a,tk ] ) − E(1[a,tk−1 ] )



 n  ≤ E ∑ λk (1[a,tk ] − 1[a,tk−1 ] ) k=1

n

≤ kEk ∑ λk (1[a,tk ] − 1[a,tk−1 ] )



k=1

n

≤ kEk ∑ (1[a,tk ] − 1[a,tk−1 ] ) k=1



≤ kEk.

This means that g has bounded variation on [a, b]. Now let f ∈ C([a, b], F) and tk = a + kn−1 (b − a) for k = 0, ..., n, and define f n ∈ BD([a, b], F) via n

f n (x) =

∑ f (tk−1)

k=1

 1[a,tk ] (x) − 1[a,tk−1 ] (x) for all x ∈ [a, b].

Banach Spaces via Operators and Functionals

273

Now, it follows from the foregoing representation of 1 and the uniform continuity of f on [a, b] that k f n − f k∞ ≤ max

sup | f (x) − f (tk−1)| → 0 as n → ∞.

1≤k≤n x∈[t

k−1 ,tk ]

Note that E is continuous on BD([a, b], F). So it follows from the definition of the Riemann-Stieltjes integral that L( f ) = E( f ) =

lim E( f n )

n→∞

n

= =

lim

n→∞

Z b

∑ f (tk−1)(g(tk) − g(tk−1)

k=1

f (x)dg(x),

a

as required. Finally, if h + BV0 ([a, b], F) ∈ BV ([a, b], F)/BV0([a, b], F) also satisfies L( f ) =

Z b a

then

Z b a

f (x)dh(x) for all f ∈ C([a, b], F),

 f (x)d g(x) − h(x) = 0 for all f ∈ C([a, b], F),

and hence g − h ∈ BV0 ([a, b], F), as desired.

To reach the third dual Banach space, we need the following Olof Hanner’s inequality which is viewed as a sort of improvement of the Minkowski inequality. Theorem 9.3.5. Let g : R → R be increasing, E ⊆ R mg -measurable, p ∈ [1, ∞), and f 1 , f 2 ∈ LRSgp (E, F). (i) If p ∈ [1, 2], then

k f 1 + f 2 k pp,mg ,E + k f 1 − f 2 k pp,mg ,E

p ≥ (k f 1 k p,mg ,E + k f 2 k p,mg ,E ) p + k f 1 k p,mg ,E − k f 2 k p,mg ,E

274

Jie Xiao

and p (k f 1 + f 2 k p,mg ,E + k f 1 − f 2 k p,mg ,E ) p + k f 1 + f 2 k p,mg ,E − k f 1 − f 2 k p,mg ,E ≤ 2 p (k f 1k pp,mg ,E + k f 2 k pp,mg ,E ).

(ii) If p ∈ [2, ∞), then k f 1 + f 2 k pp,mg ,E + k f 1 − f 2 k pp,mg ,E

p ≤ (k f 1 k p,mg ,E + k f 2 k p,mg ,E ) p + k f 1 k p,mg ,E − k f 2 k p,mg ,E

and

p (k f 1 + f 2 k p,mg ,E + k f 1 − f 2 k p,mg ,E ) p + k f 1 + f 2 k p,mg ,E − k f 1 − f 2 k p,mg ,E ≥ 2 p (k f 1k pp,mg ,E + k f 2 k pp,mg ,E ).

Proof. It suffices to verify the first inequalities in (i) and (ii) since the second ones follow from the first ones via replacing f 1 and f 2 with f 1 + f 2 and f 1 − f 2 . Note that the cases p = 1 and p = 2 are clear from a direct computation. So, it remains to check the case p 6= 2. To this end, we may assume R=

k f 2 k p,mg ,E and k f 1 k p,mg ,E = 1. k f 1 k p,mg ,E

If R ≤ 1, then we consider  φ(r) = (1 + r) p−1 + (1 − r) p−1  [0, 1] 3 r 7→ ψ(r) = r1−p (1 + r) p−1 − (1 − r) p−1

,

and compute

 dφ(r)  dψ(r)  p R p + R = (p − 1) (1 + r) p−2 − (1 − r) p−2 1 − , dr dr r

which vanishes only at r = R and indicates that φ(r) + ψ(r)R p has a maximum or minimum at r = R when p < 2 or p > 2. If R > 1, then

ψ(r)



≤ φ(r) , p < 2 ⇒ φ(r) + ψ(r)R p ≥ φ(r) , p > 2



≤ ψ(r) + φ(r)R p , p < 2 . ≥ ψ(r) + φ(r)R p , p > 2

275

Banach Spaces via Operators and Functionals Accordingly, for |a|, |b| ≥ 0 one gets  ≤ |a + b| p + |a − b| p , p < 2 p p φ(r)|a| + ψ(r)|b| , ≥ |a + b| p + |a − b| p , p > 2 with equality when r = a−1 b ≤ 1, a > 0, b ≥ 0. This in turn implies Z

E

 φ(r)| f1 | p dmg + ψ(r)| f2 | p dmg



 R ≤ RE | f1 + f2 | p + | f1 − f2 | p  dmg , p < 2 ≥ E | f1 + f2 | p + | f1 − f2 | p dmg , p > 2

.

Of course, the last estimates, along with r = R, give the desired inequalities right away. An immediate example of the use of Theorem 9.3.5 is given by the following projection result which will be used later on. Corollary 9.3.6. Let g : R → R be increasing, E ⊆ R mg -measurable, p ∈ (1, ∞), and K a closed convex subset of LRSgp (E). If f ∈ LRSgp (E, F) \ K , then there exists a function h0 ∈ K such that k f − h0 k p,mg ,E = inf k f − hk p,mg ,E . h∈K

Moreover, every function h ∈ K satisfies ℜ

Z

E

(h − h0 )( f − h0 )| f − h0 | p−2 dmg ≤ 0.

Proof. Without loss of generality, we may assume and f = 0 and δ = infh∈K khk p,mg ,E . For the first result, suppose that {h j }∞j=1 is in K with lim j→∞ kh j k p,mg ,E = δ. Since K is convex, 2−1 (h j + hk ) ∈ K and consequently, 2δ

≤ ≤

kh j + hk k p,mg ,E kh j k p,mg ,E + khk k p,mg ,E

→ 2δ as k, j → ∞.

p Suppose that {hn }∞ n=1 is not a Cauchy sequence in LRSg (E, F). Then there is a constant κ > 0 such that kh j − hk k p,mg ,E ≥ κ holds for infinitely many j’s and k’s.

276

Jie Xiao

Case 1: p ∈ (1, 2]. According to the second inequality in Theorem 9.3.5 (i), we have p p kh j + hk k p,mg ,E + kh j − hk k p,mg ,E + kh j + hk k p,mg ,E − kh j − hk k p,mg ,E  p p ≤ 2 p kh j k p,mg ,E + khk k p,mg ,E → 2 p+1 δ p as j, k → ∞,

whence getting |2δ + κ| p + |2δ − κ| p ≤ 2 p+1 δ p .

Since φ(x) = |2δ + x| p is a strictly convex function with x ∈ R; that is to say, φ(tx1 + (1 − t)x2 ) < tφ(x1 ) + (1 − t)φ(x2 ) for all (t, x1, x2 ) ∈ (0, 1) × R × R, we obtain the following contradiction: 21+p δ p < |2δ + κ| p + |2δ − κ| p ≤ 2 p+1 δ p . Case 2: p ∈ [2, ∞). According to the first inequality in Theorem 9.3.5 (ii), we have p

p

kh j + hk k p,mg ,E + kh j − hk k p,mg ,E p ≤ kh j k p,m ,E + khk k p,m ,E + kh j k p,m g

g

g ,E

whence deriving a contradiction

p − khk k p,mg ,E ,

(2δ) p < (2δ) p + κ p ≤ (2δ) p . p Therefore, {hn }∞ n=1 must be a Cauchy sequence in LRSg (E, F). Note that K ∞ is closed. So {hn }n=1 is convergent to an element h0 ∈ K with δ = kh0 k p,mg ,E . For the second result, let h ∈ K . Since K is convex, we have

ht = th + (1 − t)h0 ∈ K for all t ∈ [0, 1], and consequently, p

φ(t) = kht k p,mg ,E ≥ δ p while φ(0) = δ p . Noticing both   lim+ t −1 |ht | p − |h0 | p = 2−1 p|h0 | p−2 (h − h0 )h0 + (h − h0 )h0

t→0

Banach Spaces via Operators and Functionals and

277

 |h0 | p − |2h0 − h| p ≤ t −1 |ht | p − |h0 | p ≤ |h| p − |h0 | p

which follows from the convexity

tx1 + (1 − t)x2

p

p

p

≤ tx1 + (1 − t)x2

of the function x p on [0, ∞), we use the dominated convergence theorem to obtain that φ0 (0) exists with 0 ≤ φ0 (0) = ℜ

Z

E

(h − h0 )h0 |h0 | p−2 dmg ,

as desired. Now, we are ready to identify the dual space of each LRSgp (E, F) for p ∈ [1, ∞) via the Frigyes Riesz representation theorem below. Theorem 9.3.7. Let g : R → R be increasing, E ⊆ R mg -measurable and p ∈ [1, ∞). Then p(p−1)−1

LRSgp (E, F)∗ = LRSg

(E, F)

in the sense that L ∈ LRSgp (E, F)∗ if and only if there is a unique element h ∈ p(p−1)−1

LRSg

(E, F) such that L( f ) =

Z

E

f h dmg for all f ∈ LRSgp (E, F).

Proof. The H¨older inequality implies the above-formulated functional is an element of LRSgp (E, F)∗ . For the reversed conclusion, we may assume 0 6= L ∈ LRSgp (E, F)∗. Then we consider two cases as follows. Case 1: p ∈ (1, ∞). For the above-given L, define

K = { f ∈ LRSgp (E, F) : L( f ) = 0}. Since L is continuous, the subset K of LRSgp (E, F) is closed and convex. Because of L 6= 0, there exists an element f 0 ∈ LRSgp (E, F) such that L( f 0 ) 6= 0, namely, f 0 ∈ / K . According to Corollary 9.3.6 and the linearity of K , we can find an h0 ∈ K such that ℜ

Z

E

(h − h0 )| f 0 − h0 | p−2 ( f 0 − h0 )dmg ≤ 0 for all h ∈ K .

278

Jie Xiao

Choosing h = 2h0 in the above estimate, we get ℜ

Z

E

h0 | f 0 − h0 | p−2 ( f 0 − h0 )dmg ≤ 0

whence finding ℜ

Z

E

(±h)| f 0 − h0 | p−2 ( f 0 − h0 )dmg ≤ 0 for all h ∈ K .

Note that if F = C then ih ∈ K and hence the last inequality still holds when ±h is replaced by ih. So, the last integral equals 0 for all h ∈ K . If f is an arbitrary element of LRSgp (E, F) with   L( f ) f = f 1 + f 2 and f 1 = ( f 0 − h0 ), L( f 0 − h0 ) then f 2 ∈ K and hence Z

E

=

f | f 0 − h0 | p−2 ( f 0 − h0 )dmg Z

E

Z

E

f 1 | f 0 − h0 |

p−2

( f 0 − h0 )dmg +

Z

E

f 2 | f 0 − h0 | p−2 ( f 0 − h0 )dmg

f 1 | f 0 − h0 | p−2 ( f 0 − h0 )dmg R  p E | f 0 − h0 | dmg = L( f ) . L( f 0 − h0 ) =

Clearly, the desired element h determining the representation of L is h=

| f 0 − h0 | p−2 ( f 0 − h0 )L( f 0 − h0 ) R . p E | f 0 − h0 | dmg

In fact, such an element h is unique – if there is another element h˜ then Z

E

2−p

˜ − h| ˜ p−1 ∈ LRSgp (E, F) ˜ f dmg = 0 for f = (h − h)|h (h − h)

and hence h = h˜ mg -a.e. on E. Case 2: p = 1. We handle this case via two situations. The first one is mg (E) < ∞. Under this condition, L is also in LRSqg (E, F)∗ (for any q ∈ (1, ∞)) since the H¨older inequality yields that |L( f )| ≤ kLkk f k1,mg ,E ≤ kLk mg (E)

 q−1 q

k f kq,mg ,E

279

Banach Spaces via Operators and Functionals

is valid for all f ∈ LRSqg (E, F). However, according to the previous Case 1 there q

exists a unique h ∈ LRSgq−1 (E, F) such that L( f ) =

Z

E

f h dmg for all f ∈ LRSqg (E, F).

So, by the inclusion LRSgp2 (E, F) ⊆ LRSgp1 (E, F), which holds for 1 < p1 < p2 < ∞ and follows from H¨older’s inequality, indicates that h does not depend on q. q Now if f = |h| q−1 −2 h, then f ∈ LRSqg (E, F) and hence q

khk q−1 = L( f ) ≤ kLk mg (E) q ,mg ,E q−1

Accordingly, for any ε > 0 we have   q−1 q (kLk + ε) mg {x ∈ E : |h(x)| > kLk + ε} ≤ khk

 q−1 q

q

−1

. khk q−1 q ,mg ,E

q ,mg ,E q−1

q−1

≤ kLk mg (E)

Since p can be sufficiently large, the last estimate forces  mg {x ∈ E : |h(x)| > kLk + ε} = 0,

 q−1 q

.

namely, h ∈ LRS∞ g (E, F). This fact further implies that E |h|| f | dmg < ∞ for all 1 f ∈ LRSg (E, F). With f ∈ LRS1g (E, F) and j ∈ N, we consider  f (x) , | f (x)| ≤ j f j (x) = , 0 , | f (x)| > j R

and find that

 q  | f j | ∈ LRSg (E); f j (x) → f (x);   | f j (x)| ≤ | f (x)|.

By the dominated convergence theorem it follows that   lim j→∞ k f j − f k1,mg ,E = 0; lim j→∞ kh f j − h f k1,mg ,E = 0;  R R  L( f ) = lim j→∞ L( f j ) = lim j→∞ E f j h dmg = E f h dmg . The second one is mg (E) = ∞. Since

E = ∪∞j=−∞ E ∩ [ j, j + 1) and mg ([ j, j + 1)) < ∞,

280

Jie Xiao

any function f ∈ LRS1g (E, F) can be written as ∞

f (x) =



j=−∞

f (x)1E∩[ j, j+1) (x) for all x ∈ E.

 Note that f ∈ LRS1g E ∩ [ j, j + 1), F if and only if f 1E∩[ j, j+1) ∈ LRS1g (E, F). ∗ So, L j ( f ) = L( f 1E∩[ j, j+1) ) defines an element in LRS1g E ∩ [ j, j + 1), F , and  consequently, there exists an element h j ∈ LRS∞ g E ∩ [ j, j + 1), F such that L( f 1E∩[ j, j+1) ) =

Z

E∩[ j, j+1)

f h j dmg and kh j k∞,mg ,E∩[ j, j+1) ≤ kL j k ≤ kLk.

Now, if the function h is defined on E by setting h = h j on E ∩ [ j, j + 1) for each j ∈ Z, then h ∈ LRS∞ g (E, F) and  L( f ) = L





j=−∞

 f 1E∩[ j, j+1) =



Z

j=−∞ E∩[ j, j+1)

f h dmg =

Z

E

f h dmg ,

if there is anowing to the countable additivity of the measure mg . Moreover, R ˜ ˜ other function h satisfying the last representation, then E |h − h|| f |dmg = 0 for all f ∈ LRS1g (E, F), and hence for f = 1E∩[ j, j+1) where j ∈ Z. This deduces h = h˜ mg -a.e. on E. Therefore, the function h is uniquely determined. 1 0 and a finite set {L1 , ..., Ln} in X ∗ , the (ε, L1 , ..., Ln)-neighborhood of a point x ∈ X, denoted N(x; ε, L1 , ..., Ln), is defined by {y ∈ X : |Lk (x) − Lk (y)| < ε, k = 1, ..., n}. (ii) A subset of X is called weakly open provided that it is a union of the neighborhoods N(x; ε, L1 , ..., Ln). The complement of a weakly open set is called a weakly closed set. (iii) A subset E of X is called weakly compact provided that every cover of E by weakly open sets has a finite subcover. Remark 9.4.4. Note that BXε (x) ⊆ N(x; εkLk, L) for each L ∈ X ∗ . So Definition 9.4.3 is a natural generalization of the idea of a ball in a metric space. From w the definition it turns out that xk → x is equivalent to saying that to any such weak neighborhood of x there corresponds an n0 ∈ N such that xk lies in this neighborhood for k ≥ n0 . Using the second dual we introduce the following concept.

Banach Spaces via Operators and Functionals

283

Definition 9.4.5. Let (X, k · kX ) be a normed linear space over F, and L, L1 , L2 , ... ∈ X ∗ . Then we say that: (i) {Ln }∞ n=1 is strongly convergent to L provided limn→∞ kLn − Lk = 0;

(ii) {Ln }∞ n=1 is weak* convergent to L provided limn→∞ |Ln (x) − L(x)| = 0 for all x ∈ X;

(iii) {Ln }∞ n=1 is weakly convergent to L provided limn→∞ kF(Ln ) − F(L)k = 0 for all F ∈ X ∗∗ . Remark 9.4.6.

(i) In general, (ii) and (iii) in Definition 9.4.5 are not equivalent. But, if there is an isometry between X and X ∗∗ then both concepts coincide. (ii) From Theorem 9.2.4 (ii) we can see that if (X, k · kX ) is a Banach space over F, then any weak* convergent sequence in X ∗ must be bounded and its converse is also true whenever this sequence is assumed to be convergent on a dense subset of X. (iii) Motivated by Definition 9.4.5 (iii), we say that {Tn }∞ n=1 in B (X,Y) is weakly convergent to T ∈ B (X,Y ) provided   lim F Tn (x) − F T(x) = 0 for all (x, F) ∈ X ×Y ∗ . n→∞

Here (X, k · kX ) and (Y, k · kY ) are supposed to be normed linear spaces over F. According to Definition 9.2.2, we can find out that the uniform convergence implies the strong convergence which implies the weak convergence, but not conversely. Since Example 9.2.3 has shown that the strong convergence does not yield the uniform convergence, it is enough to check that the weak convergence does not derive the strong convergence. To do so, consider X = Y = `2 (N, R) over R, and Tn (x) = (0, ..., 0, x1 , x2 , ...) for x = (x1 , x2 , ...) ∈ `2 (N, R). | {z } n

Clearly, kTn k = 1. And, if y = (y1 , y2 , ...) ∈ `2 (N, R), then ∞  ∑ Tn (x) yk = k k=1



∑ k=n+1

 xk yk ≤ kxk2



∑ k=n+1

y2k

2−1

→ 0 as n → ∞.

284

Jie Xiao

Thus, from Example 9.3.2 it follows that {Tn }∞ n=1 is weakly convergent to 0. But, {Tn }∞ is not strongly convergent to 0 since n=1 √ kTn (e1 ) − Tm (e1 )k2 = ken+1 − em+1 k2 = 2. Similarly, we can define the weak-∗ topology on X ∗ via all weak* open subsets of X ∗ . Definition 9.4.7. Let (X, k · kX ) be a normed linear space over F.

(i) Given ε > 0 and a finite set {x1 , ..., xn} in X, the (ε, x1 , ..., xn)-neighborhood of a functional F ∈ X ∗ , denoted N(F; ε, x1 , ..., xn), is defined by {G ∈ X ∗ : |G(xk ) − F(xk )| < ε, k = 1, ..., n}. (ii) An arbitrary union of such neighborhoods is called a weak* open set. The complement of a weak* open set is called a weak* closed set. (iii) A subset A of X ∗ is called weak* compact provided that every cover of A by weak* open sets has a finite subcover. Definition 9.4.7 yields immediately that a weak* closed subset of a weak* compact set is itself weak* compact. To grasp the essence of weak* compact sets, we extend Cartesian products from two sets to a family and work with their topological compactness. Definition 9.4.8. (i) A collection T of subsets of a set X is said to be a topology for X provided that: / X∈T; (a) 0, (b) ∩nj=1 E j ∈ T whenever n ∈ N and E1 , E2 , ..., En ∈ T ; (c) ∪ j∈I E j ∈ T whenever {E j } j∈I is a family in T .

In this case, (X, T ) is called a topological space. Meanwhile, members of T and their complements are called open and closed sets of (X, T ) respectively. Furthermore, (X, T ) is called compact provided every cover of X by sets of T has a finite subcover. (ii) Let {(X j , T j )} j∈I be a family of topological spaces indexed by the set I. Then the Cartesian product of {X j } j∈I is defined by X = ∏ X j = {(x j ) j∈I : x j ∈ X j } j∈I

Banach Spaces via Operators and Functionals

285

Furthermore,  for each j ∈ I the projection p j : X → X j is defined by p j (xα )α∈I = x j . The product topology T on X is defined to be the topology with the fewest open sets for which every projection p j is continuous; that is, p−1 j (O) is open in X whenever O is open in X j . Consequently, the above-defined (X, T ) is called a product topological space. Lemma 9.4.9. Let F be a family of subsets of a given set X. If F enjoys the finite intersection property (FIP) – each finite collection of subsets in the family has a nonempty intersection, then F can be extended to a family which is maximal with respect to having the FIP. Proof. Let F˜ be the collection of all families of subsets of X that contain F and satisfy the FIP. Define an order on F˜ via inclusion and let {F˜ j : j ∈ J} be any chain in F˜ – a totally ordered subset of F˜ . If S = ∪ j∈J F˜ j , then S contains F and if {s1 , ..., sn } is any finite collection of members of S then {s1 , ..., sn} ⊆ F˜ j0 for some j0 since {F˜ j } j∈J is ordered by inclusion. Now F˜ j0 enjoys the FIP. So / that is to say, S has the FIP but also each chain in S has an upper ∩nj=1 s j 6= 0; bound. From Zorn’s lemma we conclude that S has a maximal element, whence establishing the desired result. Using this lemma, we can derive the following Andrey Nikolayevich Tychonoff’s theorem. Theorem 9.4.10. Let {(X j , T j )} j∈I be a family of compact topological spaces indexed by the set I. Then their product topological space (X = ∏ j∈I X j , T ) is compact. Proof. By forming the contrapositive statement and taking complements, we immediately see that (X, T ) is compact if and only if any collection of closed subsets of (X, T ) obeying the FIP has a nonempty intersection. For this reason, we are about to make the following consideration where T j and T will be dropped from (X j , T j ) and (X, T ). Suppose that F is a collection of closed subsets of X which enjoys the FIP. By Lemma 9.4.9 we can extend F to G which is maximal with respect to having the FIP. Moreover, for each j ∈ I the family {p j (B) : B ∈ G } of subsets of X j has the FIP since for any finite collection {B1 , ..., Bn} of elements in G one has 0/ 6= p j (∩nk=1 Bk ) ⊆ ∩nk=1 p j (Bk ) ⊆ ∩nk=1 p j (Bk).

286

Jie Xiao

Note that each X j is compact. By the equivalent statement on the compactness there exists an x j in ∩B∈G p j (B). Define f ∈ X by setting f ( j) = x j . Since ∩B∈G B ⊆ ∩B∈F B, once verifying f ∈ ∩B∈G B, we complete the argument. In so doing, let O be an arbitrary open subset of X containing f . Then for some finite set of indices j1 , ..., jn and open sets O jk ⊆ X jk , we have f ∈ ∩nk=1 p−1 jk (O jk ) ⊆ O. Accordingly, x jk ∈ O jk and thus O jk ∩ p jk (B) 6= 0/ for all B ∈ G . Using the fact that O jk is open and p jk is a / for projection, we can readily find O jk ∩ p jk (B) 6= 0/ and hence p−1 jk (O jk ) ∩ B 6= 0 all B ∈ G . Because G is maximal with respect to enjoying the FIP, we must then −1 n have p−1 jk (O jk ) ∈ G and similarly ∩k=1 p jk (O jk ) ∈ G , whence obtaining O ∈ G . From this we see that O intersects every element of G and so that f ∈ B for all B ∈ G due to f ∈ O and O being an arbitrary open set. Now, we are ready to discuss the well-known Leonidas Alaoglu’s theorem (also known as the Banach-Alaoglu theorem). Theorem 9.4.11. Let (X, k · kX ) be a normed linear space over F. Then the closed unit ball X∗ B1 (0) = {F ∈ X ∗ : kFk ≤ 1} of X ∗ is weak* compact. Proof. For x ∈ X let

Dx = {λ ∈ F : |λ| ≤ kxkX }. X∗

X∗

Note that |F(x)| ≤ kxkX for all F ∈ B1 (0). So B1 (0) is a subset of the set of all functions f : X → Dx , i.e., X∗

B1 (0) ⊆

∏ Dx .

x∈X

The product topology on ∏x∈X Dx has as open sets unions of neighborhoods of the form { f : | f (x1 ) − f 0 (x1 )| < ε, ..., | f (xn ) − f 0 (xn )| < ε} X∗

for given f 0 , ε > 0 and x1 , ..., xn ∈ X. These open sets when restricted to B1 (0) X∗ are just the weak* open subsets of B1 (0). According to Theorem 9.4.10, the compactness of Dx implies the compactness of ∏x∈X Dx. Thus to prove that X∗ X∗ B1 (0) is weak* compact it is enough to show that B1 (0) is weak* closed since ∏x∈X Dx is weak* compact and the argument for Theorem 6.3.7 (i) applicable.

Banach Spaces via Operators and Functionals

287

X∗

Suppose that G is in the weak* closure of B1 (0); that is, each weak* neighborX∗ hood of G intersects B1 (0). Then we must check that G is linear and kGk ≤ 1. Let x, y ∈ X and ε > 0. By the assumption on G we see that the neighborhood X∗ N(G; 3−1ε, x, y, x + y) contains some F ∈ B1 (0). Since F(x + y) = F(x) + F(y), we have |G(x) + G(y) − G(x + y)|

≤ |G(x) − F(x)| + |G(y) − F(y)| + |G(x + y) − F(x + y)| < 3−1 ε + 3−1 ε + 3−1 ε = ε.

So, G(x + y) = G(x) + G(y). In a similar manner, we can prove G(λx) = λG(x) for λ ∈ F. Furthermore, |F(x)| ≤ kxkX

⇒ |G(x)| ≤ |G(x) − F(x)| + |F(x)| ≤ 3−1 ε + kxkX ⇒ |G(x)| ≤ kxkX ⇒ kGk ≤ 1 X∗

⇒ G ∈ B1 (0). X∗

Accordingly, B1 (0) is weak* closed and accordingly weak* compact. The above-established theorem can be utilized to derive the weak compactness of the closed unit ball of a reflexive normed linear space. Corollary 9.4.12. Let (X, k · kX ) be a normed linear space over F. If X is reflexive, then the closed unit ball of X is weakly compact. Proof. Recall

X

B1 (0) = {x ∈ X : kxkX ≤ 1}. Since X is reflexive, we conclude that ^ X X B1 (0) = {x˜ ∈ X ∗∗ : x ∈ B1 (0)} is exactly the closed unit ball of X ∗∗ . From Theorem 9.4.11 it follows that ^ X X B1 (0) is weak* compact. But since X is reflexive, this says that B1 (0) is weakly compact.

288

Jie Xiao

9.5 Compact and Dual Operators Having had a careful look at the concept of a compact (or weakly compact, or weak* compact) set, we can find that this concept is a natural generalization of a finite set. In this section we discuss the so-called compact operators and their dual forms on the general Banach spaces. Let X and Y be two normed linear spaces over F. If T ∈ B (X,Y ) has a finite dimensional range T(X) then the image of any bounded set in X is bounded in T(X) and hence has compact closure owing to T(X) being isomorphic to a Euclidean space. This motivates the following definition of a compact operator. Definition 9.5.1. Let (X, k · kX ) and (Y, k · kY ) be normed linear spaces over F. Then a linear operator T : X → Y is called compact provided that the closure T(B) is a compact subset of Y for each bounded subset B of X. Here we should note that Definition 9.5.1 does not require T to be continuous or bounded. Nevertheless, the following result tells us that the continuity or the boundedness of T is an immediate consequence of the definition. Theorem 9.5.2. Let (X, k · kX ) and (Y, k · kY ) be normed linear spaces over F. If a linear operator T : X → Y is compact, then it is bounded. X

X

Proof. In this case, T(B1 ) is compact for the closed unit ball B1 = {x ∈ X : kxkX ≤ 1}, so it is bounded in Y . Accordingly, there exists a constant C > 0 X

X

such that kykY ≤ C for all y ∈ T(B1 ). In particular, kT(x)kY ≤ C for x ∈ B1 , namely, kTk ≤ C.

Remark 9.5.3. Definition 9.5.1 actually produces three more facts. (i) A linear combination of compact operators is also compact. (ii) If T ∈ B (X) is compact and S ∈ B (X) then TS and ST are compact and hence the class of all compact operators in B (X) form a two sided ideal in algebraic terms. (iii) A linear operator T ∈ B (X,Y) is compact if and only if for every bounded sequence {x j }∞j=1 in X, the sequence {T(x j )}∞j=1 has a convergent subsequence in Y – the necessity is obvious and to see the sufficiency, assume the statement after the if and only if; then for any bounded set B ⊆ X and any sequence yk in T(B) there is an xk ∈ B such that kyk − T(xk )kY < k−1 and consequently ∞ {T(xk )}∞ k=1 has and hence {yk }k=1 has a convergent subsequence with limit in T(B), establishing the compactness of T(B).

289

Banach Spaces via Operators and Functionals

Using the criterion in Remark 9.5.3 (iii), we obtain that the compactness is preserved under the uniform limit in the Banach space setting. Theorem 9.5.4. Let (X, k·kX ) be a normed linear space and (Y, k·kY ) a Banach space over F. If each T j ∈ B (X,Y) is compact and lim j→∞ kT j − Tk = 0 then T : X → Y is compact. Proof. Suppose that {x j }∞j=1 is a bounded sequence in X. Since T1 is compact, ∞ ∞ there is a subsequence {x1,m }∞ m=1 of {x j } j=1 such that {T1 (x1,m )}m=1 is conver∞ gent. Also since T2 is compact, there is a subsequence {x2,m}m=1 of {x1,m }∞ m=1 such that {T2 (x2,m )}∞ is convergent. Continuing this process, we can obm=1 ∞ ∞ tain a subsequence {xn+1,m}∞ of {x } such that {T (x )} n,m m=1 n+1 n+1,m m=1 is m=1 ∞ convergent. So, {Tn (xm,m )}m=1 is convergent for any n ∈ N. Note that {xm,m }∞ m=1 is bounded in X. So c = 1 + supm∈N kxm,m kX < ∞. Given ε > 0, lim j→∞ kT j − Tk = 0 implies that there is an n0 ∈ N such that kTn0 − Tk < ε(3c)−1. Since {Tn0 (xm,m )}∞ m=1 is convergent in Y , there is an n1 ∈ N such that k, l > n1 =⇒ kTn0 (xk,k ) − Tn0 (xl,l )kY < 3−1 ε. With this, we achieve kT(xk,k ) − T(xl,l )kY

≤ kT(xk,k ) − Tn0 (xk,k )kY + kTn0 (xk,k ) − Tn0 (xl,l )kY + kTn0 (xl,l ) − T(xl,l )kY

≤ kT − Tn0 kkxk,k kX + 3−1 ε + kT − Tn0 kkxl,l kX < ε,

and so {T(xm,m )}∞ m=1 is convergent in Y owing to that (Y, k · kY ) is a Banach space. Example 9.5.5. For 1 ≤ p ≤ ∞ let the linear operator T on ` p (N, F) be defined −1 ∞ by T : (xk )∞ k=1 7→ (k xk )k=1. Then T is a compact operator on ` p (N, F). To check the result, set −1 −1 Tn : (xk )∞ k=1 7→ (x1 , 2 x2 , ..., n xn , 0, ...) for all n ∈ N.  It is clear that each Tn ` p (N, F) is finite dimensional and so that Tn is compact. Note that for any x = (xk )∞ k=1 ∈ ` p (N, F),  1

 −p p p ≤ (n + 1)−1k(x )k , p ∈ [1, ∞) ∑∞

(T−Tn ) (xk )∞ = k p k=n+1 k |xk | . k=1 p −1 −1

supk≥n+1 k

|xk | ≤ (n + 1) k(xk )k∞

, p=∞

290

Jie Xiao

So, limn→∞ kT − Tn k = 0. By Theorem 9.5.4 we can now conclude that T is a compact operator. Next, we consider the dual operator or the adjoint or the adjoint operator of a bounded operator and see how the compactness of the original operator is reflected in the behavior of its adjoint. Theorem 9.5.6. Let (X, k · kX ) and (Y, k · kY ) be normed linear spaces over F. For T ∈ B (X,Y ) set T∗ ( f ) = f ◦ T. Then T∗ ∈ B (Y ∗ , X ∗ ) and kT∗ k = kTk. Furthermore, if X and Y are Banach spaces, then T is compact when and only when T∗ is compact. Proof. If f ∈ Y ∗ and x ∈ X, it is easy to see that T∗ ( f )(x) = f ◦ T(x) defines an F-valued linear function on X since f and T are linear. Consequently, |T∗ ( f )(x)| ≤ k f kkT(x)kY ≤ k f kkTkkxkX ;

that is, T∗ ( f ) ∈ X ∗ . Clearly, T∗ : Y ∗ → X ∗ is linear and bounded. According to Corollary 8.4.4 (vi), we evaluate kTk = = = =

sup kT(x)kY

kxkX =1

sup sup | f ◦ T(x)|

kxkX =1 k f k=1

sup sup |T∗ ( f )(x)|

k f k=1 kxkX =1 ∗

sup kT ( f )k

k f k=1 ∗

= kT k. To check the second part, it is enough to verify that if T is compact then so is T∗ due to the symmetry between both operators. Suppose that T is compact. Let { f j }∞j=1 be a sequence in the closed unit ball of Y ∗ , i.e., sup j∈N k f j k ≤ 1. Then | f j (y1 ) − f j (y2 )| ≤ ky1 − y2 kY for all y1 , y2 ∈ Y,

and hence { f j }∞j=1 is equi-continuous. Since the closure T(B) of the image T(B) ⊆ Y of the closed unit B ⊆ X is compact, an application of Theorem 7.3.7 yields that { f j }∞j=1 has a subsequence { f jk }∞ k=1 which converges uniformly on T(B). Because of kT∗ ( f jk ) − T∗ ( f jl )k = sup | f jk ◦ T(x) − f jl ◦ T(x)|, kxkX =1

Banach Spaces via Operators and Functionals

291

∗ the completeness of X ∗ implies that {T∗ ( f jk )}∞ k=1 converges in X . Accordingly, ∗ T is compact.

Example 9.5.7. Let [a, b] and [c, d] be two finite closed intervals of R. For k(·, ·) ∈ C([c, d] × [a, b], F) let T : C([a, b], F) → C([c, d], F) be given by T( f )(x) =

Z b a

k(x, y) f (y)dy for all x ∈ [c, d].

Then T is a compact operator and so is its adjoint T∗ which maps C([c, d], F)∗ = BV ([c, d], F)/BV0([c, d], F) into C([a, b], F)∗ = BV ([a, b], F)/BV0([a, b], F) by Theorem 9.5.6, where T∗ (g)(y) =

Z d c

k(x, y)dg(x) for all y ∈ [a, b],

which can be determined by Theorem 9.3.4. To verify the compactness of T, we notice that k(·, ·) ∈ C([c, d] × [a, b], F) implies M1 = kk(·, ·)k∞ = sup |k(x, y)| < ∞, (x,y)∈[c,d]×[a,b]

and thus get that if B is a bounded subset of C([a, b], F); that is, there is a constant M2 > 0 such that k f k∞ ≤ M2 for all f ∈ B, then Z b kT( f )k∞ = sup k(x, y) f (y)dy ≤ M1 M2 (b − a). x∈[c,d]

a

This means that T(B) is bounded in C([c, d], F). Since k(·, ·) is indeed uniformly continuous on [c, d] × [a, b], we see that for any ε > 0 there is a δ > 0 such that −1 x1 , x2 ∈ [c, d] and |x1 − x2 | < δ ⇒ |k(x1 , y) − k(x2 , y)| < εM−1 2 (b − a) .

From this implication we further see that x1 , x2 ∈ [c, d] and |x1 − x2 | < δ ⇒ |T( f )(x1) − T( f )(x2 )| ≤ ε for all f ∈ B. By Theorem 7.3.7 we can now conclude that the closure T(B) is compact in C([c, d], F) and so that T is a compact operator.

292

Jie Xiao

Problems 9.1. Prove that if Cn [0, 1] is the class of all functions f : [0, 1] → R for which the derivatives f (0), f (1),..., f (n) belong to C[0, 1] and satisfy k f k = sup0≤k≤n supt∈[0,1] | f (k)(t)| < ∞, then it is a Banach space under this norm k · k. 9.2. Prove that if (X, k · kX ) and (Y, k · kY ) are Banach spaces over F then so is X ×Y under the norm k(x, y)k = max{kxkX , kykY } for (x, y) ∈ X ×Y . 9.3. Prove that (C[0, 1], k · k p) with p ∈ [1, ∞) is not complete. 9.4. Let L1 (Z, F) comprise all F-valued functions f on Z obeying k f k1 = ∑∞ n=−∞ | f (n)| < ∞. Prove that this space is a Banach algebra under the following convolution ∞

f ∗ g(n) =



m=−∞

f (n − m)g(m) for all f , g ∈ L1 (Z, F).

9.5. Show by an example that the uniform bounded principle does not hold once the completeness is dropped. 9.6. Prove that there is a continuous map f : R → R such that f (U) is not open even provided that U is open. 9.7. (i) Prove lim

Z 2π

n→∞ 0

| sin(n + 2−1 )x|| sin2−1 x|−1 dx = ∞.

(ii) Given a Riemann-integrable function f : [0, 2π] → R, let its Fourier series be Z ∞

s(x) =

am eimx where am = (2π)−1



m=−∞



0

f (y)e−imy dy.

Extend the definition of f to make it 2π-periodic. Define the n-th partial sum of the Fourier series to be sn (x) = ∑nm=−n ameimx . Prove sn (x) = (2π)

−1

Z 2π 0



sin(n + 2−1 )y f (x + y) sin2−1 y



dy for all x ∈ (0, 2π).

Banach Spaces via Operators and Functionals

293

(iii) Let X be the Banach space of continuous functions f : [0, 2π] → R with f (0) = f (2π), with the sup-norm. Prove that the linear operator Tn : X → R defined by   Z 2π sin(n + 2−1 )x −1 dx f (x) Tn ( f ) = (2π) sin2−1 x 0 is bounded, and kTn k = (2π)

−1

Z 2π −1 sin(n + 2 )x 0



sin2−1 x

dx.

(iv) Prove that there exists a continuous function f : [0, 2π] → R with f (0) = f (2π) such that its Fourier series diverges at x = 0. 9.8. Let X = C1 [0, 1] and Y = C[0, 1] be equipped with the sup-norm. Prove the following results. (i) X is not complete. (ii) The map (d/dx) : X → Y is closed but not bounded. 9.9. Suppose that (X, k · kX ) and (Y, k · kY ) are Banach spaces over F and T ∈ B (X,Y ) satisfying kT(x)kY ≥ ckxkX for all x ∈ X and for some constant c > 0. Verify the following facts: (i) N(T) = {x ∈ X : T(x) = 0} = {0};

(ii) R(T) = {y ∈ Y : y = T(x) for some x ∈ X} is closed;  (iii) T−1 ∈ B R(T), X satisfies kT−1 k ≤ c−1 . 9.10. Prove the following two results:

(i) { f j }∞j=1 converges weakly to f in the real space C[0, 1] if and only if sup j∈N k f j k∞ < ∞ and lim j→∞ f j (x) = f (x) for any x ∈ [0, 1]. ( j)

∞ ∞ (ii) {x j = (sk )∞ k=1 } j=1 converges weakly to x = (sk )k=1 in the real sequence space ` p = ` p (N, R) where p ∈ (1, ∞), if and only if, sup j∈N kx j k p < ∞ and ( j)

lim j→∞ sk = sk for any k ∈ N.

9.11. Let (X, k · kX ) be a separable Banach space over F and M be a bounded subset of X ∗ . Prove that every sequence in M contains a weak* convergent subsequence.

294

Jie Xiao

9.12. Let (X, k · kX ) be a Banach space over F. Show that X is reflexive if and only if X ∗ is reflexive. 9.13. Let c0 = c0 (N, R), the linear space consisting of all `∞ = `∞(N, R)sequences that converge to 0. Prove the following two facts: (i) c0 is a separable, but non-reflexive Banach space over R; (ii) L ∈ B (c0 , R) if and only if there is an element y = (yk )∞ k=1 ∈ `1 = `1 (N, R) ∞ ∞ such that L(x) = ∑k=1 xk yk for all x = (xk )k=1 ∈ c0 . 9.14. Let (X, k · kX ) be a normed linear space over F and K be a convex subset of X. Prove that the weak closure of K equals its norm closure. 9.15. Let (X, k · kX ) be a normed linear space over F and A ⊆ X ∗ be weak* compact. Prove that: (i) A is weak* closed; (ii) any weak* closed subset of A is weak* compact. 9.16. Let (X, k · kX ) and (Y, k · kY ) be Banach spaces over F. Prove that if T : X → Y is a linear mapping such that f ◦ T ∈ X ∗ for all f ∈ Y ∗ then T ∈ B (X,Y ). 9.17. Prove that a topological space X is compact if and only if any collection of closed subsets of X obeying the FIP has a nonempty intersection. 9.18. Let (X, k · kX ) and (Y, k · kY ) be normed linear and Banach spaces over F respectively. Prove that the set Bc (X,Y ) of all compact operators from X to Y is a Banach space under the usual operator norm for B (X,Y ). 9.19. Given a finite interval [a, b] ⊂ R, let k(·, ·) be an F-valued continuous function on {(x, y) ∈ R2 : x ∈ [a, b], y ∈ [a, x]} and let T : C([a, b], F) → C([a, b], F) be given by Z x

T( f )(x) =

k(x, y) f (y)dy.

a

Show that T is a compact operator. 9.20. Suppose that X is a Banach space over F and T ∈ B (X) is a compact operator. Prove the following assertions: (i) TS and ST are compact for any S ∈ B (X); (ii) T(X) is separable.

Banach Spaces via Operators and Functionals

295

9.21. Let (X, k·kX ) be a Banach space over F. Prove the following three results.

(i) If T : X → X is the zero operator; that is, T(x) = 0 for all x ∈ X, then T is compact;

(ii) If T ∈ B (X) is compact and X is infinite dimensional, then T is not invertible; (iii) If T ∈ B (X) ensures that T(X) is closed and infinite dimensional, then T is not compact.

Chapter 10

Hilbert Spaces and Their Operators Hilbert spaces, named after David Hilbert, are special Banach spaces which share most of the rich geometric structure of the finite-dimensional Euclidean spaces. The geometrical wealth of a Hilbert space is produced by a bilinear form on the space which allows the introduction of a notion of perpendicularity. This chapter is devoted to an introductory study of the Hilbert spaces and their bounded and compact operators.

10.1 Definition, Examples and Basic Properties From now on, F still denotes the field of real or complex numbers. Definition 10.1.1. Let X be a linear space over F. An inner product on X is a map (x, y) 7→ hx, yi from X × X to F such that: (a) hαx + βy, zi = αhx, zi + βhy, zi for all x, y, z ∈ X and for all α, β ∈ F; (b) hy, xi = hx, yi for all x, y ∈ X; (c) hx, xi ≥ 0 for all x ∈ X and hx, xi = 0 if and only if x = 0. If X is equipped with h·, ·i, then it is called an inner product space or a preHilbert space over F. Note that the bar signifies complex conjugation – of course, if the scalar field is R then the axiom (b) is simply hy, xi = hx, yi. So we will refer to real or complex Hilbert spaces depending on whether F is R or C.

298

Jie Xiao

Theoremp10.1.2. Let X be a linear space over F under the inner product h·, ·i. If kxk = hx, xi for all x ∈ X, then there are the following statements. (i) Cauchy-Schwarz’s inequality |hx, yi| ≤ kxkkyk holds for all x, y ∈ X, with equality if and only if x and y are linearly dependent;. (ii) k · k defines a norm on X.

(iii) The parallelogram law kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ) is valid for all x, y ∈ X. Proof. (i) If hx, yi = 0, then there is nothing to argue. Suppose that hx, yi 6= 0. Then x 6= 0 and y 6= 0. Let α = hx, yi and z = αy. Then 0 ≤ hx − tz, x − tzi = kxk2 − 2t|hx, yi|2 + t 2 |hx, yi|2 kyk2 for all t ∈ R. This ensures that the last quadratic function of t takes its absolute minimum at t = kyk−2 . Substituting this value for t, we get 0 ≤ kx − tzk2 = kxk2 − kyk−2 |hx, yi|2 with equality if and only if x − tz = x − αty = 0, from which the desired result is immediate. (ii) It is obvious that kxk = 0 ⇔ x = 0 and kλxk = |λ|kxk. As for the triangle inequality, (i) is used to imply kx + yk2 = kxk2 + 2ℜhx, yi + kyk2 ≤ (kxk + kyk)2 . (iii) This follows directly from expanding kx + yk2 and kx − yk2 . The above definition and theorem actually induce the notion of a Hilbert space. Definition 10.1.3. A Hilbert space X over F is a pre-Hilbert space which is complete with respect to the metric p d(x, y) = kx − yk = hx − y, x − yi for all x, y ∈ X.

Furthermore, we will refer to real or complex Hilbert space according to whether F is R or C.

Hilbert Spaces and Their Operators

299

Example 10.1.4. (i) For each n ∈ N, Fn is a Hilbert space over F under the inner product n

hx, yi =

for all x = (x1 , ..., xn), y = (y1 , ..., yn) ∈ Fn .

∑ xjyj j=1

(ii) Regardless of the cardinality of an arbitrarily given set X, we can define the sequence space   2 `2 (X, F) = f : X → F : ∑ | f (x)| < ∞ , x∈X

where

∑ | f (x)|2 = sup

x∈X

n

o 2 | f (x)| : all nonempty finite subsets Y of X . ∑

x∈Y

This space becomes a Hilbert space over F with the inner product h f1, f2i =

∑ f1(x) f2(x)

x∈X

for all f 1 , f 2 ∈ `2 (X, F).

Here the right-hand sum is said to converge provided there is a number s in F such that to each ε > 0 there corresponds a finite subset Y of X with the property that if Z is any finite subset of X with Y ⊆ Z then ∑ f 1 (x) f 2 (x) − s < ε. x∈Z

Obviously, `2 (N, F) is just the square-summable F-valued sequence space.

(iii) LRS2id ([a, b], F), often denoted L2 ([a, b], F), is a Hilbert space over F under the inner product h f1, f2i =

Z

[a,b]

f 1 f 2 dmid for all f 1 , f 2 ∈ LRS2id ([a, b], F).

The following result reveals when a Banach space becomes a Hilbert space. Theorem 10.1.5. Let (X, k · k) be a normed linear space over F and satisfy the parallelogram law.

300

Jie Xiao

(i) If F = C, then 4

hx, yi = 2−2 ∑ in kx + in yk2 n=1

defines an inner product on X. (ii) If F = R, then hx, yi = 2−2 (kx + yk2 − kx − yk2 ) defines an inner product on X. Proof. It is enough to check (i). As a matter of fact, this h·, ·i enjoys hx, xi = kxk2 + 2−2 i|1 + i|2 4kxk2 − 2−2 i|1 − i|2 kxk2 = kxk2 for all x ∈ X. To verify that h·, ·i is actually an inner product, it suffices to prove that Definition 10.1.1 (a) holds for α = β = 1 and hλ·, ·i = λh·, ·i for any λ ∈ C. For the former, we use the parallelogram law to achieve ( ku + v + wk2 + ku + v − wk2 = 2ku + vk2 + 2kwk2 ; ku − v + wk2 + ku − v − wk2 = 2ku − vk2 + 2kwk2 . Hence 2(ku + vk2 − ku − vk2 ) = (ku + v + wk2 − ku − v + wk2 ) + (ku + v − wk2 − ku − v − wk2 ).

This infers ℜhu + w, vi + ℜhu − w, vi = 2ℜhu, vi. In a similar fashion, we can establish the relation with the real part ℜ replaced by the imaginary part ℑ: ℑhu + w, vi + ℑhu − w, vi = 2ℑhu, vi. Accordingly, hu + w, vi + hu − w, vi = 2hu, vi. When u = w, we have h2u, vi = 2hu, vi. Taking u + w = x, u − w = y, v = z, we get hx, zi + hy, zi = 2h2−1(x + y), zi = hx + y, zi. To reach the latter, we observe that for any m ∈ N, hmx, yi = h(m − 1)x + x, yi = h(m − 1)x, yi + hx, yi = · · · = mhx, yi.

Hilbert Spaces and Their Operators

301

Thus for any k ∈ N, khk−1 x, yi = hx, yi and hk−1 x, yi = k−1 hx, yi. Consequently, for any r = k−1 m, rhx, yi = mhk−1 x, yi = hmk−1 x, yi = hrx, yi. Since limkxk→0 kx + in yk = kyk, hx, yi is a continuous functional in x, and consequently λhx, yi = hλx, yi for any λ > 0 thanks to the density of Q in R. If λ < 0 then λhx, yi − hλx, yi = λhx, yi − |λ|h−x, yi = λh0, yi = 0. Also, it is evident to verify 4

ihx, yi = 2−2 ∑ kx + in−1 yk = hix, yi. n=1

Therefore, for any λ = µ + iν ∈ C we have λhx, yi = µhx, yi + ihνx, yi = h(µ + iν)x, yi, as desired.

10.2 Orthogonality, Orthogonal Complement and Duality First of all, let us consider orthogonality. Definition 10.2.1. Let X be a Hilbert space over F. (i) The angle between two vectors x, y ∈ X is defined by ( 0 , x or y = 0 θx,y = . ℜhx,yi arccos kxkkyk , otherwise (ii) Two vectors x, y ∈ X are called orthogonal, denoted x⊥y, provided hx, yi = 0.

(iii) For any subset S of X, the set S⊥ = {x ∈ X : hx, yi = 0 for all y ∈ S} is called the orthogonal complement of S.

302

Jie Xiao

The following result, which has a root in Corollary 9.3.6 and will be used later, has natural geometrical and finite-dimensional antecedents. Theorem 10.2.2. Let X be a Hilbert space over F. If M is a closed convex subset of X, then there exists exactly one x ∈ M such that kxk = infy∈M kyk. Proof. Let δ = infy∈M kyk. Then by definition there is a sequence {x j }∞j=1 in M such that lim j→∞ kx j k = δ. Since M is convex, we conclude 2−1 (x j + xk ) ∈ M and consequently, kxk + x j k2 = 4k2−1 (xk + x j )k2 ≥ 4δ2 . Using the parallelogram law, we obtain 4δ2 + kxk − x j k2 ≤ kxk + x j k2 + kxk − x j k2 = 2(kxk k2 + kx j k2 ). Letting j, k → ∞ in the last inequalities, we find that kx j − xk k → 0 as j, k → ∞; that is, {x j }∞j=1 is a Cauchy sequence and so there is a point x ∈ M such that x j → x in X since M is closed. The continuity of the norm leads to kxk = δ. Finally, x is unique since if y ∈ M satisfies kyk = δ then the forgoing argument can be used for the sequence x, y, x, y, x, y, .... to show that this sequence is a Cauchy sequence, and hence x = y. Next, we introduce the concept of a direct sum which has implicitly appeared in Corollary 8.4.4. Definition 10.2.3. Let X and Y be two linear subspaces of a given linear space Z over F. Then Z is said to be the direct sum of X and Y , denoted Z = X ⊕ Y , provided that every z ∈ Z can be expressed uniquely in the form z = x + y where x ∈ X and y ∈ Y and X ∩Y = {0}. We are about to prove a theorem on decomposing a Hilbert space into a direct sum of mutually orthogonal closed subspaces. Before doing so, we need the following result. Lemma 10.2.4. Let M be a proper closed linear subspace of a Hilbert space X over F. Then there exists a unique nonzero z ∈ X such that hz, yi = 0 for all y ∈ M.

Hilbert Spaces and Their Operators

303

Proof. Given any x ∈ X, the set x + M is a closed convex set. Thus by Theorem 10.2.2 there exists a unique z ∈ x + M such that kzk = infy∈M kx + yk. Since M 6= X, we may assume x ∈ / M which yields z 6= 0, and then prove that hz, yi = 0 for all y ∈ M. Given any y ∈ M and any α ∈ F, we have z + αy ∈ x + M. By our choice of z we then have kz + αyk2 ≥ kzk2 and consequently, ¯ yi) ≥ 0 for all α ∈ F. |α|2 kyk2 + 2ℜ(αhz, Choose θ ∈ [0, 2π) with e−iθ hz, yi = |hz, yi| and let α = −teiθ , t > 0. The inequality obtained by substituting for this α is then tkyk2 ≥ 2|hz, yi|. Letting t → 0 we get hz, yi = 0. Because y ∈ M was arbitrary, we must have hz, yi = 0 for all y ∈ M. Theorem 10.2.5. If M is a proper closed linear subspace of a Hilbert space X over F, then X = M ⊕ M ⊥ and M ⊥ is closed. Proof. Given x ∈ X, apply the procedure in the proof of Lemma 10.2.4 to obtain z ∈ X such that z ∈ x + M and hz, yi = 0 for all y ∈ M. Then z ∈ M ⊥ and z = x − y for some y ∈ M. Hence x = y+z, y ∈ M, and z ∈ M ⊥ . Noting that M ∩M ⊥ = {0} since x ∈ M ∩ M ⊥ ⇒ hx, xi = 0 ⇒ x = 0, we thus have X = M ⊕ M ⊥. To see that M ⊥ is closed, suppose that x j ∈ M ⊥ and x j → x in X. Then for any y ∈ M we have by the Cauchy-Schwarz inequality, |hy, xi| ≤ |hy, x j − xi| + |hy, x j i| ≤ kykkx j − xk → 0 as j → ∞, whence deriving x ∈ M ⊥. Finally, we consider the dual space of a Hilbert space. Below is the classical Riesz representation theorem. Theorem 10.2.6. Given a Hilbert space X over F and y ∈ X, define Ly : X → F by Ly (x) = hx, yi. Then Ly ∈ X ∗ and kLy k = kyk. Conversely, for every f ∈ X ∗ there exists a unique y ∈ X such that f = Ly . Proof. Clearly, Ly ∈ X ∗ with kLy k ≤ kyk. Note that kyk2 = Ly (y) ≤ kLykkyk.

304

Jie Xiao

So kLyk = kyk. Conversely, let f ∈ X ∗ be given. If f = 0, then f = Ly where y = 0. Suppose that f 6= 0. Then we may assume without loss of generality that k f k = 1 since f k f k−1 = Ly ⇒ f = Lk f ky. For such an f , let M = {x ∈ X : f (x) = 0}. Then M is closed linear subspace of X. Of course, M 6= X otherwise f = 0 – a contradiction. By Lemma 10.2.4 we can obtain a nonzero z ∈ M ⊥ such that f (z) 6= 0. Now for any x ∈ X, we have D −1 −1 E x − f (x) f (z) z ∈ M and x − f (x) f (z) z, z = 0. Accordingly,

D −1 E hx, zi = f (x) z f (z) , z .

Taking y = f (z)kzk−2z, we further get f (x) = hx, yi, and then f = Ly . To see the uniqueness, assume there is another point w ∈ X such that f = Lw . Then hx, w − yi = 0 for all x ∈ X, and hence kw − yk2 = hw − y, w − yi = 0 and w = y. Here is an easy consequence of Theorem 10.2.6 which says that any Hilbert space is self-reflexive. Corollary 10.2.7. Let X be a Hilbert space over F. Then the mapping L : X → X ∗ given by L(x)(·) = h·, xi is an isometric embedding from X onto X ∗ . Moreover, X ∗ is also a Hilbert space over F. Proof. The first part has just been proved in Theorem 10.2.6. Furthermore, it is easily seen that L(x1 + x2 ) = L(x1 ) + L(x2 ) so the isometry is additive. Since ¯ xi = αL(x)(y) ¯ L(αx)(y) = hy, αxi = αhy, for all x, y ∈ X and α ∈ F, ¯ we conclude L(αx) = αL(x). If X ∗ is now equipped with the inner product hL(x), L(y)i = hy, xi, then X ∗ is a Hilbert space over F.

10.3 Orthonormal Sets and Bases To better understand the structure of a Hilbert space, in this section we consider the concepts of orthonormal sets and bases.

Hilbert Spaces and Their Operators

305

Definition 10.3.1. Let X be a Hilbert space over F. A set S ⊆ X is called orthogonal if any two different elements in S are orthogonal. An orthonormal set is an orthogonal set consisting entirely of elements of norm 1. A constructive method of orthonormalizing a set of vectors in a Hilbert space is the Gram-Schmidt process, named for Jørgen Pedersen Gram and Erhard Schmidt, as follows. Theorem 10.3.2. Let {x j }∞j=1 be a countable linearly independent set of a Hilbert space X over F. Then a countable orthonormal set {e j } can be constructed so that span{e j }nj=1 = span{x j }nj=1 for all n ∈ N. Proof. Define e1 = kx1 k−1 x1 and proceed inductively. Suppose that e1 , e2 , ..., en are successively defined for each n ∈ N. Then en+1 is determined by yn+1 kyn+1 k−1 , where n

yn+1 = xn+1 − ∑ hxn+1 , e j ie j . j=1

Thus kyn+1 k 6= 0 since otherwise we would have xn+1 ∈ span{e j }nj=1 = span{x j }nj=1 , contradicting the linear independence of {x j }∞j=1 . It is clear that n+1 n n span{e j }n+1 j=1 = span{x j } j=1 because this is true for {e j } j=1 and {x j } j=1 . Finally, for j ∈ {1, 2, ..., n}, we have hen+1 , e j i = (hxn+1, e j i − hxn+1 , e j ihe j , e j i)kyn+1k−1 = 0, and consequently, the set {e j }∞j=1 obtained by this inductive construction is an orthonormal set with the desired property. Example 10.3.3. RGiven −∞ ≤ a < b ≤ ∞ and a function ω : (a, b) → (0, ∞) with the property that (a,b) t n ω(t) dmid (t) converges for all n ∈ N, define the Hilbert space L2,ω ((a, b), F) to be the linear space of all F-valued functions f on (a, b) with Z | f |2 ω dmid < ∞. (a,b)

Of course, that two functions in the space coincide stands for f 1 = f 2 ωdmid -a.e. on (a, b). It is not hard to prove that the linearly independent set {1,t,t 2, ...}

306

Jie Xiao

has a linear span which is dense in L2,ω ((a, b), F). This set, together with the Gram-Schmidt process through the inner product h f 1 , f 2 iω =

Z

(a,b)

f 1 f 2 ω dmid ,

derives various families of classical orthonormal functions below. (i) If ω = 1 and a = −1, b = 1, then the process generates the Legendre polynomials, named after Adrien-Marie Legendre. 1

(ii) If ω(t) = (1 − t 2 )− 2 and a = −1, b = 1, then the process produces the Chebychev polynomials, named after Pafnuty Chebyshev. (iii) If ω(t) = t q−1 (1 − t) p−q with q > 0, p − q > −1 and a = 0, b = 1, then the process generates the Jacobi polynomials, named after Karl Gustav Jacob Jacobi. 2

(iv) If ω(t) = e−t and a = −∞, b = ∞, then the process produces the Hermite polynomials, named after Charles Hermite. (v) If ω(t) = e−t and a = 0, b = ∞, then the process gives the Laguerre polynomials, named after Edmond Laguerre. Lemma 10.3.4. For n ∈ N let {e j }nj=1 be an orthonormal set of a Hilbert space X over F. Then: (i) ∑nj=1 |hx, e j i|2 ≤ kxk2 for all x ∈ X;

(ii) x − ∑ nj=1 hx, e j ie j , ek = 0 for all x ∈ X and k = 1, 2, ..., n. Proof. (i) This follows from

2 n n

0 ≤ x − ∑ hx, e j ie j = kxk2 − ∑ |hx, e j i|2 , j=1

j=1

where the last equality is obtained by expanding in the usual fashion and using the orthonormality of {e j }nj=1 . (ii) This follows from D

E n x − ∑ hx, e j ie j , ek = hx, ek i − hx, ek ihek, ek i = 0. j=1

Hilbert Spaces and Their Operators

307

Corollary 10.3.5. Let I be any index set and {e j } j∈I be an orthonormal set of a Hilbert space X over F. Then S = {e j : hx, e j i 6= 0} is countable for any x ∈ X .

Proof. For each n ∈ N, define Sn = {e j : |hx, e j i|2 > n−1 kxk2 }. By Lemma 10.3.4 each Sn contains at most n − 1 elements. Since S = ∪∞ n=1 Sn , we conclude that S is countable.

Remark 10.3.6. Given an arbitrary orthonormal set {e j } j∈I indexed by a set I, under Lemma 10.3.4 and Corollary 10.3.5, the sums ∑ j∈I |hx, e j i|2 and ∞ 2 ∑ j∈I hx, e j ie j can be written as the series ∑∞ k=1 |hx, e jk i| and ∑k=1hx, e jk ie jk respectively, where we restrict ourselves to the countable number of e jk for which hx, e jk i 6= 0. The following result assures that the two series in Remark 10.3.6 are well defined. Theorem 10.3.7. Let {xk }∞ k=1 be an orthonormal sequence in a Hilbert space ∞ X over F and {ck }k=1 ⊆ F. Then ∑∞ k=1 ck xk is convergent in X if and only if

2

∞ ∞

∑ ck xk = ∑ |ck |2 < ∞. k=1

Moreover, ∑∞ k=1 ck xk

k=1

is independent of the order in which its terms are arranged.

Proof. From the orthonormality of {xk }∞ k=1 it follows that for m, n ∈ N and m > n,

m

2 m

∑ ck xk = ∑ |ck |2 . k=n

k=n

This, together with the completeness of X, implies the if and only if part of the theorem. ∞ 2 To complete the proof, let ∑∞ k=1 |ck | < ∞ and y = ∑l=1 ckl xkl be a rearrange∞ ment of x = ∑k=1 ck xk . Then ( kx − yk2 = hx, xi + hy, yi − hx, yi − hy, xi; 2 hx, xi = hy, yi = ∑∞ k=1 |ck | . Note that

( m ∞ 2 sm = ∑m k=1 ck xk and tm = ∑l=1 ckl xkl ⇒ hx, yi = limm→∞ hsm , tmi = ∑k=1 |ck | ; hy, xi = hx, yi = hx, yi.

So, it follows that kx − yk = 0 and hence x = y.

308

Jie Xiao

Here is a statement of Friedrich Wilhelm Bessel about the coefficients of an element in a Hilbert space with respect to an orthonormal set. Theorem 10.3.8. Let I be any index set and {e j } j∈I be an orthonormal set of a Hilbert space X over F. Then there hold the following two statements. (i) Bessel’s inequality: ∑ j∈I |hx, e j i|2 ≤ kxk2 for all x ∈ X.

(ii) x − ∑ j∈I hx, e j ie j , ek = 0 for all x ∈ X and k ∈ I. Proof. (i) This follows from

n

lim ∑ |hx, e j i|2 ≤ kxk2 . ∑ |hx, e j i|2 = n→∞ k

j∈I

k=1

(ii) Using the continuity of the inner product in its left-hand variable that follows from the Cauchy-Schwarz inequality, we obtain D E D E x − ∑ hx, e j ie j , ek = hx, ek i − ∑hx, e j ie j , ek j∈I

j∈I

= hx, ek i −

D

n

∑ hx, e j n→∞ lim

= hx, ek i − lim

n→∞

D

m=1 n

m ie jm , ek

∑ hx, e jm ie j , ek

m=1

= hx, ek i − hx, ek i = 0.

E

E

Theorem 10.3.8 (ii) induces the concept of an orthonormal basis for a Hilbert space. Definition 10.3.9. Let I be an index set and X be a Hilbert space over F. Then an orthonormal set {e j } j∈I of X is called an orthonormal basis for X provided that x = 0 follows from hx, e j i = 0 for all j ∈ I. This definition actually illustrates that an orthonormal set is an orthonormal basis when it is impossible to add one more nonzero element to the set while still preserving its orthonormality. An orthonormal basis is not generally a “basis”, i.e., it is not generally possible to write every member of the space as a linear combination of finitely many members of an orthonormal basis. In the infinitedimensional case the distinction really matters; that is to say, the definition given

Hilbert Spaces and Their Operators

309

above requires only that the span of an orthonormal basis be dense in the normed linear space, not that it equal the entire space. More precisely, we have the following theorem about series by the Marc-Antoine Parseval. Theorem 10.3.10. Given an index set I, let {e j } j∈I be an orthonormal set in a Hilbert space X over F. Then the following statements are equivalent. (i) {e j } j∈I is an orthonormal basis.

(ii) The closed linear span of {e j } j∈I is X.

(iii) Parseval’s identity: kxk2 = ∑ j∈I |hx, e j i|2 for all x ∈ X. Proof. First, we prove that (i) implies (ii) and (iii). If (i) is true, then Definition 10.3.9 and Theorem 10.3.8 (ii) yield x = ∑ j∈I hx, e j ie j for all x ∈ X. Clearly, (iii) follows from Theorem 10.3.7. Next, we prove that each of (ii) and (iii) implies (i). (ii)⇒(i) Suppose that y ∈ X satisfies hy, e j i = 0 for all j ∈ I. To prove y = 0, consider S = {x ∈ X : hy, xi = 0}. It is easy to see that S is a linear subspace of X. Since e j ∈ S, it follows that S must contain the linear span of {e j } j∈I . On the other hand, S is closed in view of the continuity of the inner product, and so S must contain the closure of the linear span of {e j } j∈I . Hence S = X by (ii). In particular, we have y ∈ S and so hy, yi = 0, whence y = 0 as required. (iii)⇒(i) Suppose on the contrary that {e j } j∈I does not form an orthonormal basis of X. Then there exists a nonzero x ∈ X such that hx, e j i = 0 for all j ∈ I. Then 0 6= kxk2 = ∑ |hx, e j i|2 = 0. j∈I

This is a contradiction. Example 10.3.11. (i) The set {(1, 0, 0), (0, 1,0),(0,0, 1)} forms an orthonormal basis of R3 .

(ii) The set {e j : j ∈ N} with the j-th entry of e j ∈ `2 being 1 and others being 0 forms an orthonormal basis of the real sequence space `2 .

2 (iii) The set {eint }∞ n=−∞ is an orthonormal basis for LRS id ([0, 2π], C). So, 2 C([0, 2π], C) is dense in LRSid ([0, 2π], C). In fact, this basis is fundamental to the study of Fourier series, named in honour of Joseph Fourier.

310

Jie Xiao

Theorem 10.3.12. (i) Every Hilbert space over F has an orthonormal basis. (ii) Any orthonormal basis in a separable Hilbert space over F is countable. Proof. (i) Let X be a Hilbert space over F with the inner product h·, ·i and the norm k · k, and consider the collection E of orthonormal subsets of X. It is clear that E is nonempty and can be partially ordered under inclusion. If F is any totally ordered subcollection of E, the set U = ∪S∈F S is member of E and an upper bound for F. By Zorn’s lemma there is a maximal orthonormal set M. Since M is maximal, we conclude from Definition 10.3.9 that M is an orthonormal basis – otherwise, there would be a nonzero x0 ∈ X \M such that hx0 , e j i = 0 holds for any e j ∈ M. Accordingly, {kx0 k−1 x0 } ∪ M forms an orthonormal set which properly contains M – this contradicts the maximality of M. (ii) If X is separable, and if {e j } j∈I is an uncountable orthonormal basis indexed by I, then for j, k ∈ I with j 6= k, we have ke j − ek k2 = ke j k2 + kek k2 = 2, and so the open balls BX2−1 (e j ) (with centre e j and radius 2−1 ) in X are mutually disjoint. If {x j }∞j=1 is a countable dense subset of X, because {e j } j∈I is uncountable, there is an open ball BX2−1 (e j0 ) which is disjoint from {x j }∞j=1 . Hence e j0 is not in the closure of {x j }∞j=1 . This contradicts the density of {x j }∞j=1 in X. We are done. Example 10.3.13. There is a non-separable Hilbert space. As a matter of fact, let X be the linear space of all continuous functions f : R → C satisfying  2−1 Z n −1 2 ||| f |||2 = lim (2n) | f (x)| dx < ∞. n→∞

−n

Although ||| f |||2 = 0 does not ensure f = 0 on R, if Y = { f ∈ X : ||| f |||2 = 0} then X/Y is a normed linear space over C. If H is the completion of this quotient space and is equipped with the following inner product h f , gi = lim (2n) n→∞

−1

Z n

−n

f (x)g(x)dx,

then H is a Hilbert space over C. Note that {eit(·) + Y }}t∈R is an orthonormal set of H but also { f +Y ∈ H : ||| f (·) − eit(·) |||2 < 2−1 } and { f +Y ∈ H : ||| f (·) − eis(·)|||2 < 2−1 }

Hilbert Spaces and Their Operators

311

are disjoint for t 6= s. So H is not separable. Corollary 10.3.14. Any two infinite dimensional separable Hilbert spaces over F are isometrically isomorphic. Proof. Suppose that X and Y are two such spaces equipped with the inner products h·, ·iX and h·, ·iY . Theorem 10.3.12 tells us that there are sequences {x j }∞j=1 and {y j }∞j=1 that form orthonormal bases for X and Y respectively. If x ∈ X and y ∈ Y , then ∞



x=

∑ hx, x j iX x j and y = j=1

∑ hy, y jiY y j . j=1

Define a mapping T : X → Y by T(x) = y whenever hx, x j iX = hy, y j iY . It is clear that T is linear and one-to-one, and it maps X onto Y since (hx, x j iX )∞j=1 and (hy, y j iY )∞j=1 run through all of elements in the F-valued sequence space `2 (N, F). In addition, kT(x)k2 =





j=1

j=1

∑ |hy, y jiY |2 = ∑ |hx.x j iX |2 = kxk2 ,

Thus, T is isometrically isomorphic. The proof is complete.

10.4 Five Special Bounded Operators For a Hilbert space X over F, the elements in B (X) are of particular interest. To see this, suppose that T ∈ B (X). For each fixed y ∈ X, the mapping that sends x to hT(x), yi is a continuous linear functional on X. Thus by the Riesz representation theorem there exists a unique z ∈ X such that hT(x), yi = Lz (x) = hx, zi. This pairing actually produces a new operator on X – see also Theorem 9.5.6. Definition 10.4.1. Let X be a Hilbert space over F. If T ∈ B (X) then its adjoint operator T∗ is defined by hT(x), yi = hx, T∗ (y)i for all x, y ∈ X.

312

Jie Xiao

The three facts (i)-(ii)-(iii) in Theorem 10.4.2 below are the required conditions for ∗ to be an involution. Theorem 10.4.2. Let X be a Hilbert space over F. Then the operation ∗ maps B (X) to itself, and has the following properties for all T, S ∈ B (X) and α, β ∈ F: ¯ ∗; ¯ ∗ + βS (i) (αT + βS)∗ = αT (ii) (TS)∗ = S∗ T∗ ; (iii) T∗∗ = T; (iv) kT∗ k = kTk;

(v) kT∗ Tk = kTk2 . Proof. First of all, we verify that ∗ maps B (X) to B (X). The linearity of T∗ follows from the following calculation: hx, T∗ (αy1 + βy2 )i = hT(x), αy1 + βy2 i ¯ ¯ = αhT(x), y1 i + βhT(x), y2i = hx, αT∗ (y1 ) + βT∗ (y2 )i

By the definition of the operator norm, we have kT∗ k = ≤ = ≤

sup kT∗ (y)k

kyk=1

sup kxk=1,kyk=1

sup kxk=1,kyk=1

|hx, T∗(y)i| |hT(x), yi|

sup kT(x)k

kxk=1

= kTk.

This implies that ∗ is bounded on B (X). Next, we check those five properties. (i) For any x, y ∈ X, we have ¯ ∗ )(y)i. ¯ ∗ + βS hx, (αT + βS)∗ (y)i = αhT(x), yi + βhS(x), yi = hx, (αT (ii) This follows from hx, (TS)∗ (y)i = hTS(x), yi = hS(x), T∗ (y)i = hx, S∗ T∗ (y)i.

Hilbert Spaces and Their Operators

313

(iii) This follows from hx, T∗∗(y)i = hT∗ (x), yi = hy, T∗(x)i = hT(y), xi = hx, T(y)i. (iv) It is known that kT∗ k ≤ kTk and T∗∗ = T. So kTk ≤ kT∗ k. This gives kTk = kT∗ k. (v) By (iv), we get kT∗ Tk ≤ kT∗ kkTk = kTk2 . For the reverse inequality, we note that kT(x)k2 = hT∗ T(x), xi ≤ kT∗ T(x)kkxk ≤ kT∗ Tkkxk2 . So, kTk2 ≤ kT∗ Tk, which completes the proof. Example 10.4.3. If T is the operator on `2 (N, C) defined by T(x1 , x2 , ...) = (0, x1 , x2 , ...), then T∗ (x1 , x2 , ...) = (x2 , x3 , ...). Clearly, kTk = kT∗ k = 1. In what follows, we deal with five special types of bounded linear operators on a given Hilbert space. They are self-adjoint, normal, nonnegative, unitary, and projective operators. Definition 10.4.4. Let X be a Hilbert space over F. An operator T ∈ B (X) is said to be a self-adjoint or symmetric operator provided that T = T∗ . Example 10.4.5. (i) On a finite-dimensional Hilbert space over F, a self-adjoint operator is one that is its own adjoint, or, equivalently, one whose matrix is Hermitian, where a Hermitian matrix is one which is equal to its own conjugate transpose. (ii) Given n ∈ N, the operator T defined by T( f )(x) = xn f (x) is a self-adjoint operator on LRS2id ([0, 1], C). The structure of self-adjoint operators on infinite dimensional Hilbert spaces is complicated somewhat by the following characterization and its immediate remark. Theorem 10.4.6. Let X be a Hilbert space over F and T ∈ B (X).

314

Jie Xiao

(i) If T is self-adjoint, then kTk = supkxk=1 |hT(x), xi|.

(ii) If F = C, then T is self-adjoint when and only when hT(x), xi is real for all x ∈ X. In particular, T = 0 when and only when hT(x), xi = 0 for all x ∈ X. Proof. (i) The Cauchy-Schwartz inequality implies kTk ≥ supkxk=1 |hT(x), xi|. For the reversed estimate, we use the triangle inequality and the parallelogram law to obtain that for any x ∈ X with kxk = 1, 2|hT(x), yi + hT(y), xi| = |hT(x + y), x + yi − hT(x − y), x − yi|

≤ 2 sup |hT(z), zi|(kx + yk2 + kx − yk2 ) kzk=1

= 4 sup |hT(z), zi|(1 + kyk2). kzk=1

If T(x) 6= 0, taking y = T(x)kT(x)k−1 in the above we get kT(x)k ≤ supkzk=1 |hT(z), zi|. Note that the last inequality is trivially valid for T(x) = 0. So, it follows that kTk ≤ supkxk=1 |hT(x), xi|, and the desired equality occurs. (ii) If T ∈ B (X) is self-adjoint, then hT(x), xi = hx, T(x)i = hT(x), xi for all x ∈ X. Hence hT(x), xi is real for all x ∈ X. Conversely, if f (x) = hT(x), xi, then ( f (x + y) = f (x) + f (y) + hT(y), xi + hT(x), yi; f (x + iy) = f (x) + f (y) + ihT(y), xi − ihT(x), yi. Since f is real-valued, we conclude from the last two identities that there are r, s ∈ R such that hT(y), xi + hT(x), yi = r and hT(y), xi − hT(x), yi = is. This yields (

hT(y), xi = 2−1 (r + is) and hT(x), yi = 2−1 (r − is); hT(y), xi = hT(x), yi = hx, T∗ (y)i = hT∗ (y), xi.

Consequently, for any y ∈ X, we have h(T − T∗ )(y), (T − T∗)(y)i = 0, i.e., T = T∗ .

Hilbert Spaces and Their Operators

315

For the second half of (ii), it is enough to prove the sufficiency. Suppose that hT(x), xi = 0 for all x ∈ X. Then T is self-adjoint, and hence 4hT(x), yi = hT(x + y), x + yi − hT(x − y), x − yi = 0 holds for all y = T(x) ∈ X. This gives T(x) = 0 for all x ∈ X, i.e., T = 0. Remark 10.4.7. Unfortunately, Theorem 10.4.6 (i) is not valid for any real Hilbert space. For instance, if X = R2 and T(x) = T(x1 , x2 ) = (−x2 , x1 ) for all x = (x1 , x2 ) ∈ R2 then hT(x), xi = 0, 0 6= T ∈ B(R2 ) and T is not self-adjoint since T∗ (x) = T∗ (x1 , x2 ) = (x2 , −x1 ) for all x = (x1 , x2 ) ∈ R2 . Definition 10.4.8. Let X be a Hilbert space over C. An operator T ∈ B (X) is said to be a nonnegative operator, denoted T ≥ 0, provided that hT(x), xi ≥ 0 for all x ∈ X. Moreover, we say that T1 ≤ T2 if T1 and T2 are self-adjoint operators in B (X) and T2 − T1 ≥ 0. Example 10.4.9. (i) Any bounded operator T on a given Hilbert space over C always generates a nonnegative operator T∗ T since hT∗ T(x), xi = kT(x)k2 ≥ 0. (ii) The operator defined in Example 10.4.5 (ii) is nonnegative.

(iii) The operator T on C2 determined by T(x1 , x2 ) = (x1 + 2x2 , 2x1 + x2 ) is not nonnegative since T(1, −1) = (−1, 1) and the inner product between two vectors (−1, 1) and (1, −1) is −2. According to Theorem 10.4.6 (ii) and Example 10.4.9 (ii), any nonnegative operator is self-adjoint, but not conversely. Here is the basic structure of a nonnegative operator. Theorem 10.4.10. Given a Hilbert space X over C, let I ∈ B (X) be the identity operator. (i) If T is self-adjoint with −I ≤ T ≤ I or 0 ≤ T ≤ 0, then kTk ≤ 1 or T = 0.

(ii) If T ≥ 0, then T + I is invertible, and the generalized Cauchy-Schwartz inequality holds: p |hT(x), yi| ≤ hT(x), xihT(y), yi for all x, y ∈ X.

316

Jie Xiao

(iii) If 0 ≤ T1 ≤ T2 ≤ · · · ≤ I, then there exists a T∞ ∈ B (X) such that lim kT j (x) − T∞ (x)k = 0 for all x ∈ X.

j→∞

(iv) T ≥ 0 when and√only when there exists a unique nonnegative operator S ∈

B (X), denoted S = T, such that S2 = T.

Proof. (i) If T is self-ad-joint, then I ± T ≥ 0 implies ±hT(x), xi ≤ kxk2 and kTk ≤ 1 by Theorem 10.4.6 (i). Moreover, ±T ≥ 0 implies hT(x), xi = 0 and kTk = 0 by Theorem 10.4.6 (i). (ii) Since T ≥ 0, we conclude (T + I)(x) = 0 ⇒ h(T + I)(x), xi = 0 ⇒ 0 ≤ hT(x), xi = −kxk2 ⇒ x = 0,

and so the injectivity of T + I follows. The surjectivity will follows if we can prove that M = (T + I)(X), the image of T + I, is both dense and closed. The argument used for injectivity applies to x ∈ M ⊥ to infer that h(T + I)x, xi = 0 for all x ∈ M ⊥ and so M ⊥ = {0}. On the other hand, for any x ∈ X, we have k(T + I)(x)k2 = kT(x)k2 + 2hT(x), xi + kxk2

and hence kxk ≤ k(T + I)(x)k. Consequently, if {(T + I)(x j )}∞j=1 is a Cauchy sequence in M, then {x j }∞j=1 is a Cauchy sequence in X and hence it is convergent. This shows that M is complete and thereby closed in X. From Theorem 10.2.5 it turns out that X = M ⊕ M ⊥ = M; that is, T + I is surjective. Regarding the generalized Cauchy-Schwartz inequality for any x, y ∈ X, for any t ∈ R let λ = tζ ∈ F obey ζhy, T(x)i = |hy, T(x)i|. Then we get 0 ≤ hx + λy, T(x + λy)i = hx, T(x)i + 2t|hy, T(x)i| + t 2hy, T(y)i, whence deriving |hT(x), yi|2 − hx, T(x)ihy, T(y)i ≤ 0. (iii) For k > j and x ∈ X, we use the generalized Cauchy-Schwartz inequality in (ii) to get kTk (x) − T j (x)k2 = h(Tk − T j )(x), (Tk − T j )(x)i q ≤ h(Tk − T j )(x), xih(Tk − T j )2 (x), (Tk − T j )(x)i q ≤ h(Tk − T j )(x), xikxk q ≤ hTk(x), xi − hT j (x), xikxk

Hilbert Spaces and Their Operators

317

since 0 ≤ Tk − T j ≤ I implies kTk − T j k ≤ 1 and k(Tk − T j )2 k ≤ 1. Note that {hT j (x), xi}∞j=1 is convergent since it is increasing and bounded. So {T j (x)}∞j=1 is a Cauchy sequence in the Hilbert space X and consequently, this sequence is convergent to T∞(x) ∈ X. According to Theorem 9.2.4 (i), T∞ ∈ B (X). (iv) The sufficiency is trivial. For the necessity, suppose T ≥ 0. According to Theorem 10.4.6 (i) we have kTk < ∞ and equivalently T ≤ kTkI. Therefore, without loss of generality we may assume T ≤ I. To find a nonnegative operator S such that S2 = T, let U = I − T, T1 = 0 and T j+1 = 2−1 (U + T2j ) for j ∈ N. By induction, we find three basic facts as follows: (a) For j ∈ N, 0 ≤ T j is a polynomial of U with nonnegative coefficients; (b) For j ∈ N, T j ≤ I. In fact, T j ≤ I implies hT2j (x), xi = kT j (x)k2 ≤ kxk2 and so T2j ≤ I which, along with U ≤ I, yields T j+1 ≤ I; (c) For j ∈ N, T j+2 − T j+1 ≥ 0. This is because the easy commutativity T j+1 T j = T j T j+1 deduces T j+2 − T j+1 = 2−1 (T2j+1 − T2j ) = 2−1 (T j+1 + T j )(T j+1 − T j ), where T j+1 + T j is a polynomial of U with nonnegative coefficients, and so is T j+1 − T j by T2 − T1 = 2−1 U and by induction. Now, (iii) is applied to produce a T∞ ∈ B (X) with lim j→∞ T j (x) = T∞ (x) for all x ∈ X. Then T∞ = 2−1 (U + T2∞ ) and hence desired square root operator is S = I − T∞ , i.e., (I − T∞ )2 = T. It remains to check the uniqueness. Suppose that S1 ≥ 0 satisfies S21 = T. Then S1 T = S31 = TS1 and hence S1 Tn = Tn S1 for any n ∈ N. Now the previousestablished limit process implies SS1 = S1 S. Consequently, if y = (S − S1 )(x) for any x ∈ X then 0 = h(S+S1 )(y), yi = hS(y), yi+hS1 (y), yi, and hence S(y) = √ 2 √ 2 0 = S1 (y) (here S = S and S1 = S1 ) which ensures (S2 − SS1 )(x) = 0 = (S1 S − S21 )(x). Accordingly, k(S − S1 )(x)k2 = h(S − S1 )2 (x), xi = 0, and so S = S1 . Definition 10.4.11. Given a Hilbert space X over F let T ∈ B (X). Then T is called unitary provided that T is an isometric isomorphism.

318

Jie Xiao

Example 10.4.12. For f ∈ LRS2id (R, C) let √

F ( f )(x) = 2π

−1

Z

R

f (t) exp(itx) dmid (t) for all x ∈ R,

then F is a unitary operator on LRS2id (R, C). This operator is called the Fourier transform and its adjoint operator is determined by √

−1

F ∗( f )(x) = 2π

Z

R

f (t) exp(−itx) dmid (t) for all x ∈ R.

We can characterize unitary operators in terms of adjoints and inner products. Theorem 10.4.13. Given a Hilbert space X over C let T ∈ B (X). If I is the identity operator in B (X), then the following statements are equivalent: (i) T is unitary; (ii) hT(x), T(y)i = hx, yi for all x, y ∈ X, and T is surjective; (iii) TT∗ = T∗ T = I.

Proof. (i)⇒(ii) If T is unitary, then kT(x)k = kxk and hence 4

hT(x), T(y)i = 2−2 ∑ in kT(x) + in T(y)k2 n=1 4

= 2−2 ∑ in kT(x + in y)k2 n=1 4

= 2−2 ∑ in kx + in yk2 n=1

= hx, yi. The surjectivity of T follows the definition of a unitary operator. (ii)⇒(iii) Clearly, (ii) implies that T is injective. Moreover hT∗ T(x), xi = hT(x), T(x)i = hx, xi. This, along with Theorem 10.4.6 (ii), yields h(T∗ T − I)(x), xi = 0 ⇒ T∗ T = I ⇒ TT∗ = I.

Hilbert Spaces and Their Operators

319

(iii)⇒(i) If (iii) is true, then T is surjective and kT(x)k2 = hT(x), T(x)i = hx, T∗ T(x)i = hx, xi = kxk2 . This implies that T is injective, and thereby T is unitary. Theorem 10.4.13 leads to the following consideration. Definition 10.4.14. Let X be a Hilbert space over C. An operator T ∈ B (X) is said to be a normal operator if TT∗ = T∗ T. Interestingly, the normal operators correspond to the complex numbers. Theorem 10.4.15. Let X be a Hilbert space over C and T ∈ B (X). Then the following are equivalent. (i) T is normal. (ii) T = T1 + iT2 where T1 and T2 are self-adjoint and T1 T2 = T2 T1 . (iii) kT(x)k = kT∗ (x)k for all x ∈ X. Proof. (i)⇒(ii) Put T1 = 2−1 (T + T∗ ) and T2 = (2i)−1 (T − T∗ ). It is easy to see that T = T1 + iT2 holds with T1 and T2 being self-adjoint and commutative. (ii)⇒(iii) Using the given decomposition of T plus T1 T2 = T2 T1 , we obtain kT(x)k2 = hx, T∗ T(x)i

= hx, (T1 − iT2 )(T1 + iT2 )(x)i

= hx, (T1 T1 + T2 T2 )(x)i = hx, (T1 + iT2 )(T1 − iT2 )(x)i

= hx, TT∗ (x)i = kT∗ (x)k2 . (iii)⇒(i). For any x ∈ X, we have

h(TT∗ − T∗ T)(x), xi = hTT∗ (x), xi − hT∗ T(x), xi = kT∗ (x)k2 − kT(x)k2 = 0. So, Theorem 10.4.6 is employed to derive that TT∗ − T∗ T = 0, as desired.

320

Jie Xiao

Example 10.4.16. Let T = 2iI on a given Hilbert space X over C, where I is the identity operator in B (X). Then T∗ = −2iI and TT∗ = T∗ T = 4I. Thus, T is normal operator but not unitary nor self-adjoint. The fifth special operator that we take account of is the projection operator. Definition 10.4.17. Let X be a Hilbert space over F. Then an operator T ∈ B (X) is called a projection operator provided T2 = T. Example 10.4.18. Let M be a closed subspace of a Hilbert space X over F. Then X = M ⊕ M ⊥ ; that is, any x ∈ X there are unique y ∈ M and z ∈ M ⊥ such that x = y + z. If PM (x) = y then PM is a projection. In fact, P2M (x) = PM (y) = y = PM (x) ⇒ P2M = PM . The fact that PM ∈ B (X) follows from kxk2 = ky + zk2 = kyk2 + kzk2 ≥ kyk2 = kPM (x)k2 ⇒ kPM k ≤ 1. Traditionally, PM is called the orthogonal projection of X onto M. Moreover, if M 6= {0} then PM 6= 0 and hence for any x ∈ M \ {0} we have x = x + 0 and PM (x) = x, giving kPM k = 1. The following theorem shows that the above orthogonal projections exist as a very important subclass of the projection operators. Theorem 10.4.19. Let X be a Hilbert space over C. If T ∈ B (X) is a projection, then the following statements are equivalent. (i) T is nonnegative. (ii) T is self-adjoint. (iii) T is normal. (iv) T is the orthogonal projection on T(X). Proof. Since (i)⇒(ii)⇒(iii) are straightforward, we only verify (iii)⇒(iv)⇒(i). (iii)⇒(iv) Assume (iii) holds. To reach (iv), let M = T(X). We first prove that M is closed. Given {y j }∞j=1 in M with y j → y, we have y j = T(x j ), x j ∈ X. Since T is a projection, we can conclude from y j → y that y j = T(x j ) = T2 (x j ) = T(y j ) → T(y) ⇒ y = T(y) ∈ M

Hilbert Spaces and Their Operators

321

and hence M is closed. Next, given x ∈ X write x = y + z, y ∈ M and z ∈ M ⊥ . We must verify T(x) = y. Because T(x) = T(y) + T(z), it suffices to prove that T(y) = y and T(z) = 0. Since y ∈ M, there is w ∈ X such that y = T(w) and so T(y) = T2 (w) = T(w) = y. By the definition of M we have T(z) ∈ M, and if we can also prove that T(z) ∈ M ⊥, then we can conclude that T(z) = 0. To do so, given any u ∈ M, then there is v ∈ X such that u = T(v) and hT(z), ui = hT(z), T(v)i = hz, T∗ T(v)i = hz, TT∗(v)i = 0 due to z ∈ M ⊥ and TT∗ (v) ∈ M. In other words, T(z) ∈ M ⊥ . (iv)⇒(i) For any x ∈ X let x = y + z, y ∈ M = T(X) and z ∈ M ⊥. Then hT(x), xi = hy, y + zi = hy, yi + hy, zi = hy, yi ≥ 0. Hence T is nonnegative and the proof is complete. Remark 10.4.20. Here it is worth remarking that there exists a natural one-toone correspondence between projection operators T and direct sum decompositions X = M ⊕ N where M and N are closed. As a matter of fact, if T is a projection, then X = M ⊕ N where M = T(X) and N = (I − T)(X). Conversely, if X = M ⊕ N, then we define T as was done for the orthogonal projection but using N instead of M ⊥. The orthogonal projections are exactly those that arise when N = M ⊥ , and these are generally the projections of interest in Hilbert space theory.

10.5 Compact Operators via Spectrum This section is devoted to a spectrum-based study of the structure of a compact operator acting on a Hilbert space. Definition 10.5.1. Let (X, k · kX ) be a Banach space over F. Then a linear operator T is said to be of finite rank provided that it is of the following form: n

T(x) =

∑ Lk (x)xk

k=1

for x, xk ∈ X and Lk ∈ X ∗ .

322

Jie Xiao

Clearly, an operator of finite rank is compact, for it takes a bounded sequence into a bounded sequence in a finite dimensional Banach space. Hence, the image of the sequence has a convergent subsequence. By Theorem 9.5.4 we know that in a Banach space the limit in norm of operators of finite rank is compact. So, it is very natural to wonder whether or not the converse is true; that is, for each compact operator, can we find a sequence of operators of finite rank that approximate this compact operator in norm? Although the answer is no in general for Banach spaces, we will show that the answer is yes for all complex Hilbert spaces. Definition 10.5.2. Let X be a Banach space over C and T ∈ B (X).

(i) The null space of T, denoted N(T), is {x ∈ X : T(x) = 0}. The range of T, denoted R(T), is {y ∈ X : y = T(x) for some x ∈ X}.

(ii) A complex number λ is called an eigenvalue of T if there exists a nonzero x ∈ X such that T(x) = λx. In this case, x is called an eigenvector associated with λ. (iii) The spectrum of T, denoted σ(T), is the set of all complex numbers λ such that T − λI does not have a bounded inverse where I is the identity operator in B (X). The resolvent of T, written ρ(T), is C \ σ(T). The spectral radius of T, denoted rσ (T), is sup{|λ| : λ ∈ σ(T)}. In addition, σ(T) can be split into the following three classes: (a) The point spectrum  σ p (T) = λ ∈ σ(T) : N(T − λI) 6= {0} . (b) The continuous spectrum  σc (T) = λ ∈ σ(T) : N(T − λI) = {0}, R(T − λI) = X .

(c) The residual spectrum  σr (T) = λ ∈ σ(T) : N(T − λI) = {0}, R(T − λI) 6= X .

Example 10.5.3.

(i) For the closed interval [0, 1] of R, consider LRS2id ([0, 1], C) and the differential operator T = −d 2 /dx2 defined on the linear subspace of all complex-valued functions f having derivatives of any order on [0, 1] and satisfying the boundary

Hilbert Spaces and Their Operators

323

conditions: f (0) = f (1) = 0. Then a computation via integration-by-parts indicates that T is self-adjoint; its eigenfunctions are f j (x) = sin( jπx), j ∈ N with the real eigenvalues j 2 π2 ; the well-known orthogonality of the sine functions follows as a consequence of the property of being self-adjoint. (ii) σ(T) is compact. In fact, according to Theorem 9.1.10 (i), |λ| > kTk implies λ(T − λI)−1 = (λ−1 T − I)−1 ∈ B (X) and then λ ∈ / σ(T). By contrapositive statement, if λ ∈ σ(T) then |λ| ≤ kTk. Thus, rσ (T) ≤ kTk. This implies that σ(T) is bounded. To see that σ(T) is closed, let Φ : C → B (X) be Φ(λ) = T − λI. This map is continuous on C since kΦ(λ1 ) − Φ(λ2 )kX = kIk|λ1 − λ2 |. Thus, σ(T) is closed as the set of all invertible elements of B (X) is open. Of course, ρ(T) = C \ σ(T) is open.  (iii) Let T be defined by T (x1 , x2 , ...) = (x2 , x3 , ...) on `2 (N, C). It is easy to see σ(T) ⊆ {λ ∈ C : |λ| ≤ 1}. Since |λ| < 1 implies x = (1, λ, λ2 , ...) ∈ `2 (N, C) and T(x) = λx, and consequently {λ ∈ C : |λ| < 1} ⊆ σ p (T). Note that σ(T) is closed. So {λ ∈ C : |λ| = 1} ⊆ σ(T). Also, λ ∈ C and |λ| = 1 and T(x) = λx imply x = (x1 , x2 , ...) = x1 (1, λ, λ2 , ...) ∈ / `2 (N, C) unless x1 = 0. This means / {λ ∈ C : |λ| = 1} ⊆ σc (T). Accordingly, σr (T) = 0. The following result, which will be used later, gives much more information on the spectrum of a self-adjoint operator. Theorem 10.5.4. Let X be a Hilbert space over C and T ∈ B (X) be self-adjoint. (i) If λ is an eigenvalue of T, then λ is real.

(ii) If x1 and x2 are eigenvectors associated with different eigenvalues λ1 and λ2 of T, then hx1 , x2 i = 0. (iii) If λ is not an eigenvalue of T, then R(T − λI) is dense in X.

(iv) λ ∈ ρ(T) if and only if k(T − λI)(x)k ≥ κkxk for all x ∈ X and some κ > 0.

(v) If mT = infkxk=1 hT(x), xi and MT = supkxk=1hT(x), xi, then MT = kTk and σ(T) is a nonempty subset of the closed interval [mT , MT ]. In particular, mT , MT ∈ σ(T). (vi) rσ (T) = kTk.

Proof. (i) This follows from the fact that T(x) = λx for some x 6= 0 implies λkxk2 = hT(x), xi ∈ R

324

Jie Xiao

due to Theorem 10.4.6. (ii) In this case, by (i) we have λ1 hx1 , x2 i = hT(x1 ), x2 i = hx1 , T(x2 )i = λ2 hx1 , x2 i which yields hx1 , x2 i = 0. (iii) Assume that λ is not an eigenvalue of T. If z ∈ R(T − λI)⊥ then ¯ xi = h(T − λI)z, ¯ 0 = hz, T(x) − λxi = hT(z), xi − hλz, xi ¯ is an eigenvalue of T, and hence λ ¯ is real. Since for all x ∈ X. If z 6= 0, then λ ¯ = λ, which contradicts the assumption that λ is T = T∗ , we must then have λ not an eigenvalue of T. Thus z = 0 and X = R(T − λI) ⊕ R(T − λI)

⊥

⊆ R(T − λI) ⊕ R(T − λI)⊥ = R(T − λI)

⊆ X.

(iv) If λ ∈ ρ(T) then (T−λI)−1 ∈ B (X) and we may set κ = k(T−λI)−1k−1 . Conversely, if k(T−λI)(x)k ≥ κkxk for all x ∈ X and some κ > 0, then R(T−λI) is closed. In fact, suppose (T−λI)(xk) → y. Then {(T−λI)(xk)}∞ k=1 is a Cauchy ∞ sequence and hence {xk }k=1 is a Cauchy sequence too. Since X is Hilbert space, there exists a z ∈ X such that xk → z. By continuity we have (T − λI)(xk) → T(z) and therefore y = T(z). Also, by (iii) above, we see that R(T − λI) is dense in X and so R(T−λI) = X. The given inequality insures that N(T−λI) = {0}. Hence T − λI is surjective and injective. Accordingly, (T − λI)−1 ∈ B (X), namely, λ ∈ ρ(T). (v) Clearly, MT ≤ kTk. To prove kTk ≤ MT , assume kzk = 1 and T(z) 6= 0. If  1  b = kT(z)k 2 ; x = bz + b−1 T(z);   y = bz − b−1 T(z),

then by the parallelogram law,

kxk2 + kyk2 = 2−1 (kx + yk2 + kx − yk2 ) = 2b2 + 2b−2 kT(z)k2 = 4kT(z)k.

Hilbert Spaces and Their Operators

325

Note that |hTu, ui| ≤ MT kuk2 for any u ∈ X. So, the last equation and a routine calculation give 4kT(z)k2 = hT(x), xi − hx, T(x)i ≤ MT (kxk2 + kyk2 ) = 4MT kT(z)k, and consequently, kT(z)k ≤ MT ; that is, kTk ≤ MT . We next prove that σ(T) comprises only real numbers. In so doing, let λ = α + iβ where β 6= 0. Then for any x ∈ X we have k(T − λI)(x)k2 = kT(x) − αxk2 + β2 kxk2 ≥ β2 kxk2 , and so λ ∈ ρ(T) by (iv). In other words, if λ ∈ σ(T) then β = 0 and hence λ ∈ R. After that, if λ ∈ (−∞, mT ) then for any x ∈ X, kT(x) − λxkkxk ≥ hT(x) − λx, xi = hT(x), xi − λkxk2 ≥ (mT − λ)kxk2 and hence λ ∈ ρ(T) by (iv). Similarly, if λ ∈ (MT , ∞) then λ ∈ ρ(T). Thus σ(T) ⊆ [mT , MT ]. Furthermore, in order to verify MT ∈ σ(T), we may assume MT ≥ mT ≥ 0 and MT = kTk by replacing T by T − mT I. By the definition of MT there exists a sequence {xk }∞ k=1 with kxk k = 1 such that limk→∞ hT(xk ), xk i = MT . Accordingly, kT(xk ) − MT xk k2 ≤ M2T − 2MT hT(xk), xk i + M2T → 0 as k → ∞. This, along with (iv), implies MT ∈ σ(T). Finally, in order to check mT ∈ σ(T), we consider S = −T and obtain −mT = MS ∈ σ(S) by the previous argument. Thus T − mT I = −(S − MS I) does not have a bounded inverse, and hence mT ∈ σ(T). (vi) By (v), we have mT , MT ∈ σ(T) and so rσ (T) ≥ max{|mT |, |MT|} = kTk. This and the general estimate rσ (T) ≤ kTk yields the desired formula. When working with a compact self-adjoint operator, we find that the structure of its spectrum is particularly simple.

326

Jie Xiao

Theorem 10.5.5. Let X be a Hilbert space over C, and let T ∈ B (X) be compact and self-adjoint. (i) If λ ∈ σ(T) is nonzero, then λ is an eigenvalue of T.

(ii) If {eα : α ∈ A} is an orthonormal set of eigenvectors corresponding to eigenvalues λα with |λα| > c > 0, then A is a finite set.

Proof. (i) By Theorem 10.5.4 (iv), we see that if λ ∈ σ(T) is not equal to 0 then there is a sequence {xk }∞ k=1 in X obeying kxk k = 1 and T(xk ) − λxk → 0. Since T is compact, there exists a subsequence {xkn }∞ n=1 such that T(xkn ) → y ∈ X as n → ∞. Consequently,  xkn = λ−1 T(xkn ) − (T(xkn ) − λxkn ) → λ−1 y as n → ∞,

and therefore λy = limn→∞ λT(xkn ) = T(y) owing to the continuity of T; that is, λ is an eigenvalue of T and y is a corresponding eigenvector. Moreover, y 6= 0 since kxk k = 1 and λ 6= 0. (ii) Assume A is infinite. Then there exists an infinite sequence of or∞ thonormal eigenvectors {yk }∞ k=1 corresponding to eigenvalues {λk }k=1 with |λk | > c > 0 for all k ∈ N. Note that the orthonormality of {yk }∞ k=1 gives kT(yk ) − T(yl )k2 = kλk yk k2 + kλl yl k2 = |λk |2 + |λl |2 > 2c2 . So, the compactness of T implies 0 ≥ c, a contradiction.

To reach the main result of this section, we introduce the following terminology. Definition 10.5.6. Let X be a Hilbert space over C. If λ is an eigenvalue of T ∈ B (X), then N(T − λI) is called the eigenspace associated with λ. Remark 10.5.7. (i) It is clear that N(T − λI) is invariant under T; that is, if x ∈ N(T − λI) then T(x) ∈ N(T − λI) since T(T(x)) = T(λx) = λT(x).

(ii) If T is compact and self-adjoint, then the eigenspace corresponding to each nonzero eigenvalue must be finite dimensional, but also, there can be at most finitely many eigenvalues λ obeying |λ| > c for a given constant c > 0. The former clearly follows from Theorem 10.5.5 (ii). To see the latter, we assume that there were an infinite sequence of different eigenvalues {λk }∞ k=1 obeying

Hilbert Spaces and Their Operators

327

|λk | > c > 0. If {yk }∞ k=1 is a corresponding sequence of norm one eigenvectors, then by Theorem 10.5.4 (ii) we see that {yk }∞ k=1 are orthogonal and hence or∞ thonormal. But, {yk }k=1 cannot be infinite by Theorem 10.5.5 (ii). Thus, there can be only finitely many eigenvalues obeying |λk | > c > 0. Consequently, the set of nonzero eigenvalues of the operator is at most countable. Moreover, if the operator has infinitely many eigenvalues, then they must converge to zero since at most finitely many eigenvalues can satisfy an inequality of the form |λ| > c > 0. Below is the spectral representation of a compact and self-adjoint operator on a complex Hilbert space. Theorem 10.5.8. Let X be a Hilbert space over C, and let T ∈ B (X) be compact and self-adjoint. If {λ j }∞j=1 is an enumeration of the distinct nonzero eigenvalues of T, then for each x ∈ X one has T(x) = ∑∞ k=1 λk Pk (x), where each Pk is the orthogonal projection of X onto the eigenspace N(T − λk I). Proof. From the definition of each Pk and Theorem 10.2.2 it follows that for x ∈ X, kPk (x) − xk = inf{ky − xk : y ∈ N(T − λk I)}.

If {yk,1 , ..., yk,nk } is an orthonormal basis for N(T − λk I), then by Remark k 10.5.7 (ii), and Theorems 10.3.8 and 10.3.10, Pk (x) = ∑nj=1 hx, yk, j iyk, j . Because eigenvectors corresponding to different eigenvalues are orthogonal, the set S = ∪∞ k=1 {yk,1 , ..., yk,nk } is an orthonormal set. The vector ∞

y=



∑ Pk (x) =

k=1

nk

∑ ∑ hx, yk,n iyk,n k

k

k=1 j=1

is then well-defined since ∞

nk

∑ ∑ |hx, yk,n i|2 ≤ kxk2 k

k=1 j=1

which follows from Bessel’s inequality. Since Pk (x) ∈ N(T − λk I), we have T(y) = ∑∞ k=1 λk Pk (x). Writing x = y + (x − y), we find that the proof will be complete upon verifying x − y ∈ N(T). To this end, let M be the smallest closed linear subspace of X containing S. Then S is an orthonormal basis for M and hence y = PM (x), the orthogonal projection onto M of x. From this it turns out that x = (I − PM )(x) + PM (x) = PM⊥ (x) + PM (x).

328

Jie Xiao

Since M is invariant under T, x ∈ M ⊥ implies hT(x), zi = hx, T(z)i = 0 for z ∈ M, namely, M ⊥ is invariant under T. Therefore, if U stands for the restriction of T to M ⊥ , then U ∈ B (M ⊥) and it is obviously a compact self-adjoint operator on the Hilbert space M ⊥. Now, if U 6= 0, then it has a nonzero eigenvalue which would also be a nonzero eigenvalue of T. In other words, there is some nonzero x ∈ M ⊥ with x ∈ N(T − λI) ⊆ M for some nonzero λ ∈ σ(T). Noticing M ∩ M ⊥ = {0}, we get x = 0, a contradiction. Therefore, U must be a zero operator and consequently, T(x − y) = UPM⊥ (x) = 0. This completes the proof of the theorem. Remark 10.5.9. The argument for Theorem 10.5.8 produces a linear subspace M ⊥ of X comprising elements mapped to zero by T. So, we can choose an orthonormal basis for M ⊥, each element of which is an eigenvector of T with eigenvalue 0. If these new eigenvectors are included in the sequence of eigenvectors constructed in Theorem 10.5.8, we obtain an orthonormal basis {e j } for X. Note that the new eigenvalues are zero, their inclusion does not affect the ∞ convergence of the sequence {λk }∞ k=1. Consequently, if x = ∑ j=1 c j e j ∈ X and {λ j } are all eigenvalues corresponding to {e j }, then ∞

T(x) =

∑ c jλ je j = j=1



∑ ck λk Pk (x). k=1

In some ways, Theorem 10.5.8 tells us that the self-adjoint compact operators in a Hilbert space are similar to the symmetric matrices in Rn . Moreover, as an interesting consequence of Theorem 10.5.8, we can approximate any compact operator on a Hilbert space by the operators of finite rank. Theorem 10.5.10. Let X be a Hilbert space over C. Then for each compact operator T ∈ B (X) there is a sequence of operators of finite rank converging to T in norm. Proof. First, assume X is separable. Then by Theorem 10.3.12 we see that X has a countable orthonormal basis {ek }∞ k=1. For each n ∈ N define the operator Pn via n

Pn (x) =

∑ hx, ek iek

k=1

for all x ∈ X.

Hilbert Spaces and Their Operators

329

From Theorem 10.3.7 or Remark 10.4.20 it follows that max{kPn k, kI − Pn k} ≤ 1 for all n ∈ N and lim kPn (x) − xk = 0.

n→∞

Now, if the assertion were false, then there would be a compact operator T ∈ B (X) and a number δ > 0 such that kT−Fk ≥ δ for all operators F of finite rank. Hence, for each n ∈ N there is an xn ∈ X such that kxn k = 1 and k(T − Fn )xn k ≥ 2−1 δ, where the operator Fn defined by n

Fn (x) = Pn T(x) =

∑ hT(x), ekiek

k=1

for all x ∈ X,

is an operator of finite rank. Since {xn }∞ n=1 is bounded and T is compact, there ∞ is a subsequence {xn j } j=1 such that {T(xn j )}∞j=1 converges to some element w ∈ X. But k(T − Fn j )(xn j )k ≤ k(I − Pn j )(T(xn j ) − w)k + k(I − Pn j )(w)k ≤ kT(xn j ) − wk + k(I − Pn j )(w)k.

Now for n j big enough, we have k(I − Pn j )(w)k < 2−2 δ and kT(xn j ) − wk < 2−2 δ. The foregoing estimates therefore produce a contradiction: 2−1 δ > 2−1 δ, which naturally proves the assertion in the case when X is separable. Next, assume that X is any Hilbert space over C. Set S = T∗ T. Then S is self-adjoint and compact. By Theorem 10.5.8, S can be written as the following form ∞

S(x) =

∑ λk hx, ek iek ,

k=1

x ∈ X,

for some orthonormal sequence {ek }∞ k=1 . Suppose now that X0 is the subspace of all linear combinations of elements of the form Tn (ek ) under n + 1, k ∈ N. Then X0 is a separable closed linear subspace of X, and T maps X0 into itself. By the first part, there exists a sequence {Fn }∞ n=1 of finite rank operators on ˘ of T to X0 . Now every element X0 converging in norm to the restriction T ⊥ x ∈ X can be decomposed into the form x = y + z where y ∈ X0 and z ∈ X0

330

Jie Xiao

is orthogonal to X0 . With this, we get that z is orthogonal to each ek , and so z ∈ N(S). Accordingly, 0 = hS(z), zi = hT∗ T(z), zi = kT(z)k2; that is, T(z) = 0. Thus, if Gn (x) = Fn (y) for n ∈ N, then Gn is of finite rank and limn→∞ kGn − Tk = 0 thanks to ˘ ˘ kGn (x) − T(x)k = kFn (y) − T(y)k = kFn (y) − T(y)k ≤ kFn − Tkkxk.

To close this section, we use Remark 10.5.9 to produce the forthcoming Fredholm’s alternative which arises in the study of the following homogeneous and inhomogeneous Fredholm integral equations: f (x) −

Z

[a,b]

κ(x, y) f (y) dmid (y) = 0; f (x) −

Z

[a,b]

κ(x, y) f (y) dmid (y) = g(x)

for a given pair {g, κ} with: g ∈ LRS2id ([a, b], C) and |κ(·, ·)|2 ∈ LRSid ([a, b] × [a, b]). Theorem 10.5.11. Let X be a Hilbert space over C and let T be a self-adjoint compact linear operator on X. (i) If the only solution to the homogeneous equation f − T( f ) = 0 is f = 0, then the inhomogeneous equation f − T( f ) = g has a unique solution f for each g ∈ X. (ii) If f − T( f ) = 0 has nonzero solutions, then f − T( f ) = g has a solution for a given g ∈ X only if hg, f i = 0 for any solution f to f − T( f ) = 0, in which case f − T( f ) = g has infinitely many solutions, the difference of any two of them being a solution of f − T( f ) = 0.

Proof. (i) From Remark 10.5.9 it follows that X has an orthonormal basis {e j }∞j=1 comprising eigenvectors of T with eigenvalues {λ j }∞j=1 . For g ∈ X, let g = ∑∞j=1 c j e j , and then search a solution to f − T( f ) = g in the form f = ∑∞j=1 d j e j . Then ∞



∑ d je j − ∑ c j λ je j =

j=1

j=1



∑ c je j j=1

Hilbert Spaces and Their Operators

331

and hence d j = c j (1 − λ j )−1 provided λ j 6= 1. Since f − T( f ) = 0 has no nonzero solution, 1 is not an eigenvalue of T and consequently all d j exist. This indicates that the solution to f − T( f ) = g must be of the form ∑∞j=1 c j (1 − λ j )−1 e j . Of course, this shows that if f − T( f ) = g has a solution then it is unique. Now, it remains to check that the last series converges in X but also produces a solution to f − T( f ) = g. If 1 were a cluster point of the set of eigenvalues of T, then ∑∞j=1 c j (1 − λ j )−1 e j would not generally converge because its terms would be very large as λ j was close to 1. But, we know from Remarks 10.5.7 (ii) that λ j → 0 as j → ∞, and so no subsequence of {λ j } can approach 1 – this means that 1 is not a cluster point of the set {λ j } and so that sup j∈N |1 − λ j |−2 < ∞. Consequently, ∞

∑ |c j (1 − λ j )−1|2 ≤ sup |1 − λ j |−2 j∈N

j=1





∑ |c j |2 < ∞. j=1

This finiteness, along with Theorem 10.3.10, implies that ∑∞j=1 c j (1 − λ j )−1 e j is convergent in X. Next, let us verify that the last series is a solution to f − T( f ) = g. Since g ∈ X and {e j } is an orthonormal basis for X, we conclude that ∑∞j=1 |hg, e j i|2 < ∞. Also since T is self-adjoint and compact, it follows that λ = infk |1 − λk| > 0 and then ∞



j=1

j=1

∑ |hg, e ji|2 |1 − λ j |−2 ≤ λ−2 ∑ |hg, e ji|2 < ∞. = ∑∞j=1 hg, e j i(1 − λ j )−1 e j

This yields that f rem 10.3.10. Accordingly,

is convergent in X thanks to Theo-



T( f ) =

∑ hg, e ji(1 − λ j )−1T(e j ) j=1 ∞

= = =

∑ hg, e ji(1 − λ j )−1λ j e j j=1 ∞



j=1

j=1

∑ c j (1 − λ j )−1e j − ∑ hg, e jiλ je j f − g.

(ii) Given g ∈ X, assume now that f 1 satisfies f − T( f ) = g and f 2 is a nonzero solution to f − T( f ) = 0. This assumption, plus T∗ = T, gives h f 1 , f 2 i = hT( f 1), f 2 i + hg, f 2i = h f 1 , T( f 2 )i + hg, f 2i = h f 1 , f 2 i + hg, f 2i,

332

Jie Xiao

whence deriving hg, f 2i = 0. Here, since f − T( f ) = g implies ( f + c f 2 ) − T( f + c f 2 ) = g for any f 2 with f 2 − T( f 2) = 0 and c ∈ C, there are infinitely many such solutions to f − T( f ) = g. The final assertion follows simply from taking the difference between two solutions to f − T( f ) = g. Clearly, the above Fredholm’s alternative theorem is a simple generalization of facts on sets of linear algebraic equations: An inhomogeneous system has a unique solution when and only when the determinant of the coefficients is nonzero, i.e., the corresponding homogeneous system has no nonzero solution. In addition, we here want to say by an example that the compactness of the operator T in the Fredholm’s alternative theorem is essential. Example 10.5.12. On LRS2id ([0, 1], C) define T( f )(x) = (1 − x) f (x). Then T is self-adjoint but not compact. The homogeneous equation f − T( f ) has no solution other than f = 0. Nevertheless, the inhomogeneous equation f − T( f ) = g has no solution in LRS2id ([0, 1], C) for any nonconstant function g ∈ LRS2id ([0, 1], C) – indeed, the solution to f − T( f ) = g is then proportional to x−1 as x ∈ (0, 1], which is certainly not in LRS2id ([0, 1], C).

Problems 10.1. Equip C([−1, 1], C) with the 2-norm and the inner product h f , gi = −1 f (x)g(x)dx. Prove that C([−1, 1], C) is an inner product space but not a Hilbert space. R1

10.2. Equip ` p (N, C), 1 ≤ p < ∞, with the p-norm. Prove that `2 (N, C) is a Hilbert space under the inner product ∞

hx, yi =

∑ x jy j j=1

for all x = (x j )∞j=1 , y = (y j )∞j=1 ∈ `2 (N, C),

but ` p (N, C) is not a Hilbert space under the p-norm whenever p 6= 2. 10.3. Let X be an inner product space over F. (i) If F = R, prove that x⊥y is equivalent to kx + yk2 = kxk2 + kyk2 . Is this equivalence true for F = C? (ii) x⊥y if and only if kx + βyk ≥ kxk for all β ∈ F.

333

Hilbert Spaces and Their Operators

10.4. Let X be a Hilbert space over F. Prove that if M is a closed linear subspace of X then (M ⊥)⊥ = M. 10.5. Equip LRS2id ([−1, 1], C) with the inner product h f , gi = (i) Let

R

[−1,1]

f gdmid .

M = { f ∈ LRS2id ([−1, 1], C) : f (x) = 0 for all x ∈ [−1, 0]}. Find M ⊥ . (ii) Let ( Modd = { f ∈ LRS2id ([−1, 1], C) : f (−x) = − f (x) for all x ∈ [−1, 1]}; Meven = { f ∈ LRS2id ([−1, 1], C) : f (−x) = f (x) for all x ∈ [−1, 1]}. Prove LRS2id ([−1, 1], C) = Modd ⊕ Meven . 10.6. Equip LRS2id ([−π, π], C) with the inner product h f , gi = Prove the following two results.

R

[−π,π]

f gdmid .

1

(i) {(2π)− 2 einx }∞ n=−∞ is an orthonormal set. (ii) 1

1

1

1

1

{(2π)− 2 , π− 2 cost, π− 2 sint, π− 2 cos 2t, π− 2 sin2t, ...}

is an orthonormal basis of LRS2id ([−π, π], C).

10.7. Let LRS2id ([−π, π], C) be as in Problem 10.5. For φ ∈ C([−π, π], C) let T( f ) = φ f be defined on LRS2id ([−π, π], C). (i) Calculate T∗ using the inner product defined above. (ii) Prove that if φ is real-valued then T is self-adjoint. (iii) Find a condition on φ such that T is respectively unitary, nonnegative, or a projection. 10.8. Let X be a Hilbert space over C and T ∈ B (X). Prove the following results. (i) R(T)⊥ = N(T∗ ) and N(T)⊥ = R(T∗ ). (ii) R(T∗ )⊥ = N(T) and N(T∗ )⊥ = R(T). 10.9. Equip C([0, 2π], C) with the sup-norm: k f k∞ = supx∈[0,2π] | f (x)| < ∞.

334

Jie Xiao

(i) Define T( f )(x) = x f (x) for f ∈ C([0, 2π], C). Find σ(T).

(ii) Define T( f )(x) = eix f (x) for f ∈ C([0, 2π], C). Prove σ(T) = {λ ∈ C : |λ| = 1}. 10.10. Equip `2 (N, C) with the 2-norm.  (i) If T(x) = T (x1 , x2 , ...) = (x2 , x3 , ...), then find σ(T).  (ii) If T(x) = T (x1 , x2 , ...) = (0, −x1 , −x2 , −x3 , ...), then prove the following results: (a) T has no eigenvalue; (b) ρ(T) = {λ ∈ C : |λ| > 1}; (c) k(T − λI)−1 k = (|λ| − 1)−1. 10.11. Suppose that X is a Hilbert space over C, and that T ∈ B (X). Prove σ(T∗ ) = {λ : λ ∈ σ(T)}.

10.12. Let (X, k · k) be a Hilbert space over C. If T ∈ B (X) is compact and self-adjoint and x ∈ X satisfies kxk = 1, then prove the following two results. (i) λ ∈ R obeys kT(x) − λxk2 = kT(x)k2 − hT(x), xi2 when and only when λ = hT(x), xi. p (ii) There exists a λ ∈ σ(T) such that |λ − hT(x), xi| ≤ kT(x)k2 − hT(x), xi2 .

10.13. Let T( f )(x) = T.

R 1 x+y f (y)dy. Find the eigenvalues and eigenvectors of 0 e

10.14. Let X be a Hilbert space over C. If T = T1 + iT2 is normal operator on X, then prove kTk2 = kT21 + T22 k and kT2 k = kTk2 . 10.15. Let X be a Hilbert space over C, λ ∈ C, and T ∈ B (X) normal. Prove the following results: ¯ (i) k(T∗ − λI)(x)k = k(T − λI)(x)k for x ∈ X; ¯ (ii) T(x) = λx ⇒ T∗ (x) = λx;

(iii) rσ(T) = kTk.

10.16. Let X be a Hilbert space over C. Prove if T ∈ B (X) and T∗ T is compact then T is compact.

Hilbert Spaces and Their Operators

335

10.17. Suppose that [amn ] is an infinite matrix of real numbers obeying 2 ∑∞ m,n=1 amn < ∞. Let the linear operator T on `2 be defined by T(x) = y where ∞ x = (x1 , x2 , ...) and y = (∑∞ n=1 a1n xn , ∑n=1 a2n xn , ...). Prove that T is a compact operator on `2 . 10.18. Equip the real-valued Lebesgue space LRS2id [0, 1] with the inner product R h f , gi = [0,1] f gdmid . Suppose that K( f )(x) =

Z

[0,1]

κ(x, y) f (y)dy for all f ∈ LRS2id [0, 1].

(i) If n

κ(y, x) =

∑ gk (x) fk (y) k=1

and

{gk}nk=1

where f k , gk ∈ LRS2id [0, 1]

are linearly independent, then prove that T is compact.

(ii) If λ is nonzero eigenvalue, prove its corresponding eigenvector has the following form: n

f (x) =

∑ ck gk (x)

where ck is constant;

k=1

moreover, if hk, j =

R

[0,1] f k g j dmid ,

then ck = λ−1 ∑nj=1 h j,k c j for k = 1, 2, ..., n.

(iii) If f k = gk and h f j , gki = 0 for j 6= k, evaluate eigenvalues and eigenvectors of K. (iv) If κ(y, x) = cos π−1 (y + x) for x, y ∈ [0, 1], evaluate eigenvalues and eigenvectors of K. 10.19. Suppose that T is a compact operator on an infinite-dimensional Hilbert space (X, k · k) over C. If {e j }∞j=1 is an orthonormal basis of X, then prove lim j→∞ kT(e j )k = 0. 10.20. Let (X, k · kX ) and (Y, k · kY ) be Hilbert spaces over C. A bounded linear operator T : X → Y is called Hilbert-Schmidt (Erhard Schmidt) operator provided that there exists an orthonormal basis {e j }∞j=1 in X such that ∑∞j=1 kT(e j )kY2 < ∞. (i) If T∗ : Y → X is defined by

hT(x), yiY = hx, T∗ (y)iX for all (x, y) ∈ X ×Y,

336

Jie Xiao

where h·, ·iX and h·, ·iY stand for the inner products equipped with X and Y respectively, prove that T is Hilbert-Schmidt operator when and only when T∗ is Hilbert-Schmidt operator. (ii) Prove that any Hilbert-Schmidt operator is compact. (iii) If κ(·, ·) : (a, b) × (c, d) → C satisfies (a,b) (c,d) |κ(x, y)|2 dmid (x) dmid (y) < ∞ where (a, b) × (c, d) ⊆ R2 , then prove that the integral operator R

T( f )(x) =

Z

(c,d)

R

κ(x, y) f (y) dmid (y) for all x ∈ (a, b)

is a Hilbert-Schmidt operator from LRS2id ((c, d), C) to LRS2id ((a, b), C) and therefore compact. 10.21. Let X and Y be Hilbert spaces over C. Suppose {e j }∞j=1 and { f k}∞ k=1 are orthonormal bases of X and Y respectively. For a sequence {ck }∞ in C define k=1 a linear operator T : X → Y by ∞

T(x) =

∑ ck hx, ek iX fk

k=1

for all x ∈ X,

where h·, ·iX and h·, ·iY stand for the inner products equipped with X and Y respectively. Prove the following results: (i) T is a bounded operator when and only when supk∈N |ck | < ∞;

(ii) T is a compact operator when and only when limk→∞ |ck | = 0;

2 (iii) T is a Hilbert-Schmidt operator when and only when ∑∞ k=1 |ck | < ∞;

(iv) T is an operator of finite rank when and only when there is an N ∈ N such that |cn | = 0 for n > N.

Hints or Solutions This part comprises hints or solutions to those problems included at the end of each chapter in the text.

1 Preliminaries 1.1 If x ∈ (X ∪ Y ) \ (X ∩ Y ) then x ∈ X ∪ Y but x ∈ / X ∩ Y and hence we must have: x ∈ X but x ∈ / Y or x ∈ Y but x ∈ / X, that is, x ∈ (X \ Y ) ∪ (Y \ X). Thus (X ∪ Y ) \ (X ∩ Y ) ⊆ (X \ Y ) ∪ (Y \ X). The reversed inclusion can be proved similarly. Therefore, the desired equality follows. 1.2 ∪ j∈N X j = [0, 1] and ∩ j∈N X j = {0}. 1.3 If x ∈ X ∩ (∪ j∈I X j ) then x ∈ X and x ∈ X j for some j ∈ I and hence x ∈ ∪ j∈I (X ∩ X j ). Consequently, X ∩ (∪ j∈I X j ) ⊆ ∪ j∈I (X ∩ X j ). The reversed inclusion can be proved similarly. So, the desired equality follows. The second equality can be checked in a similar manner. 1.4 This follows from the related definitions. 1.5 f (x) = a + (b − a)x : (0, 1) → (a, b) and g(x) = tan2−1 πx : (0, 1) → (0, ∞) are bijective. −1 −1 1.6 If x ∈ (g ∈ C and hence f (x)  ◦ f ) (C) then g ◦ f (x)  ∈ g (C) and x ∈ −1 −1 −1 −1 −1 f g (C) . Accordingly,(g ◦ f ) (C) ⊆ f g (C) . Similarly, we have −1 −1 −1 (g ◦ f ) (C) ⊇ f g (C) , thereby getting the desired equality.

1.7 Since X is infinite, X \ {x} is infinite and hence there is an x1 ∈ X \ {x}. Of course, X \ {x, x1 } is infinite, and then there is an x2 ∈ X \ {x, x1 }. This process produces a sequence of distinct points {x j }∞j=1 in X \ {x}. The map: x → x1 , x j → x j+1 ; j ∈ N, y → y otherwise, is a bijection of X onto X \ {x}.

338

Jie Xiao

1.8 Since f (X) is a subset of Y , f (X) is countable. While f is one-to-one, X is countable too. 1.9 If there is a function φ from [0, 1] onto X, then for each x ∈ [0, 1] we have φx = φ(x) ∈ X which is a function from [0, 1] to R. For any x ∈ [0, 1] let  0 , φx (x) 6= 0 f (x) = . 1 , φx (x) = 0 Since φ is surjective, there is an x0 ∈ [0, 1] such that f = φx0 . But f (x0 ) 6= φx0 (x0 ), a contradiction. Therefore, no such φ exists. 1.10 (i) a−1 ax = a−1 b gives x = a−1 b and conversely. (ii) Use a−1 (a−1 )−1 = 1. (iii) Use (ab)(ab)−1 = 1 and aba−1 b−1 = 1. 1.11 Obviously, |a1 | ≤ |a1 |. Assuming the inequality for n = k, we get by the triangle inequality that k+1 l ∑ a j ≤ ∑ a j + |ak+1 | ≤ j=1

as desired.

j=1

k

∑ |a j | + |ak+1|, j=1

1.12 Use b3 − a3 = (b − a)(b2 + ba + a2) and b2 + ba + a2 ≥ 0 (under a, b ≥ 0). 1.13 Clearly, sup(X +Y ) ≤ supX + supY . For any ε > 0 there are x ∈ X and y ∈ Y such that supX < x+ε and supY < y+ε. So, supX +supY < sup(X +Y )+2ε. This gives supX + supY ≤ sup(X +Y ). Thus, sup(X +Y ) = supX + supY . The proof of inf(X +Y ) = inf X + infY is similar to the previous one. 1.14 (i) Since (an )∞ n=1 is increasing and bounded from above by any bk , a = limn→∞ an exists and belongs to ∩n∈N In . Similarly, b = limn→∞ bn exists and belongs to ∩n∈N In . (ii) This follows from (i). (iii) This follows from (i) and (ii). 1.15 Use Definition 1.2.6 and some simple estimates. −1 1.16 Use sn > a > 0 if and only if 0 < s−1 n n.

339

Hints or Solutions

1.18 (i) It suffices to prove the first equality. Since {sn }n∈N is bounded, a = lim supsn ∈ R and for any ε > 0 there is an n0 ∈ N such that n > n0 implies sn < a + ε. Thus for all m > n0 we have sup{sn : n > m} ≤ a + ε. Of course, this yields sup{sn : n > m} ≤ a as m > n0 and then limm→∞ sup{sn : n > m} ≤ a. For the reversed inequality, let m ∈ N be fixed. Then for any ε > 0 there exists n0 > m such that sn0 > a − ε. Hence sup{sn : n > m} ≥ a. This is valid for any m ∈ N. So, a ≤ limm→∞ sup{sn : n > m}. (ii) If limn→∞ sn = s ∈ R exists, then all subsequences of {sn }n∈N have limit s. Thus S = {s} and consequently lim supsn = s = lim infsn . Conversely, assume lim supsn = s = lim infsn . If {sn }∞ n=1 has no limit s, then there is an ε0 > 0 and ∞ a subsequence (snk )k∈N of {sn }n=1 such that |snk − s| ≥ ε0 . Since {snk }∞ k=1 is bounded, the Bolzano-Weierstrass property implies that this subsequence has a subsequence which converges to t ∈ R. Of course, |t − s| ≥ ε0 . This indicates t ∈ S and so supS and infS cannot both be x, contradicting the assumption.

2 Riemann Integrals 2.1 (i) Given ε > 0, choose δ = 2−1 ε. If P is any partition of [0, 1], then in S( f , P, ξ) there are at most two nonzero terms (2−1 could be a partition point, say x j = 2−1 , and we could have ξ j = x j = ξ j+1 ). Thus kPk < δ implies 0 ≤ S( f , P, ξ) ≤ 2kPk < ε. (ii) Given ε > 0, there are at most 2ε−1 of the numbers k−1 that satisfy k−1 > 2−1 ε. Thus in any sum P( f , ξ), those numbers contribute at most (2/ε)2(supx∈[0,1] | f (x)|)kPk = 4ε−1 kPk. The remaining terms in S( f , P, ξ) contribute at most supx∈[0,1] | f (x)| times the total length of those subintervals in [0, 2−1ε]. Therefore 0 ≤ S( f , P, ξ) ≤ (4ε−1)kPk + (2−1 ε). Define δ = 2−3 ε2 to get |S( f , P, ξ)| < ε whenever kPk < δ. (iii) Let P be a partition of [0, 2] and suppose x j ≤ 1 ≤ x j+1 . In S( f , P, ξ), if k < j then f (ξk ) = 1; and if k > j + 1 then f (ξk ) = −2. Thus, if ∆xk = xk − xk−1 , then j−1

S( f , P, ξ) =

n

∑ ∆xk + f (ξ j )∆x j + f (ξ j+1 )∆x j+1 − 2 ∑

k=1

∆xk

k= j+2

= x j−1 + f (ξ j )∆x j + f (ξ j+1 )∆x j+1 − 2(2 − x j+1 ).

340

Jie Xiao Given ε > 0, choose δ = 2−2 ε. Then for kPk < δ, one has

|S( f , P, ξ) + 1| = |x j−1 − 1 + f (ξ j )∆x j + f (ξ j+1 )∆x j+1 − 2(1 − x j+1 )| ≤ kPk + kPk + 2kPk = 4kPk < ε.

(iv) Let P be a partition of [0, 2] and suppose x j ≤ 1 ≤ x j+1 . In S( f , P, ξ), if k < j then f (ξk ) = 2; and if k > j + 1 then f (ξk ) = 1. Thus, if ∆xk = xk − xk−1 , then j−1

n

S( f , P, ξ) = 2 ∑ ∆xk + f (ξ j )∆x j + f (ξ j+1 )∆x j+1 + k=1



∆xk

k= j+2

= 2x j−1 + f (ξ j )∆x j + f (ξ j+1 )∆x j+1 + (2 − x j+1 ). Given ε > 0, choose δ = ε/(14). Then for kPk < δ, one has |S( f , P, ξ) − 3| = |2x j−1 + f (ξ j )∆x j + f (ξ j+1 )∆x j+1 + 2 − x j+1 − 3|

≤ | f (ξ j )∆x j | + | f (ξ j+1 )∆x j+1 | + |x j−1 − 1| + |x j−1 − x j+1 |

≤ 5kPk + 5kPk + 2kPk + 2kPk = 14kPk < ε.

−1 n 2.2 Let P = {0, 1/n, 2/n, ...,1} and Rµk = k/n; then S( f , RP, ξ) √= n ∑k=1 f (k/n). 1 1 And by the geometric definition of 0 f (x) dx, one has 0 1 − x2 dx = π/4.

2.3 f is integrable on [−1, 0]; but f is unbounded on both [0, 1] and [−1, 1], so it is not integrable there. 2.4 This follows immediately from Corollary 2.2.2 2.5 Since f is increasing,   n k−1 k−1 k k−1 n−1 mk = f (xk−1 ) = and L( f , Pn ) = ∑ − = . n n n 2n k=1 n Similarly, Mk = f (xk ) =

  n k k k k−1 n+1 and U( f , Pn ) = ∑ − = . n n n n 2n k=1

Thus U( f , Pn) − L( f , Pn ) = n−1 , and it is clear that for any ε > 0 we can choose nR > ε−1 and get U( f , Pn ) − L( f , Pn) < ε. Therefore f is integrable on [0, 1]. Also 1 −1 0 f dx = limn→∞ U( f , Pn ) = 2 .

341

Hints or Solutions 2.6 Define f + (x) = max{ f (x), 0} and f −(x) = − min{ f (x), 0}. Then f (x) = f +(x) − f − (x); | f (x)| = f +(x) + f − (x). So, f ∈ R[a, b] implies f ± ∈ R[a, b] and hence | f | ∈ R[a, b] with Z b Z b Z b Z b f (x) dx ≤ f + (x)dx + f − (x) dx = | f (x)|dx. a

a

a

a

For the converse, taking f (x) = 1 when x ∈ [a, b] ∩ Q and f (x) = −1 otherwise, one gets that | f | = 1 but f ∈ / R[a, b].

2.7 Let f ∈ R[a, b]. Then for any partition P of [c, d] then P0 = P ∪ {a, b} is a partition of [a, b], and U( f , P) − L( f , P) ≤ U( f , P0) − L( f , P0 ) and hence f ∈ R[c, d]. Conversely, let f ∈ R[c, d] for any [c, d] ⊂ (a, b). Then for any ε > 0 we may take [c, d] such that a < c ≤ a + ε/(6K) and b − ε/(bK) ≤ d < b, where K = supx∈[a,b] | f (x)| + 1. Because f is integrable on [c, d], one can choose a partition P∗ of [c, d] such that U( f , P∗) − L( f , P∗ ) < 3−1 ε. Now let P = {a} ∪ P∗ ∪ {b}. Then P is a partition of [a, b], and U( f , P) − L( f , P)

= (M1 − m1 )(c − a) +U( f , P∗ ) − L( f , P∗ ) + (Mn − mn )(b − d) ≤ 2K(c − a) +U( f , P∗ ) − L( f , P∗ ) + 2K(b − d) < 2K(ε/(6K)) + (3−1ε) + 2K(ε/(6K)) = ε.

Hence f ∈ R[a, b]. 2.8 This follows the Cauchy-Schwarz inequality: Z π√ 0

x sinxdx ≤

Z

0

π

2−1  Z π 2−1 x dx sinx dx = π. 0

2.9 Since ( f + g)2 = f 2 + 2 f g + g2 , by the Cauchy-Schwarz inequality Z b a

≤ =

2 f (x) + g(x) dx

Z b

2−1  Z b 2−1 Z b f 2 (x) dx g2 (x) dx + g2 (x) dx a a a 2−1  Z b 2−1 2 2 2 f (x) dx + g (x) dx ,

f 2 (x) dx + 2

a

 Z

b a

Z

b

a

342

Jie Xiao

as desired. 2.10 It is easy to see that if f (x) ≥ δ > 0 then inf(1/ f ) = 1/(sup( f )) and sup(1/ f ) = 1/(inf( f )) and hence n

U(1/ f , P) − L(1/ f , P) =

 M f ,k − m f ,k (xk − xk−1 ) ≤ δ−2 U( f , P) − L( f , P) . k=1 M f ,k m f ,k



This implies 1/ f ∈ R[a, b].

2.11 Given ε > 0, choose δ = 2−2 ε2 and x1 , x2 ∈ [0, ∞). If both x1 , x2 < 2−2 ε2 , √ √ √ √ then |x1 − x2 | < δ and | x1 − x2 | ≤ x1 + x2 < ε. If either x1 ≥ 2−2 ε2 or √ √ x2 ≥ 2−2 ε2 , then x1 + x2 ≥ 2−1 ε. Thus √ √ √ √ |x1 − x2 | < δ ⇒ | x1 − x2 | = |x1 − x2 |( x1 + x2 )−1 < ε. p √ √ We note that an alternate proof exists using |x1 − x2 | ≤ x1 + x2 and the choice of δ = ε2 . 2.12 Suppose that f is unbounded on (a, b). For each n ∈ N choose sn ∈ (a, b) such that | f (sn)| ≥ n. Then {sn } has a convergent subsequence {sn(k)}, which means that {sn(k)} is a Cauchy sequence in (a, b), while the unbounded sequence { f (sn(k))} is not a Cauchy sequence. Hence f is not uniformly continuous on (a, b). 2.13 Taking x1,n = (2nπ)−1 and x2,n = (2nπ + 2−1 π)−1 , we have |x1,n −x2,n | → 0 as n → ∞ but | f (x1,n ) − f (x2,n )| = 1 > ε for any small enough number ε > 0. 2.14 The function f is piecewise linear and is therefore continuous and bounded −1 on [0, k − 2−1] \ { j}k−1 j=1. By defining a continuous function g on [0, k − 2 ] and −1 g = f on [0, k − 2−1 ] \ { j}k−1 j=1, we see that f is in R[0, k − 2 ]. 2.15 Define g(x) = x−1 sinx or g(x) = 1 if x 6= 0 or x = 0. Then g is continuous on [0, 5] and so g ∈ R[0, 5]. Consequently, f ∈ R[0, 5]. 2.16 Since f ∈ R[a, b], f is bounded and F(x) exists for any x ∈ [a, b]. The continuity of F follows from Z x |F(x) − F (x0 )| = f (t) dt ≤ sup | f (x)||x − x0 |. x0

x∈[a,b]

2.17 If x1 < x2 , then there is a µ ∈ [a, b] such that F(x2 ) − F(x1 ) =

Z x2 x1

f (t) dt ≥ min f (x)(x2 − x1 ) = f (µ)(x2 − x1 ) > 0 x∈[a,b]

343

Hints or Solutions

since f ∈ C[a, b] and f > 0.  2.18 Because f 2/(πk) = sin(πk/2) has no limit as k → ∞, f is not continuous at 0, and of course not continuous on [0, 1/π]. But f is bounded on [0, 1/π] and continuous on [ε, 1/π] for every ε > 0. So f ∈ R[0, 1/π] by Problem 2.7. 2.19 1∞ x p dx is convergent if and only if p < −1 and and only if p > −1. R

R1 p 0 x dx is convergent if

2.20 (i) convergent; (ii) convergent; (iii) divergent; (iv) divergent: ( ln1x )0 = R2 x − x(ln1x)2 , and limt→1+ −1 lnt = −∞. This implies that 1+ (ln x)2 dx is divergent. Note R∞ x that 2 (ln x)2 dx is convergent, but this not sufficient to infer convergence over [1, ∞). 2.21 Use (lnx)0 = 1/x for x > 0. 2.22 Let F(x) = x∞ e−a|x−y|−b|y|dy. Then f ∗ g(x) = F(x) + F(−x) and it suf1 fices to evaluate F(x). If x ≥ 0 then F(x) = (a+b)e bx and if x < 0 then we have R

two subcases: Firstly, a 6= b ⇒ F(x) = ax −xeax + e2a .

eax eax −ebx a+b + b−a ;

Secondly, a = b ⇒ F(x) =

2.23 Since the polynomials are dense in C[0, 1], 01 f (x)p(x) dx = 0 for any polynomial p and consequently, for any ε > 0 there is a polynomial p such that maxx∈[0,1] | f (x) − p(x)| < ε and hence R

Z 1

2

f (x) dx =

0

implying

Z 1 0



f (x) − p(x) f (x) dx ≤ ε

R1 2 0 f (x) dx = 0 and so f = 0.

Z 1 0

| f (x)|dx

2.24 Substitute u = e−t and du = −e−t dt which gives that − ln u = t. Hence Z 1− 0+

−1 −2−1

(lnu )

du =

Z ∞ 0+

−1

e−t t −2 dt = Γ(2−1 ) =



π.

√ 2.25 If n = 0 then Γ(2−1 ) = π as obtained above. Suppose that the formula is valid for n = k. When n = k + 1, we have −1 √ Γ(k + 1 + 2−1 ) = (k + 2−1 )Γ(k + 2−1 ) = (2(k + 1))! π 4k+1 (k + 1)! .

as desired. The induction concludes the argument.

344

Jie Xiao

2.26 (i) This follows from calculations.  easy R R (ii) L a f (x) + bg(x) = a 0∞+ e−xt f (t) dt + b 0∞+ e−xt g(t) dt = aL( f (x)) + bL(g(x)). R (iii) L(eax f (x)) = 0∞+ e−(x−a)t f (t) dt = L( f (x − a)). (iv) Integration by parts yields Z b

0+

−xt 0

−xb

f (t) dt = e

e

f (b) − f (0) +

Z b

0+

xe−xt f (t) dt.

Since limb→∞ e−xb f (b) = 0, one has L( f 0 (x)) =

Z ∞ 0+

e−xt f 0 (t) dt = − f (0) +

Z ∞ 0+

xe−xt f (t) dt = − f (0) + xL( f (x)).

 (v) This follows from that L(xn eax) = L (x − a)n =

to L(xn ) =

n! , xn+1

n! (x−a)n+1

x > 0.

for n ∈ N, thanks

3 Riemann-Stieltjes Integrals y

3.1 Note that the total variation of a monotonic function is given by Vx f = | f (y) − f (x)|. So we can find Vab f in general by partitioning [a, b] into subintervals on which f is monotonic. (i) V03π f

=

| f (π/2) − f (0)| + | f (3π/2) − f (π/2)|

+ | f (5π/2) − f (3π/2)| + | f (3π) − f (5π/2)| = 6. (ii) f 0 (x) = 6x(x − 1) = 0 implies x = 0 and x = 1. Hence 2 V−1 f = | f (0) − f (−1)| + | f (1) − f (0)| + | f (2) − f (1)| = 11.

(iii) If 1/(n + 1) < δ < 1/n, then on [δ, 1] the only contributions to Vδ1 f occur at x = 1/(k + 1), k = 1, ..., n − 1, where  1   1   1   1  −f − εk + f + εk − f f = 2|ak|. k+1 k+1 k+1 k+1

Here εk > 0 is such that

1/(k + 2) < 1/(k + 1) − εk < 1/(k + 1) < 1/(k + 1) + εk < k−1 .

Hints or Solutions

345

∞ 1 Therefore Vδ1 f = ∑n−1 k=1 2|ak |. Letting δ → 0, we get V0 f = ∑k=1 2|ak |.

3.2 First of all, let us prove that if f is monotonic on [a, b] then v f (x) = | f (x) − f (a)| for x ∈ [a, b]. It suffices to prove this for increasing f . When P is any partition of [a, x], one has n

S( f , P) =

n

∑ | f (xk ) − f (xk−1 )| = ∑ ( f (xk ) − f (xk−1)) = f (x) − f (a).

k=1

k=1

Hence v f (x) = sup{S( f , P) : partitions P of [a, x]} = | f (x) − f (a)|. Since sinx increases on [0, π/2] and [3π/2, 2π] and decreases on [π/2, 3π/2], if x ∈ [0, π/2] then v f (x) = sinx, if x ∈ [π/2, 3π/2] then π/2

v f (x) = V0

x f +Vπ/2 f = 2 − sin x,

and if x ∈ [3π/2, 2π] then π/2

v f (x) = V0

3π/2

x f = 4 + sin x. f +Vπ/2 f +V3π/2

3.3 This follows from L(P) = ∑nk=1 (x(tk ) − x(tk−1))2 + (y(tk ) − y(tk−1))2

1 2

.

3.4 If C( f 1) and C( f 2 ) are the sets of all continuous points of f 1 and f 2 re− spectively, then for x0 ∈ C( f 1 ) we have f 1 (x+ 0 ) = f 1 (x0 ) = f 1 (x0 ). Since D is dense in a, b, there are points x j ∈ D such that x j ≤ x0 and lim j→∞ x j = x0 , and consequently, − f 1 (x− 0 ) = lim f 1 (x j ) = lim f 2 (x j ) = f 2 (x0 ). j→∞

j→∞

+ In a similar manner, we have f 1 (x+ 0 ) = f 2 (x0 ) and x0 ∈ C( f 2 ). Accordingly, C ( f1) ⊆ C( f2) ⊆ C( f1).

3.5 Let f = f 1 and g = f 2 . Within the kth term of the sum n

S(g f , P) =

∑ |g(xk) f (xk ) − g(xk−1) f (xk−1)|

k=1

add −g(xk ) f (xk−1 ) + g(xk ) f (xk−1), and factor by pairs. This leads to n

S(g f , P) ≤

n

∑ |g(xk )| f (xk) − f (xk−1 )| + ∑ | f (xk−1)|g(xk−1) − g(xk−1)|,

k=1

k=1

346

Jie Xiao

and so S(g f , P) ≤ Mg S( f , P) + M f S(g, P), where Mg = supx∈[a,b] |g(x)| and M f = supx∈[a,b] | f (x)|. 3.6 Refer to Example 3.1.4 3.7 Let P = {xk }nk=0 be any partition of [a, b]. Then n

n

∑ | f (xk ) − f (xk−1)| =

k=1

| f j (xk ) − f j (xk−1)| ∑ lim j→∞

k=1

n

= ≤



| f j (xk ) − f j (xk−1)| k=1 sup Vab ( f j ) < ∞, j∈N lim

j→∞

and hence f is of bounded variation on [a, b]. 3.8 (i) Since [x] is a step function, each sum reduces to those terms consisting of x2 evaluated near the jump points times the jump of [x] at such an x. Thus Z 4 0

x2 d([x]) = 12 (1 − 0) + 22 (2 − 1) + 32 (3 − 2) + 42 (4 − 3) = 30.

(ii) 01 x3 d(x2 ) = 2 01 x4 dx = 2/5. R (iii) ac f dg = f (b) g(b+) − g(b) = f (b)(1 − 0) = f (b). R

R

3.9 It is enough to prove Theorem 3.2.4 (i). Z b a

 f 1 (x) + f 2 (x) dg1 (x) n

∑ kPk→0

= lim

= lim

kPk→0

=

Z b a

3.10

k=1  n

  f 1 (ξk ) + f 2 (ξk ) g1 (ξk ) − g1 (ξk−1 )



k=1

f 1 dg1 +

Z π/2 0

n   f 1 (ξk ) g1 (ξk ) − g1 (ξk−1 ) + ∑ f 2 (ξk ) g1 (ξk ) − g1 (ξk−1 ) k=1

Z b a

f 2 dg1 .

x d(sinx) = π/2 −

Z π/2 0

sin x dx = π/2 − 1.

Hints or Solutions

347

3.11 If P is a partition of [a, b] Rsuch that c 6∈ P, then the Riemann-Stieltjes sums R are the same for ab f dh as for ab f dg. Consider a partition P of [a, b] such that c = xm ∈ P. With the abbreviation ∆hk = h(xk ) − h(xk−1 ), we have n

∑ k=1

n

f (tk )∆hk − ∑ f (tk )∆gk k=1

= f (tm)∆hm + f (tm+1 )∆hm+1 − f (tm )∆gm − f (tm+1 )∆gm+1   = f (tm) h(c) − g(c) + f (tm+1 ) g(c) − h(c)   = f (tm) − f (tm+1 ) h(c) − g(c) .

As kPk → 0, tm and tm+1 both tend to c, and the continuity of f yields a limit of R 0. Hence, the Riemann-Stieltjes sums for ab f dh approach the same limit as the R Riemann-Stieltjes sums for ab f dg. Note that this assertion fails if c is allowed to be either endpoint: e.g., on [0, 1] take f (x) = 1, g(x) = 0 and  0 , x ∈ [0, 1) . h(x) = 1, x=1 This gives

R1 0

f (x) dg(x) = 0 and

R1 0

f (x) dh(x) = 1.

3.12 If f has bounded variation on [a, b], then it is Riemann integrable there. The converse is false, however, which can be seen by function  π x sin 2x , x 6= 0 f (x) = . 0 , x=0 Clearly, this continuous (and therefore integrable) function f fails to have bounded variation. 3.13 Just choose f = 1 on [0, 1] and  1 , x ∈ (0, 1] g(x) = . 0, x=0 3.14 (i) Let P = {xk }nk=0 be a partition of [a, b] and ξk ∈ [xk−1, xk ] for k ∈

348

Jie Xiao

{1, 2, ..., n}. Then

n  ∑ f (ξk ) g(xk ) − g(xk−1 ) ≤ k=1

≤ ≤

n

∑ | f (ξk )||g(xk) − g(xk−1)|

k=1 n

∑ | f (ξk )|

k=1

vg (xk ) − vg (xk−1 )



n  sup | f (x)| ∑ vg (xk ) − vg (xk−1 ) ,

x∈[a,b]

k=1

thereby giving the required inequalities. (ii) and (iii) These two limits follow from (i) above. 3.15 (i) This follows from 3.14 (i). (ii) For x, x0 ∈ [a, b], use Z x f (t) dg(t) ≤ sup | f (t)||g(x) − g(x0)|. |F(x) − F(x0 )| = x0

t∈[a,b]

(iii) For x, x0 ∈ [a, b] with x 6= x0 , using Theorem 3.4.3 (i) we get a point ξ between x0 and x such that F(x) − F(x0 ) = and consequently,

Z x x0

 f (t) dg(t) = f (ξ) g(x) − g(x0 ) ,

F 0 (x0 ) = lim f (ξ)g0(x0 ) = f (x0 )g0(x0 ) x→x0

provided that f is continuous at x0 and g is differential at x0 . (iv) By (i) above it follows h ∈ RSF [a, b] because of h ∈ C[a, b]. To get that formula, let P = {xk }nk=0 be a partition of [a, b] and ξk ∈ [xk−1 , xk ] for k = 1, 2, ..., n. Then Z xk n n  f (x) dg(x) ∑ h(ξk ) F(xk ) − F(xk−1) = ∑ h(ξk ) k=1

and

k=1

Z b a

n

h(x) f (x) dg(x) =



Z xk

k=1 xk−1

xk−1

h(x) f (x) dg(x).

Hints or Solutions

349

Suppose that M = 1 + supx∈[a,b] | f (x)|. Since h is uniformly continuous on [a, b], for any ε > 0 there is a δ > 0 such that ε . x, y ∈ [a, b] and |x − y| < δ ⇒ |h(x) − h(y)| < η = 2M(1 +Vab g) Accordingly, kPk < δ, Problem 3.14 and Lemma 3.4 (i) give Z b n  h(x) f (x) dg(x) − h(ξ ) F(x ) − F(x ) k k−1 ∑ k a

Z ≤ ∑ ≤

k=1

n

xk

k=1 n

xk−1

∑ ηMVxx

 h(x) − h(ξk ) f (x) dg(x)

k k−1

g = ηMVab g = 2−1 ε < ε,

k=1

and the desired formula. 3.16 If g ∈ C[0, 1] and 01 f (x) dg(x) = 0 for all increasing functions f on [0, 1], then g must be a constant. In fact, first, f = 1 implies g(0)  = g(1); R second, integrating by part, we have 01 g(x)d f (x) = f (1) − f (0) g(0) and so   n−1 R1 R1 n dx = 0 g(x) − g(0) d f (x) = 0. If f (x) = x , n ∈ N, then 0 g(x) − g(0) x 0. Since f ∈ C[0, 1], g(x) − g(0) = 0 follows from Problem 2.23. R

3.17 Just integrate by part and use Remark 3.3.5. 3.18 (i) Modify the argument for Theorem 2.2.8 with dg(x) replacing dx; (ii) Expand the left-hand integral; (iii) Write the inside integral of the left-hand side of the formula in (ii) as R R y≥x + y 0; (iii) Use the substitution t = x−1 and Example 3.5.10.

4 Lebesgue-Radon-Stieltjes Integrals    4.1 mg (0, 1) = 1 − e−1 ; mg [0, 1] = 3 − e−1 ; mg (−1, 1) = 4 − e−1 ; mg [0, 0] = 1 − e−1 .

350

Jie Xiao

1 2 4.2 (i) Let S1 = ∪nk=1 I1,k and S2 = ∪nk=1 I2,k be disjoint simple subsets of R, where n1 n2 1 +n2 {I1,k }k=1 and {I2,k }k=1 are two sets of disjoint intervals. Then S1 ∪S2 = ∪nk=1 Ik where Ik = I1,k for k = 1, 2, ..., n1 and Ik+n1 = I2,k for k = 1, 2, ..., n2. Hence

n1 +n2

mg (S1 ∪ S2 ) =



mg (Ik ) = mg (S1 ) + mg (S2 ).

k=1

But, let g(x) = x. Then S1 = [−1, 2−1] and S2 = [−2−1 , 1] yield S1 ∩ S2 6= 0/ and mg (S1 ∪ S2 ) = 2 6= 3 = mg (S1 ) + mg (S2 ). Meanwhile, S1 = [−1, 0] and S2 = [0, 1] give S1 ∩ S2 = {0} and mg (S1 ∪ S2 ) = 2 = mg (S1 ) + mg (S2 ). (ii) In this case S2 \ S1 is a simple subset of R and S2 = S1 ∪ (S2 \ S1 ). Thus (i) is used to produce mg (S2 ) = mg (S1 ) + mg (S2 \ S1 ) and the required equality. But, under g(x) = x, S1 = (1, 3) and S2 = [2, 3] ⇒ mg (S2 \ S1 ) = 0 6= −1 = mg (S2 ) − mg (S1 ), and S1 = {0} and S2 = (0, 2] ⇒ mg (S2 \ S1 ) = 2 = mg (S2 ) − mg (S1 ). 4.3 (i) Yes, Ag (s1 ) = 1. (ii) Yes, Ag (s2 ) = 0. (iii) Yes, Ag (s3 ) = 3. (iv) Yes, Ag (s4 ) = −3. (v) No. 4.4 Write (0, 1) ∩ Q as {r j }∞j=1 and define  0 , x ∈ [0, r j )    1 , x ∈ [r j , r j ] s j (x) = . 0 , x ∈ (r j , 1]    0 , x ∈ R \ [0, 1]

If f k = ∑kj=1 s j , then it is a step function on R, but f = limk→∞ f k is not a step function on R since it cannot be described via taking constant values on any finite set of subintervals of [0, 1]. Actually, if x is irrational in (0, 1); equals 0 or 1; or belongs to R \ [0, 1], then f k (x) = 0 = f (x). If x ∈ (0, 1) equals some rn0 , then f k (rn0 ) = 1 = f (rn0 ) when k ≥ n0 . Note that Ag (s j ) = r j − r j = 0. So Sg ( f ) = 0. 4.5 (i) Z b 0

f (x) dx =

(

b−[b] 1+[b] 1+[b]−b 1+[b]

, [b] ≤ b < [b] + 21

, [b] + 21 ≤ b < [b] + 1

.

351

Hints or Solutions −1 R (ii) Use (i) to verify | 0b f (x) dx| ≤ 2(1 + [b]) . (iii) Note that if [b] > 1 then Z b 0

[b]

| f (x)|dx =

b − [b]

∑ k−1 + 1 + [b] .

k=1

4.6 Use Theorem 4.2.3 (ii). 4.7 (i) Use Theorems 4.2.3 (ii) and 4.2.1. (ii) Use mg1 +g2 (I) = mg1 (I) + mg2 (I) for any finite interval I ⊆ R. 4.8 (i) Use 1E1 ∪E2 ≤ 1E1 + 1E2 for E1 , E2 ⊆ R. (ii) E = {0} is not an mg -null set where  1 , x ∈ (−∞, 0) g(x) = . 0 , x ∈ [0, ∞) (iii) Use Theorem 4.2.3 (ii) and the definition of mg -a.e. on R. 4.9 (i) Use Theorem 4.3.1 and 4.2.3 (ii) for ∑nk=1 f k . (ii) Use Theorem 4.3.2. 4.10 (i) f = 0. R R (ii) limn→∞ R f n (x) dmid (x) = 1 6= 0 = R f (x) dmid (x). (iii) { f n }∞ n=1 is neither increasing nor decreasing. x

4.11 (i) Since 1 − nx ≤ e− n , the following sequence of functions  n 1 − nx , x ∈ [0, n] f n (x) = 0 , x ∈ (n, ∞) ∪ (−∞, 0)

is increasing and bounded from above by 1R e−x . The monotone convergence theorem ensures the desired limit formula. (ii) By the dominated convergence theorem we have Z

R

1(0,∞) (x)|x|α(ex − 1)−1 dmid (x) =

Z

R



1(0,∞)(x)|x|α ∑ e− jx dmid (x)



j=1

Z

=



=

∑j

1(0,∞)(x)|x| j=1 R ∞ −(1+α) j=1

α − jx

e

Γ(1 + α).

dmid (x)

352

Jie Xiao

4.12 (i) For each n ∈ N let f n (x) =



0 , x ∈ (−∞, n−1/p ) . x−p , x ∈ [n−1/p , 1]

−p for Then { f n }∞ n=1 is increasing and limn→∞ f n (x) = f (x) = 1(0,1] (x)|x| each x ∈ R. By the monotone convergence theorem we have that R −p dm (x) equals 1 (x)|x| id (0,1] R  Z (1 − p)−1 , p ∈ (0, 1) lim 1(0,1](x) f n (x) dmid (x) = . n→∞ R ∞ , p ∈ [1, ∞)

(ii) For each n − 1 ∈ N we have the following two estimates: Case 1: x ∈ (0, 1) implies  −1 x −n −1 −1 x−n 1 + ≤ x−n ≤ x−2 . n

Case 2: x ∈ [1, ∞) implies  −1 x −n  n − 1 2 −1 x−n 1 + ≤ 1+x+ x ≤ 4x−2 . n 2n

Now, if

 , x ∈ (−∞, 0]  0 −1 h(x) = , x−2 , x ∈ (0, 1)  −2 4x , x ∈ [1, ∞)

then Z

R

1(0,∞)(x)h(x)dmid (x) =

Z 1

−1

x−2 dx + 4

0

Z ∞

x−2 dx = 6.

1

Using the dominated convergence theorem we obtain Z  −1 |x| −n lim 1(0,∞)(x)|x|−n 1 + dmid (x) n→∞ R n Z  −1 |x| −n = 1(0,∞)(x) lim |x|−n 1 + dmid (x) n→∞ n ZR =

R

1(0,∞)(x)e−x dmid (x) = 1.

4.13 (i) If x > 0 then f (x) =





n=1

n=1

∑ e−nx − 2 ∑ e−2nx =

e−x − e−2x . 1 − e−2x

353

Hints or Solutions

(ii) Since 0∞ | f n (x)|dx and 0∞ | f (x)|dx are convergent in the sense of improper Riemann integral, they are Lebesgue integrable on (0, ∞). R R (iii) R 1(0,∞) (x) f (x)dmid (x) 6= 0 = ∑∞ n=1 R 1(0,∞) (x) f n (x)dmid (x). R

R

4.14 (i) N1 = 1/4; N2 = 0; N3 = 3/4; N4 = 1. (ii) We have N2 ≤ N1 ≤ N3 ≤ N4 which is a version of Fatou’s lemma.

4.15 Assume limx→∞ f (x) 6= 0. Then there is an ε0 > 0 such that for each n ∈ N there is an xn ∈ [n, ∞) obeying | f (xn )| ≥ ε0 . Since f is uniformly continuous on (0, ∞), there exists a δ > 0 such that | f (t1 ) − f (t2)| < 2−1 ε0 whenever t1 ,t2 ∈ (0, ∞) and |t1 − t2 | < δ. Accordingly, | f (x) − f (xn )| < 2−1 ε0 whenever x ∈ (xn − δ, xn + δ). Consequently, | f (x)| ≥ | f (xn )| − 2−1 ε0 ≥ 2−1 ε0 . Note that ∞ xn ≥ n. So there is a subsequence (xnk )∞ k=1 of {xn }n=1 such that xnk ≤ xnk+1 and |xnk+1 − xnk | > 2δ for all k ∈ N. If Ik = (xnk − δ, xnk + δ), then f ∈ LRSg (0, ∞) implies ∞>

Z

(0,∞)

| f |dmg ≥

Z



∪∞ k=1 Ik

| f |dmg =



Z

k=1 Ik



| f |dmg ≥ 2−1 ε0 ∑ mg (Ik ) = ∞, k=1

a contradiction. 4.16 (i) For any sequence yk → y0 ∈ [c, d], apply Theorem 4.3.1 to f k (·) = f (·, yk ). (ii) Let yk → y0 ∈ [c, d], so for x ∈ [a, b], hk (x) =

f (x, yk ) − f (x, y0 ) ∂ f (x, y) → . yk − y0 ∂y

By the mean value theorem for derivative, we obtain ∂ f (x, y) |hk (x)| ≤ sup ≤ h(x) ∂y y∈[c,d]

thereby deriving the required formula via Theorem 4.3.1. 4.17 It is enough to consider the case ∑∞j=1 mg (E j ) < ∞. Apply the monotone

354

Jie Xiao

convergence theorem with f n = ∑nj=1 1E j to obtain mg (∪∞j=1 E j ) =

Z



Z

R

1∪∞j=1 E j dmg ∞

∑ 1E dmg j

R j=1

=

lim

n→∞

Z

n

∑ 1E dmg R j

j=1



∑ mg (E j).

=

j=1

4.18 If

R

E | f |dmg

< ∞ then 0

∞ >





Z

j=−∞ E j

| f |dmg + ∑

j=1 E j

=



| j|mg(E j ) + ∑ ( j − 1)mg (E j )



| j|mg(E j ) − ∑ mg (E j )



| j|mg(E j ) − mg (∪∞j=1 E j ),

j=−∞ ∞ j=−∞ ∞

=

j=−∞

and hence

j=1 ∞

j=1





j=−∞

| f |dmg



0



Z

| j|mg(E j ) ≤

Z

E

| f |dmg + mg (E) < ∞.

Conversely, if ∑∞j=−∞ | j|mg (E j ) < ∞ then Z

0 E

| f |dmg =



Z

j=−∞ E j



| f |dmg + ∑

j=1 E j



j=−∞

| j − 1|mg (E j ) + ∑ jmg (E j ) j=1



≤ ≤

| f |dmg



0



Z

0



| j|mg(E j ) +



| j|mg(E j ) + mg (E) < ∞.

j=−∞ ∞ j=−∞



mg (E j )

j=−∞

355

Hints or Solutions 4.19 If f is mg -measurable, then for any c ∈ R, {x ∈ R : f (x) ≥ c} = ∩∞j=1 {x ∈ R : f (x) > c − j −1 }

is mg -measurable. Conversely, suppose {x ∈ R : f (x) ≥ c} is mg -measurable for any c ∈ R. Then {x ∈ R : f (x) > c} = ∪∞j=1 {x ∈ R : f (x) ≥ c + j −1 } is mg -measurable and so is f . 4.20 (iii) For n ∈ N and ε > 0 let En,ε = {x ∈ R : | f n (x) − f (x)| ≥ ε}. Then εmg ({x ∈ R : | f n (x) − f (x)| ≥ ε}) ≤

Z

En,ε

| f n − f |dmg ≤

Z

R

| f n − f |dmg

gives the result. (iv) For a Cauchy sequence { f n }∞ n=1 in the mg -measure, we choose a sub−k such that if E sequence { f nk }∞ k = {x ∈ R : | f k+1(x) − f k (x)| ≥ 2 } then k=1 −k mg (Ek ) ≤ 2 and hence mg (∪∞j=k E j ) ≤





j=k

j=k

∑ mg (E j) ≤ ∑ 2− j = 21−k

and for x ∈ R \ ∪∞j=k E j and p > q > k, q−1

| f n p (x) − f nq (x)| ≤

∑ | fn

l=p

q−1 l+1

(x) − f nl (x)| ≤

∑ 2−l ≤ 21−p.

l=p

 ∞ The last estimate indicates that f nk (x) k=1 is a Cauchy sequence for x ∈ R \ ∪∞j=k E j . Now if  limk→∞ f nk (x) , x ∈ R \ ∩∞ k=1 ∪ j=k E j , f (x) = 0 , x ∈ ∩∞ ∪ k=1 j=k E j then f nk → f mg -a.e. since mg (∩∞ k=1 ∪ j=k E j ) = 0; moreover | f n p (x) − f (x)| ≤ 22−p for p ≥ k and x ∈ / R \ ∩∞ k=1 ∪ j=k E j . Consequently, f nk → f in the mg measure and then f n → f in the mg -measure since {x ∈ R : | f n (x) − f (x)| ≥ ε}

⊆ {x ∈ R : | f n(x) − f nk (x)| ≥ 2−1 ε} ∪ {x ∈ R : | f nk (x) − f (x)| ≥ 2−1 ε}.

356

Jie Xiao

The uniqueness in the sense of mg -a.e. follows from that if f n → h in the mg measure then {x ∈ R : |h(x) − f (x)| ≥ ε}

⊆ {x ∈ R : | f n (x) − h(x)| ≥ 2−1 ε} ∪ {x ∈ R : | f n (x) − f (x)| ≥ 2−1 ε}  and hence mg {x ∈ R : |h(x) − f (x)| ≥ ε} = 0 for any ε > 0. (v) By Fatou’s lemma we have 0≤

Z

R

f dmg ≤ lim inf

Z

n→∞ k≥n R

f k dmg = 0

and thus f = 0 φ-a.e. on X thanks to f ≥ 0 φ-a.e. on X. (vi) If f n = 1(n,n+1] and mg is the Lebesgue measure on R, then f n → 0 mid -a.e. on R, but f n does not tend to 0 in the Lebesgue measure. 4.21 (i) Z

R

f 1 dmid =

Z 1

−1

|x|

−p

dx = 2

Z 1 0

x−p dx = 2(1 − p)−1

and (via integration by part) Z

R

f 2 dmid = −

Z 1

−1

ln |x|dx = −2

Z 1

0+

lnxdx = 2.

(ii) Z

R

f 1 dmid =

Z ∞ 0

 mid {x ∈ R : |x|−p 1|x| t} dt

Z ∞

min{1,t −1/p}dt Z ∞   = 2 1+ t −1/p dt = 2/(1 − p)

= 2

0

1

and Z

R

f 2 dmid =

Z ∞ 0

Z  mid {x ∈ R : (− ln|x|)1|x| t} dt = 2

0



e−t dt = 2.

Hints or Solutions

357

5 Absolute Continuities in Lebesgue Integrals 5.1 (i)&(ii) follow from considering [a, b] \ E in Theorem 5.1.1. The equation of (iii) follows from (ii) and Theorem 5.1.1(iii). For the converse part, we use the fact that there are increasing open sets {O j }∞j=1 and decreasing closed sets {C j }∞j=1 such that ∩∞j=1C j ⊇ E ⊇ ∪∞j=1 O j to obtain mid (∩∞j=1 (O j \ E)) ≤ mid (∩∞j=1 (O j \ ∪∞ k=1 Ok )) = 0. Therefore E = (∩∞j=1 O j ) \ (∩∞j=1 (O j \ E)) is mid -measurable. 5.2 If E, F ⊆ R are mid -measurable, then E ∪ F = (E \ F) ∪ (E ∩ F) ∪ (F\) and E ∩ F, F \ E, E ∩ F are pairwise disjoint, and hence mid (E ∪ F ) = mid (E ∩ F) + mid (F \ E) + mid (E \ F). In general, let E, F ⊆ R and ε > 0 be sufficiently small. Then according to Theorem 5.1.3(ii) there are two open sets Oe and O f such that Oe ⊇ E and O f ⊇ F such that 0 < mid (Oe) − mid,∗ (E) < ε and 0 < mid (O f ) − mid,∗ (F) < ε. Since Oe ∩ O f ⊇ E ∩ F and Oe ∪ O f ⊇ E ∪ F, it follows that mid,∗ (E ∪ F) + mid,∗ (E ∩ F) ≤ mid (Oe ∪ O f ) + mid (Oe ∩ O f ) = mid (Oe) + mid (O f )

≤ mid,∗ (E) + mid,∗ (F) + 2ε. 5.3   m∗,id (E) = sup b − a − mid [a, b] \C : closed C ⊆ E   = b − a − inf mid [a, b] \C : closed C ⊆ E  = b − a − inf mid (O) : open O ⊇ [a, b] \ E  = b − a − mid,∗ [a, b] \ E .

358

Jie Xiao

5.4 This follows from Definition 5.1.2. 5.5 Let {a = x0 < x1 < ... < xn = x} be a partition of the interval [a, x]. If x < y, then {a = x0 < x1 < ... < xn = x < xn+1 = y} is a partition of the interval [a, y], and hence n



j=1

f (x j ) − f (x j−1 )



n+1



∑) j=1

f (x j ) − f (x j−1 )



and hV ± ixa ≤ hV ± iya .

Note that u = u+ − u− holds for all u ∈ R. So ( + −  ∑nj=1 f (x j ) − f (x j−1 ) = ∑nj=1 ) f (x j ) − f (x j−1 ) + f (x) − f (a) ; − +  ∑nj=1 f (x j ) − f (x j−1 ) = ∑nj=1 ) f (x j ) − f (x j−1 ) + f (a) − f (x) . Accordingly,

  + x − x  hV ia f ≤ hV ia f + f (x) − f (a); hV − ixa f ≤ hV + ixa f + f (a) − f (x) ;   (ii − 3) holds.

To verify ((ii − 2)), just use |u| = 2u+ − u and ((ii − 3)) to derive n

j=1

and





n

∑ f (x j ) − f (x j−1 ) = 2 ∑

j=1

f (x j ) − f (x j−1 )

+

+ f (a) − f (x)

Vax f ≥ 2hV + iba f + f (a) − f (x) = hV + ixa f + hV − ixa whose case x = b implies ((ii − 2)). For (iii), note that R b + x 0 + b   a (hV ia f ) dmid ≤ hV ia f ;  R b  − x 0 − b a (hV ia f ) dmid ≤ hV ia f ;  f 0 (x) = (hV + ixa f )0 − (hV − ixa f )0;    (ii − 2) holds. So, we get

0

f ∈ LRSid [a, b] with

Z b a

| f 0 | dmid ≤ Vab f .



359

Hints or Solutions

Moreover, if f ∈ AC[a, b] and {a = x0 < x1 < ... < xn = b} is any partition of [a, b], then Theorem 5.3.4 ensures f 0 ∈ LRSid [x j−1 , x j ] and n n Z x j 0 ∑ | f (x j) − f (x j−1 )| = ∑ f dmid j=1

≤ =

j=1 x j−1 n Z xj



j=1 x j−1

Z b a

and hence Vab f ≤

Z b a

| f 0 | dmid

| f 0 | dmid ,

| f 0 | dmid

which implies the desired equality. (iv) Since everywhere differentiable function fR enjoys f 0 ∈ LRSid [a, b], for ε > 0 there is δ > 0 such that m∗id (E) < δ ⇒ E | f 0 | dmid < ε. Now, if {[a j , b j ]}nj=1 is a family of pairwise disjoint subintervals of [a, b] with ∑nj=1 (b j − a j ) < δ, then ∞

n

∑ | f (b j) − f (a j )|

=

j=1

≤ =

∑ mid

f ([a j , b j ])



| f 0| dmid

j=1 n Z bj j=1 a j

Z

∪nj=1 [a j ,b j ]



| f 0 | dmid < ε,

and hence f ∈ AC[a, b]. 5.7 If E ⊆ [a, b] is mid -measurable, then there is a sequence {C j } of closed sets containing E with mid (E) = lim j→∞ mid (C j ), and hence there is an mid -null set Z such that E = Z ∪ (∪∞j=1C j ). This, plus f ∈ AC[a, b], implies  mid f (Z) = 0 and f (E) = f (Z) ∪ f (∪∞j=1C j ) = f (Z) ∪ (∪∞j=1 f (C j )). Since f ∈ AC[a, b] and C j is compact, f (C j ) is compact and consequently mid measurable. Accordingly, f (E) is mid -measurable.

5.7 Define g(x) = ax f 0 dmid and h = f − g. Since f is increasing on [a, b], ˜ g ∈ AC[a, b] and h0 = f 0 − g0 = 0 mid -a.e. on [a, b]. If g, ˜ h˜ obey f = g + h = g˜ + h, R

360

Jie Xiao

then (g − g) ˜ 0 = (h˜ − h)0 = 0, and hence g˜ = g + κ and h˜ = h + τ for a constant pair {κ, τ}. 5.8 Under f R∈ LRSid [a, b], the if-part is trivial, but for the only-if part we can differentiate ax f dmid = 0 to get f (x) = 0 for mid -a.e. x ∈ [a, b]. 5.9 It follows from a modification of the argument for Theorem 5.5.1(ii).

5.10 (i) follows from a direct computation. (ii) follows from the following calculation with 0 < x < 2−1 : M f 2 (x) ≥ (2x)−1

Z 2x 0

| f 2 (y)| dy ≥ (2x)−1

Z x 0

(y ln2 y)−1 dy ≥ (2x lnx)−1 .

To reach (iii). we apply the Hardy-Littlewood maximal theorem and the Fubini theorem to get Z

R

 mid {x ∈ R : M f (x) > t} t p−1 dt 0 Z ∞Z  ≤ (4p) | f | dmid t p−2 dt

(M f ) p dmid = p

Z ∞

{x∈R:| f (x)|>2−1t}

0

≤ (4p)

Z  Z 2| f | R

0

 t p−2 dt | f | dmid

−1 p+1

= p(p − 1) 2

Z

R

| f | p dmid .

6 Metric Spaces 6.1 (i) Properties of Definition 6.1.1: (i), (ii), and (iii) hold. (ii) Properties of Definition 6.1.1: (i), (ii), and (iii) hold. (iii) Properties of Definition 6.1.1: (i) and (iii) hold, but √ (ii) does not hold since taking x = (1, 2) and y = (0, 4) yields d(x, y) = 3 6= 5 = d(y, x). An effort on verifying (iii) is strongly needed. Below is the details: Given x = (x1 , x2 ), y = (y1 , y2 ), z = (z1 , z2 ). Let us consider the following cases. 2−1 Case 1. x2 ≥ z2 : This implies d(x, z) = (x1 − z1 )2 + (x2 − z2 )2 .  −1 2 2 2 Subcase 1.1. y2 ≥ z2 : This gives d(y, z) = (y1 − z1 ) + (y2 − z2 ) . And −1  2 d(x, y) equals (x1 − y1 )2 + (x2 − y2 )2 or |x1 − y1 | + |x2 − y2 |. Then by the triangle inequality for the Euclidean space (R2 , d2 ) and 2−1 |x1 − y1 | + |x2 − y2 | ≥ (x1 − y1 )2 + (x2 − y2 )2

361

Hints or Solutions

one has d(x, y) + d(y, z) ≥ d(x, z). Subcase 1.2. y2 < z2 : This yields d(y, z) = |y1 − z1 | + |y2 − z2 |. And d(x, y) 2−1 equals (x1 − y1 )2 + (x2 − y2 )2 or |x1 − y1 | + |x2 − y2 |. Then by the triangle inequalities for both the Euclidean space distance d2 and that one defined by Problem 6.1 (i), as well as ( 2−1 |y1 − z1 | + |y2 − z2 | ≥ (y1 − z1 )2 + (y2 − z2 )2 ; 2−1 2 2 |x1 − z1 | + |x2 − z2 | ≥ (x1 − z1 ) + (x2 − z2 ) , one has d(x, y) + d(y, z) ≥ d(x, z). Case 2. x2 < z2 : This infers d(x, z) = |x1 − z1 | + |x2 − z2 |.

2−1 Subcase 2.1. y2 ≥ z2 : This means d(y, z) = (y1 −z1 )2 +(y2 −z2 )2 . Note that x2 < z2 and z2 ≤ y2 imply x2 < y2 and |x2 − y2 | > |z2 − y2 |. So d(x, y) = |x1 − y1 | + |x2 − y2 |. Hence by the triangle inequality of the distance for R, d(x, y) + d(y, z) ≥ |x1 − y1 | + |x2 − y2 | + |y1 − z1 | ≥ |x1 − z1 | + |x2 − z2 | = d(x, z).

Subcase 2.2. y2 < z2 : This tells us that d(y, z) = |y1 − z1 | + |y2 − z2 | and thus 2−1 d(x, y) equals (x1 − y1 )2 + (x2 − y2 )2 or |x1 − y1 | + |x2 − y2 |. For the second case, one applies the triangle inequality of the distance for R to get d(x, y) + d(y, z) ≥ |x1 − z1 | + |x1 − z2 | = d(x, z). Concerning the first case: x2 ≥ y2 , note that d(x, y) + d(y, z) ≥ d(x, z) is equivalent to 2−1 (x1 − y1 )2 + (x2 − y2 )2 + |y1 − z1 | + z2 − y2 ≥ |x1 − z1 | + z2 − x2 which is

|y1 − z1 | + x2 − y2 ≥ |x1 − z1 | − (x1 − y1 )2 + (x2 − y2 )2 Since (x1 − y1 )2 + (x2 − y2 )2 one has |x1 − z1 | − (x1 − y1 )2 + (x2 − y2 )2

2−1

2−1

2−1

.

≥ |x1 − y1 |,

≤ |x1 − z1 | − |x1 − y1 | ≤ |y1 − z1 |.

362

Jie Xiao

Note that x2 ≥ y2 , so the desired estimate follows. 6.2 Suppose that S is the set of all positive integers m ≥ 2 such that d(x(1) , x(m)) ≤

m

∑ d(x( j−1), x( j) ) j=2

for any set of m points in X. For m = 2 the result is trivial, and for m = 3 the assertion is the triangle inequality for (X, d). Assume m ∈ S and let {x( j) }m+1 j=1 be any m + 1 points in X. By the triangle inequality for (X, d), and the above assumption, one has d(x(1), x(m+1)) ≤ d(x(1) , x(m) ) + d(x(m) , x(m+1)) ≤

m+1

∑ d(x( j−1), x( j) ). j=2

Therefore m + 1 ∈ S, and the mathematical induction ensures that S = N \ {1}.

6.3 (i) It is enough to check the second result. Let z(0) = (x1 , x2 , ..., xn) = x, z(1) = (y1 , x2 , ..., xn), z(2) = (y1 , y2 , ..., xn),..., and z(n) = (y1 , y2 , ..., yn) = y. Then for j = 1, 2, ...n, one has dn∗ (z( j−1), z( j)) = |x j − y j | = dn (z( j−1), z( j)), and by adding these equations, one gets dn∗ (z(0), z(n)) =

n

∑ dn∗ (z( j−1), z( j) ) ≥ dn(z(0), z(n)). j=1

(ii) This is a routine argument. Clearly, d∗,n (x, y) ≤ dn (x, y) ≤



nd∗,n (x, y).

6.4 First, let’s prove that A◦ is the largest open subset of A. If O ⊆ A is open, then any point of O is an interior point of A, and so this point belongs to A◦ . This gives O ⊆ A◦ . Second, let K ⊇ A be closed. If x ∈ A then it suffices to prove that x ∈ K when x is a limit point of A. In this case, x is a limit point of K. But since K is closed, x ∈ K. We are done. / Choose a 6.5 Let x ∈ B1 (0) and ε > 0. We must show that Bε (x) ∩ B1 (0) 6= 0. number c such that 1 − ε < c < 1, and consider the point z = cx = (cx1 , cx2 ); this gives d2 (z, 0) = cd2 (x, 0) < 1, so x ∈ B1 (0). Also, d2 (x, z) = (1 − c)d2 (x, 0) < ε, so x ∈ Bε (x). Hence, every point of B1 (0) is in B1 (0), the closure of B1 (0). Conversely, if y ∈ / B1 (0), then d2 (y, 0) > 1, and if δ = d2 (y, 0) − 1, then by the

Hints or Solutions

363

/ Hence, only the points of B1 (0) are in triangle inequality, Bδ (y) ∩ B1(0) = 0. B1 (0). 6.6 Let p ∈ S and define S1 = {p} and S2 = S \ {p}. Since finite sets have no limit points, it follows that the definition of disconnectness is satisfied. / 6.7 This follows from the facts that G1 ∩ G2 = 0/ and G2 ∩ G1 = 0. 6.8 (i) The sequence is convergent to (1, 1); (ii) The sequence is convergent to (0, 0); (iii) The sequence is divergent since k → ∞.

6.9 For k ∈ N take xk = (2k)−1 . Since xk → 0 but 0 ∈ / (0, 1).

6.10 (i) If A = {x(k)}∞ k=1 is bounded then there is an M > 0 such that dn (x, 0) ≤ (k) M for all x = (x1 , ..., xn) ∈ A and hence sup1≤ j≤n supk∈N |x j | ≤ M. The converse is trivial. (ii) This follows easily from the inclusion between cubes and balls in Rn . 6.11 Assume that {x(k)}∞ k=1 is a Cauchy sequence and apply the definition of Cauchy sequence with ε ∈ (0, 1). For this distance function, d(x, y) < 1 implies d(x, y) = 0 and hence x = y, so we have that k > m > n0 implies x(k) = x(m) . Hence, k > n0 implies x(k) = x(n0 +1).

6.12 Let Br (p) be an arbitrary open ball in Rn . Then Cr/√n (p), the cube √ with side-length r/ n and center p, is contained in Br (p). Since the irrational numbers R \ Q are dense in R, for each j = 1, ..., n, the interval √ √ √ (p j − r/ n, p j + r/ n) contains an x j ∈ R \ Q. Thus |x j − p j | < r/ n, which defines a point x ∈ Cr/√n (p). 6.13 This follows from the triangle inequality for d. 6.14 Let {xk }∞ k=1 be a Cauchy sequence in X. Then there is n1 ∈ N such that d(xn , xn1 ) < 2−1 as n ≥ n1 . Suppose xn1 ,...,xnk have been chosen such that n j+1 > n j , j = 1, ..., k − 1 and d(xn , xn j ) < 2− j as n ≥ n j . Take nk+1 > nk so that d(xn , xnk+1 ) < 2−(k+1) for n ≥ nk+1 . Consider B2−(k−1) (xnk ). We claim that this sequence of closed balls is nested. If x ∈ B2−k (xnk+1 ), then d(x, xnk+1 ) ≤ 2−k and d(x, xnk ) ≤ d(x, xnk+1 ) + d(xk , xnk+1 ) < 2−(k−1). So x ∈ B2−(k−1) (xnk ). By hypothesis there is x ∈ ∩∞ k=1 B2−(k−1) (xnk ). Obviously, xnk → x and so xn → x. That is, X is complete. Conversely, suppose that X is complete. If Brk (xk ) with rk → 0 is nested, then {xk }∞ k=1 is a Cauchy sequence in X. In fact, k > m implies Brk (xk ) ⊆

364

Jie Xiao

Brm (xm ), and so xk ∈ Brm (xm), i.e., d(xk , xm ) ≤ rm . For any ε > 0 there is an n0 ∈ N such that m > n0 implies rm < ε. Consequently, k > m > n0 yields d(xk , xm ) < ε. Accordingly, {xk } is convergent to a point x ∈ X. But, each Brk (xk ) contains all but possibly a finite number of the {xk : k ∈ N} and since Brk (xk ) is closed, it follows that x ∈ Brk (xk ) and so that x is in all Brk (xk ). 6.15 Since S j ’s are nonempty, there is a sequence {x j }∞j=1 such that x j ∈ S j for any j ∈ N. Given ε > 0, there exists an N ∈ N depending only on ε such that diam(S j ) < ε for j > N. Thus, d(xm , xn ) < ε whenever m, n ≥ N. This means that {x j }∞j=1 is a Cauchy sequence in the complete metric space (X, d), and so that it converges, to x, say. Since S j+1 ⊆ S j and S j is closed, the fact x j ∈ S j must give x ∈ ∩∞j=1 S j . ◦

/ Suppose that 6.16 If S is nowhere dense then S contains no sphere, i.e., S = 0. ◦ O is any open ball in X. Then S does not contain O, and hence the open O \ S is / Consequently, there is an open ball O0 with O0 ⊆ O \ S ⊆ O. This implies not 0. 0 O ∩ S = 0/ since S ⊂ S. Conversely, assume that every open ball O contains an open ball O0 such ◦ ◦ / If S 6= 0, / then there would be a point x in the open set S and that O0 ∩ S = 0. hence an open ball Br (x) ⊆ S centered at x with a radius r > 0. As a by-product, any point p in Br (x) is a limit point of S, and then any open ball centered at p and contained in Br (x) has a non-void intersection with S. This is against the ◦ / above assumption. Therefore S = 0. 6.17 Suppose that {Oβ } is an open covering of ∩α∈I Eα. Since Eα is compact, it is closed and X \ ∩α∈I Eα is open. Now Eα ⊆ X \ ∩α∈I Eα ∪β Oβ . By the compactness of Eα , there is a finite number n of Oβ such that Eα ⊆ (X \ ∩α∈I Eα ) ∪nj=1 Oβ j , and consequently, ∩α∈I Eα ⊆ ∪nj=1 Oβ j . Therefore ∩α∈I Eα is compact. 6.18 Assume O is an open cover of {x j }∞j=0 . Then there exists an open set O ∈ O such that x0 ∈ O. Hence limn→∞ d(xn , x0 ) = 0 indicates that there is an n0 ∈ N such that xn ∈ O as n > n0 . For each n ∈ {1, ..., n0} there exists an open set On ∈ O such that xn ∈ On . Now {O, O1, ..., On0 } forms a subcover of {x j }∞j=0 . So, {x j }∞j=0 is compact.

Hints or Solutions

365

7 Continuous Mappings 7.1 Given p = (p1 , p2 , p3 ) ∈ R3 and ε > 0, choose δ = ε. Now x = (x1 , x2 , x3 ) ∈ R3 and d3 (x, p) < δ imply  2−1 d3 f (x), f (p) = (x1 − p1 )2 + (x2 − p2 )2 ≤ d3 (x, p) < ε.

7.2 The natural choice is f (0) = 0 due to d2 (0, 0) = 0. We must now prove that f is continuous at 0. Given ε > 0, choose δ = ε. Now d2 (x, 0) < δ implies | f (x) − f (0)| = d2 (x, 0) < ε. 7.3 The range of f is [0, ∞), so no point is mapped into (−3, −2); hence / f −1 (S) = 0. 7.4 Since 0 ∈ / D, f (0) = 0. For a fixed x = (x1 , x2 , x3 ) ∈ R3 , consider the points on the line segment {tx = (tx1 ,tx2 ,tx3 ) ∈ R3 : t ∈ [0, 1]}. We want to find / D and f (tx) = 0 for all t ∈ [0, 1]. If x3 > 0, limt→0+ f (tx). If x3 ≤ 0, then tx ∈ then we claim that for t sufficiently small, tx ∈ / D and therefore f (tx) = 0. This is seen by checking the coordinate inequality for tx ∈ D: tx ∈ D if and only if tx3 < (tx1 )2 + (tx2 )2 , or x3 < t(x21 + x22 ). For any x3 > 0, t can be chosen small enough to deny this. Therefore in either case, f (tx) = 0 for t sufficiently small. The discontinuity of f at 0 is shown by finding points of D that are in Bδ (0) for any δ > 0.   7.5 T (1, 1) = (3, −1) yields a11 + a12 = 3 and a21 + a22 = −1; T (1, −1) = (1, 7) yields a11 − a12 = 1 and a21 − a22 = 7. Solving this system of equations, we get a11 = 2, a12 = 1, a21 = 3 and a22 = −4. 7.6 Since f is continuous on compact subset K, f assumes its minimum, i.e., minx∈K f (x) = f (m) for m ∈ K. Taking δ = f (m) > 0 gives the desired result. 7.7 Let y ∈ [0.1]. If y is rational or irrational, then taking x = y or x = 1 − y gives f (x) = y. So f is onto. But f is continuous only at x = 2−1 . In fact, | f (x) − f (2−1)| = | f (x) − 2−1| = |x − 2−1 | gives the continuity. Let x ∈ [0, 1] \ {2−1 }. If x is rational then | f (xk ) − f (x)| = | f (xk ) − x|. When xk is rational resp. irrational, one then has | f (xk ) − x| = |xk − x| resp. |1 − xk − x|. Thus, once choosing that xk is rational but xk+1 irrational, one cannot get | f (xk) − x| → 0 as xk → x.  7.8 (i) Taking f 1 = 1 and f 2 = 0 we have dC[0,1] T( f 1 ), T( f 2) = 1 = dC[0,1] ( f 1 , f 2 );

366

Jie Xiao

(ii) If T( f ) = f , then (iii) Since

Rx 0

f (t) dt = f (x) and hence f = 0, giving the uniqueness;

 Z x Z T ◦ T f (x) = 0

0

y

Z x  f (t) dt dy = (x − t) f (t) dt, 0



it follows that dC[0,1] T ◦ T( f 1 ), T ◦ T( f 2) ≤ 2−1 dC[0,1] ( f 1 , f 2 ).

7.9 The result follows from | f (b) − f (a)| = | f 0 (c)(b − a)| ≤ α|b − a| for some c ∈ [a, b]. 7.10 Since the series is convergent, there is an n0 ∈ N such that  αn0 = sup dX T(n0 ) (x1 ), T(n0) (x2 ) /dX (x1 , x2 ) < 1. x1 6=x2

This means that T(n0 ) is a contraction, and consequently, it has a unique fixed point thanks to Corollary 7.4.5, and so does T.  7.11 If f (x) = dX x, T(x) , then f : K → R is continuous. Since K is compact and nonempty, there is a point x0 ∈ K such that f (x0 ) =  minx∈K f (x). Consequently, T(x0 ) = x0 – otherwise we would have f T(x0 ) < f (x0 ) which shows that f (x0 ) is not the minimum of f over K, a contradiction. The uniqueness is trivial. 7.12 Consider the map T : C[0, 1] → C[0, 1] given by Z x

f (x − t) exp(−t 2 ) dt.  √ For f 1 , f 2 ∈ C[0, 1], we have dC[0,1] T( f 1 ), T( f 2) ≤ 2−1 πdC[0,1]( f 1 , f 2 ). Since C[0, 1] is complete under dC[0,1] ( f 1 , f 2 ), T is a contraction and hence there exists an f ∈ C[0, 1] such that T( f ) = f . T( f ) = h(x) +

0

7.13 Show that the inverse f −1 of f exists and apply the Banach contraction theorem to f −1 . 7.14 The equivalences can be readily checked via the related definitions. 7.15 Check that f is not injective. (1)

7.16 Outline of the proof: For (X, dX ) define two Cauchy sequences {xk }∞ k=1 (2) ∞ (1) (2) and {xk }k=1 in X to be equivalent provided limk→∞ dX (xk , xk ) = 0. Let Y be the space of all equivalence classes of Cauchy sequences in X with metric (1) (2) ∞  (1) (2) dY {xk }∞ k=1, {xk }k=1 = lim dX (xk , xk ). k→∞

Hints or Solutions

367

Then (Y, dY ) is complete. Define a mapping f from X into Y by f (x) = {xk }∞ k=1 ∈ Y where xk = x ∈ X for each k ∈ N. Then f is an isometry and moreover f (X) is dense in Y . 7.17 Use the triangle inequality: dX (x, y) ≤ dX (x0 , y) + dX (x, x0 ).

8 Normed Linear Spaces 1

8.1 Let Φ(t) = t p for t ≥ 0. Since p−1 ∈ (0, 1), Φ00 (s) < 0 for all s > 0 and Φ 1 t 1 is concave. Hence Φ(t) ≤ Φ(1) + Φ0(1)(t − 1) or t p ≤ 1 + t−1 p = p + q . Setting p q t = u p v−q , where u ≥ 0 and v > 0, we find uv ≤ up + vq where 1 − q = − qp . Obviously, this inequality also holds when v = 0. The last inequality implies the H¨older inequality which, plus |a + b| p ≤ |a + b| p−1 |a| + |a + b| p−1 |b| (for a, b ∈ R), yields the Minkowski inequality. 8.2 (i) n − 1, since the following n − 1 vectors (1, −1, 0, ...0), (0, 1, −1, 0, ...,0), ..., (0, 0, 0, ..., 1, −1) are linearly independent; (ii) ∞, since 1,t,t 2, ...,t n are linearly independent for any n ∈ N; (iii) ∞, since 1,t,t 2, ...,t n are linearly independent for any n ∈ N. 8.3 (i) Obviously, k f k p ≤ k f k∞ when p ∈ [1, ∞) and f ∈ C[0, 1], but there is no such a constant κ > 0 that k f k∞ ≤ κk f k p for all f ∈ C[0, 1]. For n − 2 ∈ N let  2 1  , 0 ≤ t ≤ n1  n pt p 1 2 f n (t) = n p ( 2n − t) p , 1n ≤ t ≤ n2 .   0 , 2n ≤ t ≤ 1 1

It is easy to see that k f n k∞ = n p → ∞ (as n → ∞) and k f n k p = 1. This verifies the nonexistence of the above constant κ > 0. (ii) For n ∈ N let f n (x) = xn . Then k f n k∞ = 1 and hence the unit sphere of (C[0, 1], k · k∞) is not finite dimensional. Theorem 8.2.5 (ii) gives that the unit sphere is not compact.

8.4 Note that |kxk − kak| ≤ kx − ak. So k · k : X → R is continuous. Also, regarding the continuity of the vector addition and the scalar multiplication, we naturally assume that the norms defined on X × X and F × X are determined

368

Jie Xiao

by k(x, y)kX×X = kxk + kyk and k(λ, x)kF×X = |λ| + kxk. Therefore the desired continuities follow from ( k(x + y) − (x0 + y0 )k ≤ kx − x0 k + ky − y0 k = k(x, y) − (x0 , y0 )kX×X ; kλx − λ0 x0 k ≤ |λ − λ0 |kxk + |λ0 |kx − x0 k ≤ k(λ, x) − (λ0 , x0 )kF×X (kxk + |λ0 |). 8.5 It is clear that `0 is a linear subspace of `∞. Note that a = (1, 2−1, 3−1 , ...) ∈ `∞ . For every n ∈ N, let xn = (1, 2−1 , 3−1, ..., n−1, 0, 0, ...) ∈ `0 . Then

 kxn − ak∞ = 0, ..., 0, (n + 1)−1 , (n + 2)−1 , ...., ∞ = (n + 1)−1 → 0.

It follows that {xn }∞ n=1 converges in `∞ , but the limit a does not belong to `0 . Hence `0 is not closed in `∞. 8.6 (i) By H¨older’s inequality it follows that for 1 < p < q < ∞, Z Z  qp p q | f | dmid ≤ | f | dmid , [0,1]

[0,1]

and so that k f k p,mid ,[0,1] increases with p. Since k f k∞,mid ,[0,1] < ∞, the limit a = lim p→∞ k f k p,mid ,[0,1] is finite. It is clear that t mid ({x ∈ [0, 1] : | f (x)| > t})

1

p

≤ k f k p,mid ,[0,1] ≤ a

for all t > 0. Accordingly, if mid ({x ∈ [0, 1] : | f (x)| > t}) > 0, then    1p  lim p→∞ mid {x ∈ [0, 1] : | f (x)| > t} = 1; m {x ∈ [0, 1] : | f (x)| > t} = 0 for t > a, id

and hence a = k f k∞,mid ,[0,1] . (ii) It suffices to prove the result under 0 < k f 1 k1,mid ,[0,1] < ∞ and 0 < k f 2 k∞,mid ,[0,1] < ∞. For the inequality, one has Z

[0,1]

| f 1 f 2 | dmid =

Z

{x∈[0,1]: | f2 (x)|≤k f2 k∞,mid ,[0,1] }

| f 1 f 2 | dmid

≤ k f 1 k1,mid ,[0,1] k f 2 k∞,mid ,[0,1] . As for the equality, note that | f 2 | ≤ k f 2 k∞,mid ,[0,1] holds mid -a.e. on [0, 1]. So, if the equality is valid then Z  | f 1 | 1 − | f 2 |k f 2 k−1 ∞,mid ,[0,1] dmid = 0 [0,1]

369

Hints or Solutions

and hence | f 2 (x)| = k f 2 k∞,mid ,[0,1] for mid -a.e. x ∈ [0, 1]. The reversed assertion is trivial. 8.7 It is clear that T is linear. Furthermore, if f ∈ C[0, 1] then |T( f )| = | f (0)| ≤ k f k∞, giving the boundedness of T with kTk ≤ 1. If f (x) = 1 then k f k∞ = 1 and T( f ) = 1. This yields kTk = 1. 8.8 (i) Clearly, ( kT(x)k∞ ≤ sup1≤k≤n ∑nj=1 |ak j ||x j | ≤ kxk∞ sup1≤k≤n ∑∞j=1 |ak j |; kTk ≤ sup1≤k≤n ∑nj=1 |ak j |. To get the equality, just take x j = sgnak0 j where ∑nj=1 |ak0 j | = sup1≤k≤n ∑nj=1 |ak j |. Then kxk∞ = 1 and yk = ∑ j=1 |ak0 j |. So n n kTk ≥ k{yk }nk=1k∞ = sup ∑ ak j sgn ak0 j ≥ ∑ ak0 j sgn ak0 j = sup 1≤k≤n j=1

kT( f )k1 ≤

Z b Z x a

a



| f (t)|dt dx ≤

Z b Z b a

a

∑ |ak j|,

1≤k≤n j=1

j=1

giving the desired equality. (ii) It follows from the definition of k · k2 . (iii) It is clear that for f ∈ R1 [a, b],

n

 | f (t)|dt dx = (b − a)k f k1 ,

and so that T is bounded with kTk ≤ b − a. To see the equality, for any n ∈ N with a + n−1 < b let  n , t ∈ [a, a + n−1 ] . f n (t) = 0 , t ∈ (a + n−1 , b] It is easy to check k f n k1 = 1 and kT( f n )k1 =

Z a+n−1 a

n(x − a) dx +

Z b

a+n−1

dx = (b − a) − (2n)−1 .

So kTk ≥ supn∈N kT( f n )k1 = b − a. Therefore kTk = b − a. 8.9 The boundedness is obvious. Since T( f 1 ) = T( f 2 ) implies f 1 = f 2 , we conclude that T is 1-1. But, T is not onto. In fact, if T( f ) = 1 on (0, 1) then the only possible candidate for f is f (t) = t −1 . It is clear that f ∈ C(0, 1) with k f k2 = ∞. From this it turns out that T is not invertible.

370

Jie Xiao

8.10 Here, suppose that the norm on R2 is (2, 2)-norm.  Linear:  T α(x1 , y1 ) + β(x2 , y2 ) = (αx1 + βx2 , 0) = αT (x1 , y1 ) + βT (x2 , y2 ) .  Bounded: kT (x1 , x2 ) k2,2 = |x1 | ≤ k(x1 , x2 )k2,2 . Not onto: (x1 , 1) has no preimage under T. An example: S = (0, 1) × (0, 1) is an open set of R2 , but T(S) = {(x1 , 0) : x1 ∈ (0, 1)} is not open set of R2 . 8.11 These results just follow from the definition. 8.12 (i) If x ∈ X and 0 < ε < kxkX , then x ∈ BkxkX +ε (0) and x ∈ / BkxkX −ε (0). This readily implies the desired result. (ii) Clearly, we have p(κx) = inf{λ > 0 : κx ∈ λK} = κ inf{κ−1 λ > 0 : x ∈ κ−1 λK} = κp(x). Moreover, if ε, λ, µ are positive and satisfy x = λx1 , y = µy1 , λ < p(x) + 2−1 ε and µ < p(y) + 2−1 ε for some x1 , y1 ∈ K, then x + y = (λ + µ)

 λ  µ x1 + y1 ∈ (λ + µ)K. λ+µ λ+µ

So, p(x + y) ≤ λ + µ < p(x) + p(y) + ε and hence p(x + y) ≤ p(x) + p(y). (iii) Consider L0 (λx0 ) = λ for λ ∈ R which is a linear functional on the linear subspace < x0 >= {λx0 : λ ∈ R}. Since p is the Minkowski functional of K, we use the fact that x0 ∈ / K to obtain λx0 ∈ / λK for λ 6= 0 and thus p(λx0 ) ≥ λ = L0 (λx0 ). Consequently, there exists a linear extension L of L0 to X obeying L(x) ≤ p(x) for any x ∈ X. When x ∈ K, we have L(x) ≤ p(x) ≤ 1 = L0 (x0 ). Note that 0 ∈ K ◦ . So ±ε−1 y ∈ K for a given ε > 0 and any y with kykX being small enough. Accordingly, |L(y)| < ε for kykX small enough, say kykX < δ. Finally, the linearity of L gives that |L(x1 ) − L(x2 )| = |L(x1 − x2 )| < ε whenever kx1 − x2 kX < δ, namely, L is continuous. 8.13 Note that E1 − E2 = {a − b : a ∈ E1 , b ∈ E2 } is convex and 0 6∈ E1 − E2 . So, by hypothesis, there is an x0 ∈ (E1 − E2 )◦ . Consequently, −x0 + E1 − E2 is convex and contains 0 in its interior. Of course, −x0 ∈ / −x0 + E1 − E2 . Thus, an application of Problem 8.12 (iii) yields a continuous linear functional L such that L(−x0 + a − b) ≤ L(−x0 ) for any a ∈ E1 and b ∈ E2 . Consequently, supa∈E1 L(a) ≤ infb∈E2 L(b). Now, any α ∈ [supa∈E1 L(a), infb∈E2 L(b)] gives the two inclusions.

371

Hints or Solutions

8.14 Let f ∈ B (X, F). If {xn }∞ n=1 converges to x ∈ X then | f (xn ) − f (x)| ≤ k f kkxn − xkX → 0 and hence f (x) = 0, i.e., x ∈ f −1 (0). Thus f −1 (0) is closed. Conversely, assume that f −1 (0) is closed. If f is not in B (X, F), then there is a sequence of points {xn }∞ n=1 in X such that kxn kX = 1 but cn = f (xn ) → ∞. Since f is linear, f (xn /cn ) = 1 and consequently f (xn /cn − x1 /c1 ) = 0; that is, xn /cn − x1 /c1 ∈ f −1 (0). Note that xn /cn → 0 in X. So xn /cn − x1 /c1 → −x1 /c1 . Now, the assumption gives x1 /c1 ∈ f −1 (0), contradicting f (x1 /c1 ) = 1. Therefore, f ∈ B (X, F). 8.15 Suppose xk = mk + λk x ∈ M + Fx is convergent to y ∈ X. Then {λk }∞ k=1 is bounded – otherwise, there exists a subsequence λk j such that 0 < |λk j | → ∞ and hence x + mk j /λk j = xk j /λk j → 0 in X; but, M is closed, so we see that x ∈ M, contradicting the assumption x ∈ X \ M. Now the boundedness of {λk }∞ k=1 in F yields a subsequence, say {λk j }∞j=1 , which is convergent to a point λ ∈ F. Consequently, mk j = xk j − λk j x → y − λx ∈ M; this implies, y ∈ M + Fx. Therefore, M + Fx is closed.

8.16 If M = X then N is taken to be {0}. Now, suppose M is a closed proper subspace of X. For any x ∈ X \ M, let L(y + λx) = λd(x, M) for all y ∈ M and λ ∈ F, then L is a linear functional defined on M + Fx over F. Clearly, L(x) = d(x, M), L(y) = 0 for all y ∈ M, and kLk = 1. If Lx is the Hahn-Banach extension of L to X associated with x, then Lx ∈ B (X, F) with kLx k = 1, Lx (x) = d(x, M) and M ⊆ NLx = {z ∈ X : Lx (z) = 0}. Next, if M is finite dimensional, then it is closed. Moreover, by Corollary 8.4.4 (vii) for every x ∈ X \ M there is a one-dimensional subspace Yx of X such that X = NLx + Yx and NLx ∩ Yx = {0}. T Accordingly, M = x∈X\M NLx . To see this equation, assume that it is not true. T By the first part of the argument, there exists an x0 ∈ x∈X\M NLx \ M and hence an Lx0 ∈ B (X, F) such that Lx0 (x0 ) = d(x0 , M) and M ⊆ NLx0 . Note that Lx0 (x) = 0 for all x ∈ X \ M. So Lx0 (x0 ) = 0, i.e., x0 ∈ M thanks to the closedness of M. T This is a contradiction. Let N = x∈X\M Yx which is closed. Since X=

\

x∈X\M

(NLx +Yx ) =

\

x∈X\M

NLx +

\

Yx,

x∈X\M

we conclude X = M + N and M ∩ N = {0}. 8.17 The result follows from considering p(x) = inf{ f (y) : y ∈ M and x  y} and adapting the argument for Theorem 8.4.3.

372

Jie Xiao

9 Banach Spaces via Operators and Functionals 9.1 If { f j }∞j=1 is a Cauchy sequence in Cn [0, 1], then for any ε > 0 there exists n0 ∈ N such that for all x ∈ [0, 1], (k)

(k)

m, n ≥ n0 ⇒ | f m (x) − f n (x)| ≤ k f m − f n k∞ < ε. (k)

So for each integer k ∈ [0, n], one has that { f j (x)}∞j=1 is a Cauchy sequence in R. Therefore it converges to some real number gk (x) for every x ∈ [0, 1]; this (k) defines a new function gk such that { f j }∞j=1 converges to gk pointwise on [0, 1]. (k)

Moreover, { f j }∞j=1 converges to gk uniformly on [0, 1], and that gk ∈ C[0, 1] – this follows readily from the last estimate when letting m → ∞. In particular, k = 0 implies that { f j }∞j=1 converges uniformly on [0, 1] to g0 = f , and f is differentiable with f 0 = g1 . Furthermore, the same reasoning yields f (k) = gk on [0, 1]. So, f ∈ Cn [0, 1]. 9.2 The result follows from the estimate for m, n ∈ N, kxm − xn kX , kym − yn kY ≤ k(xm, ym ) − (xn , yn )k ≤ kxm − xn kX + kym − yn kY . 9.3 Consider f n (x) = xn for each n ∈ N. It is clear that f n ∈ C[0, 1] and the function  1, x=1 f (x) = 0 , x ∈ [0, 1) is not in C[0, 1]. But k fn −

f k pp

=

Z 1 0

n

p

|x − f (x)| dx =

Z 1 0

x pn dx = (1 + pn)−1 → 0.

9.4 It suffices to check the inequality k f ∗ gk1 ≤ k f k1 kgk1 for all f , g ∈ L1 (Z, F). Since f , g ∈ L1 (Z, F), it follows that ∞



m=−∞

| f (n − m)||g(m)| ≤ k f k1 kgk1 < ∞

373

Hints or Solutions and so that for any k, l ∈ Z, k

k f ∗ gk1 =

∑ | f ∗ g(n)|

lim

k→∞,l→−∞

n=l k



k→∞,l→−∞

=

lim

lim



n=l



j

lim

j→∞ j



m=− j

| f (n − m)||g(m)|

k



∑ ∑ | f (n − m)||g(m)|

lim

k→∞,l→−∞ j→∞ m=− j

n=l

j

≤ k f k1 lim

j→∞



m=− j

|g(m)| = k f k1 kgk1 .

1 9.5 Take X = Y = `0 and equip it with the 2-norm: kxk2 = ∑∞j=1 |x j |2 2 . Here x ∈ `0 if and only if x ∈ `∞ and it has only finitely many nonzero entries. Then X and Y are not complete. Now define Tn (x) = (0, ..., 0, nxn, 0, ...) for x = (x1 , x2 , ...). Then Tn is bounded with kTn k = n → ∞. On the other hand, if x ∈ `0 then there is an n0 ∈ N such that n > n0 , xn = 0 and hence kTn (x)k2 = n|xn | = 0 and if n ≤ n0 then kTn (x)k2 = n|xn | ≤ nkxk2 ≤ n0 kxk2 . Hence for each x ∈ X we have supn∈N kTn (x)k2 < ∞. Clearly, the uniform bounded principle fails.  9.6 If f (x) = (1 + x2 )−1 , then f (−1, 1) = (2−1 , 1] which is not open though (−1, 1) is open. 9.7 (i) Since | sinx| ≤ |x|, we conclude that the integral is not less than Z 2π 0

2x−1 | sin(n + 2−1 )x| dx.

Note that kπ + 6−1 π ≤ (n + 2−1 )x ≤ kπ + 3−1 π ⇒ | sin(n + 2−1 )x| ≥ 2−1 , k ∈ N. So Z 2π 0

2x−1 | sin(n + 2−1 )x| dx ≥

2n

∑ k=0

 π(k + 3−1 ) −1 n + 2−1

→ ∞ as n → ∞.

(ii) It follows from a change of variable that sn (x) = (2π)

−1

Z 2π−x −x

f (x + z)



n



m=−n

 e−imz dz.

374

Jie Xiao

This yields that sn (x) = sn (x) and hence sn (x) is real-valued. Note that eix = cos x + i sin x and f is 2π-periodic. So sn (x) = (2π)−1 = (2π)−1



n

Z 2π−x

f (x + z)

Z 2π−x

 f (x + z) 1 + 2

−x −x

Z 2π−x

= (2π)−1

Z 2π 0

|Tn ( f )| ≤ (2π)−1k f k∞ kTn k ≤ (2π)−1



m=1

 cosmz dz

f (x + z)

(iii) For any f ∈ X, we have

and then

n

 sin(n + 2−1 )z  dz sin2−1 z  sin(n + 2−1 )y  f (x + y) dy. sin2−1 y

= (2π)−1

−x



m=−n

 cos mz dz

Z 2π −1 sin(n + 2 )x 0



sin2−1 x

Z 2π sin(n + 2−1 )x

dx,



dx. sin2−1 x To prove the equality actually holds, we may assume qn (x) =

0

sin(n + 2−1 )x sin 2−1 x

and gn (x) = sgnqn (x), that is,   1 , gn (x) > 0 gn (x) = 0 , gn (x) = 0 .  −1 , gn (x) < 0

Then |gn (x)| = gn (x)qn (x). Though gn is not continuous, for any ε > 0 there is a continuous function f n such that Z (2π) −1

0



 f n (x) − gn (x) qn (x) dx < ε.

This can be easily realized since qn is continuous on [0, 2π]. In fact, it is enough to use piecewise-defined segments to connect the discontinuous points of gn so

375

Hints or Solutions that f n is sufficiently close to gn . Then k f n k∞ = maxx∈[0,2π] | f n (x)| = 1, but |Tn ( f n )|

= (2π)−1

Z 2π 0

f n (x)qn (x) dx

Z 2π  f n (x) − gn (x) qn (x) dx + (2π)−1 gn (x)qn (x) dx 0 0 Z 2π Z 2π  ≥ (2π)−1 gn (x)qn (x) dx − (2π)−1 f n (x) − gn (x) qn (x) dx

= (2π)−1

Z 2π 0



Z 2π 0

0

|qn(x)|dx − ε.

Obviously, this implies kTn k ≥ (2π)

−1

and so kTn k = (2π)−1

Z 2π sin(n + 2−1 )x 0



sin2−1 x

dx,

Z 2π sin(n + 2−1 )x

dx. sin2−1 x (iv) It is clear that Tn ( f ) = sn (0) for all f ∈ X. Moreover, if the Fourier series of a function f ∈ X converges at 0, then {Tn ( f )}∞ n=1 is bounded since each element is just a partial sum of a convergent series. Thus if the Fourier series of f converges at 0 for all f ∈ X, then for each f ∈ X the sequence {Tn ( f )}∞ n=1 is bounded. By the uniform bounded principle, this implies that {kTn k}∞ n=1 is bounded, contradicting (i) and (iii). p 9.8 (i) Consider f n (x) = (x − 2−1 )2 + n−2 . Then { f n }∞ n=1 is convergent to the function f (x) = |x − 2−1 | uniformly on [0, 1] since 0



n−2 f n (x) − |x − 2−1 | = p p ≤ n−1 . −1 2 −2 −1 2 (x − 2 ) + n + (x − 2 )

So, limn→∞ k f n − f k∞ = 0. However, f is not in C1 [0, 1]. Namely, X is not complete under the sup-norm. 1 (ii) To prove that T = d/dx is closed, let {xn }∞ n=1 be a sequence in C [0, 1] such that xn → x, T(xn ) → y. Since the convergence in C[0, 1] means the uniform convergence, we conclude that T(xn ) converges to y uniformly on [0, 1] and y ∈ C[0, 1]. Of course, x ∈ C[0, 1] and y(t) = x0 (t); that is, T is closed. But, it is not bounded since if xn (t) = sinnπt then kT(xn )k∞ = max | cos nπt|nπ = nπ → ∞. t∈[0,1]

376

Jie Xiao

9.9 The result follows from the related definitions. 9.10 (i) Assume the statement after the if and only if. If L ∈ B (C[0, 1], R) then by Theorem 9.3.4 there is a function g of bounded variation on [0, 1] such that R1 L( f ) = 0 f (x) dg(x) for all f ∈ C[0, 1]. Note that sup j∈N k f j k∞ < ∞. So the dominated convergence theorem yields lim L( f j ) = lim

j→∞

Z 1

j→∞ 0

f j (x) dg(x) =

Z 1

f (x) dg(x) = L( f ),

0

namely, { f j }∞j=1 converges weakly to f . Conversely, if { f j }∞j=1 converges weakly to f , then sup j∈N k f j k∞ < ∞. For any t ∈ [0, 1] define Lt ( f ) = f (t) for f ∈ C[0, 1]. Clearly, Lt ∈ B (C[0, 1], R) and so f j (t) = Lt ( f j ) → Lt ( f ) = f (t) as j → ∞.

(ii) Assume the statement after the if and only if. Let L ∈ B (` p , R). Then there exists a y = (y j )∞j=1 ∈ ` p(p−1)−1 such that ∞

L(x) =

∑ xk yk

k=1

for all x = {xk }∞ k=1 ∈ ` p .

Moreover, for any ε > 0 there exists an n0 ∈ N such that n > n0 implies n0

(n)

− sk ||yk | < 2−1 ε

∑ |sk

k=1

and ∞



k=n0+1

|yk |

p(p−1)−1


0 there is a j0 ∈ N such that j > j0 implies kx( j) − xk∞ < ε. Note that x( j0 +1) ∈ c0 . So there is a k0 ∈ N ( j +1) ( j +1) such that m, n > k0 implies |xm 0 − xn 0 | < 3−1 ε. Accordingly, ( j +1)

|xm − xn | ≤ |xm − xm 0

( j +1)

| + |xm 0

( j +1)

− xn 0

( j +1)

| + |xn − xn 0

| < ε.

Namely, {xk }∞ k=1 is a Cauchy sequence in R and hence convergent to 0. Next, we prove that c0 is separable. For n ∈ N let An = {x ∈ c0 : x = (r1 , r2 , ..., rn, 0, 0, ...), r j ∈ Q}.

378

Jie Xiao

∞ Then the countable set ∪∞ n=1 An is a subset of c0 . Now, given x = (x j ) j=1 ∈ c0 . Then for any ε > 0 there is an n0 ∈ N such that j > n0 implies |x j | < 2−1 ε. Taking an r1 , ..., rn0 ∈ Q such that sup1≤ j≤n0 |x j − r j | < 2−1 ε, we obtain y = (r1 , ..., rn0 , 0, 0, ...) ∈ ∪∞ n=1 An and

kx − yk∞ ≤ sup |x j − r j | + sup |x j | < ε. 1≤ j≤n0

j>n0

Thus, ∪∞ n=1 An is a dense subset of c0 , deriving the separability of c0 . Finally, if c0 is reflexive then we would have c0 = `0 which is impossible. (ii) It suffices to verify the necessity. For each j ∈ N, let e j = (0, ..., 0, 1, 0, ...). | {z } j−1

Then en ∈ c0 . For any L ∈ B (c0 , R) set y j = L(e j ), we have ∞



∑ |y j | =

j=1

∑ L(e j )sgnL(e j ) j=1

 ∞  = L ∑ e j sgnL(e j ) j=1



≤ kLk ∑ e j sgnL(e j ) j=1



≤ kLk and

 ∞  L(x) = L ∑ x j e j = j=1



∑ xjyj j=1

for x = (xk )∞ k=1 ∈ c0 .

9.14 The weak closure clearly contains the norm closure. So it remains to check that if x is not in the norm closure of K then there exists a weak neighborhood of x which is disjoint from K. But, this follows right away from the convex separation theorem presented in Problem 8.13. 9.15 (i) It suffices to consider A 6= X ∗ . Note that for f ∈ A and g ∈ X ∗ \ A there is a point x0 ∈ X such that f (x0 ) 6= g(x0 ) and hence for 0 < ε < | f (x0 ) − g(x0)| there are two disjoint neighborhoods ( O f = {F ∈ X ∗ : |F(x0 ) − f (x0 )| < 2−1 ε}; Og = {F ∈ X ∗ : |F(x0 ) − g(x0 )| < 2−1 ε}.

Hints or Solutions

379

So, a combination of the compactness of A = ∪ f ∈A O f and the last part of the argument for Example 6.3.1 (iii) completes the proof. (ii) Just follow the argument for Theorem 6.3.7 (i). 9.16 For any x ∈ X, suppose T(x) 6= 0 otherwise there is nothing to argue. Then kT(x)kY = sup | f ◦ T(x)| ≤ sup k f ◦ TkkxkX , k f k=1

k f k=1

thereby obtaining T ∈ B (X,Y ). 9.17 Suppose X is compact. If {Fα }α∈I is a set of closed subsets of X and has the FIP, then any finite number of sets {Fj }nj=1 from {Fα }α∈I obeys ∩nj=1 Fj 6= / and hence ∩α∈I Fα 6= 0/ – otherwise, X = ∪α∈I (X \ Fα ). This yields a finite 0, number of sets {X \ Fj }nj=1 such that X = ∪nj=1 X \ Fj thanks to X being compact. Consequently, ∩nj=1 Fj = 0/ – a contraction. Conversely, assume the statement after the if and only if. Suppose X = ∪α∈I Oα where Oα ⊆ X is open. Then / and hence by the contrapositive statement of the assumption it ∩α (X \ Oα ) = 0, follows that there is a finite number of sets {X \O j }nj=1 such that ∩nj=1 (X \O j ) = 0/ and consequently X = ∪nj=1 O j . Accordingly, X is compact. 9.18 Since Bc (X,Y ) is a closed linear subspace of the Banach space B (X,Y ), it is a Banach space, too. 9.19 The result follows from a modification of the argument for Example 9.5.7 – in fact, the proof does not employ the continuity of k(·, ·) on the region {(x, y) ∈ R2 : x < y, a ≤ x, y ≤ b}.

9.20 (i) Consider a bounded sequence {x j }∞j=1 in X. Since T is compact, the sequence {T(x j )}∞j=1 has a convergent subsequence, say, {T(x jk )}∞ k=1 which is convergent to y in X. Then kST(x jk ) − S(y)k ≤ kSkkT(x jk ) − yk → 0 as k → ∞.

This shows that ST is compact. Note that {S(x j )}∞j=1 is bounded. So {TS(x j )}∞j=1 has a convergent subsequence, showing that TS is compact. Similarly, ST is compact, too. (ii) For any n ∈ N, let BXn ⊆ X be the open ball centered at 0 with radius n. Then X X X T(X) = ∪∞ n=1 T(Bn ). Since T(Bn ) is compact, T(Bn ) is separable by Theorem 6.3.12 and hence T(X) is separable. 9.21 (i) For any bounded set B ⊆ X, we have T(B) = {0} and so T is compact.

380

Jie Xiao

(ii) Assume that T is invertible. Then the identity operator I = T−1 T is compact on X. But since X is infinite dimensional, the unit sphere S of X is not compact and consequently I(S) = S is not compact, i.e., I is not compact on X, a contraction. Thus, T is not invertible. (iii) Note that T(X) is a closed linear subspace of the Banach space X. If T is compact then the closed unit ball of T(X) is compact and hence T(X) must be finite dimensional thanks to Theorem 8.2.5 (ii), contradicting the assumption that T(X) is infinite dimensional.

10 Hilbert Spaces and Their Operators 1 10.1 It is easy to check that the inner product h f , gi = −1 f (x)g(x)dx satisfies (a), (b) and (c) of Definition 5.1.1, and so C([−1, 1], C) is an inner product q space. Since the norm equipped with C([−1, 1], C) is the 2-norm:

R

1 2 k f k2 = −1 | f (x)| dx, we can conclude that this space is not complete. In fact, for each j ∈ N let   −1 , x ∈ [−1, 0) jx , x ∈ [− j −1 , j −1 ] , f j (x) =  1 , x ∈ [ j −1 , 1]

R

then f j ∈ C([−1, 1], C). Observe that for each m, n ∈ N satisfying m > n, one has k fm −

f n k22

= 2

Z

0

= 2
0. Thus, there is a δ > 0 such that f (x) > 2−1 f (x0 ) for x ∈ (x0 − δ, x0 + δ). If g(x) = 1 + cos(x0 − x) − cos δ, then 1 < g(x) for x ∈ (x0 − δ, x0 + δ) and |g(x)| ≤ 1 for x ∈ [−π, π] \ (x0 − δ, x0 + δ). Note that h f , e j i = 0. So for any n ∈ N, n

0 = h f,g i = =

Z x0 −δ −π

Z π

−π

f (x)gn (x) dx

f (x)gn (x) dx +

Z x0 +δ x0 −δ

f (x)gn (x) dx +

Z π

x0 +δ

f (x)gn (x) dx.

Using the properties of g above, we obtain  R x0 −δ n (x) dx ≤ 2π f (x );  f (x)g 0   −π R π x0 +δ f (x)gn (x) dx ≤ 2π f (x0 );   Rb R x0 +δ n n x0 −δ f (x)g (x) dx ≥ a f (x)g (x) dx,

for all [a, b] ⊆ (x0 − δ, x0 + δ). Since g is continuous on [a, b], we can conclude that g achieves a minimum value, κ > 1, there. This implies 4π f (x0 ) ≥

Z b a

f (x)gn (x) dx ≥ 2−1 f (x0 )κn (b − a) → ∞ as n → ∞.

Hints or Solutions

383

This is a contradiction. Thus, f = 0 on [−π, π]. If f is continuous but not real-valued, then our hypothesis gives that for all k = 0, ±1, ±2, ..., Z π

f (x)e−ikxdx = 0 and

Z π

f (x)e−ikxdx = 0.

ℜ f (x)e j (x) dx = 0 and

Z π

ℑ f (x)e j (x) dx = 0.

−π

Hence

Z π

−π

−π

−π

By the first part, we get that ℜ f (x) = 0 = ℑ f (x) and so f = 0 on [−π, π]. Finally, we no longer assume that f is continuous. But, f generates Rx a continuous function F(x) = −π f (t) dt. An integration by parts yields Rπ Rπ F(x) sin(kx) dx = 0. Similarly, −π −π F(x) cos(kx) dx = 0. This infers that F and F −C for every constant C, is orthogonal to each of the non-constant memRπ −1 bers of {e j }. For (2π)−2 , let C0 = (2π)−1 −π F(x) dx. Then hF − C0 , e j i = 0 for all j ∈ N. Since F is continuous, we conclude from the first two parts that F −C0 = 0. Of course, f = F 0 = 0 a.e. on [−π, π]. 10.7 (i) From hT( f ), gi = h f , T∗ (g)i we have Z

[−π,π]

φ f gdmid =

Z

[−π,π]

 f T∗ (g) dmid

and so T∗ (g) = φg by the uniqueness of T∗ . (ii) It is clear that if φ is real-valued, then φ = φ and hence T∗ = T. (iii) If |φ| = 1 a.e. on [−π, π] then T is unitary; If φ ≥ 0 mid -a.e. on [−π, π] then hT( f ), f i ≥ 0 and hence T is positive; If φ = 1 or 0 mid -a.e. on [−π, π] then T2 ( f ) = φ2 f = φ f = T( f ) and hence T is projection. 10.8 If x ∈ N(T) then T(x) = 0 and hence hx, T∗ (y)i = hT(x), yi = 0 for all y ∈ X, T∗ y ∈ R(T∗ ). This gives x ∈ R(T∗ )⊥ . Conversely, if x ∈ R(T∗ )⊥ , then hx, T∗ (y)i = hT(x), yi = 0 for any y ∈ X, and hence T(x) = 0, i.e., x ∈ N(T). Therefore N(T) = R(T∗ )⊥ . Replacing T by T∗ , we get N(T∗ ) = R(T)⊥ and ⊥ N(T∗ )⊥ = R(T)⊥ . Since R(T) ⊆ R(T), by the continuity of h·, ·i and the closedness of R(T) ⊥ ⊥ ⊥ we obtain R(T) = R(T) = R(T)⊥ . Thus N(T∗ )⊥ = R(T). Replacing T∗ by T, we also obtain R(T∗ ) = N(T)⊥.

384

Jie Xiao

10.9 (i) Consider (T − λI)(x) = y. Thus x(t) = y(t)/(t − λ) provided that t 6= λ for t ∈ [0, 2π]. So there is a unique solution x ∈ C[0, 2π] except when t = λ for t ∈ [0, 2π] and so σ(T) = [0, 2π]. (ii) The argument is similar to that for (i). 10.10 (i) For λ ∈ C with |λ| < 1 let xλ = (1, λ, ..., λn, ...). Then xλ ∈ `2 (N, C) and T(xλ ) = (λ, λ2 , ..., λn, ...) = λxλ . This means {λ ∈ C : |λ| ≤ 1} ⊆ σ(T). On the other hand, kT(x)k2 ≤ kxk2 for any x ∈ `2 (N, C), and thus kTk ≤ 1, giving σ(T) ⊆ {λ ∈ C : |λ| ≤ kTk} ⊆ {λ ∈ C : |λ| ≤ 1}. Therefore, σ(T) = {λ ∈ C : |λ| ≤ 1}. (ii) Refer to the argument for (i). 10.11 The formula follows from (λI − T)∗ = λI − T∗ . 10.12 (i) This follows from that fact that hT(x), xi, λ ∈ R ensures

2 kT(x) − λxk2 = (λ − hT(x), xi + kT(x)k2 − hT(x), xi2 .

(ii) By Theorem 10.5.8 we can write x = x0 + ∑∞ k=1 Pk (x) and T(x) = ∞ ∞ ∑k=1 λk Pk (x) where x0 ∈ N(T) and {λk }k=1 are those nonzero eigenvalues of T. Then ∞

T(x) − hT(x), xi =

∑ k=1

 λk − hT(x), xi Pk (x) − hT(x), xix0 .

Since σ(T) is closed, we can choose a point λ ∈ σ(T) such that it is nearest to hT(x), xi, we use (i) and the Pythagorean property to get kT(x) − hT(x), xixk2 =





k=1

2 λk − hT(x), xi kPk (x)k2 + hT(x), xi2 kx0 k2

≥ |λ − hT(x), xi|2

2





∑ kPk (x)k2 + kx0 k2

k=1 2

= |λ − hT(x), xi| kxk



= |λ − hT(x), xi|2 .

10.13 Let c = 01 ey f (y)dy and f obey T( f ) = λ f . Then cex = λ f (x). If λ 6= 0 R then f (x) = (c/λ)ex and hence c = 01 ey (c/λ)eydy, yielding λ = (e2 − 1)/2 and R f (x) = c0 ex where c0 is any nonzero constant. If λ = 0 then 01 ex f (x) dx = 0 and hence eigenvectors are all the nonzero elements in {ex }⊥. R

385

Hints or Solutions

10.14 Since T1 = (T + T∗ )/2 and T2 = (T − T∗ )/(2i), we use TT∗ = T∗ T to obtain T21 + T22 = 2−2 (T + T∗ )(T + T∗ ) − 2−2 (T − T∗ )(T − T∗ ) = TT∗ thereby achieving kT21 + T22 k = kTT∗ k = kTk2 and kT2 k2 = k(T2 )∗ T2 k = kTT∗ k2 = kTk4 , as desired.

10.15 (i) This follows from k(T∗ − λI)(x)k2 = k(T − λI)(x)k2 . (ii) and (iii) These follow from a simple calculation via the definition. 10.16 Use kT(xm ) − T(xn )k2 ≤ kT∗ T(xm ) − T∗ T(xn )kkxm − xn k.

10.17 Given j ∈ N, set e j = (0, ..., 0, 1, 0, ...). For x = {x j }∞j=1 ∈ `2 (N, C) and | {z } j−1

n ∈ N let Tn (x) = ∑nj=1 (∑∞ k=1 xk a jk )e j . Then Tn has finite rank and hence it is compact. Note that k(T − Tn )(x)k22

∞ 2  ∑ xk a jk ≤



=



j=n+1 k=1



j=n+1 k=1

thanks to the Cauchy-Schwarz inequality. So s ∞

 kxk22



∑ ∑ |a jk |2 → 0

kT − Tn k ≤



∑ ∑ |a jk |2

j=n+1 k=1

as n → ∞.

This implies the compactness of T. 10.18 (i) Note that K( f )(x) =

Z

[0,1]

n

=



k=1



Z

n

∑ gk (x) fk (y) k=1

[0,1]



f (y)dmid (y)

 f k f dmid gk (x).

So, the range of K is finite-dimensional, and consequently, K is compact. (ii) Just use the formula in (i). (iii) In this case, we have  0 , j 6= k h j,k = R . 2 g dm id , j = k [0,1] j

386

Jie Xiao

Moreover, (h j, j − λ)c j for j = 1, 2, ..., n. This means that {h j, j }nj=1 is the set of all nonzero eigenvalues, and g j is the eigenvector corresponding to h j, j for each j. It is clear that if M = span{g j }nj=1 then any nonzero element in M ⊥ is an eigenvector corresponding to 0. (iv) Since cos π(y + x) = cos πy cos πx + (i sinπy)(i sinπx), we can take g1 (x) = f 1 (x) = cos(πx) and g2 (x) = f 2 (x) = i sin(πx), obtaining hg1 , g2 i = 0 and h1,1 = 2−1 ; h2,2 = −2−1 . Accordingly, K has two nonzero eigenvalues λ1 = 2−1 and λ2 = −2−1 for which the corresponding eigenvectors are c cos πx and c sin πx respectively. Of course, the eigenvectors corresponding to 0 are the nonzero functions in {cos πx, sinπx}⊥ . 10.19 Assuming that lim j→∞ kT(e j )k → 0 is not true, we use the compactness of T to get a subsequence {e jk }∞ k=1 and a nonzero x ∈ X such that kT(e jk )−xk < k−1 holds for any k ∈ N. Letting xn = ∑nk=1 k−1 e jk , we find that {xn }∞ n=1 is con−2 −1 ∞ vergent in X and {T(xn )}n=1 is bounded since ∑k=1 k < ∞ and {k e jk }∞ k=1 is orthonormal, but also (by the triangle inequality)

n

−1 kT(xn )k = ∑ k T(e jk ) k=1

n 

= ∑ k−1 T(e jk ) − x + k=1

n

n

k=1 n

k=1 n

k=1

k=1

n

∑ k−1 k=1

≥ kxk ∑ k−1 − ∑ kT(e jk ) − xk



x

≥ kxk ∑ k−1 − ∑ k−2 → ∞, contradicting the boundedness of {T(xn )}∞ n=1.

10.20 (i) Suppose that { f k }∞ k=1 is an orthonormal basis of Y . Then the result follows from  ∞ ∞  ∞ 2 ∑ kT(e j )kY = ∑ ∑ |hT(e j), fkiY j=1

=

j=1 k=1 ∞  ∞



k=1 ∞

=

∑ |he j , T∗( fk)iX j=1

∑ kT∗ ( fk)k2X .

k=1



387

Hints or Solutions (ii) For each k ∈ N and x = ∑∞j=1 hx, e j iX e j ∈ X let Tk (x) = T



 hx, e i e ∑ jX j = k

k

j=1

j=1

∑ hx, e j iX T(e j ).

Then Tk is a compact operator from X to Y . This fact, plus ∞

k(Tk − T)(x)kY





j=k+1  ∞





j=k+1

≤ kxkX derives kTk − Tk ≤



|hx, e j iX kT(e j )kY



∑ j=k+1



|hx, e j iX |2 ∞

∑ j=k+1

 21 

kT(e j )kY2

kT(e j )kY2

1 2





j=k+1

1 2

kT(e j )kY2

 12

,

→ 0 as k → ∞,

whence the compactness of T. (iii) Choosing any orthonormal basis {e j }∞j=1 in LRS2id ((c, d), C), we get T(e j )(x) =

Z

(c,d)

k(x, y)e j (y) dmid (y) = hk(x, ·), e j i(c,d)

where the right-hand inner product is defined on LRS2id ((c, d), C). Thus, ∞

k

∑ kT(e j )k2 = j=1

lim

∑ kT(e j)k2

k→∞ j=1 k

= = ≤

lim



lim

Z

lim

Z

Z

k→∞ j=1 (a,b)

k→∞

|hk(x, ·), e ji(c,d)|2 dmid (x)

k

|hk(x, ·), e ji(c,d)|2 dmid (x) ∑ (a,b) j=1

Z

k→∞ (a,b) (c,d)

|k(x, y)|2 dmid (y) dmid (x) < ∞.

10.21 (i) Let M = supk∈N |ck |. If M < ∞ then kT(x)kY2 ≤ M2 kxkX for any x ∈ X and hence T is bounded. Conversely, if T is bounded but M = ∞, then for any

388

Jie Xiao

n ∈ N there is cn such that |cn | ≥ n, and hence kT(en )kY = kcn f n kY = |cn | ≥ n, contradicting the boundedness of T. (ii) Let limk→∞ |ck | = 0. For each n ∈ N and x ∈ X define Tn (x) = ∑nj=1 ck hx, ek iX f k . Then this operator is compact. It is easy to see that limn→∞ kTn − Tk = 0. So T is compact. Conversely, if T is compact but ∞ (|ck |)∞ k=1 does not approach 0, then there is an ε0 > 0 and a subsequence {ck j } j=1 such that |ck j | ≥ ε0 for all j ∈ N. Consequently, j, l ∈ N with j 6= l implies kT(ek j ) − T(ekl )kY2 = kc jk f jk − c jl f jl kY2 = |c jk |2 + |c jl |2 ≥ 2ε20 and then no subsequence of {T(ek j )}∞j=1 is convergent, i.e., T is not compact, a contradiction. (iii) This follows from T(e j ) = c j f j . (iv) Since f k ∈ R(T) for any k ∈ N with ck 6= 0, the desired equivalence follows from the linear independence of { f k }∞ k=1.

References [AmT] Ambrosio, L., Tilli, P.: Topics on Analysis in Metric Spaces, Oxford University Press, 2004. [Apo] Apostol, T. M.: Mathematical Analysis, Addison-Wesley Publishing Company, Inc. 1964. [Bol]

Bollob´as, B.: Linear Analysis, Combridge University Press, 1990.

[Bur]

Burk, F.: Lebesgue Measure and Integration, John Wiley & Sons, Inc., 1998.

[Cain] Cain, G. L.: Introduction to General Topology, Addison-Wesley Publishing Company, 1993. [CarB] Carter, M., Brunt, B.V.: The Lebesgue-Stieltjes Integral, UTM, Springer-Verlag New York, Inc., 2000. [Cha] Chae, Soo B.: Lebesgue Integration, Springer-Verlag; 2nd Edition, 1995. [Con] Conway, J. B.: A Course in Functional Analysis, GTM, Springer-Verlag New York, Inc., 1985. [CuP] Curtain, R. F., Pritchard, A. J.: Functional Analysis in Modern Applied Mathematics, Mathematics in Science and Engineering 132, Academic Press Inc. (London) Ltd., 1977. [DaSe] Dangello, F., Seyfried, M.: Introductory Real Analysis, Houghton Mifflin Company, 2000.

390

References

[DeS] DePree J., Swartz, C.: Introduction to Real Analysis, New York, Toronto, Wiley, 1988. [deS]

de Souza, P. N., Silva, J. N.: Berkeley Problems in Mathematics, 2nd edition, Springer, 2001.

[EiMT] Eidelman, Y., Milman, V., Tsolomitis, A.: Functional Analysis: An Introduction, GSM 66, American Mathematical Society, Providence, Rhode Island, 2004. [EvG] Evans, L. C.: Measure Theory and Fine Properties of Functions, CRC Press, 1992. [Fla]

Flattto, L.: Advanced Calculus, The Williams & Wilkins Company, 1976.

[Fol]

Folland, G. B.: Real Analysis, Modern Techniques and Their Applications, Pure & Applied Mathematics, A Wiley-Interscience Series of Texts, Monographs, and Tracts, John Wiley & Sons, Inc., 1984.

[Frid] Fridy, J. A.: Introductory Analysis: The Theory of Calculus (2nd Edition), San Diego, Toronto, Harcourt Brace Jovanovich, Academic Press, 2000. [Frie] Friedman A.: Foundations of Modern Analysis, Holt, Rinehart and Winston, Inc., 1970. [GaN] Gaskill, H. S, Narayanaswami, P. P.: Elements of Real Analysis, Upper Saddle River, NJ, Prentice Hall, 1998. [Geo] George, C.: Exercises in Integration, Springer-Verlag New York, Inc., 1984. [Gro] Groetsch, C. W.: Elements of Applicable Functional Analysis, Marcel Dekker, Inc., New York and Basel, 1980. [Hil]

Hille, E.: Methods in Classical and Functional Analysis, AddisonWesley Publishing Company, Inc., 1972.

[KolF] Kolmogorov, A. N., Fomin, S. V.: Elements of the Theory of Functions and Functional Analysis, Vol. 1 & 2, Graylock Press, 1957.

References

391

[KuS] Kurtz, D. S., Swartz, C. W.: Theories of Integrations: The Integrals of Riemann, Lebesgue, Henstock-Kurzweil, and Mcshane, Series in Real Analysis 9, World Scientific Publishing Co. Pte. Ltd., 2004. [Nie]

Nielsen, O. A.: An Introduction to Integration and Measure Theory, Canadian Mathematical Socity Series of Monographs and Advanced Texts, John Wiley & Sons, Inc. 1997.

[Pac]

Packel, E. W.: Functional Analysis: A Short Course, Intext, Inc., 1974.

[Pit]

Pitt, H. R.: Measure and Integration for Use, Clarendon Press, Oxford, 1985.

[Ros] Rosenlicht, M.: Introduction to Analysis, Scott, Foresman and Company, 1968 [Roy] Royden, H. L.: Real Analysis, Prentice-Hall, Inc., 1998. [Rud] Rudin, W.: Functional Analysis, Second Edition, McGraw-Hill, Inc., 1991. [RyY] Rynne, B. P., Youngson, M. A.: Linear Functional Analysis, SUMS, Springer-Verlag London Limited, 2000. [Sax]

Saxe, K., Beginning Functional Analysis, UTM, Springer-Verlag New York, Inc., 2002.

[Sch]

Schechter, M.: Principles of Functional Analysis, Academic Press, Inc., 1971.

[Tor]

Torchinisky, A.: Real Variables, Addison-Wesley Publishing Company, 1988.

[Ves]

Vestrup, E. W.: The Theory of Measures and Integration, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., 2003.

[War] Ward, T. B. : Functional Analysis Lecture Notes, https://archive.org/ details/TB−Ward−− Functional−analysis−lecture−notes/mode/2up.

392

References

[You] Young, N.: An Introduction to Hilbert Space, Cambridge University Press, 1998. [Zim] Zimmer, R. J.: Essential Results of Functional Analysis, The University of Chicago Press, 1990.

About the Author Jie Xiao University Research Professor Memorial University 





Education-Employment-Honor: o PhD Peking, 1992; Alexander von Humboldt Research Fellow, 1999-2000; o Memorial’s Dean of Science Distinguished Scholar Medal, 2009; o Memorial’s University Research Professor, 2016-present; Recognized on Stanford University List of World’s Top 2% Scientists, 2020 199 Publications including: o 9 papers in Adv. Math; o 4 papers in Math. Ann.; o 5 papers in Calc Vari.; o 2 papers in J. Math. Pures Appl.; o 1 paper in J. Euro. Math. Soc.; o 1 paper in Ann. Inst. H. Poincar'e Anal. Non Lin'eaire; o 3 monographs; o 2 textbooks Services including: o Editor-in-Chief, Advances in Analysis and Geometry, 2018-present; o Editor-in-Chief of the Canadian Mathematical Bulletin, 2014-2019; o Associate Editor of the Journal of Mathematical Analysis and Applications, 2007-present

Index ¯-measurable, 127 œ-ring, 110 g-null set, 71 mg -measurable, 107 mg -null function, 98 mg -null set, 98 g1 ⊗ g2 -measure, 113 absolutely convergent, 80, 256 adjoint operator, 290, 311 admissible sequences, 90 affine subset, 227 Alaoglu’s theorem, 286 algebraic closure, 33, 63 algebraically isomorphic, 233 angle, 301 antisymmetry, 8 Arzela-Ascoli theorem, 210 Ascoli’s theorem, 212 associativity, 10 axiom of choice, 10 Banach algebra, 259, 292 Banach fixed point theorem, 212 Banach-Zarecki theorem, 151 basis, 228, 308 Bessel’s inequality, 308 Bonnet’s form, 77 braces, 1

cardinality, 6 Cartesian product, 3, 238 Cauchy’s convergence criterion, 45 Cauchy’s criterion, 29 Cauchy’s sequence, 17 Cauchy-Schwarz inequality, 34 chain, 8 Chebychev polynomials, 306 closed graph theorem, 266 closed set, 13 cluster point, 13 codimension, 230 commutativity, 10 compact metric space, 188, 204 compact self-adjoint, 325 comparison test, 46 completeness of R, 18 complex linear space, 227 composition, 69 conditionally convergent, 80 connected metric space, 207 continuous spectrum, 322 p convergence theorem in LRSg (R), 122 convolution, 50, 292 countable additivity, 89 countably additive, 143 countably subadditive, 110

396 Darboux characterization of integrability, 32 Darboux’s criterion, 29 DeMorgan law, 2 differential operator, 322 Dirac sequence, 50 direct sum, 302 disconnected, 179 distributivity, 10 domain, 4 dominated convergence, 100 eigenspace, 326 eigenvalue, 322 eigenvector, 322 equivalence relation, 3 equivalent, 218, 234 existence of inverses, 10 existence of neutrals, 10 factor space, 230 Fatou’s lemma, 104 finite intersection property, 285 finite rank, 321 finite subcover, 187 first and second mean value theorem, 75 first category, 194 four one-sided derivatives, 144 Fourier series, 309 Fourier transform, 318 Fredholm integral equation of the second kind, 213 Fubini-Tonelli’s theorem, 114 fundamental theorem of Lebesgue integration, 151 Gamma function, 49

Index general fixed point theorem, 214 general form of Minkowski’s inequality, 120 generalized Cauchy-Schwartz inequality, 315 generalized mean value theorem, 41 Gram-Schmidt process, 305 H¨older inequality, 118 Hahn-Banach extension theorem, 244 Hanner’s inequality, 273 Hardy-Littlewood’s maximal theorem, 166 Hermite polynomials, 306 homogeneity, 228 image, 4 imaginary, 2 infima, 9 infinite-dimensional, 228 infinitely differentiable, 226 injective, 4 inner product, 297 integrand, 61 integrator, 61 interior point, 13 inverse mappings, 266 isometric, 234 isometrically isomorphic, 218 Jacobi polynomials, 306 kernel, 51 Krein’s extension theorem, 253 Laguerre polynomials, 306 Laplace transform, 55 layer cake representation, 125

Index Lebesgue’s decomposition theorem, 170 Lebesgue’s differentiation theorem, 165 Lebesgue’s integration-by-part formula, 157 Lebesgue’s property, 193 Lebesgue’s theorem on the derivative, 146 Lebesgue-Radon-Stieltjes space, 121 Legendre polynomials, 306 linear span, 227 Lipschitz class, 148 Lipschitz condition, 216 LUB-axiom, 11 maximal ideal, 10 Minkowski’s inequality, 118 monotone convergence, 103 monotone convergence theorem, 17 nested downward (respectively upward) sequence, 13 nonnegative operator, 315 normal operator, 319 nowhere dense, 194 objects, 1 open mapping theorem, 264 open set, 13 orthogonal, 301 orthogonal complement, 301 orthogonal projection, 321 orthonormal basis, 308 parallelogram law, 298 Parseval’s identity, 309

397 partition, 21 Picard’s existence and uniqueness theorem, 216 point spectrum, 322 polynomial approximation, 50 pre-image, 4 product topological space, 285 projection, 320 projection operator, 320 range, 4 real linear space, 227 rectangle, 113 rectifiable, 83 refinement, 30 reflexive, 243, 287 reflexivity, 3, 8 relation, 3 residual spectrum, 322 Riemann integrable, 22 Riemann sum, 22 Riemann-Stieltjes integrable, 61 Riemann-Stieltjes sum, 61 Riesz representation theorem, 277, 303 second category, 194 self-adjoint or symmetric operator, 313 self-reflexive, 304 semi-norm, 228 similar, 218 simple sets, 88 spectral radius, 322 spectrum, 322 strongly convergent, 283 summable functions, 90 suprema, 9

398 surjective, 4 symmetry, 3 The Bolzano-Weierstrass property, 14, 184 The bounded sequence property, 184 The Heine-Borel property, 14, 184 The nested interval property, 14 The nested set property, 184 topologically isomorphic, 234 topologically isomorphic or homeomorphic, 218 total/linear order, 8 totally bounded, 189 totalness, 8 transitivity, 3, 8 triangle inequality, 228 Tychonoff’s theorem, 285 uniform

boundedness principle, 261 uniform convergence, 208 uniform limit, 289 uniformly continuous, 36 uniformly isomorphic, 218 unit sphere, 237 unitary, 317 unitary operator, 318 variation, 57 Volterra integral equation, 215 weak compactness, 287 weak topology, 282 weak* closed, 284 weak* compact, 284 weak* convergent, 283

Index weak* open, 284 weakly closed, 282 weakly compact, 282 weakly convergent, 283 weakly open, 282 Weierstrass approximation, 52 Zorn’s lemma, 9