# Fourier Series, Fourier Transforms, and Function Spaces: A Second Course in Analysis (AMS/MAA Textbooks) 147045145X, 9781470451455

Fourier Series, Fourier Transforms, and Function Spaces is designed as a textbook for a second course or capstone course

881 86 2MB

English Pages 354 [371] Year 2020

Cover
Title page
Contents
Introduction
Chapter 1. Overture
1.1. Mathematical motivation: Series of functions
1.2. Physical motivation: Acoustics
Part 1 Complex functions of a real variable
Chapter 2. Real and complex numbers
2.1. Axioms for the real numbers
2.2. Complex numbers
2.3. Metrics and metric spaces
2.4. Sequences in C and other metric spaces
2.5. Completeness in metric spaces
2.6. The topology of metric spaces
Chapter 3. Complex-valued calculus
3.1. Continuity and limits
3.2. Differentiation
3.3. The Riemann integral: Definition
3.4. The Riemann integral: Properties
3.5. The Fundamental Theorem of Calculus
3.6. Other results from calculus
Chapter 4. Series of functions
4.1. Infinite series
4.2. Sequences and series of functions
4.3. Uniform convergence
4.4. Power series
4.5. Exponential and trigonometric functions
4.7. The Schwartz space
4.8. Integration on R
Part 2 Fourier series and Hilbert spaces
Chapter 5. The idea of a function space
5.1. Which clock keeps better time?
5.2. Function spaces and metrics
5.3. Dot products
Chapter 6. Fourier series
6.1. Fourier polynomials
6.2. Fourier series
6.3. Real Fourier series
6.4. Convergence of Fourier series* of differentiable functions
Chapter 7. Hilbert spaces
7.1. Inner product spaces
7.2. Normed spaces
7.3. Orthogonal sets and bases
7.4. The Lebesgue integral: Measure zero
7.5. The Lebesgue integral: Axioms
7.6. Hilbert spaces
Chapter 8. Convergence of Fourier series
8.1. Fourier series in ?²(?¹)
8.2. Convolutions
8.3. Dirac kernels
8.4. Proof of the Inversion Theorem
8.5. Applications of Fourier series
Part 3 Operators and differential equations
Chapter 9. PDEs and diagonalization
9.1. Some PDEs from classical physics
9.2. Schrödinger’s equation
9.3. Diagonalization
Chapter 10. Operators on Hilbert spaces
10.1. Operators on Hilbert spaces
10.2. Hermitian and positive operators
10.3. Eigenvectors and eigenvalues
10.4. Eigenbases
Chapter 11. Eigenbases and differential equations
11.1. The heat equation on the circle
11.2. The eigenbasis method
11.3. The wave equation on the circle
11.4. Boundary value problems
11.5. Legendre polynomials
11.6. Hermite functions
11.7. The quantum harmonic oscillator
11.8. Sturm-Liouville theory
Part 4 The Fourier transform and beyond
Chapter 12. The Fourier transform
12.1. The big picture
12.2. Convolutions, Dirac kernels, and calculus on R
12.3. The Fourier transform on schwartz
12.4. Inversion and the Plancherel theorem
12.5. The ?² Fourier transform
Chapter 13. Applications of the Fourier transform
13.1. A table of Fourier transforms
13.2. Linear differential equations with constant coefficients
13.3. The heat and wave equations on R
13.4. An eigenbasis for the Fourier transform
13.5. Continuous-valued quantum observables
13.6. Poisson summation and theta functions
13.7. Miscellaneous applications of the Fourier transform
Chapter 14. What’s next?
14.1. What’s next: More analysis
14.2. What’s next: Signal processing and distributions
14.3. What’s next: Wavelets
14.4. What’s next: Quantum mechanics
14.5. What’s next: Spectra and number theory
14.6. What’s next: Harmonic analysis on groups
Appendices
Appendix A. Rearrangements of series
Appendix B. Linear algebra
Appendix C. Bump functions
Appendix D. Suggestions for problems
Bibliography
Index
Index
Back Cover

##### Citation preview

AMS / MAA

TEXTBOOKS

VOL 59

Fourier Series, Fourier Transforms, and Function Spaces A Second Course in Analysis Tim Hsu

Fourier Series, Fourier Transforms, and Function Spaces: A Second Course in Analysis

AMS/MAA

TEXTBOOKS

VOL 59

Fourier Series, Fourier Transforms, and Function Spaces: A Second Course in Analysis Tim Hsu

MAA Textbooks Editorial Board Stanley E. Seltzer, Editor Matthias Beck Debra Susan Carney Heather Ann Dye William Robert Green

Suzanne Lynne Larson Jeffrey L. Stuart Michael J. McAsey Ron D. Taylor, Jr. Virginia A. Noonburg Elizabeth Thoren Thomas C. Ratliff Ruth Vanderpool

2010 Mathematics Subject Classification. Primary 26-01, 42-01.

Library of Congress Cataloging-in-Publication Data Names: Hsu, Tim (Timothy Ming-Jeng), 1969- author. Title: Fourier series, Fourier transforms, and function spaces : a second course in analysis / Tim Hsu. Description: Providence, Rhode Island : MAA Press, an imprint of the American Mathematical Society, [2020] | Series: AMS/MAA textbooks ; volume 59 | Includes bibliographical references and index. Identifiers: LCCN 2019040897 | ISBN 9781470451455 (hardback) | ISBN 9781470455194 (ebook) Subjects: LCSH: Fourier analysis. | Fourier series. | Fourier transformations. | Function spaces. | AMS: Real functions [See also 54C30] – Instructional exposition (textbooks, tutorial papers, etc.). | Harmonic analysis on Euclidean spaces – Instructional exposition (textbooks, tutorial papers, etc.). Classification: LCC QA403.5 .H785 2020 | DDC 515/.2433–dc23 LC record available at https://lccn.loc.gov/2019040897

established to ensure permanence and durability. Visit the AMS home page at https://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

25 24 23 22 21 20

For my parents, Yu Kao and Martha Hsu

Contents Introduction

xi

1

1 1 3

Overture 1.1 Mathematical motivation: Series of functions 1.2 Physical motivation: Acoustics

Part 1 Complex functions of a real variable

7

2

Real and complex numbers 2.1 Axioms for the real numbers 2.2 Complex numbers 2.3 Metrics and metric spaces 2.4 Sequences in 𝐂 and other metric spaces 2.5 Completeness in metric spaces 2.6 The topology of metric spaces

9 9 13 14 17 23 25

3

Complex-valued calculus 3.1 Continuity and limits 3.2 Differentiation 3.3 The Riemann integral: Definition 3.4 The Riemann integral: Properties 3.5 The Fundamental Theorem of Calculus 3.6 Other results from calculus

31 32 40 45 52 58 62

4

Series of functions 4.1 Infinite series 4.2 Sequences and series of functions 4.3 Uniform convergence 4.4 Power series 4.5 Exponential and trigonometric functions 4.6 More about exponential functions 4.7 The Schwartz space 4.8 Integration on 𝐑

73 74 80 84 95 96 101 104 105

vii

viii

Contents

Part 2 Fourier series and Hilbert spaces

113

5

The idea of a function space 5.1 Which clock keeps better time? 5.2 Function spaces and metrics 5.3 Dot products

115 115 117 121

6

Fourier series 6.1 Fourier polynomials 6.2 Fourier series 6.3 Real Fourier series 6.4 Convergence of Fourier series of differentiable functions

125 125 127 132 136

7

Hilbert spaces 7.1 Inner product spaces 7.2 Normed spaces 7.3 Orthogonal sets and bases 7.4 The Lebesgue integral: Measure zero 7.5 The Lebesgue integral: Axioms 7.6 Hilbert spaces

139 139 144 150 156 162 171

8

Convergence of Fourier series 8.1 Fourier series in 𝐿2 (𝑆 1 ) 8.2 Convolutions 8.3 Dirac kernels 8.4 Proof of the Inversion Theorem 8.5 Applications of Fourier series

177 177 179 180 185 189

Part 3 Operators and differential equations

201

PDEs and diagonalization 9.1 Some PDEs from classical physics 9.2 Schrödinger’s equation 9.3 Diagonalization

203 203 208 210

10 Operators on Hilbert spaces 10.1 Operators on Hilbert spaces 10.2 Hermitian and positive operators 10.3 Eigenvectors and eigenvalues 10.4 Eigenbases

213 213 218 222 225

11 Eigenbases and differential equations 11.1 The heat equation on the circle 11.2 The eigenbasis method 11.3 The wave equation on the circle 11.4 Boundary value problems 11.5 Legendre polynomials

229 230 235 237 244 250

9

Contents

ix

11.6 Hermite functions 11.7 The quantum harmonic oscillator 11.8 Sturm-Liouville theory

254 257 259

Part 4 The Fourier transform and beyond

261

12 The Fourier transform 12.1 The big picture 12.2 Convolutions, Dirac kernels, and calculus on 𝐑 12.3 The Fourier transform on 𝒮(𝐑) 12.4 Inversion and the Plancherel theorem 12.5 The 𝐿2 Fourier transform

263 263 266 271 273 276

13 Applications of the Fourier transform 13.1 A table of Fourier transforms 13.2 Linear differential equations with constant coefficients 13.3 The heat and wave equations on 𝐑 13.4 An eigenbasis for the Fourier transform 13.5 Continuous-valued quantum observables 13.6 Poisson summation and theta functions 13.7 Miscellaneous applications of the Fourier transform

281 281 283 285 289 291 296 301

14 What’s next? 14.1 What’s next: 14.2 What’s next: 14.3 What’s next: 14.4 What’s next: 14.5 What’s next: 14.6 What’s next:

305 306 306 308 310 314 316

More analysis Signal processing and distributions Wavelets Quantum mechanics Spectra and number theory Harmonic analysis on groups

Appendices

319

A Rearrangements of series

319

B Linear algebra

323

C Bump functions

327

D Suggestions for problems

331

Bibliography

343

Index of Selected Notation

347

Index

349

Introduction Life is uncertain. Eat dessert first. — Ernestine Ulmer This book gives a different answer to the question: What should be covered in a second undergraduate course in analysis? One standard approach to undergraduate Analysis II is to study the theory of integration (either Riemann, Lebesgue, or both). This has the virtue of preparing students for graduate study, or even giving them a head start on their graduate classes. However, for students not headed to a doctoral program, and especially for students thinking about Analysis II as the last analysis class they’ll ever take (or even the last math class they’ll ever take), studying the theory of integration runs the risk of getting involved in a significant amount of technical detail without ever seeing why we would find Lebesgue integration useful. Instead, in this course, we eat dessert first, with the following benefits: •

Applications: One of our main goals is to show that the theory you learn in analysis can be used to solve “applied” problems rigorously in many different subjects, ranging from partial differential equations, to mathematical quantum mechanics, to signal processing (and also number theory, if you’re willing to call that an application).

Theory: Nevertheless, this is definitely a theory course, in which everything is proven rigorously (with one major exception; see below). Furthermore, students will encounter big, fundamental ideas that show up everywhere in mathematics (and statistics!): metrics, function spaces, problems in convergence, solving analysis problems in terms of linear algebra, and different kinds of approximations.

Future motivation: And finally, even though the design of this book comes from thinking about “the last analysis course you’ll ever take,” students who continue on to graduate study in analysis will actually learn things that are complementary to their first graduate analysis courses, in that they’ll learn why the Lebesgue integral is necessary before they dig in to see how it’s defined.

Of course, there’s a catch: To get to dessert, we skip the dinner of developing the theory of the Lebesgue integral. Instead, we axiomatize the properties that we need it to have and simply stipulate that it exists; see Section 7.5 if you want to check out how this works. Note that we lose no applications, since all of the concrete calculations we need can be done using Riemann integration, and the reader looking for rigor can fill in the (considerable!) gap later in a course on Lebesgue integration. xi

xii

Introduction

We now briefly describe the rest of this book. After the equivalent of the overture of a musical or opera (Chapter 1), Part 1 of the book proper can be thought of as a “reboot” of Analysis I: We review the fundamental theory of functions of one real variable, but we revamp the material to allow complex-valued outputs (and occasionally, inputs). Sometimes the content of real-valued functions carries over almost intact, and sometimes we have to repeat or apply the old arguments twice, but the punchline is that we obtain much the same results, with greater generality. Note that of the material in Part 1, Chapter 4 is the chapter most likely to be new to the reader and is also the chapter referred to most often in the rest of the book, as one of our main concerns is the convergence of series of functions. Part 2 starts our new material by asking the question: How can we determine the “best” approximation of some particular type to a specified function? This leads us to framing the approximation problem not just in terms of (pointwise) convergence, but in terms of function spaces and the 𝐿2 metric. We then present one solution to the approximation problem in the form of Fourier series, which are, in a sense to be made precise, the best possible approximations to a (sufficiently nice) periodic function as an infinite series of trigonometric functions. In Part 3, we apply the theory of function spaces and Fourier series to solve problems from partial differential equations and quantum mechanics. For much of this material, the basic idea is to express derivatives and integrals in terms of operators on function spaces and to express a given problem in terms of linear algebra, or, specifically, as an eigenvalue problem. Part 4 gives an introduction to the Fourier transform, which one can think of as a continuous analogue of Fourier series. After establishing the fundamental theory, we return to applications, including second looks at PDEs and quantum mechanics and a look at the interplay between the Fourier transform and Fourier series. We conclude the book with a brief survey of what the reader might choose to learn next.

About problems in this book. Q: What did one math book say to the other math book? A: I’ve got a lot of problems. — National Geographic Kids Almanac 2017 The reader will quickly notice that many, even most, of the results in this book do not come with proofs. Instead, the proofs are left to be done either by the reader or in class, by the instructor; indeed, proving important results of the book is the goal of the great majority of its 400 problems. The idea is that students can cement their mastery of Analysis I by applying it to build what comes next. To suggest some ground rules: •

Problems that prove results of the text are marked with a note like (Proves Theorem x.y.z). To maintain logical consistency, students should only use results appearing before Theorem x.y.z in the book in the proof of Theorem x.y.z. An exception to this rule comes when, for motivational purposes, we introduce the statement of a theorem long before its proof; in those cases, the reader can use results up to the point where the proof occurs, except, of course, for the result to be proven.

For the most part, we provide structured approaches for solving problems, and Appendix D also has suggestions for key ideas to apply in many problems. We hope we

Introduction

xiii

have included enough scaffolding and suggestions to make the problems tractable for students who understand Analysis I well; we especially hope that readers using this book for self-study find the suggestions to be helpful. •

On the other hand, some problems, marked by (*) , have less scaffolding and have no corresponding suggestions in Appendix D. These starred problems are not necessarily more difficult per se, though some of them definitely take some work and a few are more open-ended, but they do require more independence on the part of the reader. Note that nearly all of the problems in Sections 2.6, 8.5, 11.3, 11.5–11.6, 13.3–13.4, and 13.6–13.7 are starred, and note that those sections are mostly not required for later material. Therefore, once the main theory is established, instructors looking to challenge students’ understanding and independence can select a few topics from these sections to cover, possibly only lightly, and then assign some of the accompanying starred problems.

How to use this book in a course. One reasonable approach to using the problems in this book in a lecture-based class is to do a few proofs (i.e., a few problems) from a given section in lecture, as models; assign some problems as homework, perhaps saving some for in-class exams; and omit the others for the sake of time. The structure of the book also makes it well suited for an undergraduate capstone course or an independent study/reading course. The amount of the book that can be covered in one semester depends on how much background the instructor chooses to rely on, as this book is designed to be accessible to students with a range of preparation from Analysis I. Specifically, most first courses will have covered through differentiation and the Mean Value Theorem (Section 3.2), albeit only in 𝐑 and not in 𝐂; still others will have reached the Fundamental Theorems of Calculus (Section 3.5); and ambitious courses may have covered series of functions, perhaps up to power series (Section 4.4). In any case, we recommend covering Chapters 2 and 3 at least at the level of review, so students can become familiar with complex-valued versions of the usual Analysis I material, and again, Chapter 4 is likely to be less familiar to even well-prepared students. With that in mind, even if the review material is included, one should still be able to get to the heat and wave equations in Chapter 11 comfortably. Alternatively, one should also be able to cover the Fourier transform (Chapter 12) and possibly some of the further applications (Chapter 13) by either covering the early material more lightly (for students with more background) or by reducing the amount of coverage of Chapters 9– 11; indeed, from Chapters 9–11, really only Section 10.1 is necessary for Chapters 12 and 13. A third option would be to cover only Parts 1 and 2 (i.e., Chapters 1–8), but in greater depth, and treat the Inversion Theorem for Fourier Series as the culmination of the course; one could then also spend more time on optional material like Sections 2.6 and 8.5. As for which sections can be skipped without harming subsequent material, from Part 1, Section 2.6 is optional, and Sections 4.7 and 4.8 are only used for the Fourier transform. From Part 2, all of Section 8.5 except Subsection 8.5.1 is optional. In Part 3, Sections 11.1 and 11.2 establish the basic methods of Chapter 11, and after that, instructors or readers can choose applications suited to their interests. Similarly, in Part 4, the

xiv

Introduction

applications in Chapter 13 are fairly independent of each other, as are the sections of Chapter 14.

Acknowledgements. I would first like to thank my students who have used early versions of this book over the years. I would particularly like to thank the students who made helpful suggestions about exposition and content, including Jeff Cavallaro, Charles Petersen, Roy Araiza, James Horine, Albert Chang, Liliana Gonzalez, Khai Le, and Dylan ArceJaeger. Extra thanks to Paul Aoki for a very thorough list of suggestions. Thanks also to my physics colleague Ken Wharton for sanity-checking the material on quantum mechanics, and thanks to William Green of the Rose-Hulman Institute of Technology and my colleague Dashiell Fryer for test-driving this book in their own classes. And thanks to MAA/AMS editors Stephen Kennedy, Stan Seltzer, and Christine Thivierge, and the anonymous reviewers of this book, for about as great an editorial process as I could ask for. Finally, thanks especially to my family for their patience and support through the long process of writing this book.

1 Overture In this chapter, we briefly introduce some of the main themes found throughout this book. Specifically, in Section 1.1, we introduce some mathematical problems that motivate what we study in this book, and in Section 1.2, we introduce just one of the physical applications that motivate what we study.

1.1 Mathematical motivation: Series of functions As you may have seen in Analysis I, or in calculus, the exponential function 𝑒𝑥 is equal to the following infinite series for all real 𝑥: ∞

𝑥𝑛 𝑥2 𝑥3 =1+𝑥+ + + ⋯. 𝑛! 2! 3! 𝑛=0

𝑒𝑥 = ∑

(1.1.1)

If you’ve forgotten from analysis (or calculus) what it means for a function to be equal to an infinite series, we’ll go over all of that again in Section 4.1; in fact, in Section 4.5, we will actually use (1.1.1) to give a rigorous definition of the exponential function. For now, it’s enough to remember that an infinite series is the limit of its partial sums, that is, the limit as 𝑁 → ∞ of the sum of its terms up through term number 𝑁. In some sense, one main goal of this course is to make sense of the somewhat similar-looking ∞

𝑥 = ∑ (−1)𝑛+1 ( 𝑛=1

sin(2𝜋𝑛𝑥) ) 𝑛𝜋

(1.1.2)

sin(2𝜋𝑥) sin(4𝜋𝑥) sin(6𝜋𝑥) sin(8𝜋𝑥) = − + − + ⋯, 𝜋 2𝜋 3𝜋 4𝜋 1

1

which, as it turns out, holds for − < 𝑥 < . 2 2 However, at this point, there are several questions you might ask about (1.1.2): •

Why on earth would you want to replace the function 𝑥 with the stuff on the righthand side of (1.1.2)? 1

2

Chapter 1. Overture

If you think back to what you know about trig functions, the right-hand side of (1.1.2) is periodic with period 1; how could it be equal to 𝑥? Moreover, it can be shown (Problem 1.1.1) that the right-hand side must be equal to 0 when 𝑥 is any 1 1 integer multiple of . (At least that explains why (1.1.2) fails for 𝑥 = ± .) 2

2

How on earth can we prove something like (1.1.2)?

Figure 1.1.1. The two sides of (1.1.2), with partial sums 10, 15, and 20 See Section 1.2 for an application that gives one answer to the first question. An answer to the second question, and another partial answer to the first, comes from comparing the graphs of 𝑥 and the partial sums of the series in (1.1.2), as shown in Figure 1.1.1. We see that the mysterious series of (1.1.2) is not really approximating the function 𝑓(𝑥) = 𝑥; it’s really approximating the function you get by taking 𝑓(𝑥) = 𝑥 1 1 for − < 𝑥 < and repeating it along the real line with period 1. 2 2 The last question is harder, so for the moment, we’ll answer it with a question few people would ask at this point: •

In what sense do we mean “=” in (1.1.2)?

The idea that there might be different ways to say that functions are equal, or approximately equal, is one of the central ideas of this book and also leads to the very useful idea of function space that is the focus of Part 2. But we’re getting ahead of ourselves here, so let’s return to another motivating problem. Again, as you may have seen in Analysis I, or calculus, we can take the derivative of the right-hand side of (1.1.1) using term-by-term differentiation: ∞

𝑑 𝑥𝑛 𝑑 𝑥𝑛 𝑥𝑛−1 𝑥𝑘 ∑ = ∑ = ∑ = 𝑒𝑥 . ( )= ∑ 𝑑𝑥 𝑛=0 𝑛! 𝑑𝑥 𝑛! (𝑛 − 1)! 𝑘=0 𝑘! 𝑛=0 𝑛=1

(1.1.3)

1.2. Physical motivation: Acoustics

3

Note that the 𝑛 = 0 term drops out because we are taking the derivative of a constant, and we make the substitution 𝑘 = 𝑛 − 1 to get the final equality. In any case, assuming 𝑑 𝑑 we can push into an infinite sum the same as we can push into a finite sum, 𝑑𝑥 𝑑𝑥 we see that 𝑒𝑥 is its own derivative, or in other words, 𝑒𝑥 is a solution to the differential equation 𝑦′ = 𝑦. As we will see in Part 3, one of the main applications of series like the one in (1.1.2) is in solving differential equations, which means that it would be very helpful to have term-by-term differentiation available to us. However, operations that work well with power series can go very wrong with other series. For example, if we try to take the derivative of the series in (1.1.2) term-by-term, we get ∞

𝑑 sin(2𝜋𝑛𝑥) ? 𝑑 sin(2𝜋𝑛𝑥) ∑ (−1)𝑛+1 ( ) = ∑ (−1)𝑛+1 ( ) 𝑑𝑥 𝑛=1 𝑛𝜋 𝑑𝑥 𝑛𝜋 𝑛=1 ∞

= ∑ (−1)

𝑛+1

(1.1.4)

2 cos(2𝜋𝑛𝑥).

𝑛=1

As it turns out, the right-hand side of (1.1.4) diverges for every value of 𝑥 (see Problem 1.1.2); in any case, it certainly bears no resemblance to the “correct” derivative of 1 1 (for 𝑥 not an odd integer multiple of ). 2 The moral here is that even if we have a series expansion for a function, as in (1.1.2), there is no reason to think we can take term-by-term derivatives of that series and still be sure that everything works. Again, finding conditions under which series like (1.1.2) are “durable” enough to survive taking derivatives is another one of the central problems of this course.

Problems. 1.1.1. Use your calculus knowledge of trig functions to prove that the right-hand side of (1.1.2) is equal to 0 whenever 𝑥 = 𝑘/2 for some 𝑘 ∈ 𝐙. 1.1.2. Use your calculus knowledge of trig functions to prove that if 𝑥 = 𝑝/𝑞, where 𝑝, 𝑞 ∈ 𝐙, 𝑞 ≠ 0, then cos(2𝜋𝑛𝑥) = 1 for infinitely many 𝑛 > 0. (It follows that lim cos(2𝜋𝑛𝑥) ≠ 0, and therefore, by the 𝑛th term test from calculus, Analysis I, 𝑛→∞

or Corollary 4.1.10, the series in (1.1.4) diverges.) Note: If 𝑥 is irrational, one can also show, with more effort, that cos(2𝜋𝑛𝑥) is very close to 1 for infinitely many 𝑛 > 0, with much the same consequences.

1.2 Physical motivation: Acoustics For real-valued functions, the fundamental mathematical problem of Part 2 of this book can be expressed in the following (slightly simplified) manner: Suppose we have a function 𝑓 on 𝐑 that is periodic with period 1 (i.e., 𝑓(𝑡 + 1) = 𝑓(𝑡) for all 𝑡 ∈ 𝐑). When can we express 𝑓 as a series of the form ∞

𝑓(𝑡) = 𝑎0 + ∑ (𝑎𝑛 cos(2𝜋𝑛𝑡) + 𝑏𝑛 sin(2𝜋𝑛𝑡))? 𝑛=1

(1.2.1)

4

Chapter 1. Overture

A series of the form (1.2.1), or its complex-valued generalization that we will see in Chapter 6, is called a Fourier series. While the above question is both concise and relatively precise, it may seem somewhat unmotivated to the first-time reader. One motivation for this question comes from the study of acoustics, or the study of sound waves. In mathematical acoustics, we model an idealized periodic sound wave, or tone, as a function 𝑓 ∶ 𝐑 → 𝐑 of time 𝑡 that is periodic with period 1. See Figure 1.2.1 for a simulation of what this might look like.

Figure 1.2.1. Two cycles of a simulated sound wave Acoustically, we may then interpret (1.2.1) as follows: •

The fact that 𝑓 has period 1 means that the fundamental frequency of the tone represented by 𝑓 is 1. (The reader should not worry that this somehow limits our discussion, as we may use this same setting to study tones having any fundamental frequency we want, by adjusting the units of time 𝑡 to be the reciprocal of the fundamental frequency.)

The summand 𝑎1 cos(2𝜋𝑡) + 𝑏1 sin(2𝜋𝑡) is called the first harmonic of the tone, and the quantity 𝑎21 + 𝑏12 represents the amount of energy of the tone contained in its first harmonic.

Similarly, the summand 𝑎2 cos(4𝜋𝑡) + 𝑏2 sin(4𝜋𝑡) is called the second harmonic of the tone, and 𝑎22 + 𝑏22 represents the amount of energy contained in the second harmonic. Using calculations that the reader may recall from precalculus, we see that the frequency of the second harmonic is twice the fundamental frequency.

In general, the summand 𝑎𝑛 cos(2𝜋𝑛𝑡) + 𝑏𝑛 sin(2𝜋𝑛𝑡) is called the 𝑛th harmonic of the tone, 𝑎2𝑛 + 𝑏𝑛2 represents the amount of energy contained in the 𝑛th harmonic, and the frequency of the 𝑛th harmonic is 𝑛 times the fundamental frequency.

The infinite sum ∑ (𝑎2𝑛 + 𝑏𝑛2 ), which, as we shall see, converges given only mild

∞ 𝑛=1

assumptions on 𝑓, represents the total energy of the tone. One of the central ideas of acoustics (see the epigraph to Chapter 6) is that the distinctive sound quality, or timbre, of a given tone, is determined by the relative strengths of its harmonics.

1.2. Physical motivation: Acoustics

5

Higher harmonics can be exhibited physically on almost any musical stringed instrument. For example, the first picture in Figure 1.2.2 shows the C string of a cello being played at a fundamental frequency of roughly 65.4 Hz (cycles per second). In the second picture, we see how lightly placing a finger exactly halfway down the C string suppresses the odd harmonics, leaving only the natural even harmonics of the C string and producing a sound that not only has a fundamental frequency of 130.8 Hz (in musical terms, up one octave) but also has a timbre that is purer, or at least less complex, than the ordinary sound of the cello C string. For mathematical details of this explanation, see Remark 11.4.6.

Figure 1.2.2. Harmonics in action on a cello We hope this brief discussion gives some idea of how obtaining a decomposition like (1.2.1) might provide a lot of information about a given tone. For a readable and mathematically sound introduction to the rest of the subject of Fourier series and acoustics, much of which should be accessible to a reader who understands Part 2 of this book, see Alm and Walker [AW02].

Part 1

Complex functions of a real variable

2 Real and complex numbers Feynman used to tell the most complex stories — part real, part imaginary. — Gen. Donald Kutyna, quoted in What Do You Care What People Think? by Richard Feynman In this chapter, after briefly reviewing the axiomatic characterization of the real numbers (Section 2.1), we define the complex numbers (Section 2.2) and establish the complex-valued versions of sequences and their key properties (Section 2.4). We also introduce the general concept of a metric space (Section 2.3), including completeness (Section 2.5) and the topology of metric spaces (Section 2.6).

2.1 Axioms for the real numbers A first course in analysis explains how the axioms for the real numbers (Definition 2.1.2) imply the usual properties of the real numbers, including calculus. In this section, we briefly summarize those axioms and some of their immediate consequences. Note that we present this material in condensed form to serve as a quick reference, so the reader should feel free to skim this section for now and refer back to it as necessary. For a more in-depth discussion, see, for example, Ross [Ros13] or Rudin [Rud76]. Notation 2.1.1. Throughout this book, 𝐍 denotes the natural numbers, 𝐙 denotes the integers, and 𝐐 denotes the rationals. We also choose the convention that the natural numbers are the positive integers, as opposed to the nonnegative integers. Definition 2.1.2. Let 𝑅 be a set on which two binary operations + and ⋅ and a binary relation ≤ are defined. Consider the following axioms on the system (𝑅, +, ⋅, ≤). (A1) For all 𝑎, 𝑏, 𝑐 ∈ 𝑅, (𝑎 + 𝑏) + 𝑐 = 𝑎 + (𝑏 + 𝑐). (A2) For all 𝑎, 𝑏 ∈ 𝑅, 𝑎 + 𝑏 = 𝑏 + 𝑎. (A3) There exists 0 ∈ 𝑅 such that for all 𝑎 ∈ 𝑅, 𝑎 + 0 = 𝑎. 9

10

Chapter 2. Real and complex numbers

(A4) For all 𝑎 ∈ 𝑅, there exists (−𝑎) ∈ 𝑅 such that 𝑎 + (−𝑎) = 0. (M1) For all 𝑎, 𝑏, 𝑐 ∈ 𝑅, (𝑎 ⋅ 𝑏) ⋅ 𝑐 = 𝑎 ⋅ (𝑏 ⋅ 𝑐). (M2) For all 𝑎, 𝑏 ∈ 𝑅, 𝑎 ⋅ 𝑏 = 𝑏 ⋅ 𝑎. (M3) There exists 1 ∈ 𝑅, s.t. for all 𝑎 ∈ 𝑅, 𝑎 ⋅ 1 = 𝑎. (DL) For all 𝑎, 𝑏, 𝑐 ∈ 𝑅, 𝑎 ⋅ (𝑏 + 𝑐) = 𝑎 ⋅ 𝑏 + 𝑎 ⋅ 𝑐. (F1) For all 𝑎 ≠ 0 in 𝑅, there exists 𝑎−1 ∈ 𝑅 such that 𝑎 ⋅ 𝑎−1 = 1. (F2) 1 ≠ 0. (O1) For all 𝑎, 𝑏 ∈ 𝑅, either 𝑎 ≤ 𝑏 or 𝑏 ≤ 𝑎. (O2) For all 𝑎, 𝑏 ∈ 𝑅, if 𝑎 ≤ 𝑏 and 𝑏 ≤ 𝑎, then 𝑎 = 𝑏. (O3) For all 𝑎, 𝑏, 𝑐 ∈ 𝑅, if 𝑎 ≤ 𝑏 and 𝑏 ≤ 𝑐, then 𝑎 ≤ 𝑐. (O4) For all 𝑎, 𝑏, 𝑐 ∈ 𝑅, if 𝑎 ≤ 𝑏, then 𝑎 + 𝑐 ≤ 𝑏 + 𝑐. (O5) For all 𝑎, 𝑏, 𝑐 ∈ 𝑅, if 𝑎 ≤ 𝑏 and 0 ≤ 𝑐, then 𝑎𝑐 ≤ 𝑏𝑐. (OC) (Order completeness) Every nonempty subset of 𝑅 that has an upper bound also has a least upper bound, or supremum. To give a bit more detail about (OC), for 𝑆 ⊆ 𝑅, 𝑆 ≠ ∅, to say that 𝑢 is an upper bound for 𝑆 means that for all 𝑥 ∈ 𝑆, 𝑥 ≤ 𝑢; and to say that 𝑈 is the supremum of 𝑆, or 𝑈 = sup 𝑆, means both that 𝑈 is an upper bound for 𝑆 and also that for any other upper bound 𝑢 for 𝑆, 𝑈 ≤ 𝑢. Similarly, to to say that ℓ is a lower bound for 𝑆 means that for all 𝑥 ∈ 𝑆, ℓ ≤ 𝑥; and to say that 𝐿 is the infimum, or greatest lower bound, of 𝑆 (written 𝐿 = inf 𝑆) means both that 𝐿 is a lower bound for 𝑆 and also that for any other lower bound ℓ for 𝑆, ℓ ≤ 𝐿. We also extend the meaning of sup and inf to unbounded sets: To say that sup 𝑆 = +∞ means that 𝑆 is not bounded above, and to say that inf 𝑆 = −∞ means that 𝑆 is not bounded below. If 𝑅 satisfies axioms (A1)–(A4), we say that 𝑅 is an additive abelian group, with additive identity 0. If 𝑅 satisfies (A1)–(A4), (M1)–(M3), and (DL), we say that 𝑅 is a commutative ring with unity, where the unity is 1. If 𝑅 satisfies (A1)–(A4), (M1)–(M3), (DL), and (F1)–(F2), we say that 𝑅 is a field. As the reader may know, the formal study of groups, rings, and fields is typically the focus of a year-long course in abstract algebra; see, for example, Gallian [Gal12]. Suppose 𝑅 is a field (i.e., 𝑅 satisfies (A1)–(A4), (M1)–(M3), (DL), and (F1)–(F2)). A relation ≤ on 𝑅 that satisfies (O1)–(O3) is called a total order on 𝑅; (O4) means that the ordering is preserved under addition, and (O5) means that the ordering is preserved under multiplication by “nonnegative” elements. If there exists some relation ≤ on 𝑅 that satisfies (O1)–(O5), we say that 𝑅 is orderable, and we say that 𝑅 with ≤ is an ordered field; if no such relation exists, we say that 𝑅 is not orderable. Given the usual properties of the integers, it can be shown that there exists a unique algebraic object that satisfies all of the axioms of Definition 2.1.2; for example, this can be done by means of Dedekind cuts (see Rudin [Rud76, App. to Ch. 1]). We call this (unique) algebraic object 𝐑, the field of real numbers.

2.1. Axioms for the real numbers

11

We note some other facts about Definition 2.1.2. •

The element 0 is unique in an additive abelian group, and the element 1 is unique in a commutative ring with unity (Problem 2.1.1).

The classes of commutative rings with unity, fields, ordered fields, and 𝐑 form a sequence of proper containments (Problem 2.1.2). In other words, every commutative ring is a field, but not every field is a commutative ring, and so on.

We use the relation ≤ to define all of the other usual kinds of inequalities. For example, to say 𝑎 ≥ 𝑏 means that 𝑏 ≤ 𝑎, and to say that 𝑎 < 𝑏 means that 𝑎 ≤ 𝑏 and 𝑎 ≠ 𝑏.

The reader may have previously seen axiom (OC) called simply completeness, and not order completeness. We use the term “order completeness” to distinguish from a more general notion of completeness that will be important later (Definition 2.5.4).

Returning to our main discussion, we have the following consequences of the axioms of an ordered field. Theorem 2.1.3. Let 𝑅 be an ordered field and 𝑎, 𝑏, 𝑐 ∈ 𝑅. (1) 𝑎 ≥ 0 if and only if −𝑎 ≤ 0. (2) If 𝑎 ≤ 𝑏 and 𝑐 ≤ 0, then 𝑎𝑐 ≥ 𝑏𝑐. (3) 𝑎2 ≥ 0. (4) −1 < 0 < 1. Proof. Problem 2.1.3. Note that Theorem 2.1.3 and axiom (OC) together imply the inf version of axiom (OC) in 𝐑: Every nonempty subset of 𝐑 that has a lower bound also has an infimum (Problem 2.1.4). As mentioned earlier, much of a first course in analysis is concerned with consequences of (order) completeness. First, we recall, without proof, two initial consequences of completeness: the Archimedean Property and the density of 𝐐 in the reals. Theorem 2.1.4. In 𝐑, we have that: (1) (Archimidean Property) For any 𝑎 ∈ 𝐑, there exists an integer 𝑛 such that 𝑛 > 𝑎. (2) (Density of 𝐐 in 𝐑) For 𝑎, 𝑏 ∈ 𝐑 such that 𝑎 < 𝑏, there exists some 𝑟 ∈ 𝐐 such that 𝑎 < 𝑟 < 𝑏. We also note two characterizations of suprema that will be useful later. Theorem 2.1.5 (Arbitrarily Close Criterion). Suppose 𝑆 is a nonempty subset of 𝐑, and suppose 𝑢 is an upper bound for 𝑆. Then the following are equivalent: (1) For every 𝜖 > 0, there exists some 𝑠 ∈ 𝑆 such that 𝑢 − 𝑠 < 𝜖. (2) 𝑢 = sup 𝑆.

12

Chapter 2. Real and complex numbers

Proof. Problem 2.1.5. Lemma 2.1.6 (Sup Inequality Lemma). If 𝑆 is a nonempty bounded subset of 𝐑, then sup 𝑆 ≤ 𝑢 if and only if 𝑢 is an upper bound for 𝑆. Proof. Problem 2.1.6. Note that by symmetry, the inf versions of Theorem 2.1.5 and Lemma 2.1.6 follow immediately. Corollary 2.1.7 (Arbitrarily Close Criterion for Inf). Suppose 𝑆 is a nonempty subset of 𝐑, and suppose ℓ is a lower bound for 𝑆. Then the following are equivalent: (1) For every 𝜖 > 0, there exists some 𝑠 ∈ 𝑆 such that 𝑠 − ℓ < 𝜖. (2) ℓ = inf 𝑆. Lemma 2.1.8 (Inf Inequality Lemma). If 𝑆 is a nonempty bounded subset of 𝐑, then ℓ ≤ inf 𝑆 if and only if ℓ is a lower bound for 𝑆.

Problems. 2.1.1. (a) Prove that if 𝑅 is an additive abelian group and 0 and 0′ both satisfy (A3), then 0 = 0′ . (b) Prove that if 𝑅 is a commutative ring with unity and 1 and 1′ both satisfy (M3), then 1 = 1′ . 2.1.2. This problem shows that the containments of classes of algebraic objects in Definition 2.1.2 are proper. (a) Prove that the integers 𝐙 are a commutative ring with identity, but not a field. (b) Prove that the set {0, 1}, with addition and multiplication (mod 2), is a field but is not orderable. (c) Prove that the rational numbers 𝐐 are an ordered field but do not satisfy the order-completeness axiom (OC). 2.1.3. (Proves Theorem 2.1.3) Let 𝑅 be an ordered field and 𝑎, 𝑏, 𝑐 ∈ 𝑅. In the following, you may use the field axioms freely (i.e., you may assume that arithmetic operations work in the usual way in 𝑅). (a) Prove that 𝑎 ≥ 0 if and only if −𝑎 ≤ 0. (b) Prove that if 𝑎 ≤ 𝑏 and 𝑐 ≤ 0, then 𝑎𝑐 ≥ 𝑏𝑐. (c) Prove that 𝑎2 ≥ 0. (d) Prove that −1 < 0 < 1. 2.1.4. Prove that any nonempty subset of 𝐑 that is bounded below has a greatest lower bound (i.e., an infimum). 2.1.5. (Proves Theorem 2.1.5) Suppose 𝑆 is a nonempty subset of 𝐑, and suppose 𝑢 is an upper bound for 𝑆. Prove that the following are equivalent: • •

For every 𝜖 > 0, there exists some 𝑠 ∈ 𝑆 such that 𝑢 − 𝑠 < 𝜖. 𝑢 = sup 𝑆.

2.1.6. (Proves Lemma 2.1.6) Prove that if 𝑆 is a nonempty bounded subset of 𝐑, then sup 𝑆 ≤ 𝑢 if and only if 𝑢 is an upper bound for 𝑆.

2.2. Complex numbers

13

2.2 Complex numbers Compared with the logical jump from rational numbers to real numbers, the jump from real numbers to complex numbers is relatively straightforward. Definition 2.2.1. The complex numbers 𝐂 are defined as follows: (1) Set: Formally, 𝐂 is the set of all pairs (𝑎, 𝑏), where 𝑎, 𝑏 ∈ 𝐑. However, instead of (𝑎, 𝑏), we write 𝑎 + 𝑏𝑖. (2) Operations: Addition and multiplication of complex numbers is defined like the addition and multiplication of polynomials in the variable 𝑖, but with the additional rule 𝑖2 = −1. Formally, that means (𝑎, 𝑏) + (𝑐, 𝑑) = (𝑎 + 𝑐, 𝑏 + 𝑑), (𝑎, 𝑏) ⋅ (𝑐, 𝑑) = (𝑎𝑐 − 𝑏𝑑, 𝑎𝑑 + 𝑏𝑐),

(2.2.1)

or in other words, (𝑎 + 𝑏𝑖) + (𝑐 + 𝑑𝑖) = (𝑎 + 𝑐) + (𝑏 + 𝑑)𝑖, (𝑎 + 𝑏𝑖) ⋅ (𝑐 + 𝑑𝑖) = 𝑎𝑐 + 𝑎𝑑𝑖 + 𝑏𝑐𝑖 + 𝑏𝑑𝑖2

(2.2.2)

= (𝑎𝑐 − 𝑏𝑑) + (𝑎𝑑 + 𝑏𝑐)𝑖. Since, formally speaking, elements of 𝐂 are pairs of real numbers, we draw 𝑥+𝑦𝑖 ∈ 𝐂 as the point (𝑥, 𝑦) ∈ 𝐑2 . This picture is called the complex plane (Figure 2.2.1).

y 3 x −2 Figure 2.2.1. The point 3 − 2𝑖 in the complex plane We can verify by direct calculation that Definition 2.2.1 gives 𝐂 the structure of a commutative ring with zero element 0 = 0 + 0𝑖 and unity 1 = 1 + 0𝑖 (Problem 2.2.1). Verifying that 𝐂 is a field is a bit more interesting and uses the following important ideas. Definition 2.2.2. Let 𝑎 + 𝑏𝑖 be a complex number. The complex conjugate, or simply conjugate, of 𝑎 + 𝑏𝑖 is defined to be 𝑎 + 𝑏𝑖 = 𝑎 − 𝑏𝑖.

(2.2.3)

The absolute value, or norm, of 𝑎 + 𝑏𝑖 is defined to be |𝑎 + 𝑏𝑖| = √𝑎2 + 𝑏2 .

(2.2.4)

Note that if 𝑥 is a real number, then the absolute value |𝑥| = √𝑥2 is consistent with the usual absolute value in the reals.

14

Chapter 2. Real and complex numbers

Complex conjugation and absolute values have the following straightforward but crucial properties. Theorem 2.2.3. For 𝑧, 𝑤 ∈ 𝐂, we have that (1) 𝑧 = 𝑧; (2) 𝑧 + 𝑤 = 𝑧 + 𝑤 and 𝑧𝑤 = 𝑧 ⋅ 𝑤; 2

(3) 𝑧𝑧 = |𝑧| ; (4) |𝑧𝑤| = |𝑧| |𝑤|; (5) |𝑧| = |𝑧|; (6) |𝑧| ≥ 0, and |𝑧| = 0 if and only if 𝑧 = 0; and (7) if 𝑧 ≠ 0, then 𝑧 (

𝑧

) = 1. 2 |𝑧| Consequently, 𝐂 is a field. Proof. Problem 2.2.2. On a related note, we introduce the following terminology and notation. Definition 2.2.4. For 𝑧 = 𝑎 + 𝑏𝑖 ∈ 𝐂, we define ℜ(𝑧) = 𝑎 to be the real part of 𝑧 and ℑ(𝑧) = 𝑏 to be the imaginary part of 𝑧. Note that the following formulas are then immediate: 1 1 ℜ(𝑧) = (𝑧 + 𝑧), ℑ(𝑧) = (𝑧 − 𝑧). (2.2.5) 2 2𝑖 Convention 2.2.5. Since 𝐂 is not orderable (Problem 2.2.3), we hereby set the convention that if we introduce a number as part of an inequality (e.g., “for 𝜖 > 0. . . ”), we assume that number is real.

Problems. 2.2.1. Check that 𝐂 satisfies the axioms of a commutative ring with zero element 0 = 0 + 0𝑖 and unity 1 = 1 + 0𝑖. 2.2.2. (Proves Theorem 2.2.3) (a) Prove statements (1)–(7) of Theorem 2.2.3. (b) Use (1)–(7) of Theorem 2.2.3 to prove that 𝐂 is a field. 2.2.3. Prove that 𝐂 is not orderable.

2.3 Metrics and metric spaces As mentioned earlier, the main goal of Part 1 of this text is to review real analysis by generalizing it to functions 𝑓 ∶ 𝐑 → 𝐂. However, we immediately face the conundrum that order completeness is the foundation for the analytic properties of real-valued functions and 𝐂 is not orderable (Problem 2.2.3). For that and other reasons, we need to define a replacement for order completeness as the foundation for analysis, and we begin that pursuit by defining the idea of a metric.

2.3. Metrics and metric spaces

15

Definition 2.3.1. Let 𝑋 be a nonempty set. A metric on 𝑋 is a function 𝑑 ∶ 𝑋 × 𝑋 → 𝐑 such that for all 𝑥, 𝑦, 𝑧 ∈ 𝑋, we have: (1) 𝑑(𝑥, 𝑦) ≥ 0. (2) 𝑑(𝑥, 𝑦) = 0 if and only if 𝑥 = 𝑦. (3) 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥). (4) (Triangle inequality) 𝑑(𝑥, 𝑦) ≤ 𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦). A metric space is a pair (𝑋, 𝑑) where 𝑋 is a set and 𝑑 is a particular metric on 𝑋. If 𝑑 is a standard metric on 𝑋, we often refer to (𝑋, 𝑑) simply as 𝑋. Example 2.3.2. For 𝑥, 𝑦 ∈ 𝐑 we define 𝑑(𝑥, 𝑦) = |𝑥 − 𝑦|. Example 2.3.3. For 𝑧, 𝑤 ∈ 𝐂 we define 𝑑(𝑧, 𝑤) = |𝑧 − 𝑤| = √(𝑧 − 𝑤)(𝑧 − 𝑤). Note that Example 2.3.2 is precisely this same formula, restricted to the real numbers. Note also that in the complex plane, |𝑧 − 𝑤| is precisely the usual Euclidean distance between 𝑧 and 𝑤. Assuming for the moment that Example 2.3.3 defines a metric on 𝐂, we can use 𝐂 to convey some of the intuition behind Definition 2.3.1. The point is that if 𝑋 is a metric space, then for 𝑥, 𝑦 ∈ 𝑋, 𝑑(𝑥, 𝑦) is supposed to represent the “shortest distance from 𝑥 to 𝑦.” This idea is reflected in the axioms of Definition 2.3.1 as follows: (1) Distances are always nonnegative. (2) Points 𝑥, 𝑦 ∈ 𝑋 are at distance 0 exactly when 𝑥 = 𝑦. (3) The distance from 𝑥 to 𝑦 is the same as the distance from 𝑦 to 𝑥 (travel is reversible). (4) The triangle inequality is the most interesting axiom in the definition of metric. The point here is really that 𝑑(𝑥, 𝑦) ≯ 𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦), or in other words, one cannot get a shortcut from 𝑥 to 𝑦 by traveling through a third point 𝑧. The truth of this property for triangles in the (complex) plane is shown in Figure 2.3.1.

z x

No shortcuts via extra destinations y

Figure 2.3.1. The triangle inequality Returning to our main discussion, as mentioned above, we must prove that Examples 2.3.2 and 2.3.3 actually define metrics, which we do in the following theorem. Theorem 2.3.4. For 𝑧, 𝑤 ∈ 𝐂, we have the following. (1) (Cauchy-Schwarz inequality) |ℜ(𝑧𝑤)| ≤ |𝑧| |𝑤|. (2) (Triangle inequality) |𝑧 + 𝑤| ≤ |𝑧| + |𝑤|.

16

Chapter 2. Real and complex numbers

Proof. Problems 2.3.1 and 2.3.2. Remark 2.3.5. For the reader who finds the sudden appearance of ℜ(𝑧𝑤) to be a bit mysterious, note that if 𝑧 = 𝑎 + 𝑏𝑖 and 𝑤 = 𝑐 + 𝑑𝑖, then ℜ(𝑧𝑤) = 𝑎𝑐 + 𝑏𝑑; in other words, ℜ(𝑧𝑤) is precisely the 2-dimensional dot product of the vectors (𝑎, 𝑏) and (𝑐, 𝑑). For a generalization also relying on a generalized dot product, see Theorem 7.1.12. Corollary 2.3.6. Example 2.3.3 defines a metric on 𝐂 (and therefore, Example 2.3.2 defines a metric on 𝐑). Proof. Problem 2.3.3. As we shall see, the abstract concept of a metric actually unifies several disparate ideas we will encounter and also sometimes leads us to use better arguments. For example, we have the following result, which can be annoying to consider in terms of absolute values.

a

varies by at most d(b,x) x b

Figure 2.3.2. 𝑑(𝑎, 𝑥) differs from 𝑑(𝑎, 𝑏) by at most 𝑑(𝑏, 𝑥). Lemma 2.3.7. Let 𝑋 be a metric space. For 𝑎, 𝑏, 𝑥 ∈ 𝑋, we have that 𝑑(𝑎, 𝑏) − 𝑑(𝑏, 𝑥) ≤ 𝑑(𝑎, 𝑥) ≤ 𝑑(𝑎, 𝑏) + 𝑑(𝑏, 𝑥).

(2.3.1)

In other words, if we start at a given point 𝑏 and move to 𝑥, then our distance to another given point 𝑎 changes by at most 𝑑(𝑏, 𝑥) (the distance moved), as shown in Figure 2.3.2. Proof. Problem 2.3.5. Remark 2.3.8. We take this opportunity to make a simple observation that turns out to be quite useful: namely, that for 𝑧, 𝑤 ∈ 𝐂, we have that 𝑑(𝑧, 𝑤) = 𝑑(𝑧, 𝑤) (Problem 2.3.4). As we shall see, this means that once we understand the analytic properties of a given function 𝑓 ∶ 𝐑 → 𝐂, we can derive the properties of the function 𝑓 ∶ 𝐑 → 𝐂 defined by 𝑓(𝑥) = 𝑓(𝑥). (See, for example, Theorems 3.1.5 and 3.1.21.)

Problems. 2.3.1. (Proves Theorem 2.3.4) For 𝑧, 𝑤 ∈ 𝐂, prove that |ℜ(𝑧𝑤)| ≤ |𝑧| |𝑤|. 2.3.2. (Proves Theorem 2.3.4) For 𝑧, 𝑤 ∈ 𝐂, prove that |𝑧 + 𝑤| ≤ |𝑧| + |𝑤|. 2.3.3. Prove that Example 2.3.3 defines a metric on 𝐂. (You can use Theorem 2.3.4 for this.) 2.3.4. (Proves Corollary 2.3.6) Prove that for 𝑧, 𝑤 ∈ 𝐂, 𝑑(𝑧, 𝑤) = 𝑑(𝑧, 𝑤). 2.3.5. (Proves Lemma 2.3.7) Prove that if 𝑋 is a metric space and 𝑎, 𝑏, 𝑥 ∈ 𝑋, then 𝑑(𝑎, 𝑏) − 𝑑(𝑏, 𝑥) ≤ 𝑑(𝑎, 𝑥) ≤ 𝑑(𝑎, 𝑏) + 𝑑(𝑏, 𝑥).

(2.3.2)

2.4. Sequences in 𝐂 and other metric spaces

17

2.4 Sequences in 𝐂 and other metric spaces We now consider complex-valued sequences and their limits, and later on, sequences in general metric spaces. The punchline is that everything works almost exactly the same as with real-valued sequences. We begin with some definitions that may be slightly generalized from what the reader saw in Analysis I. Definition 2.4.1. Let 𝑋 be a set. A sequence in 𝑋 is a function 𝑎 ∶ 𝑁 → 𝑋, where 𝑁 = {𝑛 ∈ 𝐙 ∣ 𝑛 ≥ 𝑛0 } for some 𝑛0 ∈ 𝐙; in other words, a sequence is a function with values in 𝑋 whose domain is all integers starting with 𝑛0 (usually 𝑛0 = 0 or 1). If 𝑎 is a sequence, we usually write 𝑎𝑛 instead of 𝑎(𝑛). Definition 2.4.2. Let 𝑎𝑛 be a sequence in 𝑋. A subsequence of 𝑎𝑛 is a sequence of the form 𝑏𝑘 = 𝑎𝑛𝑘 , where 𝑛𝑘 is a strictly increasing sequence of indices (e.g., 𝑛0 < 𝑛1 < 𝑛2 < ⋯). Note that a straightforward induction argument yields the useful fact that if 𝑛1 ≥ 1, then 𝑛𝑘 ≥ 𝑘. In other words, taking 𝑛0 = 1 for concreteness, a sequence in 𝑋 is an infinite list 𝑎1 , 𝑎2 , 𝑎3 , … of elements of 𝑋, indexed by integers 𝑛 ≥ 1. In these same terms, the subsequence of 𝑎𝑛 defined by 𝑛𝑘 is an infinite sublist of the list 𝑎𝑛 , indexed by 𝑘 ≥ 1 (or 𝑘 ≥ 0, etc.). For example, if 𝑛𝑘 = 2𝑘 (𝑘 ≥ 0) and 𝑏𝑘 = 𝑎𝑛𝑘 = 𝑎2𝑘 , then the corresponding subsequence of 𝑎𝑛 is 𝑏0 = 𝑎1 , 𝑏1 = 𝑎2 , 𝑏2 = 𝑎4 , 𝑏3 = 𝑎8 , … . Definition 2.4.3. For a complex-valued sequence 𝑎𝑛 and 𝐿 ∈ 𝐂, to say that lim 𝑎𝑛 = 𝐿 𝑛→∞

means that for every 𝜖 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if 𝑛 > 𝑁(𝜖), then |𝑎𝑛 − 𝐿| < 𝜖. For a given sequence 𝑎𝑛 , if lim 𝑎𝑛 = 𝐿 for some 𝐿 ∈ 𝐂, then we say that 𝑛→∞

𝑎𝑛 converges, or is convergent; otherwise, we say that 𝑎𝑛 diverges, or is divergent. Note that Definition 2.4.3 is precisely the definition of the limit of a real-valued sequence, except that the absolute values are interpreted as complex absolute values. The limit properties of complex sequences can now be proven in essentially the same manner as those of real sequences. For example, we have the following familiar results about boundedness. Definition 2.4.4. To say that a nonempty subset 𝑆 of 𝐂 is bounded means that there exists some 𝑀 > 0 such that for 𝑧 ∈ 𝑆, |𝑧| < 𝑀. To say that a sequence 𝑎𝑛 in 𝐂 is bounded means that {𝑎𝑛 } (the set of its values) is bounded. Theorem 2.4.5. Let 𝑎𝑛 be a sequence in 𝐂. (1) If 𝑎𝑛 converges, then 𝑎𝑛 is bounded. (2) If lim 𝑎𝑛 = 𝐿 ≠ 0, then there exists some real number 𝐾 such that if 𝑛 > 𝐾, then 𝑛→∞

|𝑎𝑛 | >

|𝐿| . 2

Proof. Suppose lim 𝑎𝑛 = 𝐿; by definition, this means that for every 𝜖 > 0, there exists 𝑛→∞

some 𝑁(𝜖) such that if 𝑛 > 𝑁(𝜖), then |𝑎𝑛 − 𝐿| < 𝜖. To prove property (1), choose some integer 𝐾 > 𝑁(1). By the definition of limit, for 𝑛 > 𝐾, we know that |𝑎𝑛 − 𝐿| < 1, which means that |𝑎𝑛 | ≤ |𝑎𝑛 − 𝐿| + |𝐿| < |𝐿| + 1, (2.4.1)

18

Chapter 2. Real and complex numbers

by the triangle inequality. Therefore, since {|𝑎1 | , … , |𝑎𝐾 |} is a finite set, we see that for all 𝑛, |𝑎𝑛 | < 𝑀, where 𝑀 = max {|𝑎1 | , … , |𝑎𝐾 | , 𝐿 + 1}. For property (2), see Problem 2.4.1. The limit laws for sequences are also proven in a manner similar to their realvalued versions. Theorem 2.4.6. Let 𝑎𝑛 and 𝑏𝑛 be sequences in 𝐂, and suppose that lim 𝑎𝑛 = 𝐿, lim 𝑏𝑛 𝑛→∞

𝑛→∞

= 𝑀, and 𝑐 ∈ 𝐂. Then we have that (1) lim 𝑐𝑎𝑛 = 𝑐𝐿; 𝑛→∞

(2) lim (𝑎𝑛 + 𝑏𝑛 ) = 𝐿 + 𝑀; 𝑛→∞

(3) lim 𝑎𝑛 = 𝐿; 𝑛→∞

(4) lim 𝑎𝑛 𝑏𝑛 = 𝐿𝑀; 𝑛→∞

(5) if 𝐿 ≠ 0, then lim

𝑛→∞

1 1 = ; and 𝑎𝑛 𝐿

(6) if 𝑎𝑛 is real-valued and 𝑎𝑛 ≤ 𝐾 for all 𝑛, then lim 𝑎𝑛 = 𝐿 ≤ 𝐾. 𝑛→∞

Proof. The proof of properties (1) and (2) are in Problems 2.4.2 and 2.4.3. Property (3) follows by Remark 2.3.8 and the definition of limit. For property (4), fix 𝜖 > 0. By Theorem 2.4.5(1), we know that there exists some 𝐾 such that |𝑏𝑛 | < 𝐾 for all 𝑛. Furthermore, by the definition of lim 𝑎𝑛 = 𝐿, there exists 𝑛→∞ 𝜖 𝜖 for all 𝑛 > 𝑁𝑎 , and by the definition of some 𝑁𝑎 = 𝑁𝑎 ( ) such that |𝑎𝑛 − 𝐿| < 2𝐾 2𝐾 𝜖 𝜖 lim 𝑏𝑛 = 𝑀, there exists some 𝑁𝑏 = 𝑁𝑏 ( for ) such that |𝑏𝑛 − 𝑀| < |𝐿| |𝐿| 2 +1 2 +1 𝑛→∞ all 𝑛 > 𝑁𝑏 . Therefore, for 𝑛 > 𝑁(𝜖) = max(𝑁𝑎 , 𝑁𝑏 ), we have that |𝑎𝑛 𝑏𝑛 − 𝐿𝑀| = |𝑎𝑛 𝑏𝑛 − 𝐿𝑏𝑛 + 𝐿𝑏𝑛 − 𝐿𝑀| ≤ |𝑎𝑛 − 𝐿| |𝑏𝑛 | + |𝐿| |𝑏𝑛 − 𝑀| (triangle inequality) 𝜖 𝜖 (2.4.2) ) 𝐾 + |𝐿| ( ) 2𝐾 2 |𝐿| + 1 𝜖 𝜖 < + = 𝜖. 2 2 Finally, for property (5), again fix 𝜖 > 0. By Theorem 2.4.5(2), there exists some |𝐿| 𝐾 such that if 𝑛 > 𝐾, then |𝑎𝑛 | > ; and by definition, there exists some 𝑁𝑎 = 2 2 2 𝜖 |𝐿| 𝜖 |𝐿| 𝑁𝑎 ( for all 𝑛 > 𝑁𝑎 . Then for 𝑛 > 𝑁(𝜖) = max(𝐾, 𝑁𝑎 ), ) such that |𝑎𝑛 − 𝐿| < 2 2 we see that 0 such that the open disc 𝒩𝜖 (𝑧) is contained in 𝑉; and to say that 𝑉 is closed in 𝐂 means that 𝑉 𝑐 is open. See Figure 2.4.1, where the shaded area and its boundary represent a closed set 𝑉 and the unshaded area represents the open set 𝑉 𝑐 .

x ε V

Figure 2.4.1. A closed set in 𝐂 and a point 𝑥 in its complement Theorem 2.4.10. If 𝑉 is a closed subset of 𝐂 and 𝑎𝑛 is a sequence in 𝑉 such that lim 𝑎𝑛 = 𝑛→∞

𝐿, then 𝐿 is contained in 𝑉. In particular: (1) If 𝑎𝑛 is real-valued, 𝐿 is real. (2) If 𝑎𝑛 is real-valued and 𝑎𝑛 ≤ 𝐾 for all 𝑛, then 𝐿 ≤ 𝐾. (3) For 𝑧 ∈ 𝐂 and 𝑟 > 0, if |𝑎𝑛 − 𝑧| ≤ 𝑟 for all 𝑛, then |𝐿 − 𝑧| ≤ 𝑟. (4) If 𝑎𝑛 is real-valued, 𝑎 < 𝑏, and 𝑎𝑛 ∈ [𝑎, 𝑏] for all 𝑛, then 𝐿 ∈ [𝑎, 𝑏]. (5) For 𝑏1 < 𝑏2 and 𝑐1 < 𝑐2 , let 𝑉 be the set of all 𝑥 + 𝑦𝑖 ∈ 𝐂 such that 𝑥 ∈ [𝑏1 , 𝑏2 ] and 𝑦 ∈ [𝑐1 , 𝑐2 ]. If 𝑎𝑛 is a sequence in 𝑉, then 𝐿 ∈ 𝑉.

20

Chapter 2. Real and complex numbers

The set in statement (5), above, is known as a rectangle, as it is drawn as the rectangle [𝑏1 , 𝑏2 ] × [𝑐1 , 𝑐2 ] in the complex plane (or in 𝐑2 ). Proof. The first statement is proved in Problem 2.4.5. The other claims then reduce to showing that certain subsets of 𝐂 are closed, and this is found in Problem 2.4.6. Example 2.4.11. In contrast with Theorem 2.4.10, the rational numbers 𝐐 are not a closed subset of 𝐂, as there exists a convergence sequence of rational numbers whose limit is not rational (e.g., the first 𝑛 digits of 𝜋). It will also occasionally be useful for us to consider the convergence of a complex sequence in terms of its real and imaginary parts, and vice versa. The following theorem describes the necessary equivalence. Theorem 2.4.12. Let 𝑧𝑛 = 𝑥𝑛 + 𝑦𝑛 𝑖 be a complex sequence with real and imaginary parts 𝑥𝑛 and 𝑦𝑛 , respectively, and let 𝐿 = 𝑎 + 𝑏𝑖 ∈ 𝐂 have real and complex parts 𝑎 and 𝑏, respectively. Then lim 𝑧𝑛 = 𝐿 if and only if lim 𝑥𝑛 = 𝑎 and lim 𝑦𝑛 = 𝑏. 𝑛→∞

𝑛→∞

𝑛→∞

Proof. Problem 2.4.7. Staying with real-valued sequences for a moment, the idea of supremum is also related to the limit of a real-valued sequence in several ways. First, we have the following refinement of the Arbitrarily Close Criterion (Theorem 2.1.5). Theorem 2.4.13 (Arbitrarily Close Criterion, redux). Suppose 𝑆 is a nonempty subset of 𝐑, and suppose 𝑢 is an upper bound for 𝑆. Then the following are equivalent: (1) 𝑢 = sup 𝑆. (2) For every 𝜖 > 0, there exists some 𝑠 ∈ 𝑆 such that 𝑢 − 𝑠 < 𝜖. (3) There exists a sequence 𝑥𝑛 in 𝑆 such that lim 𝑥𝑛 = 𝑢. 𝑛→∞

Proof. Problem 2.4.8. We also recall the following fact. Theorem 2.4.14 (Convergence of Monotone Sequences). Let 𝑎𝑛 be a real-valued increasing sequence that is bounded above, and let 𝑆 = {𝑎𝑛 } (the set of all values attained by 𝑎𝑛 ). Then 𝑎𝑛 converges to sup 𝑆. Proof. Problem 2.4.9. Later, we will find it useful to extend the definition of the limit of a sequence (Definition 2.4.3) to an arbitrary metric space, as follows. Definition 2.4.15. For a sequence 𝑎𝑛 in a metric space 𝑋 and 𝐿 ∈ 𝑋, to say that lim 𝑎𝑛 = 𝐿 means that for every 𝜖 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if 𝑛→∞

𝑛 > 𝑁(𝜖), then 𝑑(𝑎𝑛 , 𝐿) < 𝜖. The terms convergent, divergent, and so on are also used as they are with complex-valued sequences.

2.4. Sequences in 𝐂 and other metric spaces

21

Note that Definition 2.4.3 is precisely the special case of Definition 2.4.15 where 𝑑(𝑎𝑛 , 𝐿) = |𝑎𝑛 − 𝐿|. When considering limits of sequences in a general metric space, we often need a tool like the following to prove the convergence of particular examples. Lemma 2.4.16 (Metric Squeeze Lemma). Let 𝑥𝑛 be a sequence in a metric space 𝑋, and suppose that for some 𝐿 ∈ 𝑋, there exists a sequence 𝑑𝑛 in 𝐑 such that 𝑑(𝑥𝑛 , 𝐿) < 𝑑𝑛 for all 𝑛 and lim 𝑑𝑛 = 0. Then lim 𝑥𝑛 = 𝐿. 𝑛→∞

𝑛→∞

Proof. Problem 2.4.10. Finally, it will be useful to be able to describe a subset 𝑌 of a metric space 𝑋 that contains points arbitrarily close to every point of 𝑋. The following theorem and definition make this idea precise. Theorem 2.4.17. Let 𝑋 be a metric space and 𝑌 a subset of 𝑋. Then the following conditions are equivalent: (1) For every 𝑥 ∈ 𝑋 and every 𝜖 > 0, there exists some 𝑦 ∈ 𝑌 such that 𝑑(𝑥, 𝑦) < 𝜖. (2) For every 𝑥 ∈ 𝑋, there exists some sequence 𝑦𝑛 in 𝑌 such that lim 𝑦𝑛 = 𝑥. 𝑛→∞

Proof. Problem 2.4.11. Definition 2.4.18. To say that a subset 𝑌 of a metric space 𝑋 is dense in 𝑋 means that either (and therefore, both) of the conditions of Theorem 2.4.17 hold. Example 2.4.19. The rationals 𝐐 are a dense subset of the metric space 𝐑 (Problem 2.4.12). As we shall see starting in Chapter 7, the reason we are interested in dense subsets of metric spaces is that when we work with a function space 𝑉 (see Chapter 5), we can closely approximate an arbitrary 𝑓 ∈ 𝑉 by an element of a dense subset of 𝑉 in the same way that we can closely approximate an arbitrary real number by a rational number (Example 2.4.19).

Problems. In Problems 2.4.1–2.4.3, let 𝑎𝑛 and 𝑏𝑛 be sequences in 𝐂. 2.4.1. (Proves Theorem 2.4.5) Prove that if lim 𝑎𝑛 = 𝐿 ≠ 0, then there exists some real 𝑛→∞

number 𝐾 such that if 𝑛 > 𝐾, then |𝑎𝑛 | >

|𝐿| . 2

2.4.2. (Proves Theorem 2.4.6) Prove that if lim 𝑎𝑛 = 𝐿 and 𝑐 ∈ 𝐂, then lim 𝑐𝑎𝑛 = 𝑐𝐿. 𝑛→∞

𝑛→∞

2.4.3. (Proves Theorem 2.4.6) Prove that if lim 𝑎𝑛 = 𝐿 and lim 𝑏𝑛 = 𝑀, then 𝑛→∞

𝑛→∞

(2.4.6)

lim (𝑎𝑛 + 𝑏𝑛 ) = 𝐿 + 𝑀.

𝑛→∞

2.4.4. (Proves Theorem 2.4.7) Let 𝑎𝑛 be a sequence in 𝐂 such that lim 𝑎𝑛 = 𝐿. Prove 𝑛→∞

that if 𝑏𝑘 = 𝑎𝑛𝑘 is a subsequence of 𝑎𝑛 , then lim 𝑏𝑘 = 𝐿. 𝑘→∞

22

Chapter 2. Real and complex numbers

2.4.5. (Proves Theorem 2.4.10) Prove that if 𝑉 is a closed subset of 𝐂 (Definition 2.4.9) and 𝑎𝑛 is a sequence in 𝑉 such that lim 𝑎𝑛 = 𝐿, then 𝐿 ∈ 𝑉. 𝑛→∞

2.4.6. (Proves Theorem 2.4.10) Prove that the following subsets of 𝐂 are closed (Definition 2.4.9): (a) the real line 𝐑 (i.e., the 𝑥-axis in the complex plane), (b) for a fixed 𝐾 > 0, the set of all 𝑥 ∈ 𝐑 such that 𝑥 ≥ 𝐾, (c) for a fixed 𝑧 ∈ 𝐂 and 𝑟 > 0, the closed disc 𝒩𝑟 (𝑧) (i.e., the set of all 𝑤 ∈ 𝐂 such that |𝑤 − 𝑧| ≤ 𝑟), (d) for 𝑎 < 𝑏, the real interval [𝑎, 𝑏], (e) for 𝑏1 < 𝑏2 and 𝑐1 < 𝑐2 , the set of all 𝑥 + 𝑦𝑖 ∈ 𝐂 such that 𝑥 ∈ [𝑏1 , 𝑏2 ] and 𝑦 ∈ [𝑐1 , 𝑐2 ]. 2.4.7. (Proves Theorem 2.4.12) Let 𝑧𝑛 = 𝑥𝑛 + 𝑦𝑛 𝑖 be a complex sequence with real and imaginary parts 𝑥𝑛 and 𝑦𝑛 , respectively, and let 𝐿 = 𝑎 + 𝑏𝑖 ∈ 𝐂 have real and complex parts 𝑎 and 𝑏, respectively. (a) Prove that if |𝑧𝑛 − 𝐿| < 𝜖, then |𝑥𝑛 − 𝑎| < 𝜖 and |𝑦𝑛 − 𝑏| < 𝜖. (b) Prove that if lim 𝑧𝑛 = 𝐿, then lim 𝑥𝑛 = 𝑎 and lim 𝑦𝑛 = 𝑏. 𝑛→∞

𝑛→∞

𝑛→∞

(c) Prove that if |𝑥𝑛 − 𝑎| < 𝜖/2 and |𝑦𝑛 − 𝑏| < 𝜖/2, then |𝑧𝑛 − 𝐿| < 𝜖. (d) Prove that if lim 𝑥𝑛 = 𝑎 and lim 𝑦𝑛 = 𝑏, then lim 𝑧𝑛 = 𝐿. 𝑛→∞

𝑛→∞

𝑛→∞

2.4.8. (Proves Theorem 2.4.13) Let 𝑆 be a nonempty subset of 𝐑, and let 𝑢 be an upper bound for 𝑆. Given Theorem 2.1.5, the following suffices to obtain Theorem 2.4.13. (a) Suppose that for every 𝜖 > 0, there exists some 𝑠 ∈ 𝑆 such that 𝑢−𝑠 < 𝜖. Prove that there exists a sequence 𝑥𝑛 in 𝑆 such that lim 𝑥𝑛 = 𝑢. 𝑛→∞

(b) Now suppose that 𝑢 ≠ sup 𝑆 and 𝑥𝑛 is a convergent sequence in 𝑆. Prove that lim 𝑥𝑛 < 𝑢. 𝑛→∞

2.4.9. (Proves Theorem 2.4.14) Let 𝑎𝑛 be a real-valued increasing sequence that is bounded above, let 𝑆 = {𝑎𝑛 } (the set of all values attained by 𝑎𝑛 ), and let 𝑢 = sup 𝑆. Prove that lim 𝑎𝑛 = 𝑢. 𝑛→∞

2.4.10. (Proves Lemma 2.4.16) Let 𝑥𝑛 be a sequence in a metric space 𝑋, and suppose that for some 𝐿 ∈ 𝑋, there exists a real-valued sequence 𝑑𝑛 such that 𝑑(𝑥𝑛 , 𝐿) < 𝑑𝑛 for all 𝑛 and lim 𝑑𝑛 = 0. Prove that lim 𝑥𝑛 = 𝐿. 𝑛→∞

𝑛→∞

2.4.11. (Proves Theorem 2.4.17) Prove that the conditions of Theorem 2.4.17 are equivalent. 2.4.12. Prove that 𝐐 is a dense subset of the metric space 𝐑.

2.5. Completeness in metric spaces

23

2.5 Completeness in metric spaces As mentioned in Section 2.3, to study the analysis of complex-valued functions, we need a replacement for the idea of order completeness, and here it is. Definition 2.5.1. Let 𝑎𝑛 be a complex-valued sequence 𝑎𝑛 . To say that 𝑎𝑛 is Cauchy means that for every 𝜖 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if 𝑛, 𝑘 > 𝑁(𝜖), then |𝑎𝑛 − 𝑎𝑘 | < 𝜖. More generally, if 𝑎𝑛 is a sequence in a metric space 𝑋, to say that 𝑎𝑛 is Cauchy means that for every 𝜖 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if 𝑛, 𝑘 > 𝑁(𝜖), then 𝑑(𝑎𝑛 , 𝑎𝑘 ) < 𝜖. Now, on the one hand, any convergent sequence is Cauchy; in fact, this holds in any metric space. Theorem 2.5.2. Let 𝑎𝑛 be a convergent sequence in a metric space 𝑋. Then 𝑎𝑛 must be Cauchy. Proof. If lim 𝑎𝑛 = 𝐿, then by definition, for any 𝜖𝑎 > 0, there exists some 𝑁𝑎 (𝜖) ∈ 𝐑 𝑛→∞ 𝜖 such that if 𝑛 > 𝑁𝑎 (𝜖𝑎 ), then 𝑑(𝑎𝑛 , 𝐿) < 𝜖𝑎 . For 𝜖 > 0, let 𝑁(𝜖) = 𝑁𝑎 ( ). Then for 2 𝑛, 𝑘 > 𝑁(𝜖), by the triangle inequality, 𝑑(𝑎𝑛 , 𝑎𝑘 ) ≤ 𝑑(𝑎𝑛 , 𝐿) + 𝑑(𝐿, 𝑎𝑘 )
𝑁𝑎 (𝜖𝑎 ), then |𝑎𝑛 − 𝑎𝑘 | < 𝜖𝑎 . By Lemma 2.5.5, 𝑎𝑛 is bounded, so by Bolzano-Weierstrass (Theorem 2.5.6), there exists some subsequence 𝑎𝑛𝑘 such that lim 𝑎𝑛𝑘 = 𝐿 for some 𝐿 ∈ 𝐑, which means that there exists some 𝑁𝑏 (𝜖𝑏 ) ∈ 𝐑 such 𝑘→∞

that if 𝑘 > 𝑁𝑏 (𝜖𝑏 ), then ||𝑎𝑛𝑘 − 𝐿|| < 𝜖𝑏 .

𝜖 𝜖 Given 𝜖 > 0, let 𝑁(𝜖) = max (𝑁𝑎 ( ) , 𝑁𝑏 ( )), and suppose that 𝑚 > 𝑁(𝜖). Then 2 2 𝜖 choosing any 𝑘 > 𝑚 and using the fact that 𝑛𝑘 ≥ 𝑘 > 𝑁𝑎 ( ), we see that 2 𝜖 𝜖 |𝑎𝑚 − 𝐿| ≤ ||𝑎𝑚 − 𝑎𝑛𝑘 || + ||𝑎𝑛𝑘 − 𝐿|| < + = 𝜖. (2.5.2) 2 2 The theorem follows. We may then bootstrap the completeness of 𝐑 up to the completeness of 𝐂. Corollary 2.5.8. The complex numbers are a complete metric space. Proof. Problem 2.5.2. Alternatively, we may prove Corollary 2.5.8 by first proving that Bolzano-Weierstrass holds for sequences in 𝐂, a result of independent interest. Theorem 2.5.9 (Bolzano-Weierstrass in 𝐂). Every bounded sequence in 𝐂 has a convergent subsequence. Proof. Problem 2.5.3. We may then use the argument from Theorem 2.5.7 to obtain Corollary 2.5.8; see Problem 2.5.4. Remark 2.5.10. Finally, we note that even though we have used Bolzano-Weierstrass to prove that 𝐑 and 𝐂 are complete metric spaces, Bolzano-Weierstrass may not hold even in a complete metric space; see Example 7.2.21.

2.6. The topology of metric spaces

25

Problems. 2.5.1. (Proves Lemma 2.5.5) Let 𝑎𝑛 be a Cauchy sequence in 𝐂. Prove that 𝑎𝑛 is bounded. 2.5.2. (Proves Corollary 2.5.8) Let 𝑧𝑛 be a Cauchy sequence in 𝐂, and let 𝑎𝑛 and 𝑏𝑛 be the real and imaginary parts of 𝑧𝑛 (i.e., 𝑧𝑛 = 𝑎𝑛 + 𝑏𝑛 𝑖). (a) Prove that 𝑎𝑛 and 𝑏𝑛 are (real) Cauchy sequences. (b) Prove that 𝑧𝑛 converges to some limit 𝐿 ∈ 𝐂. 2.5.3. (Proves Theorem 2.5.9) Let 𝑧𝑛 be a sequence in 𝐂, and let 𝑎𝑛 and 𝑏𝑛 be the real and imaginary parts of 𝑧𝑛 (i.e., 𝑧𝑛 = 𝑎𝑛 + 𝑏𝑛 𝑖). Assume that 𝑧𝑛 is bounded, i.e., that there exists some 𝑀 ∈ 𝐑 such that |𝑧𝑛 | < 𝑀 for all 𝑛. (a) Prove that 𝑎𝑛 and 𝑏𝑛 are bounded. (b) Prove that there exists some subsequence 𝑧𝑛𝑘 such that lim 𝑧𝑛𝑘 = 𝐿 for some 𝑘→∞

𝐿 ∈ 𝐂.

2.5.4. (Proves Corollary 2.5.8) Let 𝑧𝑛 be a Cauchy sequence in 𝐂. Use Problem 2.5.3 to prove that 𝑧𝑛 converges to some limit 𝐿 ∈ 𝐂. 2.5.5. (*) (Proves Theorem 2.5.6) Let 𝑐𝑛 be a sequence in the closed interval [𝑎0 , 𝑏0 ]. (a) Suppose an interval [𝑎, 𝑏] contains 𝑐𝑛 for infinitely many 𝑛. Explain why at least one of the closed intervals [𝑎, (𝑎 + 𝑏)/2] or [(𝑎 + 𝑏)/2, 𝑏] must contain 𝑐𝑛 for infinitely many 𝑛. (b) Starting with our given interval [𝑎0 , 𝑏0 ], prove inductively that there exists a sequence of closed subintervals [𝑎𝑘 , 𝑏𝑘 ] such that [𝑎𝑘+1 , 𝑏𝑘+1 ] ⊆ [𝑎𝑘 , 𝑏𝑘 ], 𝑏 − 𝑎𝑘 𝑏𝑘+1 − 𝑎𝑘+1 = 𝑘 , and [𝑎𝑘+1 , 𝑏𝑘+1 ] contains 𝑐𝑛 for infinitely many 𝑛. 2 (c) Prove that there exists a subsequence 𝑐𝑛𝑘 such that 𝑐𝑛𝑘 ∈ [𝑎𝑘+1 , 𝑏𝑘+1 ]. (Note that by definition of subsequence, we need the subscripts 𝑛𝑘 to be a strictly increasing sequence.) (d) Prove that lim 𝑐𝑛𝑘 exists and is contained in [𝑎0 , 𝑏0 ]. 𝑘→∞

2.6 The topology of metric spaces The material in this section will not be used in the rest of this book. However, we have already encountered, and will continue to use, special cases of some of the main ideas (e.g., closed subsets and dense subsets), so we present this digression for the reader who is interested in further context for those ideas. We begin with the following terminology. Definition 2.6.1. Let 𝑋 be a metric space, and fix 𝑥 ∈ 𝑋. For a real number 𝑟 > 0, the open 𝑟-neighborhood of 𝑥, written 𝒩𝑟 (𝑥), is defined to be 𝒩𝑟 (𝑥) = {𝑦 ∈ 𝑋 ∣ 𝑑(𝑥, 𝑦) < 𝑟} ;

(2.6.1)

and for 𝑟 ≥ 0, the closed 𝑟-neighborhood of 𝑥, written 𝒩𝑟 (𝑥), is defined to be 𝒩𝑟 (𝑥) = {𝑦 ∈ 𝑋 ∣ 𝑑(𝑥, 𝑦) ≤ 𝑟} .

(2.6.2)

Recall that in the special case 𝑋 = 𝐂 from Definition 2.4.8, 𝒩𝑟 (𝑥) is called the open disc of radius 𝑟 around 𝑥, and 𝒩𝑟 (𝑥) is called the closed disc of radius 𝑟 around 𝑥.

26

Chapter 2. Real and complex numbers

Let 𝑋 be a metric space. The following ideas define what is known as the topology of 𝑋. (See Definition 2.6.9 and the material afterwards for a brief discussion of topology in general.) Definition 2.6.2. We define an open subset 𝑈 of 𝑋 to be a set such that, for every 𝑥 ∈ 𝑈, there exists some 𝜖 > 0 such that 𝑥 ∈ 𝒩𝜖 (𝑥) ⊆ 𝑈. Definition 2.6.3. We define a closed subset 𝑉 of 𝑋 to be a set such that, for every convergent sequence 𝑥𝑛 in 𝑉, lim 𝑥𝑛 ∈ 𝑉. 𝑛→∞

The above definitions are complementary (pun intended) in the following precise sense. Theorem 2.6.4. Let 𝑋 be a metric space. Then 𝑈 ⊆ 𝑋 is open if and only if its complement 𝑋 − 𝑈 is closed. Proof. Problems 2.6.1 and 2.6.2. It follows that Definitions 2.4.9 and 2.6.3 are consistent in the case 𝑋 = 𝐂. (Compare Theorem 2.4.10.) Example 2.6.5. As one might hope, for a metric space 𝑋, 𝑥 ∈ 𝑋, and a real number 𝑟 ≥ 0, the open 𝑟-neighborhood 𝒩𝑟 (𝑥) is an open subset of 𝑋, and the closed 𝑟neighborhood 𝒩𝑟 (𝑥) is a closed subset of 𝑋 (Problems 2.6.3 and 2.6.4). As discussed in Section 2.5, the Bolzano-Weierstrass Theorem (Theorem 2.5.9) is one of the key results of analysis. It can be generalized to a more abstract setting using the following idea. Definition 2.6.6. To say that a subset 𝐶 of a metric space 𝑋 is compact means that every sequence in 𝐶 has a subsequence that converges to an element of 𝐶. In those terms, we have the following corollary of Theorem 2.5.9. Corollary 2.6.7 (Generalized Bolzano-Weierstrass). Every closed and bounded subset of 𝐂 is compact. Proof. Problem 2.6.5. Remark 2.6.8. Note that while Corollary 2.6.7 gives valuable insight into what we might expect a compact set to be like, the reader may be wondering why it is only stated for subsets of 𝐂 and not for metric spaces in general. The perhaps surprising reason is that Corollary 2.6.7 does not hold in a general metric space! See Example 7.2.21. In the rest of this section, we digress from our previous digression to give a very brief taste of point-set topology. We begin by observing that if 𝑋 is a metric space, then the collection of open subsets of 𝑋 has the following properties: •

Both the empty set ∅ and 𝑋 are open.

2.6. The topology of metric spaces

27

If {𝑈𝛼 } is a (possibly uncountable) collection of open subsets of 𝑋, then the union ⋃ 𝑈𝛼 is also an open subset of 𝑋 (Problem 2.6.6).

If {𝑈1 , … , 𝑈𝑛 } is a finite collection of open subsets of 𝑋, then the intersection

𝑛

𝑈𝑖

𝑖=1

is also an open subset of 𝑋 (Problem 2.6.7). We can generalize the above observations in the following abstraction. Definition 2.6.9. A topology on an arbitrary set 𝑋 is a choice of a collection 𝒯 of subsets of 𝑋, called the open subsets of 𝑋, such that the following three axioms hold. (1) Both ∅ and 𝑋 are in 𝒯. (2) If {𝑈𝛼 } is a (possibly uncountable) subcollection of 𝒯, then the union ⋃ 𝑈𝛼 is also in 𝒯. 𝑛

(3) If {𝑈1 , … , 𝑈𝑛 } ⊆ 𝒯, then the intersection

𝑈𝑖 is also in 𝒯.

𝑖=1

The closed subsets of 𝑋 are then defined to be those sets whose complements are open. The subject of point-set topology studies how ideas like compactness and dense subspaces can be developed using the above axiomatic properties, instead of using (say) a metric. For example, the following is the point-set version of the definition of compactness. Definition 2.6.10. Let 𝐶 be a subset of a metric space 𝑋. An open cover of 𝐶 is a (possibly uncountable) collection {𝑈𝛼 } of open subsets of 𝑋 such that 𝐶 ⊆ ⋃ 𝑈𝛼 , and a finite subcover of an open cover {𝑈𝛼 } of 𝐶 is a subcollection {𝑈1 , … , 𝑈𝑛 } of {𝑈𝛼 } such that {𝑈1 , … , 𝑈𝑛 } also covers 𝐶. To say that 𝐶 is point-set compact means that every open cover of 𝐶 has a finite subcover. The reader should be warned that our original definition of compactness (Definition 2.6.6) is usually called sequential compactness, and point-set compactness (Definition 2.6.10) is usually just called compactness. We justify our usage by the fact that we only really need sequential compactness and the fact that for subsets of 𝐂, we have the following equivalence. Theorem 2.6.11. Let 𝑋 be a subset of 𝐂. Then the following conditions are equivalent: (1) 𝑋 is point-set compact (Definition 2.6.10). (2) 𝑋 is sequentially compact (Definition 2.6.6). (3) 𝑋 is closed and bounded. Proof. Problems 2.6.8 and 2.6.9 show that (1) implies (3); Corollary 2.6.7 shows that (3) implies (2); and Problem 2.6.10 shows that (2) implies (1). In fact, it can be shown that: •

In any metric space, sequential compactness and point-set compactness are equivalent; see, for example, Munkres [Mun00, Ch. 3].

28

Chapter 2. Real and complex numbers

In an arbitrary metric space 𝑋, if 𝐶 ⊆ 𝑋 is sequentially compact, then 𝐶 is closed and bounded (Problem 2.6.11), but the converse is false (see Example 7.2.21).

In a nonmetric topological space, point-set compactness implies sequential compactness, but the converse is false; see Munkres [Mun00, Ch. 3].

Again, we recommend Munkres [Mun00] for the reader interested in learning more about point-set topology.

Problems. In Problems 2.6.1–2.6.4, we use open and closed in the sense of Definitions 2.6.2 and 2.6.3. 2.6.1. (*) (Proves Theorem 2.6.4) Let 𝑋 be a metric space and 𝑈 a subset of 𝑋. Prove that if 𝑈 is not open, then 𝑉 = 𝑋 − 𝑈 is not closed. 2.6.2. (*) (Proves Theorem 2.6.4) Let 𝑋 be a metric space and 𝑉 a subset of 𝑋. Prove that if 𝑉 is not closed, then 𝑈 = 𝑋 − 𝑉 is not open. 2.6.3. (*) Let 𝑋 be a metric space, 𝑥 ∈ 𝑋, and 𝑟 ≥ 0 a real number. Prove that the open 𝑟-neighborhood 𝒩𝑟 (𝑥) (Definition 2.6.1) is an open subset of 𝑋. 2.6.4. (*) Let 𝑋 be a metric space, 𝑥 ∈ 𝑋, and 𝑟 ≥ 0 a real number. Prove that the closed 𝑟-neighborhood 𝒩𝑟 (𝑥) (Definition 2.6.1) is a closed subset of 𝑋. 2.6.5. (*) (Proves Corollary 2.6.7) Prove that if 𝐶 is a closed and bounded subset of 𝐂, then 𝐶 is compact. 2.6.6. (*) Let 𝑋 be a metric space and let {𝑈𝛼 } be a (possibly uncountable) collection of open subsets of 𝑋. Prove that the union ⋃ 𝑈𝛼 is also an open subset of 𝑋. 2.6.7. (*) Let 𝑋 be a metric space and let {𝑈1 , … , 𝑈𝑛 } be a finite collection of open subsets 𝑛

of 𝑋. Prove that the intersection

𝑈𝑖 is also an open subset of 𝑋.

𝑖=1

2.6.8. (*) (Proves Theorem 2.6.11) Let 𝑋 be a subset of 𝐂, and suppose 𝑎𝑛 is a sequence in 𝑋 such that lim 𝑎𝑛 = 𝐿 is not contained in 𝑋. Use Problem 2.6.4 to prove that 𝑛→∞

there exists an open cover of 𝑋 with no finite subcover. 2.6.9. (*) (Proves Theorem 2.6.11) Let 𝑋 be a subset of 𝐂, and suppose 𝑋 is not bounded (Definition 2.4.4). Prove that there exists an open cover of 𝑋 with no finite subcover. 2.6.10. (*) (Proves Theorem 2.6.11) Let 𝑋 be a subset of 𝐂, and suppose that {𝑈𝛼 } is an open cover of 𝑋 with no finite subcover. In this problem, we show that 𝑋 is not sequentially compact. (a) Let ℛℬ be the collection of all open 𝑟-neighborhoods 𝒩𝑟 (𝑧) such that 𝑟 ∈ 𝐐 and 𝑧 = 𝑥 + 𝑦𝑖 for 𝑥, 𝑦 ∈ 𝐐. Prove that ℛℬ is a countable collection of open sets such that for any open set 𝑈 ⊆ 𝐂 and any 𝑤 ∈ 𝑈, there exists some 𝑉 ∈ ℛℬ such that 𝑤 ∈ 𝑉 ⊆ 𝑈. (b) Prove that there exists a countable subcollection {𝑈𝑛 ∣ 𝑛 ∈ 𝐍} of {𝑈𝛼 } such that {𝑈𝑛 ∣ 𝑛 ∈ 𝐍} also covers 𝑋. (We say that {𝑈𝑛 ∣ 𝑛 ∈ 𝐍} is a countable subcover of 𝑋.)

2.6. The topology of metric spaces

29

(c) Explain why there must exist a sequence 𝑎𝑛 in 𝑋 such that for each 𝑛 ∈ 𝐍, 𝑎𝑛 ∉ 𝑈1 ∪ ⋯ ∪ 𝑈𝑛 . (d) Prove that if 𝑎𝑛 is a sequence as described in (c), then 𝑎𝑛 has no convergent subsequence whose limit is in 𝑋. 2.6.11. (*) Let 𝑋 be an arbitrary metric space (not necessarily 𝑋 = 𝐂), and suppose that 𝐶 ⊆ 𝑋 is sequentially compact. (a) Prove that 𝐶 must be closed. (b) Prove that 𝐶 must be bounded.

3 Complex-valued calculus But neither Fourier nor anyone else in the early 1820s can prove that Fourier Integrals work for all 𝑓(𝑥)’s, in part because there’s still deep confusion in math about how to define the integral. . . but anyway, the reason we’re even mentioning the F.I. problem is that A.-L. Cauchy’s work on it leads him to most of the quote-unquote rigorizing of analysis he gets credit for, some of which rigor involves defining the integral as ‘the limit of a sum’ but most (= most of the rigor) concerns the convergence problems mentioned in (b) and its little [Quick Embedded Interpolation] in the Differential Equations part of [Emergency Glossary II], specifically as those problems pertain to Fourier Series∗ . * — There’s really nothing to be done about the preceding sentence except apologize.

— David Foster Wallace, Everything and More Having established the fundamentals of real and complex numbers in Chapter 2, in this chapter, we redo the march through calculus from Analysis I, from continuity and limits (Section 3.1), to differentiation (Section 3.2), to integration (Sections 3.3 and 3.4), and finishing with the Fundamental Theorems of Calculus (Section 3.5). The attentive reader should notice that even though our primary concern is functions 𝑓 ∶ 𝐑 → 𝐂 (i.e., real inputs and complex outputs), in this chapter and the next, we will often be as general as possible without extra effort. Specifically, for continuity, limits, and differentiation, we consider functions 𝑓 ∶ 𝐂 → 𝐂, or more generally, 𝑓 ∶ 𝑋 → 𝐂 for some 𝑋 ⊆ 𝐂, as functions of a complex variable can be treated just like functions of a real variable in those respects. (In fact, we will occasionally find the extra generality to be useful.) On the other hand, when we turn to integration, we see that the definition of the Riemann integral is essentially a matter of functions 𝑓 ∶ 𝐑 → 𝐑, 31

32

Chapter 3. Complex-valued calculus

though by handling real and imaginary parts separately, we may extend integration to functions 𝑓 ∶ 𝐑 → 𝐂, as desired.

3.1 Continuity and limits We first note the following convention. Convention 3.1.1. Throughout this chapter and the next, we will use 𝑓(𝑧) to denote a function of a complex variable 𝑧 and 𝑓(𝑥) to denote either a function of a real variable 𝑥 or a function whose domain is an arbitrary metric space 𝑋. Continuity and limits generalize from real-valued functions of a real variable to complex-valued functions of a complex variable almost without change. We begin with the definition of continuity. Definition 3.1.2. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and let 𝑎 be a point in 𝑋. To say that 𝑓 is continuous at 𝑎 means that one of the following conditions holds: •

(Sequential continuity) For every sequence 𝑧𝑛 in 𝑋 such that lim 𝑧𝑛 = 𝑎, we have 𝑛→∞

that lim 𝑓(𝑧𝑛 ) = 𝑓(𝑎). 𝑛→∞

(𝜖-𝛿 continuity) For every 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if |𝑧 − 𝑎| < 𝛿(𝜖), then |𝑓(𝑧) − 𝑓(𝑎)| < 𝜖.

To say that 𝑓 is continuous on 𝑋 means that 𝑓 is continuous at 𝑎 for all 𝑎 ∈ 𝑋. Note that Definition 3.1.2 uses only the metric properties of 𝐑 and 𝐂 and not any other features. We may therefore generalize Definition 3.1.2 to arbitrary metric spaces: Definition 3.1.3. Let 𝑋 and 𝑌 be metric spaces, let 𝑓 ∶ 𝑋 → 𝑌 be a function, and let 𝑎 be a point in 𝑋. To say that 𝑓 is continuous at 𝑎 means that one of the following conditions holds: •

(Sequential continuity) For every sequence 𝑥𝑛 in 𝑋 such that lim 𝑥𝑛 = 𝑎, we have 𝑛→∞

that lim 𝑓(𝑥𝑛 ) = 𝑓(𝑎). 𝑛→∞

(𝜖-𝛿 continuity) For every 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if 𝑑(𝑥, 𝑎) < 𝛿(𝜖), then 𝑑(𝑓(𝑥), 𝑓(𝑎)) < 𝜖.

To say that 𝑓 is continuous on 𝑋 means that 𝑓 is continuous at 𝑎 for all 𝑎 ∈ 𝑋. Note that the sequential continuity of 𝑓 ∶ 𝑋 → 𝑌 is equivalent to saying that lim 𝑓(𝑥𝑛 ) = 𝑓 ( lim 𝑥𝑛 )

𝑛→∞

for every convergent sequence 𝑥𝑛 in 𝑋.

𝑛→∞

(3.1.1)

3.1. Continuity and limits

33

Now, it would be perverse to have two ideas called “continuity” (sequential and 𝜖-𝛿) if they were not equivalent, and indeed they are: Theorem 3.1.4. Let 𝑋 and 𝑌 be metric spaces, let 𝑓 ∶ 𝑋 → 𝑌 be a function, and let 𝑎 be a point in 𝑋. Then 𝑓 is sequentially continuous at 𝑎 if and only if 𝑓 is 𝜖-𝛿 continuous at 𝑎. In particular, sequential continuity is equivalent to 𝜖-𝛿 continuity for complexvalued functions on subsets of 𝐑. Proof. On the one hand, suppose 𝑓 is 𝜖-𝛿 continuous at 𝑎 and 𝑥𝑛 is a sequence in 𝑋 such that lim 𝑥𝑛 = 𝑎. By the definition of 𝜖-𝛿 continuity, for every 𝜖 > 0, there 𝑛→∞

exists some 𝛿(𝜖) > 0 such that if 𝑑(𝑥, 𝑎) < 𝛿(𝜖), then 𝑑(𝑓(𝑥), 𝑓(𝑎)) < 𝜖; and by the definition of convergent sequence, there exists some 𝑁𝑥 (𝜖𝑥 ) such that if 𝑛 > 𝑁𝑥 (𝜖𝑥 ), then 𝑑(𝑥𝑛 , 𝑎) < 𝜖𝑥 . So, given 𝜖 > 0, let 𝑁 = 𝑁(𝜖) = 𝑁𝑥 (𝛿(𝜖)). Then given 𝑛 > 𝑁 = 𝑁𝑥 (𝛿(𝜖)), 𝑑(𝑥𝑛 , 𝑎) < 𝛿(𝜖); and because 𝑑(𝑥𝑛 , 𝑎) < 𝛿(𝜖), 𝑑(𝑓(𝑥𝑛 ), 𝑓(𝑎)) < 𝜖. Conversely, suppose 𝑓 is not 𝜖-𝛿 continuous at 𝑎, or in other words: (*) There exists 𝜖0 > 0 such that for every 𝛿 > 0, there exists 𝑥 ∈ 𝑋 such that 𝑑(𝑥, 𝑎) < 𝛿 and 𝑑(𝑓(𝑥), 𝑓(𝑎)) ≥ 𝜖0 . We construct a sequence 𝑥𝑛 that violates sequential continuity (a “bad sequence”) as 1 follows. For each 𝑛 ∈ 𝐍, by (*), we may choose 𝑥𝑛 ∈ 𝑋 such that 𝑑(𝑥𝑛 , 𝑎) < and 𝑛 𝑑(𝑓(𝑥𝑛 ), 𝑓(𝑎)) ≥ 𝜖0 . Then lim 𝑥𝑛 = 𝑎, by the Metric Squeeze Lemma (Lemma 2.4.16), 𝑛→∞

but the sequence 𝑓(𝑥𝑛 ) never comes within 𝜖0 of 𝑓(𝑎), so 𝑓(𝑥𝑛 ) cannot converge to 𝑓(𝑎). The theorem follows. We will find both the sequential and 𝜖-𝛿 versions of continuity to be useful for different purposes. For example, 𝜖-𝛿 continuity will later be useful when considering the properties of a continuous function over an entire interval. Conversely, sequential continuity makes the proof of the algebraic properties of continuity straightforward: Theorem 3.1.5. Let 𝑋 be a subset of 𝐂, let 𝑓, 𝑔 ∶ 𝑋 → 𝐂 be functions, and for some 𝑎 ∈ 𝑋, suppose that 𝑓 and 𝑔 are contiuous at 𝑎. Then: (1) For 𝑐 ∈ 𝐂, 𝑐𝑓(𝑧) is continuous at 𝑎. (2) 𝑓(𝑧) + 𝑔(𝑧) is continuous at 𝑎. (3) 𝑓(𝑧) is continuous at 𝑎. (4) 𝑓(𝑧)𝑔(𝑧) is continuous at 𝑎. (5) If 𝑔(𝑧) ≠ 0 for all 𝑧 ∈ 𝑋, then 𝑓(𝑧)/𝑔(𝑧) is continuous at 𝑎. Proof. This essentially follows from Theorem 2.4.6. We give the details for property (4) as an example and leave the rest for Problem 3.1.1. For 𝑋 ⊆ 𝐂, functions 𝑓, 𝑔 ∶ 𝑋 → 𝐂, and 𝑎 ∈ 𝑋, suppose that 𝑓 and 𝑔 are continuous at 𝑎, and suppose 𝑧𝑛 is a sequence in 𝑋 such that lim 𝑧𝑛 = 𝑎. By the definition 𝑛→∞

of sequential continuity, lim 𝑓(𝑧𝑛 ) = 𝑓(𝑎) and lim 𝑔(𝑧𝑛 ) = 𝑔(𝑎), which means that 𝑛→∞

𝑛→∞

by Theorem 2.4.6, part (4), lim 𝑓(𝑧𝑛 )𝑔(𝑧𝑛 ) = 𝑓(𝑎)𝑔(𝑎). Property (4) then follows by 𝑛→∞

definition of sequential continuity.

34

Chapter 3. Complex-valued calculus

Next, in exactly the same manner as Theorem 2.4.6 implies Theorem 3.1.5, Theorem 2.4.12 implies: Theorem 3.1.6. Let 𝑋 be a subset of 𝐂, let 𝑓 ∶ 𝑋 → 𝐂 be a function, let 𝑢 and 𝑣 be the real and imaginary parts of 𝑓 (i.e., 𝑓(𝑧) = 𝑢(𝑧) + 𝑖𝑣(𝑧)), and let 𝑎 be a point of 𝑋. Then 𝑓 is continuous at 𝑎 if and only if both 𝑢 and 𝑣 are continuous at 𝑎. Sequential continuity is also useful in proving that the composition of continuous functions is continuous, which we do for general metric spaces. Theorem 3.1.7. Let 𝑋, 𝑌 , and 𝑍 be metric spaces, let 𝑓 ∶ 𝑋 → 𝑌 and 𝑔 ∶ 𝑌 → 𝑍 be functions, let 𝑎 be a point in 𝑋, and suppose that 𝑓 is continuous at 𝑎 and 𝑔 is continuous at 𝑓(𝑎). Then 𝑔 ∘ 𝑓 is continuous at 𝑎. Proof. Problem 3.1.2. We briefly pause to note some examples of continuous functions, one concrete, one quite abstract. Example 3.1.8. For 𝑋 = 𝐑 or 𝐂, any polynomial function 𝑝 ∶ 𝑋 → 𝑋 is continuous (Problem 3.1.3). Example 3.1.9. Let 𝑋 be any metric space, and fix a point 𝑎 ∈ 𝑋. Then the function 𝑓 ∶ 𝑋 → 𝐑 given by 𝑓(𝑥) = 𝑑(𝑎, 𝑥) is continuous (Problem 3.1.4). As a special case of Example 3.1.9, we have the following useful fact: Corollary 3.1.10. The function 𝑓 ∶ 𝐂 → 𝐑 given by 𝑓(𝑧) = |𝑧| is continuous. Proof. Problem 3.1.4. We now come to the version of continuity known as uniform continuity, which we will use, for example, to study integration. Definition 3.1.11. Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓 ∶ 𝑋 → 𝐂 be a function. To say that 𝑓 is uniformly continuous on 𝑋 means that for every 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if 𝑧, 𝑤 ∈ 𝑋 and |𝑧 − 𝑤| < 𝛿(𝜖), then |𝑓(𝑧) − 𝑓(𝑤)| < 𝜖. Note that to say that 𝑓 is continuous at every 𝑎 ∈ 𝑋 means precisely that: For every 𝑎 ∈ 𝑋 and 𝜖 > 0, there exists some 𝛿(𝜖, 𝑎) > 0 such that if 𝑧 ∈ 𝑋 and |𝑧 − 𝑎| < 𝛿(𝜖, 𝑎), then |𝑓(𝑧) − 𝑓(𝑎)| < 𝜖. Uniform continuity on 𝑋 is therefore generally a stricter condition than continuity on 𝑋, as the “degree of continuity” 𝛿(𝜖, 𝑎) is no longer allowed to vary with 𝑎. However, we have the following result, whose proof is a consequence of Bolzano-Weierstrass: Theorem 3.1.12. If 𝑋 is a closed and bounded subset of 𝐂 and 𝑓 ∶ 𝑋 → 𝐂 is continuous, then 𝑓 is uniformly continuous on 𝑋. Proof. Problem 3.1.5.

3.1. Continuity and limits

35

Remark 3.1.13. For the reader familiar with Section 2.6, we note that since Theorem 3.1.12 really only relies on Bolzano-Weierstrass (see Problem 3.1.5), by Corollary 2.6.7, the theorem can be extended to compact subsets of metric spaces (Definition 2.6.6). We also have the Extreme Value Theorem, which is most definitely a result on real-valued functions, as the statement makes no sense for complex-valued functions. Before we get to that statement, however, it will be helpful to have the following definition. Definition 3.1.14. Let 𝑋 be a nonempty subset of 𝐂. To say that 𝑓 ∶ 𝑋 → 𝐂 is bounded means that there exists some 𝑀 > 0 such that |𝑓(𝑥)| ≤ 𝑀 for all 𝑥 ∈ 𝑋. Theorem 3.1.15 (Extreme Value Theorem). Let 𝑋 be a closed and bounded subset of 𝐂, and let 𝑓 ∶ 𝑋 → 𝐑 be continuous. Then 𝑓 attains both an absolute maximum and an absolute minimum on 𝑋; that is, there exist 𝑐, 𝑑 ∈ 𝑋 such that 𝑓(𝑐) ≤ 𝑓(𝑧) ≤ 𝑓(𝑑) for all 𝑧 ∈ 𝑋. Proof. We first show that 𝑓 must be bounded. Proceeding by contradiction, suppose that for each 𝑛 ∈ 𝐍 there exists some 𝑧𝑛 ∈ 𝑋 such that |𝑓(𝑧𝑛 )| > 𝑛. By BolzanoWeierstrass (Theorem 2.5.9), there exists a subsequence 𝑧𝑛𝑘 of 𝑧𝑛 such that lim 𝑧𝑛𝑘 = 𝑘→∞

𝐿 ∈ 𝑋. Since 𝑓 is (sequentially) continuous, it follows that lim 𝑓(𝑧𝑛𝑘 ) = 𝑓(𝐿). How𝑘→∞

ever, since ||𝑓(𝑧𝑛𝑘 )|| > 𝑛𝑘 , the sequence 𝑓(𝑧𝑛𝑘 ) is unbounded; contradiction. It follows that 𝑓 is bounded. By symmetry, it now suffices to show that 𝑓 has a global maximum at some 𝑑 ∈ 𝑋; see Problem 3.1.6 for the proof. The Extreme Value Theorem does, however, have the following useful corollary for complex-valued functions. Corollary 3.1.16. Let 𝑋 be a closed and bounded subset of 𝐂, and let 𝑓 ∶ 𝑋 → 𝐂 be continuous. Then 𝑓 is bounded. Proof. Problem 3.1.7. We also state the following important complement to the Extreme Value Theorem. Theorem 3.1.17 (Intermediate Value Theorem). Let 𝑓 ∶ [𝑎, 𝑏] → 𝐑 be continuous, and supppose that either 𝑓(𝑎) < 𝑑 < 𝑓(𝑏) or 𝑓(𝑎) > 𝑑 > 𝑓(𝑏). Then there exists some 𝑐 ∈ [𝑎, 𝑏] such that 𝑓(𝑐) = 𝑑. Proof. Problem 3.1.8. We now come to limits of functions, which we need to define derivatives. Most properties of limits of functions can be defined and proven in almost exactly the same way as the analogous properties of continuous functions, so we leave all proofs to the reader in the problems. First, as the reader may recall from calculus, the most useful case of the limit lim 𝑓(𝑥) is when 𝑎 is not in the domain of 𝑓, which makes the following 𝑥→𝑎

36

Chapter 3. Complex-valued calculus

definition helpful when dealing with limits: Definition 3.1.18. For 𝑋 a nonempty subset of 𝐂, to say that 𝑎 is a limit point of 𝑋 means that there exists some sequence 𝑧𝑛 in 𝑋 such that lim 𝑧𝑛 = 𝑎 and 𝑧𝑛 ≠ 𝑎 for all 𝑛→∞

𝑛. (Note that this is possible both for 𝑎 ∈ 𝑋 and 𝑎 ∉ 𝑋.) Turning to the formal definition of limit, we emphasize in bold the few places where the definition of continuity needs to be changed to get the definition of limit: Definition 3.1.19. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and let 𝑎 be a limit point of 𝑋. To say that lim 𝑓(𝑧) = 𝐿 means that one of the following 𝑧→𝑎

conditions holds: •

(Sequential limit) For every sequence 𝑧𝑛 in 𝑋 such that lim 𝑧𝑛 = 𝑎 and 𝑧𝑛 ≠ 𝑎 for 𝑛→∞

all 𝑛, we have that lim 𝑓(𝑧𝑛 ) = 𝐿. 𝑛→∞

(𝜖-𝛿 limit) For every 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if |𝑧 − 𝑎| < 𝛿(𝜖) and 𝑧 ≠ 𝑎, then |𝑓(𝑧) − 𝐿| < 𝜖. Again, the two versions of the definition of limit are equivalent:

Theorem 3.1.20. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and let 𝑎 be a limit point of 𝑋. For 𝐿 ∈ 𝐂, the sequential and 𝜖-𝛿 definitions of lim 𝑓(𝑧) = 𝐿 are 𝑧→𝑎

equivalent. Proof. Problem 3.1.9. Furthermore, note that comparing Definitions 3.1.2 and 3.1.19, we see that 𝑓 is continuous at 𝑎 if and only if lim 𝑓(𝑧) = 𝑓(𝑎), a fact we will find useful later. In any 𝑧→𝑎

case, we now state the algebraic limit laws and the fact that limits can be broken down to their real and imaginary parts; the proofs of those results are left to the reader. Theorem 3.1.21. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and for some limit point 𝑎 of 𝑋, suppose that lim 𝑓(𝑧) = 𝐿 and lim 𝑔(𝑧) = 𝑀. Then: 𝑧→𝑎

𝑧→𝑎

(1) For 𝑐 ∈ 𝐂, lim 𝑐𝑓(𝑧) = 𝑐𝐿. 𝑧→𝑎

(2) lim (𝑓(𝑧) + 𝑔(𝑧)) = 𝐿 + 𝑀. 𝑧→𝑎

(3) lim 𝑓(𝑧) = 𝐿. 𝑧→𝑎

(4) lim (𝑓(𝑧)𝑔(𝑧)) = 𝐿𝑀. 𝑧→𝑎

(5) If 𝑔(𝑧) ≠ 0 for all 𝑧 ∈ 𝑋 and 𝑀 ≠ 0, then lim

𝑧→𝑎

Proof. Problem 3.1.10.

𝑓(𝑧) 𝐿 = . 𝑀 𝑔(𝑧)

3.1. Continuity and limits

37

Theorem 3.1.22. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓 ∶ 𝑋 → 𝐂 be a function, let 𝑢 and 𝑣 be the real and imaginary parts of 𝑓, let 𝐿 = 𝑟 + 𝑠𝑖 (𝑟, 𝑠 ∈ 𝐑), and let 𝑎 be a limit point of 𝑋. Then lim 𝑓(𝑧) = 𝐿 if and only if lim 𝑢(𝑧) = 𝑟 and lim 𝑣(𝑧) = 𝑠. 𝑧→𝑎

𝑧→𝑎

𝑧→𝑎

Proof. Problem 3.1.11. We will also need the Squeeze Lemma for functions, which the reader may recall from calculus. Lemma 3.1.23 (Squeeze Lemma). Let 𝑋 be a nonempty subset of 𝐂, let 𝑓, 𝑔, ℎ ∶ 𝑋 → 𝐑 be real-valued functions such that 𝑓(𝑧) ≤ 𝑔(𝑧) ≤ ℎ(𝑧) for all 𝑧 ∈ 𝑋, and for some limit point 𝑎 of 𝑋, suppose lim 𝑓(𝑧) = 𝐿 = lim ℎ(𝑧). (3.1.2) 𝑧→𝑎

𝑧→𝑎

Then lim 𝑔(𝑧) = 𝐿. 𝑧→𝑎

Proof. Problem 3.1.12. We also have the following result on composition and limits. (Compare Theorem 3.1.7.) Theorem 3.1.24. Let 𝑋 and 𝑌 be nonempty subsets of 𝐂, let 𝑓 ∶ 𝑋 → 𝑌 and 𝑔 ∶ 𝑌 → 𝐂 be functions, let 𝑎 be a limit point of 𝑋, and suppose that lim 𝑓(𝑧) = 𝑏 ∈ 𝑌 and 𝑔 is 𝑧→𝑎

continuous at 𝑏. Then lim 𝑔(𝑓(𝑧)) = 𝑔(𝑏). 𝑧→𝑎

Proof. Problem 3.1.13. We conclude this section by defining piecewise properties of functions precisely. Definition 3.1.25. To say that 𝑓 ∶ [𝑎, 𝑏] → 𝐂 is piecewise continuous means that there exist 𝑎0 , … , 𝑎𝑛 ∈ [𝑎, 𝑏] with 𝑎 = 𝑎0 < 𝑎1 < ⋯ < 𝑎𝑛 = 𝑏 such that on each subinterval [𝑎𝑖−1 , 𝑎𝑖 ] (1 ≤ 𝑖 ≤ 𝑛): − (1) the limits lim+ 𝑓(𝑥) and lim− 𝑓(𝑥), which we denote by 𝑓(𝑎+ 𝑖−1 ) and 𝑓(𝑎𝑖 ), re𝑥→𝑎𝑖−1

𝑥→𝑎𝑖

spectively, both exist; and (2) the function + ⎧𝑓(𝑎𝑖−1 ) if 𝑥 = 𝑎𝑖−1 , 𝑓𝑖 (𝑥) = 𝑓(𝑥) if 𝑎𝑖−1 < 𝑥 < 𝑎𝑖 , ⎨ − if 𝑥 = 𝑎𝑖 , ⎩𝑓(𝑎𝑖 ) is continuous on [𝑎𝑖−1 , 𝑎𝑖 ].

(3.1.3)

Similarly, to say that 𝑓 is piecewise 𝑃 (where later, 𝑃 will be “differentiable” or “Lipschitz”) means that the above holds with “continuous” replaced by 𝑃. In all of these cases, the intervals [𝑎𝑖−1 , 𝑎𝑖 ] are called the intervals of continuity of 𝑓. Remark 3.1.26. For the reader who has gone through the material on point-set topology in Section 2.6, it is worth mentioning how point-set topology allows us to unify the different versions of the Extreme Value Theorem (Theorem 3.1.15 and Corollary 3.1.16).

38

Chapter 3. Complex-valued calculus

To start, the point-set definition of continuity is as follows. To say that a function 𝑓 ∶ 𝑋 → 𝑌 between topological spaces is continuous means that for any open subset 𝑈 of 𝑌 , the preimage 𝑓−1 (𝑈) is an open subset of 𝑋. Now, this definition is far from obviously related to either the 𝜖-𝛿 or sequential definitions of continuity (Definition 3.1.3), and the reader may at first be especially puzzled to see preimages playing such a fundamental role here. Nevertheless, Problem 3.1.14 shows that when 𝑋 and 𝑌 are metric spaces, the point-set definition of continuity is equivalent to the metric definition of continuity. Turning to the Extreme Value Theorem, the point-set version is: Extreme Value Theorem (point-set version). Let 𝑓 ∶ 𝑋 → 𝑌 be continuous. If 𝐶 is a compact subset of 𝑋, then 𝑓(𝐶) is a compact subset of 𝑌 . Problem 3.1.15 explains why this generalizes the real-valued Extreme Value Theorem. Unlike the real-valued theorem, however, the point-set version of the Extreme Value Theorem still makes sense when, for example, 𝑋 = 𝑌 = 𝐂. Problem 3.1.16 gives a proof using sequential compactness (Definition 2.6.6), and Problem 3.1.17 gives a proof using point-set compactness (Definition 2.6.10).

Problems. 3.1.1. (Proves Theorem 3.1.5) Prove the parts of Theorem 3.1.5 other than property (4). 3.1.2. (Proves Theorem 3.1.7) Let 𝑋, 𝑌 , and 𝑍 be metric spaces, let 𝑓 ∶ 𝑋 → 𝑌 and 𝑔 ∶ 𝑌 → 𝑍 be functions, let 𝑎 be a point in 𝑋, and suppose that 𝑓 is continuous at 𝑎 and 𝑔 is continuous at 𝑓(𝑎). Prove that 𝑔 ∘ 𝑓 is continuous at 𝑎. 3.1.3. This problem proves that any complex polynomial function is continuous. (The real case follows by restriction.) (a) Prove that 𝑓 ∶ 𝐂 → 𝐂 defined by 𝑓(𝑧) = 𝑧 is continuous. (b) Prove that if 𝑝(𝑧) = 𝑎𝑛 𝑧𝑛 + ⋯ + 𝑎0 is a polynomial function with coefficients in 𝐂, then 𝑝 ∶ 𝐂 → 𝐂 is continuous. 3.1.4. (Proves Corollary 3.1.10) Let 𝑋 be a metric space. (a) Fix some 𝑎 ∈ 𝑋, and define 𝑓 ∶ 𝑋 → 𝐑 by 𝑓(𝑥) = 𝑑(𝑎, 𝑥). Prove that 𝑓 is continuous on 𝑋, or in other words, prove that for each 𝑏 ∈ 𝑋, 𝑓 is continuous at 𝑏. (b) Prove that the function 𝑓 ∶ 𝐂 → 𝐑 given by 𝑓(𝑧) = |𝑧| is continuous. 3.1.5. (Proves Theorem 3.1.12) Suppose 𝑋 is a closed and bounded subset of 𝐂 and 𝑓 ∶ 𝑋 → 𝐂 is continuous. (a) Prove that if 𝑓 is not uniformly continuous on 𝑋, then there exists a con1 stant 𝜖0 > 0 and sequences 𝑧𝑛 and 𝑤𝑛 in 𝑋 such that |𝑧𝑛 − 𝑤𝑛 | < and 𝑛 |𝑓(𝑧𝑛 ) − 𝑓(𝑤𝑛 )| ≥ 𝜖0 for all 𝑛. (b) Use contradiction to prove that 𝑓 is uniformly continuous on 𝑋. 3.1.6. (Proves Theorem 3.1.15) Let 𝑋 be a closed and bounded subset of 𝐂, and let 𝑓 ∶ 𝑋 → 𝐑 be continuous. Prove that there exists some 𝑑 ∈ 𝑋 such that 𝑓(𝑧) ≤ 𝑓(𝑑) for all 𝑧 ∈ 𝑋.

3.1. Continuity and limits

39

3.1.7. (Proves Corollary 3.1.16) Let 𝑋 be a closed and bounded subset of 𝐂, and let 𝑓 ∶ 𝑋 → 𝐂 be continuous. Prove that there exists some 𝑀 > 0 such that |𝑓(𝑧)| < 𝑀 for all 𝑧 ∈ 𝑋. 3.1.8. (Proves Theorem 3.1.17) Let 𝑓 ∶ [𝑎, 𝑏] → 𝐑 be continuous, and suppose 𝑓(𝑎) < 𝑑 < 𝑓(𝑏). (a) Let 𝑐 = sup {𝑥 ∈ [𝑎, 𝑏] ∣ 𝑓(𝑥) ≤ 𝑑}. Prove that 𝑓(𝑐) ≤ 𝑑. (b) Prove that if 𝑓(𝑐) < 𝑑, then there exists some 𝑥 such that 𝑐 < 𝑥 < 𝑏 and 𝑓(𝑥) < 𝑑. Conclude that 𝑓(𝑐) = 𝑑. 3.1.9. Let 𝑋 be a nonempty subset of 𝐑, let 𝑓 ∶ 𝑋 → 𝐂 be a function, let 𝑎 be a limit point of 𝑋, and fix 𝐿 ∈ 𝐂. (a) Suppose lim 𝑓(𝑧) = 𝐿 by the 𝜖-𝛿 definition of limit. Prove that lim 𝑓(𝑧) = 𝐿 𝑧→𝑎

𝑧→𝑎

by the sequential definition of limit. (b) Now suppose that it is not the case that lim 𝑓(𝑧) = 𝐿 using the 𝜖-𝛿 definition 𝑧→𝑎

of limit. Prove that it is not the case that lim 𝑓(𝑧) = 𝐿 using the sequential 𝑧→𝑎

definition of limit. 3.1.10. (Proves Theorem 3.1.21) Prove Theorem 3.1.21. 3.1.11. (Proves Theorem 3.1.22) Prove Theorem 3.1.22. 3.1.12. (Proves Lemma 3.1.23) Let 𝑋 be a nonempty subset of 𝐂, let 𝑓, 𝑔, ℎ ∶ 𝑋 → 𝐑 be real-valued functions such that 𝑓(𝑧) ≤ 𝑔(𝑧) ≤ ℎ(𝑧) for all 𝑧 ∈ 𝑋, and for some limit point 𝑎 of 𝑋, suppose lim 𝑓(𝑧) = 𝐿 = lim ℎ(𝑧).

𝑧→𝑎

(3.1.4)

𝑧→𝑎

Prove that lim 𝑔(𝑧) = 𝐿. 𝑧→𝑎

3.1.13. (Proves Theorem 3.1.24) Let 𝑋 and 𝑌 be nonempty subsets of 𝐂, let 𝑓 ∶ 𝑋 → 𝑌 and 𝑔 ∶ 𝑌 → 𝐂 be functions, and let 𝑎 be a limit point of 𝑋. (a) Suppose lim 𝑓(𝑧) = 𝑏 ∈ 𝑌 and 𝑔 is continuous at 𝑏. Prove that lim 𝑔(𝑓(𝑧)) = 𝑧→𝑎

𝑧→𝑎

𝑔(𝑏). (b) In contrast, find 𝑓 and 𝑔 such that 𝑓 is continuous at 𝑎 ∈ 𝑋, 𝑓(𝑎) = 𝑏, and lim 𝑔(𝑤) exists, but lim 𝑔(𝑓(𝑧)) does not exist. 𝑤→𝑏

𝑧→𝑎

3.1.14. (*) Let 𝑋 and 𝑌 be metric spaces, and let 𝑓 ∶ 𝑋 → 𝑌 be a function. Prove that 𝑓 is continuous in the sense defined in Remark 3.1.26 if and only if 𝑓 is continuous in the sense of Definition 3.1.3. 3.1.15. (*) Prove that if 𝐶 is a compact subset of 𝐑, then 𝐶 has a maximum value. (In other words, prove that sup 𝐶 is finite and contained in 𝐶.) 3.1.16. (*) Let 𝑋 and 𝑌 be metric spaces, and let 𝑓 ∶ 𝑋 → 𝑌 be continuous. Prove that if 𝐶 is a (sequentially) compact subset of 𝑋 (Definition 2.6.6), then 𝑓(𝐶) is a (sequentially) compact subset of 𝑌 . 3.1.17. (*) Let 𝑋 and 𝑌 be metric spaces, and let 𝑓 ∶ 𝑋 → 𝑌 be continuous. Prove that if 𝐶 is a point-set compact subset of 𝑋 (Definition 2.6.10), then 𝑓(𝐶) is a point-set compact subset of 𝑌 .

40

Chapter 3. Complex-valued calculus

3.2 Differentiation As with continuity and limits, derivatives extend to complex-valued functions with complex domains with little change, except that one of the most important results in differentiation, the Mean Value Theorem, does not work as it did before (Example 3.2.11). That fact will sometimes force us to do calculus of a complex-valued function by doing calculus on its real and imaginary parts; however, once we make that adjustment, everything works just fine. We begin with a space- and time-saving convention, followed by the basic definition. Convention 3.2.1. For the rest of this section, we let 𝑋 denote a subset of 𝐂 such that every point of 𝑋 is a limit point of 𝑋 (Definition 3.1.18), and similarly for 𝑌 . This ensures that lim makes sense for all points 𝑎 ∈ 𝑋 and 𝑎 ∈ 𝑌 . 𝑧→𝑎

Definition 3.2.2. Let 𝑋 be as in Convention 3.2.1, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and let 𝑎 be a point in 𝑋. To say that 𝑓 is differentiable at 𝑎 means that the limit 𝑓′ (𝑎) = lim

𝑧→𝑎

𝑓(𝑎 + ℎ) − 𝑓(𝑎) 𝑓(𝑧) − 𝑓(𝑎) = lim 𝑧−𝑎 ℎ ℎ→0

(3.2.1)

exists (where ℎ = 𝑧 − 𝑎). To say that 𝑓 is differentiable on 𝑋 means that for all 𝑎 ∈ 𝑋, 𝑓 is differentiable at 𝑎; and to say that 𝑓 is continuously differentiable on 𝑋 means that 𝑓 is differentiable on 𝑋 and 𝑓′ ∶ 𝑋 → 𝐂 is continuous. Example 3.2.3. For 𝛼 ∈ 𝐂, 𝑓 ∶ 𝐂 → 𝐂 defined by 𝑓(𝑧) = 𝛼𝑧, 𝑓 is differentiable at all 𝑧 ∈ 𝐂 and 𝑓′ (𝑧) = 𝛼 for all 𝑧 ∈ 𝐂 (Problem 3.2.1). Similarly, if we define 𝑔 ∶ (𝐂 − {0}) → 𝐂 by 𝑔(𝑧) = 𝑧−1 , then for all 𝑧 ∈ 𝐂 − {0}, 𝑔 is differentiable at 𝑧 and 𝑔′ (𝑧) = −𝑧−2 (Problem 3.2.1). Note that if 𝑢(𝑧) and 𝑣(𝑧) are the real and imaginary parts of 𝑓(𝑧), then lim

𝑧→𝑎

𝑓(𝑧) − 𝑓(𝑎) 𝑢(𝑧) − 𝑢(𝑎) 𝑣(𝑧) − 𝑣(𝑎) = lim ( +𝑖 ). 𝑧−𝑎 𝑧−𝑎 𝑧−𝑎 𝑧→𝑎

(3.2.2)

Theorem 3.1.22 therefore immediately yields the following corollary. Corollary 3.2.4. Let 𝑋 be as in Convention 3.2.1 and let 𝑓 ∶ 𝑋 → 𝐂 be a function. If 𝑢(𝑧) and 𝑣(𝑧) are the real and imaginary parts of 𝑓(𝑧), then 𝑓 is differentiable at 𝑎 ∈ 𝑋 if and only if both 𝑢(𝑧) and 𝑣(𝑧) are differentiable at 𝑎. One helpful way to think of differentiability is to observe that being differentiable at 𝑎 is equivalent to having a sufficiently good linear approximation near 𝑎. More precisely: Lemma 3.2.5. Let 𝑋 be as in Convention 3.2.1 and let 𝑓 ∶ 𝑋 → 𝐂 be a function. For 𝑎 ∈ 𝑋, the following are equivalent: •

𝑓 is differentiable at 𝑎.

3.2. Differentiation •

41

There exists some 𝑚 ∈ 𝐂 such that if we define 𝐸𝑓 ∶ 𝑋 → 𝐂 by 𝑓(𝑧) − 𝑓(𝑎) −𝑚 𝐸𝑓 (𝑧) = { 𝑧 − 𝑎 0

for 𝑧 ≠ 𝑎,

(3.2.3)

for 𝑧 = 𝑎,

for all 𝑧 ∈ 𝑋, then 𝐸𝑓 (𝑧) is continuous at 𝑎 (i.e., lim 𝐸𝑓 (𝑧) = 0). 𝑧→𝑎

Futhermore, if either (and therefore both) of these conditions hold, 𝑚 = 𝑓′ (𝑎). Proof. Problem 3.2.2. Lemma 3.2.5 is most often used in the form of the following immediate corollary. Corollary 3.2.6 (Local Linearity). Let 𝑋 be as in Convention 3.2.1 and let 𝑓 ∶ 𝑋 → 𝐂 be differentiable at 𝑎 ∈ 𝑋. Then there exists a function 𝐸𝑓 ∶ 𝑋 → 𝐂 such that 𝐸𝑓 is continuous at 𝑎, 𝐸𝑓 (𝑎) = 0, and 𝑓(𝑧) = 𝑓(𝑎) + (𝑓′ (𝑎) + 𝐸𝑓 (𝑧))(𝑧 − 𝑎)

(3.2.4)

for all 𝑧 ∈ 𝑋. If we replace 𝐸𝑓 (𝑧) with 0 in (3.2.4), we get the approximation 𝑓(𝑧) ≈ 𝑓(𝑎) + 𝑓′ (𝑎)(𝑧 − 𝑎).

(3.2.5)

As the reader may recall from calculus, (3.2.5) is known as the local linear approximation to 𝑓 at 𝑎. The function 𝐸𝑓 (𝑧) in Corollary 3.2.6 is therefore sometimes called the relative error (i.e., error in the slope) in the local linear approximation. To give one example of the usefulness of local linearity, it quickly follows that: Corollary 3.2.7. Let 𝑋 be as in Convention 3.2.1, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and suppose 𝑓 is differentiable at 𝑎 ∈ 𝑋. Then 𝑓 is continuous at 𝑎. Proof. Let 𝐸𝑓 (𝑧) be as in Corollary 3.2.6. Then by (3.2.4) and the limit laws for functions, we see that lim 𝑓(𝑧) = 𝑓(𝑎) + (𝑓′ (𝑎) + lim 𝐸𝑓 (𝑧))(lim 𝑧 − 𝑎) 𝑧→𝑎 𝑧→𝑎 𝑧→𝑎 (3.2.6) = 𝑓(𝑎) + (𝑓′ (𝑎) + 0)(𝑎 − 𝑎) = 𝑓(𝑎). The corollary follows. The algebraic properties of the derivative can now be proven in the same manner as they were for real-valued functions. Theorem 3.2.8. Let 𝑋 be as in Convention 3.2.1 and let 𝑎 be a point in 𝑋. Suppose that 𝑓, 𝑔 ∶ 𝑋 → 𝐂 are differentiable at 𝑎. Then: (1) For 𝑐 ∈ 𝐂, 𝑐𝑓 is differentiable at 𝑎, with derivative (𝑐𝑓)′ (𝑎) = 𝑐𝑓′ (𝑎). (2) 𝑓 + 𝑔 is differentiable at 𝑎, with derivative (𝑓 + 𝑔)′ (𝑎) = 𝑓′ (𝑎) + 𝑔′ (𝑎). (3) 𝑓 is differentiable at 𝑎, with derivative 𝑓 ′ (𝑎) = 𝑓′ (𝑎). (4) 𝑓𝑔 is differentiable at 𝑎, with derivative (𝑓𝑔)′ (𝑎) = 𝑓′ (𝑎)𝑔(𝑎) + 𝑓(𝑎)𝑔′ (𝑎).

42

Chapter 3. Complex-valued calculus ′

𝑓 (5) If 𝑔(𝑧) ≠ 0 for all 𝑧 ∈ 𝑋, then 𝑓/𝑔 is differentiable at 𝑎, with derivative ( ) (𝑎) = 𝑔 𝑔(𝑎)𝑓′ (𝑎) − 𝑓(𝑎)𝑔′ (𝑎) . 𝑔(𝑎)2 Proof. Properties (1)–(3) follow from the corresponding limit laws in an entirely straightforward way, so we omit their proofs. Properties (4) and (5) are more interesting; see Problems 3.2.3 and 3.2.5. Extending the chain rule is also relatively straightforward. Theorem 3.2.9. Let 𝑋 and 𝑌 be as in Convention 3.2.1 and let 𝑎 be a point in 𝑋. Suppose that 𝑓 ∶ 𝑋 → 𝑌 is differentiable at 𝑎 and 𝑔 ∶ 𝑌 → 𝐂 is differentiable at 𝑓(𝑎). Then 𝑔 ∘ 𝑓 ∶ 𝑋 → 𝐂 is differentiable at 𝑎, and (𝑔 ∘ 𝑓)′ (𝑎) = 𝑔′ (𝑓(𝑎))𝑓′ (𝑎). Proof. Problem 3.2.4. We now quote the Mean Value Theorem, which is perhaps the most important theoretical result in differentiation. The key new point to realize is that the Mean Value Theorem only holds for real-valued functions, and not complex-valued functions (Example 3.2.11). Theorem 3.2.10 (Mean Value Theorem). Let 𝑓 ∶ [𝑎, 𝑏] → 𝐑 be a real-valued function that is continuous on [𝑎, 𝑏] and differentiable on (𝑎, 𝑏). Then there exists some 𝑐 ∈ (𝑎, 𝑏) such that 𝑓(𝑏) − 𝑓(𝑎) (3.2.7) = 𝑓′ (𝑐). 𝑏−𝑎 𝑓(2𝜋) − 𝑓(0) Example 3.2.11. Define 𝑓 ∶ 𝐑 → 𝐂 by 𝑓(𝑥) = cos 𝑥 + 𝑖 sin 𝑥. Then = 0, 2𝜋 − 0 ′ but for 𝑥 ∈ [0, 2𝜋], 𝑓 (𝑥) = − sin 𝑥 + 𝑖 cos 𝑥 is never equal to the complex number 0. Fortunately, the real-valued Mean Value Theorem is good enough to get what we need for complex-valued functions. For example, the reader may recall from calculus that a function whose derivative is 0 must be constant, and the same is true for complexvalued functions on the following kind of domain. Definition 3.2.12. We define a path segment in 𝐂 to be a differentiable function 𝛾 ∶ [0, 1] → 𝐂, and we define a path to be a concatenation of finitely many path segments (i.e., the end of each segment is the beginning of the next). To say that 𝑋 ⊆ 𝐂 is pathconnected means that for any 𝑧, 𝑤 ∈ 𝑋, there is a path in 𝑋 starting at 𝑧 and ending at 𝑤 (Figure 3.2.1). For example, open and closed discs (Definition 2.4.8) are path-connected, as are intervals in 𝐑. Corollary 3.2.13 (Zero Derivative Theorem). Let 𝑋 be a path-connected subset of 𝐂, and let 𝑓 ∶ 𝑋 → 𝐂 be a function. Suppose either that 𝑓′ (𝑧) = 0 for all 𝑧 ∈ 𝑋, or 𝑋 = [𝑎, 𝑏], 𝑓 is continuous on [𝑎, 𝑏], and 𝑓′ (𝑥) = 0 for all 𝑥 ∈ (𝑎, 𝑏). Then 𝑓 is constant on 𝑋.

3.2. Differentiation

43

z

w

Figure 3.2.1. A path-connected subset of 𝐂 Proof. By the usual 𝑓(𝑧) = 𝑢(𝑧) + 𝑖𝑣(𝑧) argument, it suffices to consider the case of a real-valued 𝑢 ∶ 𝑋 → 𝐑. In fact, it suffices to show that if 𝛾 ∶ [0, 1] → 𝑋 is a path such that 𝑢′ (𝛾(𝑡)) = 0 for all 𝑡 ∈ (0, 1) and 𝑢 is continuous at 𝛾(0) and 𝛾(1), then 𝑢(𝛾(0)) = 𝑢(𝛾(1)), for in that case, it will follow that 𝑢(𝑧) = 𝑢(𝑤) for all 𝑧, 𝑤 ∈ 𝑋. So let 𝛾 be such a path, and let 𝑓 ∶ [0, 1] → 𝐑 be defined by 𝑓(𝑡) = 𝑢(𝛾(𝑡)). Then 𝑓 is continuous on [0, 1], and the chain rule implies that for all 𝑡 ∈ (0, 1), 𝑓′ (𝑡) = 𝑢′ (𝛾(𝑡))𝛾′ (𝑡) = 0.

(3.2.8)

Applying the Mean Value Theorem 3.2.10 to 𝑓 then shows that 𝑢(𝛾(0)) = 𝑓(0) = 𝑓(1) = 𝑢(𝛾(1)), and the corollary follows. On a similar note: Corollary 3.2.14. Let 𝑋 be a subinterval of 𝐑 (possibly 𝑋 = 𝐑), and suppose that 𝑓 ∶ 𝑋 → 𝐂 is differentiable and 𝑓′ is bounded. Then 𝑓 is uniformly continuous on 𝑋. Proof. Problem 3.2.7. As one more application of the Mean Value Theorem, we prove the following version of the derivative of an inverse function. Since we will only use this result once, and only as background in Section 4.6, we assume unnecessary hypotheses that simplify the proof; see Ross [Ros13, Thm. 29.9] for a more general version. Theorem 3.2.15. Let 𝑓 ∶ [𝑎, 𝑏] → 𝐑 be differentiable, and suppose also that 𝑓′ is continuous and positive on [𝑎, 𝑏]. Then (1) 𝑓 is strictly increasing and maps [𝑎, 𝑏] bijectively onto a closed interval [𝑐, 𝑑]; and (2) if 𝑔 ∶ [𝑐, 𝑑] → [𝑎, 𝑏] is the inverse function of 𝑓, then 𝑔 is differentiable on [𝑐, 𝑑], and 1 for 𝑦 ∈ [𝑐, 𝑑] such that 𝑔(𝑦) = 𝑥, 𝑔′ (𝑦) = ′ . 𝑓 (𝑔(𝑦)) Proof. Problem 3.2.8. Remark 3.2.16. After this section, we will generally focus on complex-valued functions of one real variable. However, to give the reader some perspective, when 𝑋 is an open subset of 𝐂 (Definition 2.6.2), a differentiable function 𝑓 ∶ 𝑋 → 𝐂 is more often called a holomorphic function. Holomorphic functions are the main topic of complex analysis, and they have remarkable and highly nonobvious properties above and beyond those held by differentiable complex-valued functions of a real variable. To take just one example that is easy to explain but not easy to prove, the derivative of a holomorphic function of a complex variable is itself always holomorphic (!). For this and much more, see, for example, Ahlfors [Ahl79] or Conway [Con78].

44

Chapter 3. Complex-valued calculus

Problems. In all of the problems in this section, let 𝑋 and 𝑌 be as in Convention 3.2.1. 3.2.1. (a) For 𝛼 ∈ 𝐂, prove that 𝑓 ∶ 𝐂 → 𝐂 given by 𝑓(𝑧) = 𝛼𝑧 is differentiable and that 𝑓′ (𝑧) = 𝛼. (b) Prove that 𝑔 ∶ 𝐂 − {0} → 𝐂 given by 𝑔(𝑧) = 1/𝑧 is differentiable on its domain and that 𝑔′ (𝑎) = −𝑎−2 for all 𝑎 ∈ 𝐂 − {0}. 3.2.2. (Proves Lemma 3.2.5) Fix 𝑓 ∶ 𝑋 → 𝐂 and 𝑎 ∈ 𝑋, and define 𝑓(𝑧) − 𝑓(𝑎) −𝑚 𝐸𝑓 (𝑧) = { 𝑧 − 𝑎 0

for 𝑧 ≠ 𝑎,

(3.2.9)

for 𝑧 = 𝑎.

(a) Assume that 𝑓 is differentiable at 𝑎. Prove that if 𝑚 = 𝑓′ (𝑎), then 𝐸𝑓 (𝑧) is continuous at 𝑎. (b) Assume 𝑚 ∈ 𝐂 and 𝐸𝑓 (𝑧) is continuous at 𝑎 (i.e., lim 𝐸𝑓 (𝑧) = 0). Prove that 𝑧→𝑎

𝑓 is differentiable at 𝑎 and 𝑓′ (𝑎) = 𝑚.

3.2.3. (Proves Theorem 3.2.8, part (4)) Let 𝑎 be a point in 𝑋, and suppose that 𝑓, 𝑔 ∶ 𝑋 → 𝐂 are differentiable at 𝑎. Prove that (𝑓𝑔)(𝑧) = 𝑓(𝑧)𝑔(𝑧) is differentiable at 𝑎 and that (𝑓𝑔)′ (𝑎) = 𝑓′ (𝑎)𝑔(𝑎) + 𝑓(𝑎)𝑔′ (𝑎). 3.2.4. (Proves Theorem 3.2.9) Let 𝑎 be a point in 𝑋. Suppose that 𝑓 ∶ 𝑋 → 𝑌 is differentiable at 𝑎 and 𝑔 ∶ 𝑌 → 𝐂 is differentiable at 𝑓(𝑎). Prove that 𝑔 ∘ 𝑓 ∶ 𝑋 → 𝐂 is differentiable at 𝑎 and (𝑔 ∘ 𝑓)′ (𝑎) = 𝑔′ (𝑓(𝑎))𝑓′ (𝑎). 3.2.5. (Proves Theorem 3.2.8, part (5)) Let 𝑎 be a point in 𝑋, suppose that 𝑓, 𝑔 ∶ 𝑋 → 𝐂 are differentiable at 𝑎, and suppose that 𝑔(𝑧) ≠ 0 for all 𝑧 ∈ 𝑋. (a) Let ℎ(𝑧) = 1/𝑔(𝑧). Use the chain rule and Problem 3.2.1 to prove that for 𝑔′ (𝑎) 𝑔(𝑎) ≠ 0, ℎ′ (𝑎) = − . 𝑔(𝑎)2 ′ 𝑓 𝑔(𝑎)𝑓′ (𝑎) − 𝑓(𝑎)𝑔′ (𝑎) (b) Now use the product rule to deduce that ( ) (𝑎) = . 𝑔 𝑔(𝑎)2 1+𝛼

3.2.6. For 𝛼 ≥ 0, define 𝑓𝛼 ∶ 𝐑 → 𝐑 by 𝑓𝛼 (𝑥) = |𝑥|

.

(a) Prove that 𝑓0 (𝑥) = |𝑥| is not differentiable at 𝑥 = 0. (b) Prove that for 𝛼 > 0, 𝑓𝛼′ (0) = 0. 3.2.7. (Proves Corollary 3.2.14) Let 𝑋 be a subinterval of 𝐑 (possibly 𝑋 = 𝐑), let 𝑓 ∶ 𝑋 → 𝐂 be differentiable, and suppose that for some 𝐶 > 0 and all 𝑥 ∈ 𝑋, we have |𝑓′ (𝑥)| ≤ 𝐶. (a) Let 𝑓(𝑥) = 𝑢(𝑥) + 𝑖𝑣(𝑥) as usual. Prove that for any 𝑥, 𝑦 ∈ 𝑋, |𝑢(𝑥) − 𝑢(𝑦)| ≤ 𝐶 |𝑥 − 𝑦|, and similarly for 𝑣(𝑥). (b) Prove that 𝑓 is uniformly continuous on 𝑋. 3.2.8. (Proves Theorem 3.2.15) Let 𝑓 ∶ [𝑎, 𝑏] → 𝐑 be differentiable, and suppose also that 𝑓′ is continuous and positive on [𝑎, 𝑏]. (a) Prove that 𝑓 is strictly increasing on [𝑎, 𝑏]. (b) Prove that 𝑓 maps [𝑎, 𝑏] bijectively onto a closed interval [𝑐, 𝑑].

3.3. The Riemann integral: Deﬁnition

45

(c) Prove that there exist 𝑚, 𝑀 ∈ 𝐑 such that for 𝑎 ≤ 𝑥0 < 𝑥1 ≤ 𝑏, we have that 𝑓(𝑥1 ) − 𝑓(𝑥0 ) ≤ 𝑀. 0 0, there exists a partition 𝑃 such that 𝑈(𝑣; 𝑃) − 𝐿(𝑣; 𝑃) < 𝜖. Furthermore, if condition (2) holds, then 𝑏

lim 𝐿(𝑣; 𝑃𝑛 ) = ∫ 𝑣(𝑥) 𝑑𝑥 = lim 𝑈(𝑣; 𝑃𝑛 ).

𝑛→∞

(3.3.11)

𝑛→∞

𝑎

Note that as the name implies, Lemma 3.3.10 once again reduces integrability and computing the integral to the problem of computing the limit of some sequence of Riemann sums. Proof. Let 𝒫 be the set of all partitions of [𝑎, 𝑏], let ℒ = {𝐿(𝑣; 𝑃) ∣ 𝑃 ∈ 𝒫}, and let 𝒰 = {𝑈(𝑣; 𝑃) ∣ 𝑃 ∈ 𝒫}. (1)⇒(2): Let 𝑛 be a positive integer. By the Arbitrarily Close Criterion (Theorem 2.4.13), there exists some 𝑄𝑛 ∈ 𝒫 such that 𝑏

𝑏

∫ 𝑣(𝑥) 𝑑𝑥 − 𝐿(𝑣; 𝑄𝑛 ) = ∫ 𝑣(𝑥) 𝑑𝑥 − 𝐿(𝑣; 𝑄𝑛 ) < 𝑎

𝑎

1 , 2𝑛

(3.3.12)

and there exists some 𝑄𝑛′ ∈ 𝒫 such that 𝑏

𝑈(𝑣; 𝑄𝑛′ ) − ∫ 𝑣(𝑥) 𝑑𝑥 < 𝑎

1 . 2𝑛

(3.3.13)

By Lemma 3.3.8, taking the common refinement 𝑃𝑛 = 𝑄𝑛 ∪ 𝑄𝑛′ pushes the quantities 𝑈(𝑣; 𝑄𝑛′ ) and 𝐿(𝑣; 𝑄𝑛 ) closer together, and so 𝑈(𝑣; 𝑃𝑛 ) − 𝐿(𝑣; 𝑃𝑛 ) ≤ 𝑈(𝑣; 𝑄𝑛′ ) − 𝐿(𝑣; 𝑄𝑛 ) 𝑏

𝑏

= 𝑈(𝑣; 𝑄𝑛′ ) − ∫ 𝑣(𝑥) 𝑑𝑥 + ∫ 𝑣(𝑥) 𝑑𝑥 − 𝐿(𝑣; 𝑄𝑛 ) < 𝑎

𝑎

Condition (2) follows by the Metric Squeeze Lemma (Lemma 2.4.16).

1 (3.3.14) . 𝑛

50

Chapter 3. Complex-valued calculus

(2)⇒(3): Given 𝜖 > 0, by the definition of limit, we may let 𝑃 = 𝑃𝑛 for any sufficiently large 𝑛. 𝑏

𝑏

(3)⇒(1): Let Δ = ∫ 𝑣(𝑥) 𝑑𝑥 − ∫ 𝑣(𝑥) 𝑑𝑥 ≥ 0 (Theorem 3.3.9). If condition (3) 𝑎

𝑎

holds, by (3.3.10), Δ < 𝜖 for any 𝜖 > 0, and so Δ = 0. Finally, suppose condition (2) holds. In that case, by (3.3.10), we have that for any 𝑛, 𝑏

∫ 𝑣(𝑥) 𝑑𝑥 − 𝐿(𝑣; 𝑃𝑛 ) ≤ 𝑈(𝑣; 𝑃𝑛 ) − 𝐿(𝑣; 𝑃𝑛 ).

(3.3.15)

𝑎

So by the Metric Squeeze Lemma (Lemma 2.4.16), 𝑏

𝑏

lim (𝑣; 𝑃𝑛 ) = ∫ 𝑣(𝑥) 𝑑𝑥 = ∫ 𝑣(𝑥) 𝑑𝑥,

𝐿→∞

(3.3.16)

𝑎

𝑎

and the theorem follows. Turning to complex-valued functions, we define integrability for such functions in terms of their real and imaginary parts. Definition 3.3.11. Let 𝑓 ∶ [𝑎, 𝑏] → 𝐂 be bounded, and let 𝑢 and 𝑣 be the real and imaginary parts of 𝑓. To say that 𝑓 is integrable means that both 𝑢 and 𝑣 are integrable, in which case we define 𝑏

𝑏

𝑏

∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑢(𝑥) 𝑑𝑥 + 𝑖 ∫ 𝑣(𝑥) 𝑑𝑥. 𝑎

𝑎

(3.3.17)

𝑎

We will also need a fully complex version of Lemma 3.3.10. Lemma 3.3.12 (Sequential criterion for complex integrability). Let 𝑓 ∶ [𝑎, 𝑏] → 𝐂 be bounded, and for any partition 𝑃 = {𝑥0 , … , 𝑥𝑛 } of [𝑎, 𝑏] and 1 ≤ 𝑖 ≤ 𝑛, define 𝜇(𝑓; 𝑃, 𝑖) = sup {|𝑓(𝑥) − 𝑓(𝑦)| ∣ 𝑥, 𝑦 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]} ,

(3.3.18)

𝑛

𝐸(𝑓; 𝑃) = ∑ 𝜇(𝑓; 𝑃, 𝑖)(Δ𝑥)𝑖 .

(3.3.19)

𝑖=1

Then the following are equivalent: (1) 𝑓 is integrable on [𝑎, 𝑏]. (2) For any 𝜖 > 0, there exists a partition 𝑃 such that 𝐸(𝑓; 𝑃) < 𝜖. Proof. Problem 3.3.3. Remark 3.3.13. We may interpret Lemma 3.3.12 as saying that 𝑓 ∶ [𝑎, 𝑏] → 𝐂 is integrable if and only if 𝑓 is “uniformly continuous on average,” in the following sense. If we think of 𝜇(𝑓; 𝑃, 𝑖) from (3.3.18) as the supremum of the “variation” of 𝑓 on subinterval 𝑖, then 𝜇(𝑓; 𝑃, 𝑖)(Δ𝑥)𝑖 is the variation on subinterval 𝑖, weighted by the size of subinterval 𝑖, and 𝐸(𝑓; 𝑃) is the “total weighted variation” of 𝑓 on [𝑎, 𝑏]. In these terms, uniform continuity (Definition 3.1.11) gives a uniform bound on the variation of 𝑓, upon subdividing [𝑎, 𝑏] into sufficiently small subintervals, whereas the condition

3.3. The Riemann integral: Deﬁnition

51

𝐸(𝑓; 𝑃) < 𝜖 does something similar, but with the possibility of there being some very small subintervals where the variation is large. For a more precise expression of the idea of “integrable means uniformly continuous except on some very small subintervals”, see the proof of Lemma 3.4.7. Later, we will describe (but not prove) a necessary and sufficient condition for integrability, expressed in terms of continuity and some more sophisticated ideas; see Remark 7.5.8. We conclude with a single, admittedly silly example. Nevertheless, the reader should carry out this computation, not just because this result actually becomes useful later, but also as an exercise in understanding the many layers of definitions established in this section. 𝑏

Lemma 3.3.14. Constant functions are integrable, and for 𝑐 ∈ 𝐂, ∫ 𝑐 𝑑𝑥 = 𝑐(𝑏 − 𝑎). 𝑎

Proof. This follows from the real-valued case; see Problem 3.3.4. See also Problem 3.3.5 for an example of a nonintegrable function.

Problems. 3.3.1. (Proves Lemma 3.3.8) Let 𝑣 ∶ [𝑎, 𝑏] → 𝐑 be bounded and let 𝑃 be a partition of [𝑎, 𝑏]. Prove that 𝐿(𝑣; 𝑃) ≤ 𝑈(𝑣; 𝑃). 3.3.2. (Proves Theorem 3.3.9) Let 𝑣 ∶ [𝑎, 𝑏] → 𝐑 be bounded, let 𝒫 be the set of all partitions of [𝑎, 𝑏], let ℒ = {𝐿(𝑣; 𝑃) ∣ 𝑃 ∈ 𝒫}, and let 𝒰 = {𝑈(𝑣; 𝑃) ∣ 𝑃 ∈ 𝒫}. (a) For 𝑃, 𝑄 ∈ 𝒫, prove that 𝐿(𝑣; 𝑃) ≤ 𝑈(𝑣; 𝑄). (b) Prove that 𝑏

𝑏

𝐿(𝑣; 𝑃) ≤ ∫ 𝑣(𝑥) 𝑑𝑥 ≤ ∫ 𝑣(𝑥) 𝑑𝑥 ≤ 𝑈(𝑣; 𝑃) 𝑎

(3.3.20)

𝑎

for any 𝑃 ∈ 𝒫. 3.3.3. (Proves Lemma 3.3.12) Let 𝑓 ∶ [𝑎, 𝑏] → 𝐂 be bounded, let 𝑢(𝑥) and 𝑣(𝑥) be the real and imaginary parts of 𝑓(𝑥), let 𝑃 = {𝑥0 , … , 𝑥𝑛 } be a partition of [𝑎, 𝑏], and let 1 ≤ 𝑖 ≤ 𝑛. We also retain the notation of Definition 3.3.4 and Lemma 3.3.12. (a) Prove that if 𝑧 = 𝑎 + 𝑏𝑖 and 𝑤 = 𝑐 + 𝑑𝑖 are complex numbers, then |𝑎 − 𝑐| ≤ |𝑧 − 𝑤| and |𝑧 − 𝑤| ≤ |𝑎 − 𝑐| + |𝑏 − 𝑑|. (b) Prove that if 𝑆 is a nonempty bounded subset of 𝐑, then sup {|𝑠 − 𝑡| ∣ 𝑠, 𝑡 ∈ 𝑆} = sup 𝑆 − inf 𝑆.

(3.3.21)

(c) Prove that sup {|𝑢(𝑥) − 𝑢(𝑦)| ∣ 𝑥, 𝑦 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]} = 𝑀(𝑢; 𝑃, 𝑖) − 𝑚(𝑢; 𝑃, 𝑖). (d) Prove that 𝑀(𝑢; 𝑃, 𝑖) − 𝑚(𝑢; 𝑃, 𝑖) ≤ 𝜇(𝑓; 𝑃, 𝑖) (3.3.22) and 𝜇(𝑓; 𝑃, 𝑖) ≤ (𝑀(𝑢; 𝑃, 𝑖) − 𝑚(𝑢; 𝑃, 𝑖)) + (𝑀(𝑣; 𝑃, 𝑖) − 𝑚(𝑣; 𝑃, 𝑖)).

(3.3.23)

(e) Prove that 𝑓 is integrable on [𝑎, 𝑏] if and only if for any 𝜖 > 0, there exists a partition 𝑃 such that 𝐸(𝑓; 𝑃) < 𝜖.

52

Chapter 3. Complex-valued calculus

3.3.4. (Proves Lemma 3.3.14) For 𝑐 ∈ 𝐑, prove that the constant function 𝑣(𝑥) = 𝑐 is 𝑏

integrable, and prove that ∫ 𝑐 𝑑𝑥 = 𝑐(𝑏 − 𝑎). 𝑎

3.3.5. Define 𝑓 ∶ [0, 1] → 𝐑 by 1 if 𝑥 is rational, 𝑓(𝑥) = { 0 if 𝑥 is irrational.

(3.3.24)

Prove that 𝑓 is not integrable.

3.4 The Riemann integral: Properties In this section, we prove the “ordinary” properties of the integral, that is, the ones that are not directly related to the Fundamental Theorem of Calculus. We begin with some useful estimates. Lemma 3.4.1. Let 𝑣, 𝑤 ∶ [𝑎, 𝑏] → 𝐑 be bounded, let 𝑃 be a partition of [𝑎, 𝑏], and let 𝑐 > 0. Then, in the notation of Definition 3.3.4, we have 𝑚(𝑣; 𝑃, 𝑖) + 𝑚(𝑤; 𝑃, 𝑖) ≤ 𝑚(𝑣 + 𝑤; 𝑃, 𝑖) (3.4.1) ≤ 𝑀(𝑣 + 𝑤; 𝑃, 𝑖) ≤ 𝑀(𝑣; 𝑃, 𝑖) + 𝑀(𝑤; 𝑃, 𝑖), 𝑐𝑚(𝑣; 𝑃, 𝑖) = 𝑚(𝑐𝑣; 𝑃, 𝑖) ≤ 𝑀(𝑐𝑣; 𝑃, 𝑖) = 𝑐𝑀(𝑣; 𝑃, 𝑖),

(3.4.2)

𝑚(−𝑣; 𝑃, 𝑖) = −𝑀(𝑣; 𝑃, 𝑖) ≤ −𝑚(𝑣; 𝑃, 𝑖) = 𝑀(−𝑣; 𝑃, 𝑖).

(3.4.3)

Furthermore, if 𝑣(𝑥) ≤ 𝑤(𝑥) for all 𝑥 ∈ [𝑎, 𝑏], then 𝑀(𝑣; 𝑃, 𝑖) ≤ 𝑀(𝑤; 𝑃, 𝑖).

(3.4.4)

Proof. We prove only (3.4.1) and leave the other estimates as Problem 3.4.1. Let [𝑥𝑖−1 , 𝑥𝑖 ] be the 𝑖th subinterval of 𝑃. By the definitions of 𝑀(𝑣; 𝑃, 𝑖) and 𝑀(𝑤; 𝑃, 𝑖), for any 𝑥 ∈ [𝑥𝑖−1 , 𝑥𝑖 ], we have 𝑣(𝑥)+𝑤(𝑥) ≤ 𝑀(𝑣; 𝑃, 𝑖)+𝑀(𝑤; 𝑃, 𝑖). In other words, 𝑀(𝑣; 𝑃, 𝑖) + 𝑀(𝑤; 𝑃, 𝑖) is an upper bound for 𝑆 = {𝑣(𝑥) + 𝑤(𝑥) ∣ 𝑥 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]}, which means that 𝑀(𝑣 + 𝑤; 𝑃, 𝑖) ≤ 𝑀(𝑣; 𝑃, 𝑖) + 𝑀(𝑤; 𝑃, 𝑖), since 𝑀(𝑣 + 𝑤, 𝑃, 𝑖) = sup 𝑆 (Sup Inequality Lemma 2.1.6). The last inequality follows, and the first inequality follows by an analogous argument. Theorem 3.4.2. Let 𝑣, 𝑤 ∶ [𝑎, 𝑏] → 𝐑 be bounded and integrable on [𝑎, 𝑏] and let 𝑐 > 0. Then 𝑣 + 𝑤, 𝑐𝑣, and −𝑣 are integrable on [𝑎, 𝑏], and 𝑏

𝑏

𝑏

∫ (𝑣(𝑥) + 𝑤(𝑥)) 𝑑𝑥 = ∫ 𝑣(𝑥) 𝑑𝑥 + ∫ 𝑤(𝑥) 𝑑𝑥, 𝑎

𝑎

(3.4.5)

𝑎

𝑏

𝑏

∫ 𝑐𝑣(𝑥) 𝑑𝑥 = 𝑐 ∫ 𝑣(𝑥) 𝑑𝑥, 𝑎 𝑏

(3.4.6)

𝑎 𝑏

∫ (−𝑣(𝑥)) 𝑑𝑥 = − ∫ 𝑣(𝑥) 𝑑𝑥. 𝑎

(3.4.7)

𝑎

Furthermore, if 𝑣(𝑥) ≤ 𝑤(𝑥) for all 𝑥 ∈ [𝑎, 𝑏], 𝑏

𝑏

∫ 𝑣(𝑥) 𝑑𝑥 ≤ ∫ 𝑤(𝑥) 𝑑𝑥. 𝑎

𝑎

(3.4.8)

3.4. The Riemann integral: Properties

53

The basic idea for each property is to sum Lemma 3.4.1 over all of the subintervals of a partition to obtain the theorem for Riemann sums and then take a limit of a wellchosen sequence of Riemann sums. Proof. By Lemma 3.3.10, there exist sequences 𝑄𝑛 , 𝑄𝑛′ of partitions of [𝑎, 𝑏] such that lim (𝑈(𝑣; 𝑄𝑛 ) − 𝐿(𝑣; 𝑄𝑛 )) = lim (𝑈(𝑤; 𝑄𝑛′ ) − 𝐿(𝑤; 𝑄𝑛′ )) = 0.

𝑛→∞

𝑛→∞

(3.4.9)

In fact, by taking the common refinement 𝑃𝑛 = 𝑄𝑛 ∪𝑄𝑛′ , we may replace both 𝑄𝑛 and 𝑄𝑛′ with 𝑃𝑛 , since refinements only make these sequences converge faster (Lemma 3.3.8). Turning first to (3.4.5), summing (3.4.1) over all subintervals of 𝑃𝑛 , we see that 𝐿(𝑣; 𝑃𝑛 ) + 𝐿(𝑤; 𝑃𝑛 ) ≤ 𝐿(𝑣 + 𝑤; 𝑃𝑛 ) ≤ 𝑈(𝑣 + 𝑤; 𝑃𝑛 ) ≤ 𝑈(𝑣; 𝑃𝑛 ) + 𝑈(𝑤; 𝑃𝑛 ).

(3.4.10)

It follows that 0 ≤ 𝑈(𝑣 + 𝑤; 𝑃𝑛 ) − 𝐿(𝑣 + 𝑤; 𝑃𝑛 ) ≤ (𝑈(𝑣; 𝑃𝑛 ) − 𝐿(𝑣; 𝑃𝑛 )) + (𝑈(𝑤; 𝑃𝑛 ) − 𝐿(𝑤; 𝑃𝑛 )),

(3.4.11)

and since the right-hand side converges to 0, the Metric Squeeze Lemma (Lemma 2.4.16) implies that lim (𝑈(𝑣 + 𝑤; 𝑃𝑛 ) − 𝐿(𝑣 + 𝑤; 𝑃𝑛 )) = 0.

𝑛→∞

(3.4.12)

Therefore, by Lemma 3.3.10, 𝑣 + 𝑤 is integrable. Furthermore, since 𝑈(𝑣 + 𝑤; 𝑃𝑛 ) ≤ 𝑈(𝑣; 𝑃𝑛 ) + 𝑈(𝑤; 𝑃𝑛 ) for all 𝑛, taking the lim of both sides, we see that 𝑛→∞

𝑏

𝑏

𝑏

∫ (𝑣(𝑥) + 𝑤(𝑥)) 𝑑𝑥 ≤ ∫ 𝑣(𝑥) 𝑑𝑥 + ∫ 𝑤(𝑥) 𝑑𝑥; 𝑎

𝑎

𝑎

and since 𝐿(𝑣; 𝑃𝑛 ) + 𝐿(𝑤; 𝑃𝑛 ) ≤ 𝐿(𝑣 + 𝑤; 𝑃𝑛 ) for all 𝑛, 𝑏

𝑏

𝑏

∫ 𝑣(𝑥) 𝑑𝑥 + ∫ 𝑤(𝑥) 𝑑𝑥 ≤ ∫ (𝑣(𝑥) + 𝑤(𝑥)) 𝑑𝑥. 𝑎

𝑎

𝑎

Equation (3.4.5) follows. Similar arguments (Problem 3.4.2) give the other integrability results and formulas. A computation then yields the full complex version of the algebraic parts of Theorem 3.4.2. Corollary 3.4.3. Let 𝑓, 𝑔 ∶ [𝑎, 𝑏] → 𝐂 be bounded and integrable on [𝑎, 𝑏] and let 𝑐, 𝑑 ∈ 𝐂. Then 𝑐𝑓 + 𝑑𝑔 is integrable on [𝑎, 𝑏] and 𝑏

𝑏

𝑏

∫ (𝑐𝑓(𝑥) + 𝑑𝑔(𝑥)) 𝑑𝑥 = 𝑐 ∫ 𝑓(𝑥) 𝑑𝑥 + 𝑑 ∫ 𝑔(𝑥) 𝑑𝑥. 𝑎

𝑎

(3.4.13)

𝑎

Proof. We observe that integrability of 𝑐𝑓 +𝑑𝑔 and the complex versions of (3.4.5) and (3.4.7) hold by decompositions into real and imaginary parts. It will therefore suffice to prove the complex version of (3.4.6).

54

Chapter 3. Complex-valued calculus

So suppose that 𝑓(𝑥) = 𝑢(𝑥) + 𝑖𝑣(𝑥) is integrable on [𝑎, 𝑏] and 𝑐 + 𝑑𝑖 ∈ 𝐂. Then by Theorem 3.4.2, 𝑏

𝑏

∫ (𝑐 + 𝑑𝑖)(𝑢(𝑥) + 𝑖𝑣(𝑥)) 𝑑𝑥 = ∫ (𝑐𝑢(𝑥) − 𝑑𝑣(𝑥) + 𝑖(𝑑𝑢(𝑥) + 𝑐𝑣(𝑥))) 𝑑𝑥, 𝑎

𝑎 𝑏

𝑏

= 𝑐 ∫ 𝑢(𝑥) 𝑑𝑥 − 𝑑 ∫ 𝑣(𝑥) 𝑑𝑥 𝑎

𝑎 𝑏

𝑏

(3.4.14)

+ 𝑖𝑑 ∫ 𝑢(𝑥) 𝑑𝑥 + 𝑖𝑐 ∫ 𝑣(𝑥) 𝑑𝑥 𝑎

𝑎 𝑏

= (𝑐 + 𝑑𝑖) ∫ (𝑢(𝑥) + 𝑖𝑣(𝑥)) 𝑑𝑥. 𝑎

The corollary follows. Now, up to this point, we have not really used the full flexibility of allowing arbitrary (uneven) partitions. It is precisely in the proof of the next result where we see that come to fruition. Theorem 3.4.4. For 𝑎 < 𝑏 < 𝑐, let 𝑓 ∶ [𝑎, 𝑐] → 𝐂 be integrable on [𝑎, 𝑏] and [𝑏, 𝑐]. Then 𝑓 is integrable on [𝑎, 𝑐] and 𝑐

𝑏

𝑐

∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥 + ∫ 𝑓(𝑥) 𝑑𝑥. 𝑎

𝑎

(3.4.15)

𝑏

Proof. This follows from the real-valued case; see Problem 3.4.3 for the proof. Next, as previously promised, we use uniform continuity to obtain the following result. Theorem 3.4.5. Let 𝑓 ∶ [𝑎, 𝑏] → 𝐂 be continuous. Then 𝑓 is integrable on [𝑎, 𝑏]. Proof. This follows from the real-valued case; see Problem 3.4.4 for the proof. A little more work then yields the following improved version of Theorem 3.4.5. Corollary 3.4.6. Let 𝑓 ∶ [𝑎, 𝑏] → 𝐂 be bounded and continuous except at finitely many points in [𝑎, 𝑏]. Then 𝑓 is integrable on [𝑎, 𝑏]. We only sketch the proof; the interested reader may fill in the missing details. Sketch of proof. This follows from the real-valued case, so suppose 𝑣 ∶ [𝑎, 𝑏] → 𝐂 is continuous except possibly at the points 𝑎 = 𝑥0 < ⋯ < 𝑥𝑛 = 𝑏 in [𝑎, 𝑏], and suppose that for some 𝑀 ∈ 𝐑, |𝑣(𝑥)| < 𝑀 for all 𝑥 ∈ [𝑎, 𝑏]. Given 𝜖 > 0, choose a partition 𝑃 of [𝑎, 𝑏] as follows: •

(Small subintervals) Make sure that each of the points 𝑥0 , … , 𝑥𝑛 is contained in only one “small” subinterval of 𝑃𝑛 , either by putting 𝑥𝑖 in the center of its subinterval, by putting it at the left endpoint (for 𝑥0 ), or by putting it at the right endpoint (for 𝑥𝑛 ). Also make sure that these small subintervals are disjoint and have total length 𝜖 at most . 4𝑀

3.4. The Riemann integral: Properties

55

(Continuous subintervals) The remainder of [𝑎, 𝑏], plus boundary points, is the union of 𝑛 disjoint closed intervals, and 𝑣 is continuous on each of those intervals. By Theorem 3.4.5, 𝑣 is integrable on each of those intervals, so by Lemma 3.3.10, we 𝜖 may choose a partition 𝑃𝑖 on the 𝑖th such interval such that 𝑈(𝑣; 𝑃𝑖 ) − 𝐿(𝑣; 𝑃𝑖 ) < . 2𝑛 𝜖 Then the small subintervals and continuous subintervals each contribute at most to 2 the total 𝑈(𝑣; 𝑃) − 𝐿(𝑣; 𝑃), and so 𝑈(𝑣; 𝑃) − 𝐿(𝑣; 𝑃) < 𝜖. We will also need a “combination of integrable functions is integrable” result beyond the linear combinations of Corollary 3.4.3. The result we want (Theorem 3.4.8) is mostly a consequence of the following somewhat technical lemma. Lemma 3.4.7. If 𝑓 ∶ [𝑎, 𝑏] → 𝐂 is integrable and 𝜙 ∶ 𝐂 → 𝐂 is continuous, then 𝜙 ∘ 𝑓 ∶ [𝑎, 𝑏] → 𝐂 is integrable. Now, while the proof of this lemma requires somewhat fancier epsilonics than usual, we include the details because it also gives a more precise formulation of the idea that an integrable function is “uniformly continuous on average” (Remark 3.3.13). Proof. Fix 𝜖 > 0. First, since 𝑓 is bounded (by definition of integrable), there exists some 𝑀1 > 0 such that |𝑓(𝑥)| < 𝑀1 for all 𝑥 ∈ [𝑎, 𝑏]. Furthermore, since the closed disk 𝒩𝑀1 (𝑧) is closed and bounded (Theorem 2.4.10), we see that: •

𝜙 is uniformly continuous on 𝐷 (Theorem 3.1.12), or more specifically, there exists 𝜖 some 𝛿 > 0 such that if |𝑧 − 𝑤| < 𝛿, then |𝜙(𝑧) − 𝜙(𝑤)| < . 3(𝑏 − 𝑎)

𝜙 is bounded on 𝐷 (Corollary 3.1.16), or more specifically, there exists some 𝑀 > 0 such that |𝜙(𝑧)| < 𝑀 for all 𝑧 ∈ 𝐷, and therefore, |𝜙(𝑓(𝑥))| < 𝑀 for all 𝑥 ∈ [𝑎, 𝑏]. Recall the notation of Lemma 3.3.12: 𝜇(𝑓; 𝑃, 𝑖) = sup {|𝑓(𝑥) − 𝑓(𝑦)| ∣ 𝑥, 𝑦 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]} ,

(3.4.16)

𝑛

𝐸(𝑓; 𝑃) = ∑ 𝜇(𝑓; 𝑃, 𝑖)(Δ𝑥)𝑖 .

(3.4.17)

𝑖=1

𝜖𝛿 In those terms, choose a partition 𝑃 of [𝑎, 𝑏] such that 𝐸(𝑓; 𝑃) < 𝜖0 = . We 6𝑀 divide the subintervals 𝜎𝑖 = [𝑥𝑖−1 , 𝑥𝑖 ] of 𝑃 into two disjoint categories. (1) We say that 𝜎𝑖 is in the class 𝒮 of stable subintervals if 𝜇(𝑓; 𝑃, 𝑖) < 𝛿. (2) Otherwise, if 𝜇(𝑓; 𝑃, 𝑖) ≥ 𝛿, then we say that 𝜎𝑖 is in the class 𝒯 of thin subintervals (a name to be justfied momentarily). The reader may find it helpful to consider a heuristic picture like Figure 3.4.1, in which the shaded regions represent the area between upper and lower Riemann sums. So now, let 𝑤(𝑃) be the sum of the widths of all thin subintervals, or in other words, 𝑤(𝑃) = ∑ (Δ𝑥)𝑖 .

(3.4.18)

𝜍𝑖 ∈𝒯

It follows that 𝜖0 > 𝐸(𝑓; 𝑃) ≥ ∑ 𝜇(𝑓; 𝑃, 𝑖)(Δ𝑥)𝑖 ≥ ∑ 𝛿(Δ𝑥)𝑖 = 𝛿𝑤(𝑃), 𝜍𝑖 ∈𝒯

𝜍𝑖 ∈𝒯

(3.4.19)

56

Chapter 3. Complex-valued calculus

Figure 3.4.1. Stable and thin subintervals which means that 𝑤(𝑃)
0 for 𝑐 ∈ (𝑎, 𝑏); the cases 𝑐 = 𝑎 and 𝑐 = 𝑏 are similar.

Problems. 3.4.1. (Proves Lemma 3.4.1) Let 𝑣, 𝑤 ∶ [𝑎, 𝑏] → 𝐑 be bounded, let 𝑃 be a partition of [𝑎, 𝑏], and let 𝑐 ∈ 𝐑 be positive. (a) Prove that 𝑐𝑚(𝑣; 𝑃, 𝑖) = 𝑚(𝑐𝑣; 𝑃, 𝑖) ≤ 𝑀(𝑐𝑣; 𝑃, 𝑖) = 𝑐𝑀(𝑣; 𝑃, 𝑖). (b) Prove that 𝑚(−𝑣; 𝑃, 𝑖) = −𝑀(𝑣; 𝑃, 𝑖) ≤ −𝑚(𝑣; 𝑃, 𝑖) = 𝑀(−𝑣; 𝑃, 𝑖). (c) Now assume that 𝑣(𝑥) ≤ 𝑤(𝑥) for all 𝑥 ∈ [𝑎, 𝑏]. Prove that 𝑀(𝑣; 𝑃, 𝑖) ≤ 𝑀(𝑤; 𝑃, 𝑖). 3.4.2. (Proves Theorem 3.4.2) Let 𝑣, 𝑤 ∶ [𝑎, 𝑏] → 𝐑 be bounded and integrable and let 𝑐 ∈ 𝐑 be positive. 𝑏

𝑏

(a) Prove that 𝑐𝑣 is integrable and ∫ 𝑐𝑣(𝑥) 𝑑𝑥 = 𝑐 ∫ 𝑣(𝑥) 𝑑𝑥. 𝑎

𝑎 𝑏

𝑏

(b) Prove that −𝑣 is integrable and ∫ (−𝑣(𝑥)) 𝑑𝑥 = − ∫ 𝑣(𝑥) 𝑑𝑥. 𝑎

𝑎 𝑏

(c) Now assume that 𝑣(𝑥) ≤ 𝑤(𝑥) for all 𝑥 ∈ [𝑎, 𝑏]. Prove that ∫ 𝑣(𝑥) 𝑑𝑥 ≤ 𝑏

∫ 𝑤(𝑥) 𝑑𝑥. 𝑎

𝑎

58

Chapter 3. Complex-valued calculus

3.4.3. (Proves Theorem 3.4.4) For 𝑎 < 𝑏 < 𝑐 in 𝐑, let 𝑣 ∶ [𝑎, 𝑐] → 𝐑 be integrable on [𝑎, 𝑏] and [𝑏, 𝑐]. Prove that 𝑣 is integrable on [𝑎, 𝑐] and 𝑐

𝑏

𝑐

∫ 𝑣(𝑥) 𝑑𝑥 = ∫ 𝑣(𝑥) 𝑑𝑥 + ∫ 𝑣(𝑥) 𝑑𝑥. 𝑎

𝑎

(3.4.26)

𝑏

3.4.4. (Proves Theorem 3.4.5) Let 𝑣 ∶ [𝑎, 𝑏] → 𝐑 be continuous. (a) Suppose 𝑣 is continuous on [𝑥0 , 𝑥1 ] and satisfies the condition that for 𝑥, 𝑦 ∈ [𝑥0 , 𝑥1 ], we have that |𝑣(𝑥) − 𝑣(𝑦)| < 𝜖0 . Prove that if 𝑀 = sup {𝑣(𝑥) ∣ 𝑥 ∈ [𝑥0 , 𝑥1 ]} , 𝑚 = inf {𝑣(𝑥) ∣ 𝑥 ∈ [𝑥0 , 𝑥1 ]} , then |𝑀 − 𝑚| < 𝜖0 . (b) Use the uniform continuity of 𝑣 to show that given 𝜖 > 0, for sufficiently large 𝑛, if 𝑃𝑛 is the 𝑛th standard partition of [𝑎, 𝑏] (Example 3.3.2), then for 𝑥 and 𝑦 𝜖 contained in the same subinterval of 𝑃𝑛 , we have |𝑣(𝑥) − 𝑣(𝑦)| < . (𝑏 − 𝑎) (c) Prove that 𝑣 is integrable on [𝑎, 𝑏]. 3.4.5. (Proves Theorem 3.4.8) Let 𝑓, 𝑔 ∶ [𝑎, 𝑏] → 𝐂 be integrable. (a) Prove that |𝑓(𝑥)| and 𝑓(𝑥)2 are integrable on [𝑎, 𝑏]. (b) Prove that 𝑓(𝑥)𝑔(𝑥) is integrable on [𝑎, 𝑏]. 1 (c) Prove that for 𝑐, 𝑑 ∈ 𝐑, max(𝑐, 𝑑) = (𝑐 + 𝑑 + |𝑐 − 𝑑|), and find and prove a 2 similar formula for min(𝑐, 𝑑). (d) Now suppose also that 𝑓 and 𝑔 are real-valued. Prove that min(𝑓(𝑥), 𝑔(𝑥)) and max(𝑓(𝑥), 𝑔(𝑥)) are integrable on [𝑎, 𝑏]. (e) Suppose again that 𝑓 is real-valued. Prove that 𝑏 | | 𝑏 |∫ 𝑓(𝑥) 𝑑𝑥| ≤ ∫ |𝑓(𝑥)| 𝑑𝑥. | | | | 𝑎 𝑎

(3.4.27)

3.4.6. (Proves Lemma 3.4.9) Suppose 𝑓 ∶ [𝑎, 𝑏] → 𝐑 is continuous and nonnegative (i.e., 𝑓(𝑥) ≥ 0 for all 𝑥 ∈ [𝑎, 𝑏]), and suppose that 𝑓(𝑐) > 0 for some 𝑐 ∈ (𝑎, 𝑏). (a) Prove that there exists some 𝛿 > 0 such that for 𝑐 − 𝛿 ≤ 𝑥 ≤ 𝑐 + 𝛿, we have 𝑓(𝑥) > 𝑓(𝑐)/2. 𝑏

(b) Prove that ∫ 𝑓(𝑥) 𝑑𝑥 > 0. 𝑎

3.5 The Fundamental Theorem of Calculus Our complexified review/recovery/reboot of calculus now culminates in the Fundamental Theorems of Calculus. First, we need two definitions. Definition 3.5.1. For 𝑏 < 𝑎, if 𝑓(𝑥) is integrable on [𝑏, 𝑎], we define the symbol 𝑏

∫ 𝑓(𝑥) 𝑑𝑥 to be 𝑎

𝑏

𝑎

∫ 𝑓(𝑥) 𝑑𝑥 = − ∫ 𝑓(𝑥) 𝑑𝑥. 𝑎

𝑏

(3.5.1)

3.5. The Fundamental Theorem of Calculus

59

In other words, an integral “traveled backwards” is defined to be the negative of the corresponding forwards integral. We also define 𝑎

∫ 𝑓(𝑥) 𝑑𝑥 = 0.

(3.5.2)

𝑎

Definition 3.5.2. Let 𝐼 be a subinterval (not necessarily closed) of 𝐑. For 𝑓 ∶ 𝐼 → 𝐂 such that 𝑓 is integrable on any closed subinterval of 𝐼, we define an indefinite integral of 𝑓 to be any function of the form 𝑥

𝐹(𝑥) = ∫ 𝑓(𝑡) 𝑑𝑡,

(3.5.3)

𝑎

where 𝑎 ∈ 𝐼 is fixed. Note that we only use 𝑡 as the “inner” variable to distinguish it from the “outer” variable 𝑥 that appears as a limit of integration in (3.5.3). Note also that if 𝑥 < 𝑎, then (3.5.3) must be interpreted in the sense of Definition 3.5.1. We now state and prove the Fundamental Theorems of Calculus. Theorem 3.5.3 (FTC I). Let 𝐼 be an interval, 𝑎 ∈ 𝐼, let 𝑓 ∶ 𝐼 → 𝐂 be integrable on any closed subinterval of 𝐼, and let 𝑥

𝐹(𝑥) = ∫ 𝑓(𝑡) 𝑑𝑡.

(3.5.4)

𝑎

Then 𝐹 is (uniformly) continuous on 𝐼, and furthermore, if 𝑓 is continuous at some 𝑏 ∈ 𝐼, then 𝐹 is differentiable at 𝑏 and 𝐹 ′ (𝑏) = 𝑓(𝑏). Proof. Turning first to the continuity of 𝐹, since 𝑓 is bounded, choose 𝑀 > 0 such that |𝑓(𝑥)| < 𝑀 for all 𝑥 ∈ 𝐼. Fixing 𝜖 > 0, let 𝛿(𝜖) = 𝜖/𝑀, and suppose 𝑥, 𝑦 ∈ 𝐼 and |𝑥 − 𝑦| < 𝛿. By symmetry, we may assume 𝑥 < 𝑦, in which case 𝑥 | 𝑦 | |𝐹(𝑦) − 𝐹(𝑥)| = |∫ 𝑓(𝑡) 𝑑𝑡 − ∫ 𝑓(𝑡) 𝑑𝑡| | 𝑎 | 𝑎 𝑦 | 𝑦 | = |∫ 𝑓(𝑡) 𝑑𝑡| ≤ ∫ |𝑓(𝑡)| 𝑑𝑡 < 𝑀(𝑦 − 𝑥) < 𝜖, | 𝑥 | 𝑥

(3.5.5)

by Theorems 3.4.4, 3.4.8, and 3.4.2 and Lemma 3.3.14. It follows that 𝐹 is uniformly continuous on 𝐼. As for differentiability, suppose that 𝑓 is continuous at 𝑏; more specifically, suppose that for any 𝜖0 > 0, there exists some 𝛿0 (𝜖0 ) > 0 such that if |𝑥 − 𝑏| < 𝛿0 (𝜖0 ), then |𝑓(𝑥) − 𝑓(𝑏)| < 𝜖0 . To prove that 𝐹 ′ (𝑏) = 𝑓(𝑏), we will prove the local linearity condition (Lemma 3.2.5) for 𝑚 = 𝑓(𝑏) by showing that if 𝐸(𝑥) =

𝐹(𝑥) − 𝐹(𝑏) − 𝑓(𝑏), 𝑥−𝑏

(3.5.6)

then lim 𝐸(𝑥) = 0. 𝑥→𝑏

Now, by Theorem 3.4.4 (possibly in the extended sense of Definition 3.5.1; see Problem 3.5.1), we have that 𝑥

𝑏

𝑥

𝐹(𝑥) − 𝐹(𝑏) = ∫ 𝑓(𝑡) 𝑑𝑡 − ∫ 𝑓(𝑡) 𝑑𝑡 = ∫ 𝑓(𝑡) 𝑑𝑡. 𝑎

𝑎

𝑏

(3.5.7)

60

Chapter 3. Complex-valued calculus

Furthermore, by Lemma 3.3.14, we know that 𝑥

∫ 𝑓(𝑏) 𝑑𝑡 = 𝑓(𝑏)(𝑥 − 𝑏).

(3.5.8)

𝑏

𝜖 So given 𝜖 > 0, let 𝛿(𝜖) = 𝛿0 ( ). Then for |𝑥 − 𝑏| < 𝛿(𝜖), 𝑥 ≠ 𝑏, we have 2 | 𝐹(𝑥) − 𝐹(𝑏) | |𝐸(𝑥)| = | − 𝑓(𝑏)| | 𝑥−𝑏 | 𝑥 | 𝑥 | |∫ 𝑓(𝑡) 𝑑𝑡 − ∫ 𝑓(𝑏) 𝑑𝑡| | | 𝑏 = 𝑏 |𝑥 − 𝑏| 𝑥

∫ |𝑓(𝑡) − 𝑓(𝑏)| 𝑑𝑡 ≤

(3.5.9)

𝑏

|𝑥 − 𝑏| 𝑥

∫ (𝜖/2) 𝑑𝑡 ≤

𝑏

|𝑥 − 𝑏| = 𝜖/2 < 𝜖, where the first inequality holds by Theorem 3.4.8, and the second holds because |𝑥 − 𝑏| 𝜖 < 𝛿0 ( ). The theorem follows by the 𝜖-𝛿 definition of the limit of a function. 2 𝑑𝐹 In the following, we use the notation = 𝐹 ′ (𝑥) to create a greater visual contrast 𝑑𝑥 with 𝐹(𝑥) and emphasize that the right-hand side of (3.5.10), below, is the integral of the derivative of 𝐹(𝑥). Theorem 3.5.4 (FTC II). Let 𝐹 ∶ [𝑎, 𝑏] → 𝐂 be continuously differentiable. Then 𝑏

𝐹(𝑏) − 𝐹(𝑎) = ∫ 𝑎

𝑑𝐹 𝑑𝑥. 𝑑𝑥

(3.5.10)

Proof. Problem 3.5.2. One familiar and useful consequence of FTC II is integration by substitution, which we state as follows. Theorem 3.5.5. Let 𝑋 be a subset of 𝐂, and let 𝑢 ∶ [𝑎, 𝑏] → 𝑋 and 𝑓 ∶ 𝑋 → 𝐂 be continuously differentiable. Then 𝑏

∫ 𝑓′ (𝑢(𝑥))𝑢′ (𝑥) 𝑑𝑥 = 𝑓(𝑢(𝑏)) − 𝑓(𝑢(𝑎)).

(3.5.11)

𝑎

If we further assume that 𝑋 is a subinterval of 𝐑 and 𝑔 ∶ 𝑋 → 𝐂 is continuous, then 𝑏

ᵆ(𝑏)

∫ 𝑔(𝑢(𝑥))𝑢′ (𝑥) 𝑑𝑥 = ∫ 𝑎

Proof. Problem 3.5.3.

ᵆ(𝑎)

𝑔(𝑢) 𝑑𝑢.

(3.5.12)

3.5. The Fundamental Theorem of Calculus

61

As the reader may (fondly?) recall, (3.5.12) provides a useful notation for doing multiple substitutions; e.g., we might next substitute 𝑤 = 𝑤(𝑢), and so on. We conclude our discussion of FTC by recovering (complex-valued) integration by parts. While this may seem like an afterthought, the reader unfamiliar with our subsequent material may be surprised at how useful, or even crucial, it turns out to be for us. Theorem 3.5.6 (Integration by parts). Let 𝑓, 𝑔 ∶ [𝑎, 𝑏] → 𝐂 be continuously differentiable. Then 𝑏

𝑏

∫ 𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 = 𝑓(𝑏)𝑔(𝑏) − 𝑓(𝑎)𝑔(𝑎) − ∫ 𝑔(𝑥)𝑓′ (𝑥) 𝑑𝑥. 𝑎

(3.5.13)

𝑎

Proof. Problem 3.5.4. Remark 3.5.7. One generalization of the integral of a complex function 𝑓 ∶ 𝐑 → 𝐂 that we will not use directly, but which does have an indirect influence on our main discussion, is the notion of a path integral of a function 𝑓 ∶ 𝐂 → 𝐂. Define a path segment to be a continuously differentiable function 𝛾 ∶ [𝑎, 𝑏] → 𝐂. In those terms, for continuous 𝑓 ∶ 𝐂 → 𝐂 and a path segment 𝛾 ∶ [𝑎, 𝑏] → 𝐂, we can define the integral ∫ 𝑓(𝑧) 𝑑𝑧 by 𝛾 𝑏

∫ 𝑓(𝑧) 𝑑𝑧 = ∫ 𝑓(𝛾(𝑡))𝛾′ (𝑡) 𝑑𝑡. 𝛾

(3.5.14)

𝑎

Similarly, if we define a path to be a concatenation of path segments (see Definition 3.2.12 and Figure 3.2.1), for a path 𝛾, we can define the integral ∫ 𝑓(𝑧) 𝑑𝑧 to be the 𝛾

sum of the integrals along the segments making up 𝛾. In complex analysis, it is important to consider ∫ 𝑓(𝑧) 𝑑𝑧 as depending only on the 𝛾

image and direction of 𝛾, in the following sense: If 𝑢 ∶ [𝑐, 𝑑] → [𝑎, 𝑏] is a continuously differentiable bijection such that 𝑢′ (𝑥) > 0 for all 𝑥 ∈ [𝑐, 𝑑], then ∫ 𝑓(𝑧) 𝑑𝑧 = ∫ 𝑓(𝑧) 𝑑𝑧. 𝛾∘ᵆ

(3.5.15)

𝛾

See Problem 3.5.5 for a proof. As for the properties of path integrals, we can, for example, generalize FTC II to path integrals; see Problem 3.5.6 for a discussion. On the other hand, it turns out that generalizing FTC I is much more challenging, because when we try to define indefinite integration for path integrals, we run into the problem that ∫ 𝑓(𝑧) 𝑑𝑧 may depend on 𝛾

the specific path 𝛾 takes and not just on its endpoints. In fact, describing the pathdependency of ∫ 𝑓(𝑧) 𝑑𝑧 turns out to be one of the central problems of a first course 𝛾

in complex analysis; see Ahlfors [Ahl79] and Conway [Con78] for much more on this question.

62

Chapter 3. Complex-valued calculus

Problems. 3.5.1. (Proves Theorem 3.5.3) We know that Theorem 3.4.4 holds when 𝑎 < 𝑏 < 𝑐. Using Definition 3.5.1, prove that it also holds for the other five possible orderings of {𝑎, 𝑏, 𝑐}. 3.5.2. (Proves Theorem 3.5.4) Let 𝐹 ∶ [𝑎, 𝑏] → 𝐂 be continuously differentiable. 𝑥

(a) Let 𝐺(𝑥) = ∫ 𝐹 ′ (𝑡) 𝑑𝑡 and let 𝐻(𝑥) = 𝐹(𝑥) − 𝐺(𝑥). Find the value of 𝐻 ′ (𝑥) 𝑎

for 𝑥 ∈ [𝑎, 𝑏] (with proof). What conclusion can you draw, and why? 𝑏

(b) Prove that 𝐹(𝑏) − 𝐹(𝑎) = ∫ 𝑎

𝑑𝐹 𝑑𝑥. 𝑑𝑥

3.5.3. (Proves Theorem 3.5.5) Let 𝑋 be a subset of 𝐂, and let 𝑢 ∶ [𝑎, 𝑏] → 𝑋 and 𝑓 ∶ 𝑋 → 𝐂 be continuously differentiable. 𝑏

(a) Prove ∫ 𝑓′ (𝑢(𝑥))𝑢′ (𝑥) 𝑑𝑥 = 𝑓(𝑢(𝑏)) − 𝑓(𝑢(𝑎)). 𝑎

(b) Now further assume that 𝑋 is a subinterval of 𝐑 and 𝑔 ∶ 𝑋 → 𝐂 is continuous. 𝑏

ᵆ(𝑏) ′

Prove that ∫ 𝑔(𝑢(𝑥))𝑢 (𝑥) 𝑑𝑥 = ∫ 𝑎

𝑔(𝑢) 𝑑𝑢.

ᵆ(𝑎)

3.5.4. (Proves Theorem 3.5.6) Let 𝑓, 𝑔 ∶ [𝑎, 𝑏] → 𝐂 be continuously differentiable. Prove that 𝑏

𝑏

∫ 𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 = 𝑓(𝑏)𝑔(𝑏) − 𝑓(𝑎)𝑔(𝑎) − ∫ 𝑔(𝑥)𝑓′ (𝑥) 𝑑𝑥. 𝑎

(3.5.16)

𝑎

3.5.5. (*) Let 𝑓 ∶ 𝐂 → 𝐂 be continuous, let 𝛾 ∶ [𝑎, 𝑏] → 𝐂 be continuously differentiable (i.e., 𝛾 is a path segment), and suppose that 𝑢 ∶ [𝑐, 𝑑] → [𝑎, 𝑏] is a continuously differentiable bijection such that 𝑢′ (𝑥) > 0 for all 𝑥 ∈ [𝑐, 𝑑]. Prove that ∫ 𝑓(𝑧) 𝑑𝑧 = ∫ 𝑓(𝑧) 𝑑𝑧. 𝛾∘ᵆ

(3.5.17)

𝛾

3.5.6. (*) State and prove a version of the Second Fundamental Theorem of Calculus (Theorem 3.5.4) for path integrals. (Some details to consider: How differentiable do we need 𝐹(𝑧) to be? What kind of a domain can we allow 𝐹 to have?)

3.6 Other results from calculus In this section, we discuss miscellaneous results from calculus that are tangential to our main story but will nevertheless be useful. The key definitions and results are •

the asymptotic behavior of functions and sequences (Definition 3.6.7 and Theorems 3.6.9 and 3.6.12),

Fubini’s Theorem (Theorem 3.6.21), and

differentiating under the integral sign (Theorem 3.6.23).

3.6. Other results from calculus

63

Because Fubini’s Theorem and differentiating under the integral sign both involve functions of two variables, we also briefly review some facts from multivariable calculus. In any case, we recommend, at least on a first reading, that the reader merely absorb the definitions and theorem statements listed above and not worry about the details and proofs, other than perhaps out of sheer curiosity.

3.6.1 Asymptotics and L’Hôpital’s Rule. We begin our discussion of asymptotics by defining infinite limits of various types. Definition 3.6.1. For some 𝑎 ∈ 𝐑, let 𝑓 ∶ (𝑎, +∞) → 𝐂 be a function, and let 𝐿 be a complex number. To say that lim 𝑓(𝑥) = 𝐿 means that for every 𝜖 > 0, there exists 𝑥→+∞

some 𝑁(𝜖) > 0 such that if 𝑥 > 𝑁(𝜖), then |𝑓(𝑥) − 𝐿| < 𝜖. Comparing Definition 2.4.3, we see that if lim 𝑓(𝑥) = 𝐿, then lim 𝑓(𝑛) = 𝐿 a 𝑥→+∞

𝑛→∞

fortiori. We will also need to consider infinite-valued limits. Definition 3.6.2. For a real-valued sequence 𝑎𝑛 , to say that lim 𝑎𝑛 = +∞ means that 𝑛→∞

for every 𝑀 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if 𝑛 > 𝑁(𝜖), then 𝑎𝑛 > 𝑀. Definition 3.6.3. For 𝑓 ∶ 𝑋 → 𝐑, with 𝑋 ⊆ 𝐑 and 𝑎 a limit point of 𝑋, to say that lim 𝑓(𝑥) = +∞ means that for every 𝑀 > 0, there exists some 𝛿(𝑀) > 0 such that if 𝑥→𝑎

|𝑥 − 𝑎| < 𝛿(𝑀) and 𝑥 ≠ 𝑎, then 𝑓(𝑥) > 𝑀. Similarly, for 𝑎 ∈ 𝐑 and 𝑓 ∶ (𝑎, +∞) → 𝐑, to say that lim 𝑓(𝑥) = +∞ means that for every 𝑀 > 0, there exists some 𝑁(𝑀) > 0 𝑥→+∞

such that if 𝑥 > 𝑁(𝑀), then 𝑓(𝑥) > 𝑀. We observe that: Lemma 3.6.4. Retaining the notation of Definition 3.6.3, for 𝑎 ∈ 𝐑 or 𝑎 = +∞, if 1 = 0. lim 𝑓(𝑥) = +∞, then lim 𝑥→𝑎 𝑥→𝑎 𝑓(𝑥) Proof. Problem 3.6.1. L’Hôpital’s Rule is quite general, but we will only use the following cases. Note that in any case, L’Hôpital’s Rule is definitely a result about real-valued functions. Theorem 3.6.5 (L’Hôpital’s Rule). Let 𝑓 and 𝑔 be real-valued differentiable functions on some 𝑋 ⊆ 𝐑 such that 𝑔′ (𝑥) ≠ 0 for all 𝑥 ∈ 𝑋 and 𝑔(𝑥) is strictly monotone (i.e., either strictly increasing or strictly decreasing) on 𝑋. (1) If 𝑋 = (𝑎, 𝑏) and for some 𝐿 ∈ 𝐑, lim+ 𝑓(𝑥) = 0,

𝑥→𝑎

then lim+ 𝑥→𝑎

𝑓(𝑥) = 𝐿. 𝑔(𝑥)

lim+ 𝑔(𝑥) = 0,

𝑥→𝑎

lim+

𝑥→𝑎

𝑓′ (𝑥) = 𝐿, 𝑔′ (𝑥)

(3.6.1)

64

Chapter 3. Complex-valued calculus

(2) If 𝑋 = (𝑎, +∞) and for some 𝐿 ∈ 𝐑, lim 𝑓(𝑥) = +∞,

𝑥→+∞

then lim+ 𝑥→𝑎

lim 𝑔(𝑥) = +∞,

𝑥→+∞

𝑓′ (𝑥) = 𝐿, 𝑥→+∞ 𝑔′ (𝑥) lim

(3.6.2)

𝑓(𝑥) = 𝐿. 𝑔(𝑥)

The reader may safely skip the proof of L’Hôpital’s Rule, but we include it for completeness. Proof. We will require the following extension of the Mean Value Theorem: Let 𝑓, 𝑔 ∶ [𝑎, 𝑏] → 𝐑 be differentiable on (𝑎, 𝑏) and continuous on [𝑎, 𝑏]. Then there exists some 𝑐 ∈ (𝑎, 𝑏) such that (𝑓(𝑏) − 𝑓(𝑎))𝑔′ (𝑐) = (𝑔(𝑏) − 𝑔(𝑎))𝑓′ (𝑐).

(3.6.3)

See Problem 3.6.2 for a proof. Note that under our hypotheses, (3.6.3) becomes 𝑓(𝑏) − 𝑓(𝑎) 𝑓′ (𝑐) = ′ . 𝑔(𝑏) − 𝑔(𝑎) 𝑔 (𝑐) In case (1), given 𝜖 > 0, since lim+ 𝑥→𝑎

(3.6.4)

𝑓′ (𝑥) = 𝐿, there exists some 𝛿 > 0 such that 𝑔′ (𝑥)

| | 𝑓′ (𝑡) | ′ − 𝐿| < 𝜖/2 | | 𝑔 (𝑡)

(3.6.5)

for all 𝑡 ∈ (𝑎, 𝑎 + 𝛿). Therefore, by (3.6.4), we see that for 𝑎 < 𝑥 < 𝑦 < 𝑎 + 𝛿, we have | | 𝑓′ (𝑡) | | 𝑓(𝑦) − 𝑓(𝑥) | − 𝐿| = | ′ − 𝐿| < 𝜖/2 | | 𝑔 (𝑡) | | 𝑔(𝑦) − 𝑔(𝑥)

(3.6.6)

for some 𝑡 ∈ (𝑥, 𝑦). Therefore, taking lim+ of both sides of (3.6.6), we see that 𝑥→𝑎

| | 𝑓(𝑦) | − 𝐿| ≤ 𝜖/2 < 𝜖, | | 𝑔(𝑦)

(3.6.7)

and case (1) follows. Similarly, for case (2), given 𝜖 > 0, there exists some 𝑀1 > 0 such that | 𝑓′ (𝑡) | | ′ − 𝐿| < 𝜖/3 | 𝑔 (𝑡) |

(3.6.8)

for all 𝑡 > 𝑀1 . Therefore, again, by (3.6.4), we see that for 𝑀1 < 𝑀2 < 𝑦, we have | 𝑓(𝑦) − 𝑓(𝑀2 ) | | − 𝐿| < 𝜖/3. | 𝑔(𝑦) − 𝑔(𝑀2 ) |

(3.6.9)

We now fix 𝑀2 > 𝑀1 as above and let ℎ(𝑦) =

𝑔(𝑦) − 𝑔(𝑀2 ) , 𝑔(𝑦)

(3.6.10)

observing that lim ℎ(𝑦) = 1. In particular, there exists some 𝑀3 > 0 such that ℎ(𝑦) < 𝑦→+∞

3/2 for 𝑦 > 𝑀3 . So now, multiplying (3.6.9) on both sides by |ℎ(𝑦)|, we see that | 𝑓(𝑦) − 𝑓(𝑀2 ) | 𝜖 | − 𝐿ℎ(𝑦)| < |ℎ(𝑦)| | | 3 𝑔(𝑦)

(3.6.11)

3.6. Other results from calculus

65

for all 𝑦 > 𝑀2 . Therefore, if we let | 𝑓(𝑀2 ) | | + |𝐿| |ℎ(𝑦) − 1| , 𝐸(𝑦) = | | 𝑔(𝑦) |

(3.6.12)

then because lim 𝐸(𝑦) = 0, there exists some 𝑀4 > 0 such that 𝐸(𝑦) < 𝜖/3 for all 𝑦→+∞

𝑦 > 𝑀4 . Therefore, if 𝑦 > max {𝑀1 , 𝑀2 , 𝑀3 , 𝑀4 }, by the triangle inequality, we see that | 𝑓(𝑦) | | 𝑓(𝑦) − 𝑓(𝑀2 ) | | 𝑓(𝑀2 ) | | | + |𝐿ℎ(𝑦) − 𝐿| − 𝐿| ≤ | − 𝐿ℎ(𝑦)| + | | 𝑔(𝑦) | | | | 𝑔(𝑦) | 𝑔(𝑦) 𝜖 3 𝜖 𝜖 < |ℎ(𝑦)| + 𝐸(𝑦) < ( ) + < 𝜖. 3 3 2 3 The theorem follows.

(3.6.13)

Remark 3.6.6. In fact, the hypothesis that 𝑔(𝑥) is strictly monotone on 𝑋 is redundant, but harmless for our purposes. See Ross [Ros13, Thm. 30.2] for the details and for a more complete statement and proof of L’Hôpital’s Rule. The main reason we are interested in L’Hôpital’s Rule is to study the following phenomenon. Definition 3.6.7. For positive-valued functions 𝑓, 𝑔 ∶ (𝑎, +∞) → 𝐑, to say 𝑓(𝑥) ≪ 𝑓(𝑥) 𝑔(𝑥) means that lim = 0. 𝑥→+∞ 𝑔(𝑥) The notation ≪ is justified by the following fact: Lemma 3.6.8. For positive-valued functions 𝑓, 𝑔, ℎ ∶ (𝑎, +∞) → 𝐑, if 𝑓(𝑥) ≪ 𝑔(𝑥) and 𝑔(𝑥) ≪ ℎ(𝑥), then 𝑓(𝑥) ≪ ℎ(𝑥). Proof. Problem 3.6.3. We also note that we will mainly apply asymptotics in the following way. Theorem 3.6.9. For positive-valued functions 𝑓, 𝑔 ∶ (𝑎, +∞) → 𝐑, if 𝑓(𝑥) ≪ 𝑔(𝑥), 1 𝐾 𝑎𝑛 = 𝑓(𝑛), and 𝑏𝑛 = 𝑔(𝑛), then there exists some 𝐾 ∈ 𝐑 such that ≤ for all 𝑛. 𝑏𝑛 𝑎𝑛 Proof. Problem 3.6.4. Now, formally speaking, we have not yet proven facts about exponentials or even defined them rigorously. (See Chapter 4, starting with Section 4.5, for that material.) However, for the moment, the following calculus-level facts, to be proven later, will suffice. Lemma 3.6.10. For 𝑎 > 0 and 𝑐 > 1, we have 𝑑 1 𝑑 𝑎 (ln 𝑥) = , (𝑥 ) = 𝑎𝑥𝑎−1 , 𝑑𝑥 𝑥 𝑑𝑥

𝑑 𝑥 (𝑐 ) = (ln 𝑐)𝑐𝑥 . 𝑑𝑥

(3.6.14)

Proof. See Problem 4.6.6. Lemma 3.6.11. We have that 1 ≪ ln 𝑥; for 𝑎 > 0, we have 1 ≪ 𝑥𝑎 ; and for 𝑐 > 1, we have 1 ≪ 𝑐𝑥 .

66

Chapter 3. Complex-valued calculus

Proof. First, by Lemma 3.6.10, ln 𝑥, 𝑥𝑎 , and 𝑐𝑥 are all positive and increasing for 𝑥 > 0, 1 1 1 , , and 𝑥 are all positive and decreasing for 𝑥 > 0. Therefore, in the notation so ln 𝑥 𝑥𝑎 𝑐 ln 𝜖 of Definition 3.6.1, the choices 𝑁(𝜖) = 𝑒1/𝜖 , 𝑁(𝜖) = (1/𝜖)1/𝑎 , and 𝑁(𝜖) = − suffice ln 𝑐 1 1 1 to show that as 𝑥 → +∞, each of , , and 𝑥 approaches 0, respectively. ln 𝑥 𝑥𝑎 𝑐 Theorem 3.6.12 (Asymptotics). For 0 < 𝑎 < 𝑏 and 1 < 𝑐 < 𝑑, we have that 1 ≪ ln 𝑥 ≪ 𝑥𝑎 ≪ 𝑥𝑏 ≪ 𝑐𝑥 ≪ 𝑑𝑥 .

(3.6.15)

𝑥𝑎 1 = 𝑏−𝑎 , we see that 1 ≪ ln 𝑥 and 𝑥𝑏 𝑥 𝑥𝑎 ≪ 𝑥𝑏 . By the transitivity of ≪, it therefore suffices to show that ln 𝑥 ≪ 𝑥𝑎 and 𝑥𝑘 ≪ 𝑐𝑥 for 𝑘 a nonnegative integer; see Problem 3.6.5.

Proof. By Lemma 3.6.11 and the fact that

3.6.2 Functions of two real variables. Before we turn to Fubini’s Theorem and differentiating under the integral sign, we briefly review some of the fundamentals of functions of two real variables. This material should be familiar (though perhaps expressed in fancier language) to the reader who has taken multivariable calculus, with the possible exception of Corollary 3.6.14. Our first observation is that if 𝑧𝑛 = 𝑥𝑛 + 𝑦𝑛 𝑖 (𝑛 = 0, 1), then |𝑧1 − 𝑧0 | = √(𝑥1 − 𝑥0 )2 + (𝑦1 − 𝑦0 )2 ,

(3.6.16)

the usual Euclidean distance between (𝑥0 , 𝑦0 ) and (𝑥1 , 𝑦1 ) in 𝐑2 = 𝐑 × 𝐑. It follows that 𝐂 and 𝐑2 are essentially the same as metric spaces, which means that our results on continuity and limits for functions in 𝐂 carry over to functions in 𝐑2 . For example, Theorem 3.1.5 immediately implies: Corollary 3.6.13. Let 𝑋 be a subset of 𝐑2 , and let 𝑓, 𝑔 ∶ 𝑋 → 𝐂 be functions that are continuous at some (𝑎, 𝑏) ∈ 𝑋. Then 𝑐𝑓(𝑥, 𝑦) (𝑐 ∈ 𝐂), 𝑓(𝑥, 𝑦) + 𝑔(𝑥, 𝑦), 𝑓(𝑥, 𝑦), and 𝑓(𝑥, 𝑦)𝑔(𝑥, 𝑦) are continuous at (𝑎, 𝑏), and if 𝑔(𝑥, 𝑦) ≠ 0 for all (𝑥, 𝑦) ∈ 𝑋, then 𝑓(𝑥, 𝑦)/𝑔(𝑥, 𝑦) is continuous at (𝑎, 𝑏). Similarly, Theorem 3.1.12 immediately implies: Corollary 3.6.14. If 𝑋 is a closed and bounded subset of 𝐑2 and 𝑓 ∶ 𝑋 → 𝐑 is continuous, then 𝑓 is uniformly continuous on 𝑋. Theorem 3.1.15 and Corollary 3.1.16 also carry over to functions of two real variables in exactly the same way. The continuity of one-variable functions relates to the continuity of two-variable functions much as one might expect. Theorem 3.6.15. Let 𝑋 and 𝑌 be subsets of 𝐑. (a) If 𝑓 ∶ 𝑋 → 𝐂 is continuous, then the function 𝐹 ∶ 𝑋 × 𝐑 given by 𝐹(𝑥, 𝑦) = 𝑓(𝑥) is continuous (as a function of two variables). (b) If 𝐹 ∶ 𝑋 × 𝑌 → 𝐂 is continuous (as a function of two variables) and 𝑦0 ∈ 𝑌 , then 𝑓(𝑥) = 𝐹(𝑥, 𝑦0 ) is continuous as a function of 𝑥 ∈ 𝑋.

3.6. Other results from calculus

67

Proof. Problem 3.6.6. We will also need partial derivatives on the following kind of domain. Definition 3.6.16. We define a rectangle to be the product [𝑎, 𝑏] × [𝑐, 𝑑] ⊆ 𝐑2 of two closed and bounded intervals. To say that 𝑋 ⊆ 𝐑2 is locally rectangular means that for every 𝑥 ∈ 𝑋, there is some rectangle 𝑅 such that 𝑥 ∈ 𝑅 ⊆ 𝑋. For example, open discs (Definition 2.4.8) are locally rectangular because every point in an open disc 𝐷 is contained in a small square that is in turn contained in 𝐷. Definition 3.6.17. Let 𝑋 be a locally rectangular subset of 𝐑2 . For fixed (𝑎, 𝑏) ∈ 𝑋, we 𝜕𝑓 𝜕𝑓 define the partial derivatives (𝑎, 𝑏) and (𝑎, 𝑏) of 𝑓 ∶ 𝑋 → 𝐂 by 𝜕𝑥 𝜕𝑦 𝜕𝑓 𝑓(𝑥, 𝑏) − 𝑓(𝑎, 𝑏) (𝑎, 𝑏) = lim , 𝜕𝑥 𝑥−𝑎 𝑥→𝑎

𝜕𝑓 𝑓(𝑎, 𝑦) − 𝑓(𝑎, 𝑏) (𝑎, 𝑏) = lim , 𝜕𝑦 𝑦−𝑏 𝑦→𝑏

(3.6.17)

respectively, when the limits exist. One tricky aspect of partial derivatives is that “partial differentiability” need not imply continuity. For example, the function 𝑥𝑦 for (𝑥, 𝑦) ≠ (0, 0), 𝑓(𝑥, 𝑦) = { 𝑥2 + 𝑦2 (3.6.18) 0 for (𝑥, 𝑦) = (0, 0) 𝜕𝑓 𝜕𝑓 (0, 0) = (0, 0) = 0 but is not continuous at (0, 0) (Problem 3.6.7). However, 𝜕𝑥 𝜕𝑦 having continuous partial derivatives does have the expected consequence. has

Theorem 3.6.18. Let 𝑋 ⊆ 𝐑2 be locally rectangular. For 𝑓 ∶ 𝑋 → 𝐂, if

𝜕𝑓 𝜕𝑓 and are 𝜕𝑥 𝜕𝑦

continuous on 𝑋, then 𝑓 is continuous on 𝑋. Proof. It suffices to assume that 𝑋 is a rectangle; for a proof, see Problem 3.6.8. Remark 3.6.19. The example in (3.6.18) really shows that partial derivatives are not the right definition of differentiability for a function of more than one variable. The correct definition of differentiability, sometimes called total differentiability, actually generalizes local linearity (Lemma 3.2.5) to higher-dimensional inputs, and in fact, the proof of Theorem 3.6.18 in Problem 3.6.8 can be refined to show that having continuous partial derivatives implies total differentiability. See, for example, Munkres [Mun97, Ch. 2] for a discussion.

3.6.3 Fubini’s Theorem. Consider a continuous function of two real variables 𝑓 ∶ [𝑎, 𝑏] × [𝑐, 𝑑] → 𝐂. Our goal is to prove that the iterated integrals 𝑏

𝑑

∫ (∫ 𝑓(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥 𝑎

𝑐

𝑑

and

𝑏

∫ (∫ 𝑓(𝑥, 𝑦) 𝑑𝑥) 𝑑𝑦 𝑐

(3.6.19)

𝑎

exist and are equal. It suffices to consider the real-valued case, which we do for the rest of this discussion.

68

Chapter 3. Complex-valued calculus

First, we recall that if 𝑓 ∶ [𝑎, 𝑏] × [𝑐, 𝑑] → 𝐂 is continuous as a function of two variables, then 𝑓 is continuous in each variable (Theorem 3.6.15), which means that we can integrate 𝑓 in each variable if we hold the other constant. We next have the following lemma. Lemma 3.6.20. If 𝑓 ∶ [𝑎, 𝑏] × [𝑐, 𝑑] → 𝐂 is continuous, then 𝑑

𝐼(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦

(3.6.20)

𝑐

is continuous as a function on [𝑎, 𝑏]. It follows that both integrals in (3.6.19) are well-defined. Proof. Problem 3.6.9. Theorem 3.6.21 (Fubini’s Theorem). Let 𝑓 ∶ [𝑎, 𝑏] × [𝑐, 𝑑] → 𝐑 be continuous. Then 𝑏

𝑑

𝑑

𝑏

∫ (∫ 𝑓(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥 = ∫ (∫ 𝑓(𝑥, 𝑦) 𝑑𝑥) 𝑑𝑦. 𝑎

𝑐

𝑐

(3.6.21)

𝑎

Again, the reader may safely skip the proof, but we include it for completeness. Proof. We first establish notation. For 𝑛, 𝑘 ∈ 𝐍, we define 𝑏−𝑎 , 𝑛 𝑥𝑖 = 𝑎 + 𝑖Δ𝑥,

Δ𝑥 =

𝑑−𝑐 , 𝑘 𝑦𝑗 = 𝑐 + 𝑗Δ𝑦.

Δ𝑦 =

(3.6.22) (3.6.23)

In other words, {𝑥0 , … , 𝑥𝑛 } is the standard partition of [𝑎, 𝑏] into 𝑛 uniform subintervals (Example 3.3.2), and analogously for {𝑦0 , … , 𝑦𝑘 }. Since 𝑓 is bounded, we also define 𝑚(𝑖, 𝑗) = inf {𝑓(𝑥, 𝑦) ∣ 𝑥 ∈ [𝑥𝑖−1 , 𝑥𝑖 ], 𝑦 ∈ [𝑦𝑗−1 , 𝑦𝑗 ]} ,

(3.6.24)

𝑀(𝑖, 𝑗) = sup {𝑓(𝑥, 𝑦) ∣ 𝑥 ∈ [𝑥𝑖−1 , 𝑥𝑖 ], 𝑦 ∈ [𝑦𝑗−1 , 𝑦𝑗 ]} .

(3.6.25)

It will suffice to show that 𝑛

𝑏

𝑘

𝑑

𝑛

𝑘

∑ ∑ 𝑚(𝑖, 𝑗)Δ𝑥Δ𝑦 ≤ ∫ (∫ 𝑓(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥 ≤ ∑ ∑ 𝑀(𝑖, 𝑗)Δ𝑥Δ𝑦 𝑖=1 𝑗=1

𝑎

𝑐

(3.6.26)

𝑖=1 𝑗=1

and that for every 𝜖 > 0, there exist 𝑛, 𝑘 ∈ 𝐍 such that the beginning and end of (3.6.26) differ by at most 𝜖, for then by symmetry between 𝑥 and 𝑦, the two sides of (3.6.21) differ by at most 𝜖 for every 𝜖 > 0. So fix 𝜖 > 0. By the uniform continuity of 𝑓 (Corollary 3.6.14), there exists some 𝛿 > 0 such that if 𝑑((𝑥0 , 𝑦0 ), (𝑥1 , 𝑦1 )) < 𝛿, then 𝜖 |𝑓(𝑥1 , 𝑦1 ) − 𝑓(𝑥0 , 𝑦0 )| < . (3.6.27) (𝑏 − 𝑎)(𝑑 − 𝑐) Choose 𝑛, 𝑘 ∈ 𝐍 such that Δ𝑥 + Δ𝑦 < 𝛿. By reasoning similar to that of Problem 3.4.4, we see that for all 𝑖, 𝑗, we have that 𝜖 𝑀(𝑖, 𝑗) − 𝑚(𝑖, 𝑗) ≤ . (3.6.28) (𝑏 − 𝑎)(𝑑 − 𝑐)

3.6. Other results from calculus

69

So again, let 𝑑

𝐼(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦.

(3.6.29)

𝑐

Since 𝑃 = {𝑦0 , … , 𝑦𝑘 } is a partition of [𝑐, 𝑑], for 𝑥 ∈ [𝑥𝑖−1 , 𝑥𝑖 ], in the notation of Section 3.3, we have 𝑘

𝑘

∑ 𝑚(𝑖, 𝑗)Δ𝑦 ≤ 𝐿(𝑓; 𝑃) ≤ 𝐼(𝑥) ≤ 𝑈(𝑓; 𝑃) ≤ ∑ 𝑀(𝑖, 𝑗)Δ𝑦. 𝑗=1

(3.6.30)

𝑗=1

So let ℓ𝑗 (𝑥) = {

𝑚(𝑖, 𝑗)

if 𝑥𝑖−1 ≤ 𝑥 < 𝑥𝑖

𝑚(𝑛, 𝑗)

if 𝑥 = 𝑥𝑛 = 𝑏,

𝑀(𝑖, 𝑗) 𝑢𝑗 (𝑥) = { 𝑀(𝑛, 𝑗)

if 𝑥𝑖−1 ≤ 𝑥 < 𝑥𝑖 if 𝑥 = 𝑥𝑛 = 𝑏.

(3.6.31)

From (3.6.30), we now see that 𝑘

𝑘

∑ ℓ𝑗 (𝑥)Δ𝑦 ≤ 𝐼(𝑥) ≤ ∑ 𝑢𝑗 (𝑥)Δ𝑦, 𝑗=1

(3.6.32)

𝑗=1

and since ℓ𝑗 (𝑥) and 𝑢𝑗 (𝑥) are piecewise constant, by Lemma 3.3.14 and Theorem 3.4.4, 𝑏

𝑏

𝑛

∫ ℓ𝑗 (𝑥) 𝑑𝑥 = ∑ 𝑚(𝑖, 𝑗)Δ𝑥, 𝑎

𝑛

∫ 𝑢𝑗 (𝑥) 𝑑𝑥 = ∑ 𝑀(𝑖, 𝑗)Δ𝑥. 𝑎

𝑖=1

(3.6.33)

𝑖=1

Therefore, integrating (3.6.32), applying Theorem 3.4.2, and rearranging the finite double sums as necessary, we get (3.6.26), as desired. Finally, by (3.6.28), we see that 𝑛

𝑛

𝑘

𝑘

∑ ∑ 𝑀(𝑖, 𝑗)Δ𝑥Δ𝑦 − ∑ ∑ 𝑚(𝑖, 𝑗)Δ𝑥Δ𝑦 𝑖=1 𝑗=1

𝑖=1 𝑗=1 𝑛

𝑘

= ∑ ∑ (𝑀(𝑖, 𝑗) − 𝑚(𝑖, 𝑗))Δ𝑥Δ𝑦 𝑖=1 𝑗=1 𝑛

(3.6.34)

𝑘

𝜖 0 and 𝑐 > 1. (a) Prove that ln 𝑥 ≪ 𝑥𝑎 . (b) Prove that for any nonnegative integer 𝑘, 𝑥𝑘 ≪ 𝑐𝑥 . 3.6.6. (Proves Theorem 3.6.15) Let 𝑋 and 𝑌 be subsets of 𝐑. (a) Suppose 𝑓 ∶ 𝑋 → 𝐂 is continuous. Prove that the function 𝐹 ∶ 𝑋 × 𝐑 given by 𝐹(𝑥, 𝑦) = 𝑓(𝑥) is continuous (as a function of two variables). (b) Suppose 𝐹 ∶ 𝑋 × 𝑌 → 𝐂 is continuous (as a function of two variables) and fix 𝑦0 ∈ 𝑌 . Prove that 𝑓(𝑥) = 𝐹(𝑥, 𝑦0 ) is continuous as a function of 𝑥 ∈ 𝑋. 3.6.7. Let

𝑥𝑦 2 + 𝑦2 𝑥 𝑓(𝑥, 𝑦) = { 0

for (𝑥, 𝑦) ≠ (0, 0),

(3.6.40)

for (𝑥, 𝑦) = (0, 0).

𝜕𝑓 𝜕𝑓 (0, 0) = (0, 0) = 0. 𝜕𝑥 𝜕𝑦 (b) Prove that 𝑓 is not continuous at (0, 0) by finding a sequence (𝑎𝑛 , 𝑏𝑛 ) such that lim 𝑓(𝑎𝑛 , 𝑏𝑛 ) ≠ 0. (a) Prove that

𝑛→∞

3.6.8. (Proves Theorem 3.6.18) Let 𝑋 = [𝑎, 𝑏] × [𝑐, 𝑑], suppose 𝑓 ∶ 𝑋 → 𝐂 is a function 𝜕𝑓 𝜕𝑓 such that and exist and are continuous on 𝑋, and consider (𝑥0 , 𝑦0 ) ∈ 𝑋. 𝜕𝑥 𝜕𝑦 (a) For (𝑥1 , 𝑦1 ) ∈ 𝑋, prove that 𝑥1

𝑓(𝑥1 , 𝑦1 ) = 𝑓(𝑥0 , 𝑦0 ) + ∫ 𝑥0

𝜕𝑓 (𝑥, 𝑦0 ) 𝑑𝑥 + ∫ 𝜕𝑥 𝑦

𝑦1

0

𝜕𝑓 (𝑥 , 𝑦) 𝑑𝑦. 𝜕𝑦 1

(3.6.41)

(b) Prove that 𝑓 is continuous at (𝑥0 , 𝑦0 ). 3.6.9. (Proves Lemma 3.6.20) Suppose 𝑓 ∶ [𝑎, 𝑏] × [𝑐, 𝑑] → 𝐂 is continuous. Prove that 𝑑

𝐼(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦 𝑐

is uniformly continuous as a function of 𝑥 ∈ [𝑎, 𝑏].

(3.6.42)

4 Series of functions When the various terms of series [𝑢0 , 𝑢1 , 𝑢2 , . . . ] are functions of the same variable 𝑥, continuous with respect to this variable in the neighborhood of a particular value for which the series converges, the sum 𝑠 of the series is also a continuous function of 𝑥 in the neighborhood of this particular value. — the incorrect (or at least imprecise) “sum theorem” of AugustinLouis Cauchy, Cours d’Analyse “He’ll never catch up!” the Sicilian cried. “Inconceivable!” “You keep using that word!” the Spaniard snapped. “I don’t think it means what you think it does.” — William Goldman, The Princess Bride In this chapter, we develop the theory of series of (complex-valued) functions, starting with infinite series in general (Section 4.1) and then moving on to series of functions (Section 4.2), pointwise vs. uniform convergence (Section 4.3), power series (Section 4.4), and the exponential and trig functions (Sections 4.5 and 4.6). We conclude the chapter, and Part 1 of this book, with some material from calculus on the real line (Sections 4.7 and 4.8) that we will need later, mainly for the study of the Fourier transform (Chapters 12 and 13). The central difficulty to watch out for in this chapter is exactly the problem with Cauchy’s famously incorrect “sum theorem”: a convergent series of continuous (or integrable or differentiable) functions may not converge to a continuous (or integrable or differentiable) function. On the plus side, one satisfying consequence of all of our efforts to establish “real analysis with complex numbers” is that by the end of this chapter, we will be able to define the complex exponential function ∞

𝑧𝑛 𝑧2 𝑧3 =1+𝑧+ + +⋯ 𝑛! 2! 3! 𝑛=0

𝑒𝑧 = ∑

(4.0.1) 73

74

Chapter 4. Series of functions

for all 𝑧 ∈ 𝐂 and recover all of its usual properties, including 𝑒𝑖𝑥 = cos 𝑥 + 𝑖 sin 𝑥 for 𝑥 ∈ 𝐑.

4.1 Inﬁnite series As in the real-valued case, we begin with basic definitions. Overall, the point is that the fundamental results for complex-valued series work pretty much the same as those for real-valued series. Definition 4.1.1. Let 𝑎𝑛 (𝑛 ≥ 𝑘) be a sequence in 𝐂. We define the corresponding ∞

(infinite) series ∑ 𝑎𝑛 as follows. 𝑛=𝑘

First, we recursively define the sequence of partial sums 𝑠𝑁 by setting 𝑠𝑘 = 𝑎𝑘 and, for 𝑁 ≥ 𝑘, setting 𝑠𝑁+1 = 𝑠𝑁 + 𝑎𝑁+1 . In other words, 𝑠𝑘 = 𝑎 𝑘 , 𝑠𝑘+1 = 𝑠𝑘 + 𝑎𝑘+1 = 𝑎𝑘 + 𝑎𝑘+1 , 𝑠𝑘+2 = 𝑠𝑘+1 + 𝑎𝑘+2 = 𝑎𝑘 + 𝑎𝑘+1 + 𝑎𝑘+2 , 𝑠𝑘+3 = 𝑠𝑘+2 + 𝑎𝑘+3 = 𝑎𝑘 + 𝑎𝑘+1 + 𝑎𝑘+2 + 𝑎𝑘+3 ,

(4.1.1)

⋮ 𝑠𝑁+1 = 𝑠𝑁 + 𝑎𝑁+1 = 𝑎𝑘 + 𝑎𝑘+1 + ⋯ + 𝑎𝑁+1 , ⋮. ∞

To say that ∑ 𝑎𝑛 converges means that the sequence of partial sums 𝑠𝑁 converges, 𝑛=𝑘 ∞

and similarly for divergence. Furthermore, if ∑ 𝑎𝑛 converges, we define 𝑛=𝑘 ∞

∑ 𝑎𝑛 = lim 𝑠𝑁 . 𝑛=𝑘

(4.1.2)

𝑁→∞

−∞

Note that ∑ 𝑎𝑛 can be defined similarly by having the indices and partial sums 𝑛=𝑘

in Definition 4.1.1 go backwards instead of forwards; we leave the details to the reader. Since the sum of a series is defined in terms of its partial sums, some straightforward algebra shows that series inherit properties (1)–(3) of the limit laws for sequences ∞

(Theorem 2.4.6). For example, if ∑ 𝑎𝑛 and ∑ 𝑏𝑛 both converge and 𝑐, 𝑑 ∈ 𝐂, then 𝑛=1 ∞

𝑛=1 ∞

∑ (𝑐𝑎𝑛 + 𝑑𝑏𝑛 ) = 𝑐 ∑ 𝑎𝑛 + 𝑑 ∑ 𝑏𝑛 𝑛=1

𝑛=1

(4.1.3)

𝑛=1

converges as well. For a more substantial study of series, just as in the real-valued case, Cauchy completeness (Theorem 2.5.7) comes in quite handy. First, rewriting the Cauchy criterion

4.1. Inﬁnite series

75 𝑚

using the fact that 𝑠𝑚 − 𝑠𝑘−1 = ∑ 𝑎𝑛 , we immediately see that: 𝑛=𝑘

Corollary 4.1.2 (Cauchy Criterion for Series). The series ∑ 𝑎𝑛 converges if and only if for every 𝜖 > 0, there exists some 𝑁(𝜖) such that if 𝑚, 𝑘 ∈ 𝐙 and 𝑚, 𝑘 > 𝑁(𝜖), then |𝑚 | | ∑ 𝑎 | < 𝜖. 𝑛| | |𝑛=𝑘 | Remark 4.1.3. Note that by Corollary 4.1.2, the convergence or divergence of a series does not change if we omit or add finitely many terms, so it makes sense to speak of the convergence of ∑ 𝑎𝑛 without indicating a starting point. As a further consequence of Corollary 4.1.2, we come to our principal tool for understanding series, namely, the comparison test. Corollary 4.1.4 (Comparison Test). Let 𝑎𝑛 and 𝑏𝑛 be sequences, with 𝑏𝑛 ≥ 0. (1) If ∑ 𝑏𝑛 converges and |𝑎𝑛 | ≤ 𝑏𝑛 for all 𝑛 (or sufficiently large 𝑛), then ∑ 𝑎𝑛 converges. (2) If ∑ 𝑏𝑛 diverges, 𝑎𝑛 ≥ 0, and 𝑏𝑛 ≤ 𝑎𝑛 for all 𝑛 (or sufficently large 𝑛), then ∑ 𝑎𝑛 diverges. Proof. Note that by Remark 4.1.3, the claims for “sufficiently large 𝑛” follow from the claims for all 𝑛. Furthermore, the second claim follows by the contrapositive of the first, so it remains only to prove the first claim. So suppose that ∑ 𝑏𝑛 converges and |𝑎𝑛 | ≤ 𝑏𝑛 for all 𝑛. Then since the triangle inequality implies 𝑚 𝑚 |𝑚 | | ∑ 𝑎 | ≤ ∑ |𝑎 | ≤ ∑ 𝑏 , (4.1.4) 𝑛 𝑛 𝑛 | | |𝑛=𝑘 | 𝑛=𝑘 𝑛=𝑘 the Cauchy criterion (Corollary 4.1.2) for ∑ 𝑏𝑛 implies the Cauchy criterion for ∑ 𝑎𝑛 , with the same 𝑁(𝜖). The corollary follows. In particular, applying Corollary 4.1.4 in the case where 𝑏𝑛 = |𝑎𝑛 |, we have: Corollary 4.1.5. If ∑ |𝑎𝑛 | converges, then so does ∑ 𝑎𝑛 . Definition 4.1.6. To say that ∑ 𝑎𝑛 converges absolutely means that ∑ |𝑎𝑛 | converges (and therefore, so does ∑ 𝑎𝑛 ). In this book, we will frequently use two-sided series, which can be defined as follows. Definition 4.1.7. Let 𝑎𝑛 ∶ 𝐙 → 𝐂 be a two-sided sequence, which we can think of as a sequence 𝑎𝑛 where 𝑛 goes to both +∞ and −∞. To say that the the corresponding −∞

two-sided series ∑ 𝑎𝑛 converges means that for some 𝑘 ∈ 𝐙, both ∑ 𝑎𝑛 and ∑ 𝑎𝑛 𝑛∈𝐙

𝑛=𝑘−1

𝑛=𝑘

converge, in which case we define −∞

∑ 𝑎𝑛 = ∑ 𝑎𝑛 + ∑ 𝑎𝑛 . 𝑛∈𝐙

𝑛=𝑘−1

𝑛=𝑘

(4.1.5)

76

Chapter 4. Series of functions

Conversely, if either of the series on the right-hand side of (4.1.5) diverges, we say that ∑ 𝑎𝑛 diverges. 𝑛∈𝐙

Note that changing the value of 𝑘 just shifts a finite number of terms from one of the sums on the right-hand side of (4.1.5) to the other, which does not change the convergence of either series (Remark 4.1.3), so if (4.1.5) converges for one value of 𝑘, it converges for any value of 𝑘. In other words, the convergence of ∑ 𝑎𝑛 does not depend 𝑛∈𝐙

on the above choice of 𝑘. Sums like ∑ 𝑎𝑛 are defined similarly, replacing (4.1.5) with 𝑛≠0 −∞

∑ 𝑎𝑛 = ∑ 𝑎𝑛 + ∑ 𝑎𝑛 . 𝑛≠0

𝑛=−1

(4.1.6)

𝑛=1

See Example 4.1.15 for one such series arising naturally. While Definition 4.1.7 gives the general definition of the convergence of ∑ 𝑎𝑛 , 𝑛∈𝐙

the order of summation of ∑ 𝑎𝑛 that arises naturally with Fourier series is actually 𝑛∈𝐙

the following one. (See Definition 6.2.1.) Definition 4.1.8. We define the synchronous convergence of the two-sided series ∑ 𝑎𝑛 𝑛∈𝐙

as follows. •

First, we recursively define the sequence of synchronous partial sums 𝑠𝑁 by 𝑠0 = 𝑎 0 , 𝑠1 = 𝑠0 + 𝑎−1 + 𝑎1 = 𝑎−1 + 𝑎0 + 𝑎1 , 𝑠2 = 𝑠1 + 𝑎−2 + 𝑎2 = 𝑎−2 + 𝑎−1 + 𝑎0 + 𝑎1 + 𝑎2 , ⋮

(4.1.7) 𝑁

𝑠𝑁 = 𝑠𝑁−1 + 𝑎−𝑁 + 𝑎𝑁 = ∑ 𝑎𝑛 , 𝑛=−𝑁

⋮. •

We then say that ∑ 𝑎𝑛 converges or diverges synchronously as the sequence of par𝑛∈𝐙

tial sums 𝑠𝑁 does. Furthermore, if ∑ 𝑎𝑛 converges synchronously, we define the 𝑛∈𝐙

synchronous sum of ∑ 𝑎𝑛 to be 𝑛∈𝐙 𝑁

∑ 𝑎𝑛 = lim 𝑠𝑁 = lim 𝑛∈𝐙

𝑁→∞

𝑁→∞

∑ 𝑎𝑛 .

(4.1.8)

𝑛=−𝑁

Remark 4.1.9. In general, the order of summation of a series can be quite a subtle matter; see Appendix A for more on this point. Fortunately, it can be shown that when a series converges absolutely, the order of summation does not matter (Corollary A.3), and we will almost exclusively deal with absolute convergence. Nevertheless, it is

4.1. Inﬁnite series

77

worth knowing that the distinction between synchronous and ordinary convergence of two-sided series exists. Taking 𝑘 = 𝑚 in Corollary 4.1.2 also gives the following result, which is useful as long as the reader avoids its tempting, but false, converse. (The name of the result therefore emphasizes the fact that it is useful in proving divergence, but not convergence.) Corollary 4.1.10 (𝑛th Term Test for Divergence). If ∑ 𝑎𝑛 converges, then lim 𝑎𝑛 = 0. 𝑛→∞

Equivalently, if lim 𝑎𝑛 either does not exist or has a nonzero value, then ∑ 𝑎𝑛 diverges. 𝑛→∞

Essentially everything we need to know about infinite series then boils down to comparison with particular examples. For instance: ∞

Example 4.1.11 (Geometric series). For 𝑟 ∈ 𝐂, the series ∑ 𝑟𝑛 is known as a geometric 𝑛=0

series. We have that (Problem 4.1.2) ∑ 𝑟𝑛 converges if and only if |𝑟| < 1, in which case ∞

∑ 𝑟𝑛 = 𝑛=0

1 . 1−𝑟

(4.1.9)

Example 4.1.12 (Two-sided geometric series). Similarly, for 𝑟 ∈ 𝐂, we may think of the series ∑ 𝑟|𝑛| as a two-sided geometric series. Applying Example 4.1.11 in both 𝑛∈𝐙

directions, we see that ∑ 𝑟|𝑛| also converges if and only if |𝑟| < 1. 𝑛∈𝐙

Example 4.1.12 is perhaps not obviously useful on its own but can be used to prove the following helpful fact: Theorem 4.1.13. For |𝑟| < 1, 𝑎 > 0, 𝑏 ∈ 𝐑, and 𝑘 > 0, we have that 2 +𝑏𝑛

∑ 𝑛𝑘 𝑟𝑎𝑛

(4.1.10)

𝑛∈𝐙

converges absolutely. Proof. Problem 4.1.3. By comparison with geometric series, we also see that: | |𝑎 Theorem 4.1.14 (Ratio Test). Suppose 𝑎𝑛 is a sequence such that 𝑎𝑛 ≠ 0 and lim | 𝑛+1 | 𝑛→∞ | 𝑎𝑛 | = 𝑟. Then: (1) If 𝑟 < 1, then ∑ 𝑎𝑛 converges absolutely. (2) If 𝑟 > 1, then ∑ 𝑎𝑛 diverges. Proof. Problem 4.1.4.

78

Chapter 4. Series of functions ∞

1 a 𝑝-series. We 𝑛𝑝 𝑛=1

Example 4.1.15 (𝑝-series). For a real number 𝑝 > 0, we call ∑ ∞

1 converges if 𝑝 > 1 and diverges 𝑝 𝑛 𝑛=1

recall from Analysis I (or from calculus!) that ∑ if 0 < 𝑝 ≤ 1. Since −∞

1 1 1 1 𝑝 = ∑ 𝑝 + ∑ 𝑝 = 2 ∑ 𝑝, 𝑛 𝑛≠0 |𝑛| 𝑛=−1 |𝑛| 𝑛=1 |𝑛| 𝑛=1 ∑

(4.1.11)

1 𝑝. 𝑛≠0 |𝑛|

we see that the same result holds for ∑

Example 4.1.16 (Alternating 𝑝-series). To remind the reader that convergence does ∞ (−1)𝑛+1 not imply absolute convergence, recall that the alternating series ∑ converges 𝑛𝑝 𝑛=1 ∞

1 diverges for 𝑝 ≤ 1. (See Ross [Ros13, Thm. 15.3] for 𝑝 𝑛 𝑛=1

for all 𝑝 > 0, even though ∑ details.)

Remark 4.1.17 (Root test). While we will not need the root test, we briefly sketch the idea here for completeness. The tricky part of stating the root test is that it requires the following somewhat technical definition. Let 𝑎𝑛 be a bounded sequence in 𝐑, and let 𝑠𝑘 = sup {𝑎𝑛 ∣ 𝑛 ≥ 𝑘} be the sup of the 𝑘-tail of the sequence. We define the lim sup of the sequence 𝑎𝑛 to be lim sup 𝑎𝑛 = lim 𝑠𝑘 = lim sup {𝑎𝑛 ∣ 𝑛 ≥ 𝑘} . 𝑛→∞

𝑘→∞

𝑘→∞

(4.1.12)

It can be shown that 𝑠𝑘 is a monotone decreasing sequence that is bounded below (Problem 4.1.5), so lim 𝑠𝑘 exists. 𝑘→∞

𝑛

Having defined lim sup 𝑎𝑛 , let 𝑟 = lim sup √|𝑎𝑛 |. The root test has much the same 𝑛→∞

𝑛→∞

kind of near-dichotomy as the ratio test, in that if 𝑟 < 1, then ∑ 𝑎𝑛 converges absolutely, and if 𝑟 > 1, then ∑ 𝑎𝑛 diverges; in fact, if the ratio test applies, the two “𝑟” values can be shown to be equal (see Ross [Ros13, Thms. 14.8–14.9]). The difference is that, if we include the possibility 𝑟 = ∞, the quantity 𝑟 is guaranteed to exist, a fact that comes in handy when considering the radius of convergence of a power series. See Remark 4.4.4 for more on this point.

Problems. 4.1.1. Let 𝑎𝑛 , 𝑏𝑛 be sequences in 𝐂 such that ∑ 𝑎𝑛 converges absolutely and 𝑏𝑛 is bounded. Prove that ∑ 𝑎𝑛 𝑏𝑛 converges. 4.1.2. This problem generalizes the standard real results on geometric series to complex numbers. (Note that the reader should avoid using Theorem 3.6.12 here, because we actually need geometric series to prove the results on power series needed to prove Lemma 3.6.10.) (a) Fix 𝑥 ≥ 0. Prove by induction that for 𝑛 ∈ 𝐍, (1 + 𝑥)𝑛 ≥ 1 + 𝑛𝑥.

(4.1.13)

4.1. Inﬁnite series

79

(b) Prove that for 𝑏 ≥ 0, ⎧0 lim 𝑏𝑛 = 1 𝑛→∞ ⎨ ⎩diverges

if 𝑏 < 1, if 𝑏 = 1,

(4.1.14)

if 𝑏 > 1.

(c) For 𝑟 ∈ 𝐂, 𝑟 ≠ 1, use induction on 𝑁 to prove that 𝑁

∑ 𝑟𝑛 = 1 + 𝑟 + ⋯ + 𝑟𝑁 = 𝑛=0

1 − 𝑟𝑁+1 . 1−𝑟

(4.1.15)

(d) For 𝑟 ∈ 𝐂, |𝑟| < 1, prove that ∑ 𝑟𝑛 converges. 𝑛=0 ∞

(e) For 𝑟 ∈ 𝐂, |𝑟| ≥ 1, prove that ∑ 𝑟𝑛 diverges. 𝑛=0

4.1.3. (Proves Theorem 4.1.13) Suppose 𝑟 ∈ 𝐂 and 𝑎, 𝑏, 𝑘 ∈ 𝐑 such that |𝑟| < 1 and 𝑎, 𝑘 > 0. (a) Prove that there exists some 𝑁 such that 𝑎𝑛2 + 𝑏𝑛 > |𝑛| for all 𝑛 ∈ 𝐙 such that |𝑛| > 𝑁. (b) Prove that the series 2 ∑ 𝑟𝑎𝑛 +𝑏𝑛 (4.1.16) 𝑛∈𝐙

converges absolutely. (c) Prove that the series 2 +𝑏𝑛

∑ 𝑛𝑘 𝑟𝑎𝑛

(4.1.17)

𝑛∈𝐙

converges absolutely. | |𝑎 4.1.4. (Proves Theorem 4.1.14) Suppose 𝑎𝑛 is a sequence such that 𝑎𝑛 ≠ 0 and lim | 𝑛+1 | 𝑛→∞ | 𝑎𝑛 | = 𝑟0 . 𝑟 +1 (a) Suppose 𝑟0 < 1, and let 𝑟 = 0 . Prove there exists some 𝑁 ∈ 𝐙 such that 2 | 𝑎𝑛+1 | | < 𝑟 < 1. Also, prove by induction that for 𝑛 ≥ 𝑁, if 𝑛 ≥ 𝑁, then | | 𝑎𝑛 | 𝑛−𝑁 |𝑎𝑛 | ≤ |𝑎𝑁 | 𝑟 . (b) Prove that if 𝑟0 < 1, then ∑ 𝑎𝑛 converges absolutely. 1 + 𝑟0 (c) Suppose 𝑟0 > 1, and let 𝑟 = . Prove there exists some 𝑁 ∈ 𝐙 such that 2 | 𝑎𝑛+1 | | > 𝑟 > 1. Also, prove by induction that for 𝑛 ≥ 𝑁, if 𝑛 ≥ 𝑁, then | | 𝑎𝑛 | |𝑎𝑛 | ≥ |𝑎𝑁 |. (d) Prove that if 𝑟0 > 1, then ∑ 𝑎𝑛 diverges. 4.1.5. Let 𝑎𝑛 be a sequence in 𝐑 such that −𝑀 ≤ 𝑎𝑛 ≤ 𝑀 for all 𝑛, let 𝑇𝑘 = {𝑎𝑛 ∣ 𝑛 ≥ 𝑘} be the 𝑘-tail of the sequence 𝑎𝑛 , and let 𝑠𝑘 = sup 𝑇𝑘 . (a) Prove that −𝑀 ≤ 𝑠𝑘 for all 𝑘. (b) Prove that 𝑠𝑘+1 ≤ 𝑠𝑘 .

80

Chapter 4. Series of functions

4.2 Sequences and series of functions We begin with a definition that, to paraphrase the second epigraph from the beginning of the chapter, may not mean what you think it means. Definition 4.2.1. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, and let 𝑓 ∶ 𝑋 → 𝐂 be a function. To say that the sequence 𝑓𝑛 converges pointwise to 𝑓 means that for any fixed 𝑧 ∈ 𝑋, lim 𝑓𝑛 (𝑧) = 𝑓(𝑧). Pointwise convergence 𝑛→∞

then defines series of functions in the same way that convergence of sequences is used ∞

to define ordinary series: namely, to say that ∑ 𝑔𝑛 (𝑧) converges pointwise on 𝑋 means 𝑛=0 ∞

that for each fixed 𝑧 ∈ 𝑋, ∑ 𝑔𝑛 (𝑧) is a convergent series, or in other words, the partial 𝑛=0 𝑁

sums 𝑓𝑁 (𝑧) = ∑ 𝑔𝑛 (𝑧) converge pointwise to some 𝑓 ∶ 𝑋 → 𝐂. Two-sided series of 𝑛=0

functions are defined analogously. Note that in fact, every sequence of functions is the sequence of partial sums of some series (Problem 4.2.2), so for theoretical purposes, studying the convergence of series of functions is equivalent to studying the convergence of sequences of functions. Now, in the notation of Definition 2.4.3, the tricky part of Definition 4.2.1 is that the expression 𝑁(𝜖) that governs the “rate of convergence” of lim 𝑓𝑛 (𝑧) = 𝑓(𝑧) can 𝑛→∞

vary with 𝑧. This turns out to mean that pointwise convergence does not work as well as the reader might anticipate. (To be fair, the first epigraph of the chapter shows that the very first published textbook in analysis, by one of the great mathematicians of the 19th century, made the same mistake.) To be precise, if 𝑓𝑛 converges pointwise to 𝑓 on a domain 𝑋, the answer to each of the following six questions is NO: QB: If the 𝑓𝑛 are all bounded on 𝑋, must 𝑓 be bounded on 𝑋? QC: If the 𝑓𝑛 are all continuous on 𝑋, must 𝑓 be continuous on 𝑋? QD1: If the 𝑓𝑛 are all differentiable on 𝑋, must 𝑓 be differentiable on 𝑋? QD2: If the 𝑓𝑛 and 𝑓 are all differentiable on 𝑋, must it be the case that 𝑓𝑛′ converges pointwise to 𝑓′ on 𝑋? QI1: If the 𝑓𝑛 are all integrable on 𝑋, must 𝑓 be integrable on 𝑋? 𝑏

QI2: If the 𝑓𝑛 and 𝑓 are all integrable on [𝑎, 𝑏], must it be the case that lim ∫ 𝑓𝑛 (𝑥) 𝑑𝑥 𝑛→∞

𝑏

= ∫ 𝑓(𝑥) 𝑑𝑥? 𝑎

Here are the six NO’s, in detail.

𝑎

4.2. Sequences and series of functions

81

Example 4.2.2 (NO: QB). Let 𝑋 = (−1, 1), and consider (Figure 4.2.1) 𝑁

𝑓𝑛 (𝑥) = ∑ 𝑥𝑛 = 𝑛=0

1 − 𝑥𝑁+1 , 1−𝑥

𝑓(𝑥) =

1 . 1−𝑥

(4.2.1)

Then by geometric series (Example 4.1.11), lim 𝑓𝑛 (𝑥) = 𝑓(𝑥) for all 𝑥 ∈ (−1, 1). How𝑛→∞

ever, each 𝑓𝑛 (𝑥) is continuous on [−1, 1] and therefore bounded on (−1, 1); and 𝑓(𝑥) is not bounded on (−1, 1).

Figure 4.2.1. Bounded functions with an unbounded limit Example 4.2.3 (NO: QC, QD1). Let 𝑋 = [0, 1], and consider (Figure 4.2.2) 𝑓𝑛 (𝑥) = 𝑥𝑛 ,

0 if 0 ≤ 𝑥 < 1, 𝑓(𝑥) = { 1 if 𝑥 = 1.

(4.2.2)

Then for 𝑥 < 1, lim 𝑥𝑛 = 0, and for 𝑥 = 1, 1𝑛 = 1, so lim 𝑓𝑛 (𝑥) = 𝑓(𝑥). However, 𝑛→∞

𝑛→∞

𝑓𝑛 (𝑥) is differentiable on [0, 1] and 𝑓(𝑥) is neither continuous nor differentiable. Recall that in Section 1.1, we discussed (without proof) that the series in (1.1.2) 1 1 converges for − < 𝑥 < , but term-by-term differentiation fails, providing an inter2 2 esting but somewhat complicated “NO” answer to QD2. The following example is a much more straightforward “NO”. Example 4.2.4 (NO: QD2). Let 𝑋 = [0, 1], and consider (Figure 4.2.3) 𝑓𝑛 (𝑥) =

𝑥𝑛+1 , 𝑛+1

(4.2.3)

𝑓(𝑥) = 0.

1 for 𝑥 ∈ 𝑋, 𝑓𝑛 converges to 𝑓 on 𝑋, and in fact, the rate 𝑛+1 of convergence is indepenent of 𝑥. However, 𝑓𝑛′ (𝑥) = 𝑥𝑛 , so lim 𝑓𝑛′ (1) = 1 ≠ 0 = 𝑓′ (1).

Because |𝑓𝑛 (𝑥) − 𝑓(𝑥)| ≤

𝑛→∞

82

Chapter 4. Series of functions

Figure 4.2.2. Differentiable functions with a discontinuous limit

Figure 4.2.3. The limit of the derivatives is not the derivative of the limit. Example 4.2.5 (NO: QI1). Let 𝑋 = [0, 1], and consider 𝑓𝑛 (𝑥) = {

1 if 𝑥 =

𝑝 𝑞

in lowest terms and 𝑞 ≤ 𝑛,

0 otherwise,

1 𝑓(𝑥) = { 0

(4.2.4)

if 𝑥 is rational, if 𝑥 is irrational.

We see that for any 𝑛, since 𝑓𝑛 is 0 except at finitely many points, 𝑓𝑛 is continuous except at finitely many points and therefore integrable (Corollary 3.4.6). However, 𝑓 is not integrable (Problem 3.3.5).

4.2. Sequences and series of functions

83

Example 4.2.6 (NO: QI2). (Witch’s hats) Let 𝑋 = [0, 1], and consider 1 2𝑛+2 𝑥 if 0 ≤ 𝑥 < 𝑛+1 , ⎧2 2 ⎪ 𝑓𝑛 (𝑥) = 22𝑛+2 ( 1 − 𝑥) if 1 ≤ 𝑥 < 1 , ⎨ 2𝑛 2𝑛 2𝑛+1 ⎪ 0 otherwise, ⎩ 𝑓(𝑥) = 0.

(4.2.5)

(4.2.6)

If the above formulas seem somewhat opaque, consider the graphs in Figure 4.2.4. The 1 point is that the nonzero part of the graph of each 𝑓𝑛 (𝑥) is a triangle of area 1 (base 𝑛 , 2 1

height 2

𝑛+1

), which means that ∫ 𝑓𝑛 (𝑥) 𝑑𝑥 = 1 for all 𝑛 ∈ 𝐍. However, 𝑓𝑛 (0) = 0, 0

and for any 𝑥 ∈ (0, 1], if we choose 𝑁 such that follows that lim 𝑓𝑛 (𝑥) = 0.

1 < 𝑥, then for 𝑛 > 𝑁, 𝑓𝑛 (𝑥) = 0. It 2𝑁

𝑛→∞

Figure 4.2.4. The witch’s hat sequence For a variation on Example 4.2.6 that has bounded outputs, but not a bounded domain, see Problem 4.2.1. As we shall see, for a function with bounded domain and bounded outputs, the answer to QI2 becomes YES; see Section 7.5.

Problems. 4.2.1. In this problem, for 𝑔 continuous on [0, +∞) except at finitely many points, we define ∞

𝑏

∫ 𝑔(𝑥) 𝑑𝑥 = lim ∫ 𝑔(𝑥) 𝑑𝑥. 0

𝑏→∞

(4.2.7)

0

(See Definition 4.8.2 for more on improper integrals.) Let 𝑋 = [0, +∞), and consider 1 if 𝑛 ≤ 𝑥 ≤ 𝑛 + 1, 𝑓𝑛 (𝑥) = { 0 otherwise,

𝑓(𝑥) = 0.

(4.2.8)

84

Chapter 4. Series of functions ∞

Prove that 𝑓𝑛 converges pointwise to 𝑓, but ∫ 𝑓𝑛 (𝑥) 𝑑𝑥 does not converge to 0

∫ 𝑓(𝑥) 𝑑𝑥. 0

4.2.2. Let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions. Prove that there exists a series ∞

∑ 𝑔𝑛 (𝑧) whose sequence of partial sums is precisely 𝑓𝑛 . 𝑛=0

4.3 Uniform convergence As we just saw, pointwise convergence is not sufficent to provide the properties of a sequence of functions that we need to do calculus. However, the following notion is one that often suffices. Definition 4.3.1. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, and let 𝑓 ∶ 𝑋 → 𝐂 be a function. To say that the sequence 𝑓𝑛 converges uniformly to 𝑓 on 𝑋 means that for any 𝜖 > 0, there exists some 𝑁(𝜖) independent of 𝑧 ∈ 𝑋 such that for any 𝑧 ∈ 𝑋 and 𝑛 ∈ 𝐙 such that 𝑛 > 𝑁(𝜖), we have |𝑓(𝑧) − 𝑓𝑛 (𝑧)| < 𝜖. ∞

Also, as usual, uniform convergence of a series ∑ 𝑔𝑛 (𝑧) is defined in terms of the 𝑛=0 𝑁

uniform convergence of its sequence of partial sums 𝑓𝑁 (𝑧) = ∑ 𝑔𝑛 (𝑧), and similarly 𝑛=0

for two-sided series. For comparison, to say that 𝑓𝑛 converges pointwise to 𝑓 on 𝑋 means that for any 𝑧 ∈ 𝑋 and any 𝜖 > 0, there exists some 𝑁(𝜖, 𝑧) such that for any 𝑧 ∈ 𝑋 and 𝑛 ∈ 𝐙 such that 𝑛 > 𝑁(𝜖, 𝑧), we have |𝑓(𝑧) − 𝑓𝑛 (𝑧)| < 𝜖. In other words, the difference between pointwise and uniform continuity is that in uniform continuity, there is some worst-case “rate of convergence” 𝑁(𝜖), independent of 𝑧, that holds for all 𝑧 ∈ 𝑋 simultaneously. More conceptually, to prove that 𝑓𝑛 converges uniformly to 𝑓, it suffices to find an argument proving that lim 𝑓𝑛 (𝑧) = 𝑓(𝑧) that does not rely on (directly or indi𝑛→∞

rectly) the specific value of 𝑧. To start our discussion of uniform convergence, we first note that the usual algebraic rules apply to uniform convergence. Theorem 4.3.2. For a nonempty 𝑋 ⊆ 𝐂, let 𝑓𝑛 , 𝑔𝑛 ∶ 𝑋 → 𝐂 be sequences of functions, let 𝑓, 𝑔 ∶ 𝑋 → 𝐂 be functions, and suppose that 𝑓𝑛 and 𝑔𝑛 converge uniformly to 𝑓 and 𝑔, respectively. Then 𝑓𝑛 + 𝑔𝑛 converges uniformly to 𝑓 + 𝑔, and for 𝑐 ∈ 𝐂, 𝑐𝑓𝑛 converges uniformly to 𝑐𝑓. Proof. Problem 4.3.1. Also as usual, we can show that uniform convergence can be broken down to real and imaginary parts. Lemma 4.3.3. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, and let 𝑓 ∶ 𝑋 → 𝐂 be a function. Let 𝑓𝑛 (𝑧) = 𝑢𝑛 (𝑧)+𝑖𝑣𝑛 (𝑧) and 𝑓(𝑧) = 𝑢(𝑧)+𝑖𝑣(𝑧)

4.3. Uniform convergence

85

be the respective real-imaginary decompositions. Then 𝑓𝑛 converges uniformly to 𝑓 if and only if 𝑢𝑛 converges uniformly to 𝑢 and 𝑣𝑛 converges uniformly to 𝑣. Proof. Problem 4.3.2. The completeness of 𝐂 gives a necessary and sufficient condition for uniform convergence for which we need the following analogue to Definition 4.3.1. Definition 4.3.4. Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions. To say that the sequence 𝑓𝑛 is uniformly Cauchy means that for any 𝜖 > 0, there exists some 𝑁(𝜖) independent of 𝑧 ∈ 𝑋 such that for any 𝑧 ∈ 𝑋 and 𝑛, 𝑘 ∈ 𝐙 such that 𝑛, 𝑘 > 𝑁(𝜖), we have |𝑓𝑛 (𝑧) − 𝑓𝑘 (𝑧)| < 𝜖. Theorem 4.3.5. Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions. Then 𝑓𝑛 converges uniformly to some 𝑓 ∶ 𝑋 → 𝐂 if and only if 𝑓𝑛 is uniformly Cauchy. Proof. On the one hand, if 𝑓𝑛 converges uniformly to some 𝑓 ∶ 𝑋 → 𝐂, then applying the proof of Theorem 2.5.2 independently of 𝑧 ∈ 𝑋, we see that 𝑓𝑛 must be uniformly Cauchy. For the converse, see Problem 4.3.3. The following necessary and sufficient condition is also sometimes useful in proving uniform or nonuniform convergence. Lemma 4.3.6. Let 𝑋 be a nonempty subset of 𝐂, let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, let 𝑓 ∶ 𝑋 → 𝐂 be a function, and let 𝑑𝑛 = sup {|𝑓𝑛 (𝑧) − 𝑓(𝑧)| ∣ 𝑧 ∈ 𝑋} .

(4.3.1)

Then 𝑓𝑛 converges uniformly to 𝑓 if and only if lim 𝑑𝑛 = 0. 𝑛→∞

As we shall see, Lemma 4.3.6 is really a rewriting of the definition of uniform continuity. The advantage in particular examples is that when 𝑓𝑛 and 𝑓 are differentiable on a closed and bounded 𝑋, then 𝑑𝑛 becomes a maximum that we can compute using calculus; see Problem 4.3.7 for an example. See also Sections 5.2 and 7.2 for an interpretation of 𝑑𝑛 as the 𝐿∞ metric on a space of functions. Proof. On the one hand, if lim 𝑑𝑛 = 0, since |𝑓𝑛 (𝑧) − 𝑓(𝑧)| ≤ 𝑑𝑛 for all 𝑧 ∈ 𝑋, 𝑓𝑛 (𝑧) 𝑛→∞

converges to 𝑓(𝑧) with rate of convergence independent of 𝑧. Conversely, suppose that for any 𝜖 > 0, there exists some 𝑁(𝜖) such that for any 𝑧 ∈ 𝑋 and 𝑛 ∈ 𝐙 such that 𝑛 > 𝑁(𝜖), we have |𝑓(𝑧) − 𝑓𝑛 (𝑧)| < 𝜖. Then if 𝑛 > 𝑁(𝜖/2), we have that 𝜖/2 is an upper bound for the set 𝐷𝑛 = {|𝑓𝑛 (𝑧) − 𝑓(𝑧)| ∣ 𝑧 ∈ 𝑋}, so because 𝑑𝑛 is the least upper bound of 𝐷𝑛 , 𝑑𝑛 ≤ 𝜖/2 < 𝜖. The lemma follows. The criterion we will use most often to prove uniform convergence is the Weierstrass 𝑀-test for uniform convergence of series. As a bonus, we will see that the 𝑀-test also yields absolute convergence, which will prove useful later. Theorem 4.3.7 (Weierstrass 𝑀-test). Let 𝑋 be a nonempty subset of 𝐂, let 𝑔𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, and suppose that 𝑀𝑛 is a sequence of nonnegative real numbers such that ∑ 𝑀𝑛 converges (absolutely) and |𝑔𝑛 (𝑧)| ≤ 𝑀𝑛

(4.3.2)

86

Chapter 4. Series of functions ∞

for all 𝑧 ∈ 𝑋. Then ∑ 𝑔𝑛 (𝑧) converges absolutely and uniformly to some 𝑓 ∶ 𝑋 → 𝐂. 𝑛=0

Note that a key feature of the 𝑀-test is that we do not need to know anything about the sum 𝑓(𝑧) beforehand to prove uniform convergence. Proof. For any 𝑘, 𝑚 ∈ 𝐙, 𝑘 < 𝑚, and any 𝑧 ∈ 𝑋, we have that 𝑚 𝑚 |𝑚 | |𝑚 | | ∑ 𝑔 (𝑧)| ≤ ∑ |𝑔 (𝑧)| ≤ ∑ 𝑀 = | ∑ 𝑀 | . 𝑛 𝑛| 𝑛 𝑛 | | | |𝑛=𝑘 | |𝑛=𝑘 | 𝑛=𝑘 𝑛=𝑘

(4.3.3)

By Corollary 4.1.2, we know that ∑ 𝑀𝑛 satisfies the Cauchy criterion for series, so (4.3.3) implies that ∑ 𝑔𝑛 (𝑧) satisfies the Cauchy criterion as well. Furthermore, the estimate (4.3.3) is independent of 𝑥, so ∑ 𝑔𝑛 (𝑧) is uniformly Cauchy and therefore, by Theorem 4.3.5, uniformly convergent. Absolute convergence also follows because (4.3.2) relies only on |𝑔𝑛 (𝑧)|, and the theorem follows. Applying Theorem 4.3.7 twice, we immediately get the following two-sided 𝑀-test: Corollary 4.3.8. Let 𝑋 be a nonempty subset of 𝐂, let 𝑔𝑛 ∶ 𝑋 → 𝐂 be a two-sided sequence of functions, and suppose that 𝑀𝑛 is a two-sided sequence of nonnegative real numbers such that ∑ 𝑀𝑛 converges (absolutely) and 𝑛∈𝐙

|𝑔𝑛 (𝑧)| ≤ 𝑀𝑛

(4.3.4)

for all 𝑧 ∈ 𝑋. Then ∑ 𝑔𝑛 (𝑧) converges absolutely and uniformly to some 𝑓 ∶ 𝑋 → 𝐂. 𝑛∈𝐙 1

Example 4.3.9. Let 𝑋 = 𝒩1/2 (0) be the closed ball of radius around 0, and let 𝑔𝑛 (𝑧) = 2 𝑧𝑛 . Since 1 |𝑔𝑛 (𝑧)| ≤ 𝑛 (4.3.5) 2 ∞ ∞ 1 for all 𝑧 ∈ 𝑋 and the geometric series ∑ 𝑛 converges, the 𝑀-test implies that ∑ 𝑔𝑛 (𝑧) 2 𝑛=0 𝑛=0 converges absolutely and uniformly on 𝑋. Remark 4.3.10. The reader should note that in general, absolute convergence and ∞

uniform convergence are independent; that is, it is possible that ∑ 𝑔𝑛 (𝑧) converges 𝑛=0

absolutely but not uniformly, or vice versa. It is therefore a notable and useful feature of the 𝑀-test that it implies both absolute and uniform convergence. In any case, given uniform convergence, we can ensure a “yes” answer to many of the questions of Section 4.2. For example: Theorem 4.3.11 (Uniform YES: QB). Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, each bounded on 𝑋, such that 𝑓𝑛 converges uniformly on 𝑋 to some 𝑓 ∶ 𝑋 → 𝐂. Then 𝑓 is bounded on 𝑋. Proof. Problem 4.3.4.

4.3. Uniform convergence

87

Theorem 4.3.12 (Uniform YES: QC). Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, each continuous on 𝑋, such that 𝑓𝑛 converges uniformly on 𝑋 to some 𝑓 ∶ 𝑋 → 𝐂. Then 𝑓 is continuous on 𝑋. Proof. Problem 4.3.5. Theorem 4.3.13 (Uniform YES: QI1). Let 𝑓𝑛 ∶ [𝑎, 𝑏] → 𝐂 be a sequence of functions, each integrable on [𝑎, 𝑏], such that 𝑓𝑛 converges uniformly on [𝑎, 𝑏] to some 𝑓 ∶ [𝑎, 𝑏] → 𝐂. Then 𝑓 is integrable on [𝑎, 𝑏]. Before reading this proof, the reader may wish to review Lemma 3.3.12 and the definitions of 𝜇(𝑓; 𝑃, 𝑖) and 𝐸(𝑓; 𝑃) given there.

Proof. Fix 𝜖 > 0. First, since 𝑓𝑛 converges uniformly to 𝑓, for any 𝜖0 , there exists 𝑁(𝜖0 ) such that if 𝑛 > 𝑁(𝜖0 ), then |𝑓𝑛 (𝑥) − 𝑓(𝑥)| < 𝜖0 for any 𝑥 ∈ [𝑎, 𝑏]. Therefore, 𝜖 we may choose some 𝑛 > 𝑁 ( ) such that for any 𝑥 ∈ [𝑎, 𝑏], |𝑓𝑛 (𝑥) − 𝑓(𝑥)| < 3(𝑏 − 𝑎) 𝜖 . Next, since 𝑓𝑛 is integrable, by Lemma 3.3.12, we may choose a partition 3(𝑏 − 𝑎) 𝑃 = {𝑥0 = 𝑎, … , 𝑥𝑚 = 𝑏} of [𝑎, 𝑏] such that 𝐸(𝑓𝑛 ; 𝑃) < 𝜖/3.

fn(x) f(x)

fn(y) f(y)

Figure 4.3.1. A three-piece path from 𝑓(𝑥) to 𝑓(𝑦)

The key observation is that for 𝑥, 𝑦 ∈ [𝑥𝑖−1 , 𝑥𝑖 ] (the 𝑖th subinterval of 𝑃), |𝑓(𝑥) − 𝑓(𝑦)| ≤ |𝑓(𝑥) − 𝑓𝑛 (𝑥)| + |𝑓𝑛 (𝑥) − 𝑓𝑛 (𝑦)| + |𝑓𝑛 (𝑦) − 𝑓(𝑦)| 𝜖 𝜖 + 𝜇(𝑓𝑛 ; 𝑃, 𝑖) + < 3(𝑏 − 𝑎) 3(𝑏 − 𝑎) 2𝜖 = 𝜇(𝑓𝑛 ; 𝑃, 𝑖) + , 3(𝑏 − 𝑎)

(4.3.6)

where the first inequality follows by the triangle inequality (see Figure 4.3.1), and the second by our choice of 𝑓𝑛 and the definition of 𝜇(𝑓𝑛 ; 𝑃, 𝑖). It follows that 𝜇(𝑓𝑛 ; 𝑃, 𝑖) + 2𝜖 is an upper bound for {|𝑓(𝑥) − 𝑓(𝑦)| ∣ 𝑥, 𝑦 ∈ [𝑥𝑖−1 , 𝑥𝑖 ]}, which means that 3(𝑏 − 𝑎)

𝜇(𝑓; 𝑃, 𝑖) ≤ 𝜇(𝑓𝑛 ; 𝑃, 𝑖) +

2𝜖 . 3(𝑏 − 𝑎)

(4.3.7)

88

Chapter 4. Series of functions Therefore, 𝑚

𝐸(𝑓; 𝑃) = ∑ 𝜇(𝑓; 𝑃, 𝑖)(Δ𝑥)𝑖 𝑖=1 𝑚

≤ ∑ (𝜇(𝑓𝑛 ; 𝑃, 𝑖) + 𝑖=1 𝑚

2𝜖 ) (Δ𝑥)𝑖 3(𝑏 − 𝑎) 𝑚

2𝜖 (Δ𝑥)𝑖 3(𝑏 − 𝑎) 𝑖=1

(4.3.8)

= ∑ 𝜇(𝑓𝑛 ; 𝑃, 𝑖)(Δ𝑥)𝑖 + ∑ 𝑖=1

𝑚

= 𝐸(𝑓𝑛 ; 𝑃, 𝑖) +
0. We may therefore fix 𝑟 > 0 and assume 𝑋 = 𝒩𝑟 (𝑎) for the rest of the proof.

Proof. Fix 𝑎 ∈ 𝑋. Since 𝑋 is open, to compute lim

𝑧→𝑎

z r a

Figure 4.3.3. A path inside the 𝑟-neighborhood of 𝑎 Next, for fixed 𝑧 ∈ 𝒩𝑟 (𝑎), observe that the function 𝑢𝑧 ∶ [0, 1] → 𝐂 given by 𝑢𝑧 (𝑡) = 𝑡𝑧 + (1 − 𝑡)𝑎

(4.3.12)

90

Chapter 4. Series of functions

is a function such that for all 𝑡 ∈ [0, 1], 𝑢′𝑥 (𝑡) = 𝑧 − 𝑎 and |𝑢𝑧 (𝑡) − 𝑎| ≤ |𝑧 − 𝑎| (Problem 4.3.8), with 𝑢𝑧 (0) = 𝑎 and 𝑢𝑧 (1) = 𝑧. It follows that the image of 𝑢𝑧 is contained entirely within 𝒩𝑟 (𝑎), as shown in Figure 4.3.3. We now come to the key idea, which is to compute 1

lim ∫ 𝑓𝑛′ (𝑢𝑧 (𝑡))𝑢′𝑧 (𝑡) 𝑑𝑡

𝑛→∞

(4.3.13)

0

in two different ways. If we first apply substitution (Theorem 3.5.5), we get 1

lim ∫ 𝑓𝑛′ (𝑢𝑧 (𝑡))𝑢′𝑧 (𝑡) 𝑑𝑡 = lim [𝑓𝑛 (𝑢𝑧 (1)) − 𝑓𝑛 (𝑢𝑧 (0))]

𝑛→∞

(4.3.14)

𝑛→∞

0

= lim (𝑓𝑛 (𝑧) − 𝑓𝑛 (𝑎)) = 𝑓(𝑧) − 𝑓(𝑎). 𝑛→∞

On the other hand, since the uniform convergence of 𝑓𝑛′ (𝑧) means that the rate of convergence 𝑁(𝜖) of lim 𝑓𝑛′ (𝑧) is independent of 𝑧, we see that the rate of convergence 𝑛→∞

of lim 𝑓𝑛′ (𝑢𝑧 (𝑡)) is independent of 𝑡, or in other words, 𝑓𝑛′ (𝑢𝑧 (𝑡)) converges uniformly 𝑛→∞

1

on [0, 1]. Therefore, by Theorem 4.3.14, we may exchange lim and ∫ in (4.3.13) and 𝑛→∞

0

obtain 1

1

1

∫ lim 𝑓𝑛′ (𝑢𝑧 (𝑡))𝑢′𝑧 (𝑡) 𝑑𝑡 = ∫ 𝑔(𝑢𝑧 (𝑡))(𝑧 − 𝑎) 𝑑𝑡 = (𝑧 − 𝑎) ∫ 𝑔(𝑢𝑧 (𝑡)) 𝑑𝑡. (4.3.15) 0

𝑛→∞

0

0

Equating (4.3.14) and (4.3.15), we see that for 𝑧 ≠ 𝑎, 1

𝑓(𝑧) − 𝑓(𝑎) − 𝑔(𝑎) = (∫ 𝑔(𝑢𝑧 (𝑡)) 𝑑𝑡) − 𝑔(𝑎) 𝑧−𝑎 0 1

1

= ∫ 𝑔(𝑢𝑧 (𝑡)) 𝑑𝑡 − ∫ 𝑔(𝑎) 𝑑𝑡 0

(4.3.16)

0 1

= ∫ (𝑔(𝑢𝑧 (𝑡)) − 𝑔(𝑎)) 𝑑𝑡. 0

𝑓(𝑧) − 𝑓(𝑎) = 𝑔(𝑎). First, by Theorem 4.3.12 and the 𝑧−𝑎 uniform convergence of the continuous functions 𝑓𝑛′ to 𝑔, we see that 𝑔 is continuous at 𝑎, and therefore, for any 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if |𝑧 − 𝑎| < 𝛿(𝜖), then |𝑔(𝑧) − 𝑔(𝑎)| < 𝜖. It follows from (4.3.16) and Theorem 3.4.8 that whenever |𝑧 − 𝑎| < 𝛿(𝜖/2) and 𝑧 ≠ 𝑎, we have It remains to prove that lim

𝑧→𝑎

1 | | 𝑓(𝑧) − 𝑓(𝑎) | || | | − 𝑔(𝑎) = |∫ (𝑔(𝑢𝑧 (𝑡)) − 𝑔(𝑎)) 𝑑𝑡|| | 𝑧−𝑎 | | 0 | 1

≤ ∫ |𝑔(𝑢𝑧 (𝑡)) − 𝑔(𝑎)| 𝑑𝑡 0 1

≤ ∫ (𝜖/2) 𝑑𝑡 = 𝜖/2 < 𝜖. 0

The theorem follows.

(4.3.17)

4.3. Uniform convergence

91

Remark 4.3.17. Note that the case of Theorem 4.3.16 that we will use most often is the case where ∑ 𝑔′𝑛 (𝑧) converges uniformly, each 𝑔′𝑛 (𝑧) is continuous, and ∑ 𝑔𝑛 (𝑧) converges, in which case we have 𝑑 (∑ 𝑔𝑛 (𝑧)) = ∑ 𝑔′𝑛 (𝑧). 𝑑𝑧

(4.3.18)

We call the interchange of infinite sum and derivative in (4.3.18) term-by-term differentiation, and we can paraphrase Theorem 4.3.16 as saying that uniform convergence of the derivative series plus convergence of the original series imply that term-by-term differentiation is valid. As an alternative to Theorem 4.3.16, we have the following real-domain version. Theorem 4.3.18. Let 𝐼 be an interval in 𝐑, and for fixed 𝑐 ∈ 𝐼 and 𝐿 ∈ 𝐂, let 𝑓𝑛 ∶ 𝐼 → 𝐂 be a sequence of differentiable functions such that lim 𝑓𝑛 (𝑐) = 𝐿. Suppose each 𝑓𝑛′ is 𝑛→∞

continuous and the sequence 𝑓𝑛′ converges uniformly to some 𝑔 ∶ 𝐼 → 𝐂. Then 𝑓 is differentiable on 𝐼 and 𝑓′ (𝑥) = 𝑔(𝑥). Proof. Problem 4.3.9. Remark 4.3.19. Note that Theorems 4.3.16 and 4.3.18 each have their virtues: Theorem 4.3.16 allows for complex domains but assumes the pointwise convergence of 𝑓𝑛 on the entire domain, whereas Theorem 4.3.18 only makes sense for real domains but also only assumes the convergence of 𝑓𝑛 at a single point 𝑐. We could try to combine the two theorems to allow complex domains while only assuming convergence of 𝑓𝑛 at a single point, but we would need to be able to define indefinite path integrals, a task whose difficulty we have discussed previously (Remark 3.5.7). Again, see Ahlfors [Ahl79] and Conway [Con78, Con96]. When faced with a particular sequence or series of functions, the reader should keep in mind that we have developed essentially three techniques for analyzing whether the convergence of 𝑓𝑛 (𝑥) to 𝑓(𝑥) is uniform. (1) By Theorems 4.3.11–4.3.14 and 4.3.16, if 𝑓 lacks certain properties held by 𝑓𝑛 (𝑥) (e.g., continuity), then convergence must be nonuniform. (Uniform convergence may not prevent problems with differentiability, however; see Example 4.3.15.) 𝑁

(2) If 𝑓𝑁 = ∑ 𝑔𝑛 (𝑥) is the sequence of partial sums of a series ∑ 𝑔𝑛 (𝑥), then we may 𝑛=0

𝑛=0

be able to apply the 𝑀-test (Theorem 4.3.7) by choosing a suitable 𝑀𝑛 . (3) We can apply hard work. The 𝑑𝑛 technique of Example 4.3.15 gives some idea of how to approach an example where the limit preserves properties like continuity and the 𝑀-test does not give uniform convergence, but in the end, each such example is handled by different ad hoc arguments. See Problems 4.3.10–4.3.16 for more practice in analyzing whether a given sequence or series of functions converges uniformly.

92

Chapter 4. Series of functions

Problems. 4.3.1. (Proves Theorem 4.3.2) Suppose 𝑋 ⊆ 𝐂 is nonempty, let 𝑓𝑛 , 𝑔𝑛 ∶ 𝑋 → 𝐂 be sequences of functions, let 𝑓, 𝑔 ∶ 𝑋 → 𝐂 be functions, and suppose that 𝑓𝑛 and 𝑔𝑛 converge uniformly to 𝑓 and 𝑔, respectively. (a) Prove that 𝑓𝑛 + 𝑔𝑛 converges uniformly to 𝑓 + 𝑔. (b) Prove that for 𝑐 ∈ 𝐂, 𝑐𝑓𝑛 converges uniformly to 𝑐𝑓. 4.3.2. (Proves Lemma 4.3.3) Let 𝑋 be a nonempty subset of 𝐂, let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, and let 𝑓 ∶ 𝑋 → 𝐂 be a function. Let 𝑓𝑛 (𝑧) = 𝑢𝑛 (𝑧) + 𝑖𝑣𝑛 (𝑧) and 𝑓(𝑧) = 𝑢(𝑧) + 𝑖𝑣(𝑧) be the respective real-imaginary decompositions. Prove that 𝑓𝑛 converges uniformly to 𝑓 if and only if 𝑢𝑛 converges uniformly to 𝑢 and 𝑣𝑛 converges uniformly to 𝑣. 4.3.3. (Proves Theorem 4.3.5) Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a uniformly Cauchy sequence of functions. (a) Prove that 𝑓𝑛 (𝑧) converges pointwise to some 𝑓 ∶ 𝑋 → 𝐂; in other words, prove that for any 𝑧 ∈ 𝑋, lim 𝑓𝑛 (𝑧) exists. 𝑛→∞

(b) Fix 𝑧 ∈ 𝑋, 𝑘 ∈ 𝐙, and 𝜖0 > 0. Prove that if |𝑓𝑛 (𝑧) − 𝑓𝑘 (𝑧)| < 𝜖0 for all 𝑛 > 𝑘, then |𝑓(𝑧) − 𝑓𝑘 (𝑧)| ≤ 𝜖0 . (c) Prove that 𝑓𝑛 converges uniformly to 𝑓. 4.3.4. (Proves Theorem 4.3.11) Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, each bounded on 𝑋, such that 𝑓𝑛 converges uniformly on 𝑋 to some 𝑓 ∶ 𝑋 → 𝐂. Prove that 𝑓 is bounded on 𝑋. 4.3.5. (Proves Theorem 4.3.12) Let 𝑋 be a nonempty subset of 𝐂 and let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence of functions, each continuous on 𝑋, such that 𝑓𝑛 converges uniformly on 𝑋 to some 𝑓 ∶ 𝑋 → 𝐂. (a) Suppose there exists some 𝑈 ⊂ 𝑋, some 𝑎 ∈ 𝑈, and some 𝛼, 𝛽 > 0 such that |𝑓(𝑧) − 𝑓𝑛 (𝑧)| < 𝛼 and |𝑓𝑛 (𝑧) − 𝑓𝑛 (𝑎)| < 𝛽 for all 𝑧 ∈ 𝑈. Find the best possible upper bound for |𝑓(𝑧) − 𝑓(𝑎)| that applies to all 𝑧 ∈ 𝑈. (b) Prove that for any 𝑎 ∈ 𝑋, 𝑓 is continuous at 𝑎. 4.3.6. (Proves Theorem 4.3.14) Let 𝑓𝑛 ∶ [𝑎, 𝑏] → 𝐂 be a sequence of functions, each integrable on [𝑎, 𝑏], such that 𝑓𝑛 converges uniformly on [𝑎, 𝑏] to some 𝑓 ∶ [𝑎, 𝑏] → 𝐂. Prove that 𝑏

𝑏

∫ 𝑓(𝑥) 𝑑𝑥 = lim ∫ 𝑓𝑛 (𝑥) 𝑑𝑥. 𝑎

𝑛→∞

(4.3.19)

𝑎

4.3.7. Let 𝑑𝑛 = max { 𝑥 − 𝑥1+(1/𝑛) || 𝑥 ∈ [0, 1]}. (a) Fix 𝑛, and use ordinary calculus to find an expression for 𝑑𝑛 . (b) Prove that lim 𝑑𝑛 = 0. 𝑛→∞

4.3.8. (Proves Theorem 4.3.16) Fix 𝑎, 𝑧 ∈ 𝐂 and let 𝑢𝑧 ∶ [0, 1] → 𝐂 be given by 𝑢𝑧 (𝑡) = 𝑡𝑥 + (1 − 𝑡)𝑎. Prove that for all 𝑡 ∈ [0, 1], 𝑢′𝑥 (𝑡) = 𝑧 − 𝑎 and |𝑢𝑧 (𝑡) − 𝑎| ≤ |𝑧 − 𝑎|.

4.3. Uniform convergence

93

4.3.9. (Proves Theorem 4.3.18) Let 𝐼 be an interval in 𝐑, and for fixed 𝑐 ∈ 𝐼 and 𝐿 ∈ 𝐂, let 𝑓𝑛 ∶ 𝐼 → 𝐂 be a sequence of differentiable functions such that lim 𝑓𝑛 (𝑐) = 𝐿. 𝑛→∞

Suppose each 𝑓𝑛′ is continuous and the sequence 𝑓𝑛′ converges uniformly to some 𝑔 ∶ 𝐼 → 𝐂. 𝑥

(a) Simplify ∫ 𝑓𝑛′ (𝑡) 𝑑𝑡, with proof. 𝑐

(b) Use Theorem 4.3.14 to prove that 𝑓𝑛 converges to some 𝑓 ∶ 𝐼 → 𝐂. (c) Prove that for all 𝑥 ∈ 𝐼, 𝑓′ (𝑥) = 𝑔(𝑥). 4.3.10. (*) Let 0 if 0 ≤ 𝑥 < 1, 𝑓(𝑥) = { 1 if 𝑥 = 1.

𝑓𝑛 (𝑥) = 𝑥𝑛 ,

(4.3.20)

It can be shown that 𝑓𝑛 (𝑥) converges pointwise to 𝑓(𝑥) on [0, 1]. 1

(a) Does 𝑓𝑛 (𝑥) converge uniformly to 𝑓(𝑥) on the closed interval [0, ]? Prove 2 your answer. (b) Same, but on the closed interval [0, 1]. (c) Same, but on the open interval (0, 1). 4.3.11. (*) Let 𝑥𝑛 , 𝑓(𝑥) = 0. 𝑛 It can be shown that 𝑓𝑛 (𝑥) converges pointwise to 𝑓(𝑥) on [0, 1]. 𝑓𝑛 (𝑥) =

(4.3.21)

1

(a) Does 𝑓𝑛 (𝑥) converge uniformly to 𝑓(𝑥) on the closed interval [0, ]? Prove 2 your answer. (b) Same, but on the closed interval [0, 1]. (c) Same, but on the open interval (0, 1). 4.3.12. (*) Let 𝑁

𝑓𝑁 (𝑥) = ∑ 𝑥𝑛 .

(4.3.22)

𝑛=0 ∞

It can be shown that 𝑓𝑁 (𝑥) converges pointwise (i.e., the series ∑ 𝑥𝑛 converges 𝑛=0

pointwise) to a function 𝑓(𝑥) on the half-open interval [0, 1). 1

(a) Does 𝑓𝑛 (𝑥) converge uniformly to 𝑓(𝑥) on the closed interval [0, ]? Prove 2 your answer. (b) Same, but on the open interval (0, 1). (c) Does lim 𝑓𝑁 (𝑥) converge at 𝑥 = 1? If so, is convergence uniform on [0, 1]? 𝑁→∞

4.3.13. (*) Let 𝑁

𝑓𝑁 (𝑥) = ∑ 𝑛𝑥𝑛 . 𝑛=1

(4.3.23)

94

Chapter 4. Series of functions ∞

It can be shown that 𝑓𝑁 (𝑥) converges pointwise (i.e., the series ∑ 𝑛𝑥𝑛 converges 𝑛=1

pointwise) to a function 𝑓(𝑥) on the half-open interval [0, 1). 1

(a) Does 𝑓𝑛 (𝑥) converge uniformly to 𝑓(𝑥) on the closed interval [0, ]? Prove 2 your answer. (b) Same, but on the open interval (0, 1). (c) Does lim 𝑓𝑁 (𝑥) converge at 𝑥 = 1? If so, is convergence uniform on [0, 1]? 𝑁→∞

4.3.14. (*) Let 𝑁

𝑥𝑛 . 𝑛 𝑛=1

𝑓𝑁 (𝑥) = ∑

(4.3.24) ∞

𝑥𝑛 converges 𝑛 𝑛=1

It can be shown that 𝑓𝑁 (𝑥) converges pointwise (i.e., the series ∑ pointwise) to a function 𝑓(𝑥) on the half-open interval [0, 1).

1

(a) Does 𝑓𝑛 (𝑥) converge uniformly to 𝑓(𝑥) on the closed interval [0, ]? Prove 2 your answer. (b) Same, but on the open interval (0, 1). (c) Does lim 𝑓𝑁 (𝑥) converge at 𝑥 = 1? If so, is convergence uniform on [0, 1]? 𝑁→∞

4.3.15. (*) Let 𝑁

𝑥𝑛 . 𝑛2 𝑛=1

𝑓𝑁 (𝑥) = ∑

(4.3.25) ∞

𝑥𝑛 converges 𝑛2 𝑛=1

It can be shown that 𝑓𝑁 (𝑥) converges pointwise (i.e., the series ∑ pointwise) to a function 𝑓(𝑥) on the half-open interval [0, 1).

1

(a) Does 𝑓𝑛 (𝑥) converge uniformly to 𝑓(𝑥) on the closed interval [0, ]? Prove 2 your answer. (b) Same, but on the open interval (0, 1). (c) Does lim 𝑓𝑁 (𝑥) converge at 𝑥 = 1? If so, is convergence uniform on [0, 1]? 𝑁→∞

4.3.16. (*) Let 𝑁

𝑥2𝑛 . (2𝑛)! 𝑛=0

𝑓𝑁 (𝑥) = ∑

(4.3.26) ∞

𝑥2𝑛 converges (2𝑛)! 𝑛=1

It can be shown that 𝑓𝑁 (𝑥) converges pointwise (i.e., the series ∑ pointwise) to a function 𝑓(𝑥) on all of 𝐑.

(a) For which 𝑅 > 0 does 𝑓𝑁 (𝑥) converge uniformly on [−𝑅, 𝑅]? Prove your answer. (b) Does 𝑓𝑁 (𝑥) converge uniformly on all of 𝐑? Prove your answer.

4.4. Power series

95

4.4 Power series We now apply what we have developed in this chapter so far to the following important special case. ∞

Definition 4.4.1. A power series is a (complex-valued) series of the form 𝑓(𝑧) = ∑ 𝑎𝑛 𝑧𝑛 , 𝑛=0

where the 𝑎𝑛 ∈ 𝐂 are the coefficients of the power series, and we interpret 𝑧0 as the constant function 1. The reader may recall that a power series is governed by its radius of convergence. For our purposes, we will only require the following “ratio test” version of the radius of convergence. (See Remark 4.4.4 for a description of the full version.) ∞

|𝑎 | Theorem 4.4.2. Let 𝑓(𝑧) = ∑ 𝑎𝑛 𝑧𝑛 be a power series such that 𝜌 = lim | 𝑛+1 | exists, 𝑛→∞ | 𝑎𝑛 | 𝑛=0 1 and let 𝑅 = , where we define 𝑅 = ∞ when 𝜌 = 0. Then: 𝜌 (1) For any 𝑅0 such that 0 ≤ 𝑅0 < 𝑅, the power series 𝑓(𝑧) converges uniformly on the closed disc 𝒩𝑅0 (0). (2) It follows that 𝑓(𝑧) converges pointwise (but not necessarily uniformly) on the open disc 𝒩𝑅 (0). |𝑏 | (3) Let 𝑏𝑛 = 𝑛𝑎𝑛 . Then lim | 𝑛+1 | = 𝜌 as well. 𝑛→∞ | 𝑏𝑛 | (4) It follows that 𝑓(𝑧) is differentiable on 𝒩𝑅 (0) and that ∞

𝑓′ (𝑧) = ∑ 𝑛𝑎𝑛 𝑧𝑛−1 = ∑ (𝑘 + 1)𝑎𝑘+1 𝑧𝑘 𝑛=1

(4.4.1)

𝑘=0

for any 𝑧 ∈ 𝒩𝑅 (0). The reader may find it instructive to revisit Problems 4.3.12–4.3.16 in light of the above results. Proof. Claim (1) is proven in Problem 4.4.1. Therefore, for |𝑧| < 𝑅, if we let 𝑅0 = (|𝑧| + 𝑅)/2, 𝑓(𝑧) converges uniformly on 𝒩𝑅0 (0), and in particular, pointwise at 𝑧. Claim (3) is proven in Problem 4.4.2. If we let 𝑐𝑛 = (𝑛+1)𝑎𝑛+1 , it then follows from ∞

Claims (1)–(3) that ∑ 𝑐𝑛 𝑧𝑛 converges pointwise on 𝒩𝑅 (0) and uniformly on 𝒩𝑅0 (0) 𝑛=0

for any 𝑅0 such that 0 ≤ 𝑅0 < 𝑅. Therefore, for any fixed 𝑧 such that |𝑧| < 𝑅, taking 𝑅0 = (|𝑧| + 𝑅)/2, we may apply term-by-term differentiation (Theorem 4.3.16) on the open set 𝒩𝑅0 (0) ⊆ 𝒩𝑅0 (0) to obtain (4.4.1). The theorem follows. Definition 4.4.3. The quantity 𝑅 in Theorem 4.4.2 is called the radius of convergence of 𝑓(𝑧).

96

Chapter 4. Series of functions

Remark 4.4.4. As mentioned in Remark 4.1.17, the root test gives an unconditional version of Theorem 4.4.2; in particular, every power series has a radius of convergence. ∞

More precisely, for the power series 𝑓(𝑧) = ∑ 𝑎𝑛 𝑧𝑛 , let 𝜌 = lim sup 𝑛√|𝑎𝑛 |, which 𝑛=0

𝑛→∞

always exists if we allow the possibility of 𝜌 = ∞ (Remark 4.1.17). Then the rest of Theorem 4.4.2 still holds and is proven in the same way, substituting the root test for the ratio test; we leave the details to the interested reader (or see Ross [Ros13, Thm. 23.1]). Remark 4.4.5. The reader may also wonder why we are making a distinction between closed and open discs in the statement of Theorem 4.4.2, when we could just make all of the discs open without really changing the theorem. The reason is that the phenomenon of converging uniformly on compact sets (see Definition 2.6.6 and Corollary 2.6.7) is useful elsewhere in analysis; see, for example, the study of families of holomorphic functions in complex analysis (Ahlfors [Ahl79, Ch. 5], Conway [Con78, Ch. VII]).

Problems. ∞

4.4.1. (Proves Theorem 4.4.2) Let 𝑓(𝑧) = ∑ 𝑎𝑛 𝑧𝑛 be a power series such that 𝜌 = 𝑛=0

| |𝑎 lim | 𝑛+1 | ∈ 𝐑 (i.e., 𝜌 < ∞). 𝑛→∞ | 𝑎𝑛 |

(a) Suppose 𝜌 > 0, let 𝑅 = 1/𝜌, and choose 𝑅0 ∈ 𝐑 such that 0 ≤ 𝑅0 < 𝑅. Prove that 𝑓(𝑧) converges absolutely and uniformly on the closed disc 𝒩𝑅0 (0) = {𝑧 ∈ 𝐂 ∣ |𝑧| ≤ 𝑅0 }. (b) Suppose 𝜌 = 0, and choose any real 𝑅0 > 0. Prove that 𝑓(𝑧) converges absolutely and uniformly on the closed disc 𝒩𝑅0 (0). | |𝑎 4.4.2. (Proves Theorem 4.4.2) Prove that if 𝑎𝑛 is a sequence in 𝐂 such that 𝜌 = lim | 𝑛+1 | 𝑛→∞ | 𝑎𝑛 | |𝑏 | exists, and 𝑏𝑛 = 𝑛𝑎𝑛 , then lim | 𝑛+1 | = 𝜌 as well. 𝑛→∞ | 𝑏𝑛 |

4.5 Exponential and trigonometric functions As a benchmark of the progress we have made in understanding series of functions, we now define the complex exponential function and derive its usual properties, including the famous Euler formula 𝑒𝑖𝑥 = cos 𝑥 + 𝑖 sin 𝑥. We begin with a definition. Definition 4.5.1. For 𝑧 ∈ 𝐂, we define 𝐸(𝑧) to be the power series ∞

1 𝑧𝑛 = ∑ ( ) 𝑧𝑛 . 𝑛! 𝑛! 𝑛=0 𝑛=0

𝐸(𝑧) = ∑

(4.5.1)

In other words, instead of thinking of (4.5.1) as a formula derived from the study of Taylor series, we use (4.5.1) to define the exponential function; then, once the standard properties of 𝐸(𝑧) are fully established, we will write 𝐸(𝑧) as 𝑒𝑧 . However, note that unlike the exponential function from ordinary calculus, we allow arbitrary complex

4.5. Exponential and trigonometric functions

97

exponents in 𝑒𝑧 , as the exponential functions we will use most often are the functions that we will later call 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 . We now turn to the basic properties of 𝐸(𝑧); all of the most interesting proofs are left as problems for the reader. We first apply our results on power series from Section 4.4. Theorem 4.5.2. The power series 𝐸(𝑧) has radius of convergence 𝑅 = ∞. Furthermore, 𝐸(0) = 1, 𝐸(𝑧) = 𝐸(𝑧), and for all 𝑧 ∈ 𝐂, 𝐸 ′ (𝑧) = 𝐸(𝑧). Proof. Problem 4.5.1. We may therefore think of 𝐸(𝑧) as a function 𝐸 ∶ 𝐂 → 𝐂 that is infinitely differentiable, or smooth, meaning that the 𝑛th derivative 𝐸 (𝑛) (𝑧) exists for every natural number 𝑛 (and is equal to 𝐸(𝑧)). Note also that by the (complex-valued) chain rule and Problem 3.2.1, for any 𝛼 ∈ 𝐂, we have the innocent-looking but important formula 𝑑 (𝐸(𝛼𝑧)) = 𝛼𝐸(𝛼𝑧). (4.5.2) 𝑑𝑥 Next, we show that 𝐸(𝑧) only takes on nonzero values. Theorem 4.5.3. For any 𝑧 ∈ 𝐂, 𝐸(𝑧) ≠ 0. Proof. Problem 4.5.2 shows that for any 𝑧 ∈ 𝐂, 𝐸(𝑧)𝐸(−𝑧) = 1, and the theorem follows immediately. Theorem 4.5.4. For 𝑧, 𝑤 ∈ 𝐂, we have that 𝐸(𝑧 + 𝑤) = 𝐸(𝑧)𝐸(𝑤). Proof. Problem 4.5.3. Turning to trigonometric functions, instead of deriving the Euler formula from the properties of the sine and cosine functions, we will use the Euler formula to define the cosine and sine functions and derive their properties. To avoid circular reasoning, however, we will only call these functions 𝐶(𝑥) and 𝑆(𝑥) until their usual properties have been proven. Definition 4.5.5. We define functions 𝐶 ∶ 𝐑 → 𝐑 and 𝑆 ∶ 𝐑 → 𝐑 to be the real and imaginary parts of 𝐸(𝑖𝑥), or in other words, by definition, 𝐸(𝑖𝑥) = 𝐶(𝑥) + 𝑖𝑆(𝑥)

(4.5.3)

for all 𝑥 ∈ 𝐑. Remark 4.5.6. By (2.2.5) and Theorem 4.5.2, we see that 𝑒𝑖𝑥 + 𝑒−𝑖𝑥 𝑒𝑖𝑥 − 𝑒−𝑖𝑥 (4.5.4) , 𝑆(𝑥) = . 2 2𝑖 We will mostly use these formulas to make it clear that certain a priori complex expressions are actually real, though the reader may later find it helpful in complex analysis to use (4.5.4) to extend 𝐶 and 𝑆 to all of 𝐂. 𝐶(𝑥) =

Theorem 4.5.7. For any 𝑥 ∈ 𝐑, we have that: ∞

(−1)𝑛 𝑥2𝑛 (−1)𝑛 𝑥2𝑛+1 and 𝑆(𝑥) = ∑ . (2𝑛)! (2𝑛 + 1)! 𝑛=0 𝑛=0

(1) 𝐶(𝑥) = ∑

98

Chapter 4. Series of functions

(2) 𝐶(0) = 1 and 𝑆(0) = 0. (3) 𝐶 and 𝑆 are even and odd functions, respectively; that is, 𝐶(−𝑥) = 𝐶(𝑥) and 𝑆(−𝑥) = −𝑆(𝑥). (4) |𝐸(𝑖𝑥)| = 1 and 𝐶(𝑥)2 + 𝑆(𝑥)2 = 1. (5) 𝐶 ′ (𝑥) = −𝑆(𝑥) and 𝑆 ′ (𝑥) = 𝐶(𝑥). Proof. Since ∞

(𝑖𝑥)𝑛 𝑥2 𝑖𝑥3 = 1 + 𝑖𝑥 − − + …, 𝑛! 2! 3! 𝑛=0

𝐸(𝑖𝑥) = ∑

(4.5.5)

claim (1) follows from the pattern +1, +𝑖, −1, −𝑖 in 𝑖𝑛 , and claim (2) is equivalent to the fact that 𝐸(0) = 1 + 0𝑖. The other claims are proven in Problem 4.5.4. Definition 4.5.8. We define 𝑉 = {𝑥 ∈ 𝐑 ∣ 𝑥 > 0 and 𝐶(𝑥) = 0}; i.e., 𝑉 is the set of all positive zeros of 𝐶(𝑥). Lemma 4.5.9. The set 𝑉 is nonempty; i.e., there exists some 𝑥 > 0 such that 𝐶(𝑥) = 0. Proof. Problem 4.5.5. Definition 4.5.10. We define 𝜋 = 2 inf 𝑉, or in other words, we define 𝜋/2 to be the infimum of all positive zeros of 𝐶(𝑥). Theorem 4.5.11. We have that: (1) 𝐶(𝜋/2) = 0. (2) 𝑆(𝜋/2) = 1, and therefore, 𝐸(𝜋𝑖/2) = 𝑖. (3) 𝐸(2𝜋𝑖) = 1. (4) For any 𝑥 ∈ 𝐑, 𝐸(𝑖(𝑥 + 2𝜋)) = 𝐸(𝑖𝑥). Note that (3) is precisely Euler’s identity 𝑒2𝜋𝑖 = 1, and (4) says that 𝐸(𝑖𝑥) is periodic with period 2𝜋. Note also that after proving Theorem 4.5.11, we are now justified in using the name cos 𝑥 for 𝐶(𝑥) and sin 𝑥 for 𝑆(𝑥), and we do so in the sequel. Proof. Once we know that 𝐶(𝜋/2) = 0 and 𝑆(𝜋/2) = 1, it follows that 𝐸(𝜋𝑖/2) = 0 + 1𝑖 = 𝑖. Also, given claim (3), claim (4) follows by Theorem 4.5.4. The rest of the theorem is proven in Problem 4.5.6. For the reader seeing complex exponentials for the first time, it may be helpful to think of complex exponentials like 𝑒𝑖𝑥 as points on the unit circle in the complex plane. More precisely, if we temporarily switch our variable from 𝑥 to 𝜃, we can picture 𝑒𝑖𝜃 as the point cos 𝜃 + 𝑖 sin 𝜃 in the complex plane, forming an angle 𝜃 with the positive real axis, as shown in Figure 4.5.1. This interpretation is particularly useful because the formula 𝑒𝑖𝜃1 𝑒𝑖𝜃2 = 𝑒𝑖(𝜃1 +𝜃2 ) means that multiplying points on the unit circle is the same as adding their angles. (For more about using 𝑒𝑖𝜃 to reconstruct unit circle trigonometry, see Problem 4.5.7.)

4.5. Exponential and trigonometric functions

99

Im cos θ + i sin θ

θ Re

Figure 4.5.1. 𝑒𝑖𝜃 pictured on the unit circle

Im z θ

r

Re

Figure 4.5.2. Polar coordinates 𝑧 = 𝑟𝑒𝑖𝜃 On a related note, complex exponentials also allow us to express any complex number 𝑧 = 𝑥 + 𝑦𝑖 ≠ 0 in “polar coordinate” form 𝑧 = 𝑟𝑒𝑖𝜃 for some 𝑟, 𝜃 ∈ 𝐑. Specifically, 𝑧 as shown in Figure 4.5.2, we can choose 𝑟 = |𝑧| and 𝜃 ∈ 𝐑 such that 𝑒𝑖𝜃 = is the |𝑧| point on the unit circle obtained by thinking of 𝑧 as a vector and scaling 𝑧 to a unit vector pointed in the same direction.

Problems. 4.5.1. (Proves Theorem 4.5.2) Define 𝐸 ∶ 𝐂 → 𝐂 by ∞

𝑧𝑛 1 = ∑ ( ) 𝑧𝑛 . 𝑛! 𝑛! 𝑛=0 𝑛=0

𝐸(𝑧) = ∑

(4.5.6)

(a) Prove that 𝐸(0) = 1. (b) Prove that the radius of convergence of 𝐸 is 𝑅 = ∞. (c) Prove that for all 𝑧 ∈ 𝐂, 𝐸(𝑧) = 𝐸(𝑧). (d) Prove that for all 𝑧 ∈ 𝐂, 𝐸 ′ (𝑧) = 𝐸(𝑧). (Justify convergence carefully.) Suggestion for subsequent problems: From here on out, you should be able to proceed using only the properties of 𝐸(𝑧) proved in Problem 4.5.1, without having to refer to the power series that defines 𝐸(𝑧).

100

Chapter 4. Series of functions

4.5.2. (Proves Theorem 4.5.3) Let 𝑓(𝑧) = 𝐸(𝑧)𝐸(−𝑧). (a) Calculate 𝑓′ (𝑧). What can you conclude? (b) Prove that 𝐸(𝑧)𝐸(−𝑧) = 1. 4.5.3. (Proves Theorem 4.5.4) Fix 𝑏 ∈ 𝐂, and for any 𝑧 ∈ 𝐂, let 𝑓(𝑧) =

𝐸(𝑧 + 𝑏) . (Note 𝐸(𝑧)

that 𝑓 is differentiable on 𝐂 because of Theorem 4.5.3.) (a) Calculate 𝑓′ (𝑧). (b) Prove that for all 𝑧 ∈ 𝐂, 𝑓(𝑧) = 𝐸(𝑏). (Theorem 4.5.4 follows.) 4.5.4. (Proves Theorem 4.5.7) Assume 𝑥 ∈ 𝐑. 2

(a) Prove that |𝐸(𝑖𝑥)| = 1 and 𝐶(𝑥)2 + 𝑆(𝑥)2 = 1. (b) Prove that 𝐶(−𝑥) = 𝐶(𝑥) and 𝑆(−𝑥) = −𝑆(𝑥). (c) Prove that 𝐶 ′ (𝑥) = −𝑆(𝑥) and 𝑆 ′ (𝑥) = 𝐶(𝑥). Suggestion for subsequent problems: Again, from here on out, you should be able to proceed using only the properties of 𝐶(𝑥) and 𝑆(𝑥) proved in problem 4.5.4, without having to refer to their power series. 4.5.5. (Proves Lemma 4.5.9) Proceeding by contradiction in the proof of Lemma 4.5.9, assume for the entirety of this problem that 𝐶(𝑥) > 0 for all 𝑥 > 0. (a) Let 𝑚 = 𝑆(1). Prove that 𝑚 = 𝑆(1) > 0. (b) Prove that 𝐶 ′ (𝑥) < −𝑚 for all 𝑥 > 1. (c) Prove that for 𝑥 > 1, 𝐶(𝑥) − 𝐶(1) < −𝑚(𝑥 − 1). Obtain a contradiction with the assumption that 𝐶(𝑥) > 0 for all 𝑥 > 0. 4.5.6. (Proves Theorem 4.5.11) (a) Prove that there exists a sequence 𝑥𝑛 in 𝐑 such that 𝐶(𝑥𝑛 ) = 0 for all 𝑛, 𝑥𝑛 ≥ 𝜋/2, and lim 𝑥𝑛 = 𝜋/2. 𝑛→∞

(b) Prove that 𝐶(𝜋/2) = 0. (c) Prove that for 0 ≤ 𝑥 < 𝜋/2, 𝐶(𝑥) > 0. (d) Prove that 𝑆(𝜋/2) = 1. (e) Prove that 𝐸(2𝜋𝑖) = 1. 4.5.7. (*) The goal of this problem is to prove a few more familiar precalculus facts about 𝑒𝑖𝑥 , cos 𝑥, and sin 𝑥. (a) Prove the facts in the following chart of signs for cos 𝑥 and sin 𝑥. 𝑥 0 < 𝑥 < 𝜋/2 𝜋/2 < 𝑥 < 𝜋 𝜋 < 𝑥 < 3𝜋/2 3𝜋/2 < 𝑥 < 2𝜋

cos 𝑥 + − − +

sin 𝑥 + + − −

(b) Prove that the image of the function 𝑓 ∶ 𝐑 → 𝐂 defined by 𝑓(𝑥) = 𝑒𝑖𝑥 is the unit circle in 𝐂. In other words, prove that for every 𝑧 ∈ 𝐂 such that |𝑧| = 1, there exists some 𝑥 ∈ 𝐑 such that 𝑓(𝑥) = 𝑧.

101

4.6 More about exponential functions In this section, building on Section 4.5, we establish some notation and some results that we will need later, as well as a few results we used earlier without proof. Definition 4.6.1. For 𝑛 ∈ 𝐙, we define the function 𝑒𝑛 ∶ 𝐑 → 𝐂 by 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 .

(4.6.1)

Note that by Theorem 4.5.2, 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 = 𝑒−2𝜋𝑖𝑛𝑥 = 𝑒−𝑛 (𝑥). Remark 4.6.2. The reader may not realize this, but we have just made an important choice of conventions that will affect everything else we do. For example, by Theorem 4.5.7, we have that 𝑒𝑛 (𝑥 + 1) = 𝑒2𝜋𝑖𝑛(𝑥+1) = 𝑒2𝜋𝑖𝑛𝑥+𝑛(2𝜋𝑖) = 𝑒2𝜋𝑖𝑛𝑥 = 𝑒𝑛 (𝑥),

(4.6.2)

or in other words, 𝑒𝑛 is periodic with period 1. On the other hand, (4.5.2) tells us that 𝑒𝑛′ (𝑥) = (2𝜋𝑖𝑛)𝑒𝑛 (𝑥),

(4.6.3)

which means that the constant 2𝜋𝑖 will appear in many of our derviative and integral formulas involving 𝑒𝑛 (𝑥). In contrast, other authors and practitioners of Fourier analysis may prefer to use 𝑒𝑖𝑛𝑥 or even 𝑒𝜋𝑖𝑛𝑥 , so be warned: In other sources, those 2𝜋, 𝜋, and 2𝜋𝑖 constants will appear in different places than they do here. We have the following indefinite integrals of functions related to 𝑒𝑛 (𝑥): 𝑒−𝑛 (𝑥) + 𝐶, 2𝜋𝑖𝑛 𝑥𝑒 (𝑥) 𝑒 (𝑥) ∫ 𝑥 𝑒𝑛 (𝑥) 𝑑𝑥 = − −𝑛 − −𝑛 2 + 𝐶, 2𝜋𝑖𝑛 (2𝜋𝑖𝑛) 𝑥2 𝑒−𝑛 (𝑥) 2𝑥𝑒−𝑛 (𝑥) 2𝑒−𝑛 (𝑥) ∫ 𝑥2 𝑒𝑛 (𝑥) 𝑑𝑥 = − − + 𝐶, − 2𝜋𝑖𝑛 (2𝜋𝑖𝑛)2 (2𝜋𝑖𝑛)3 ∫ 𝑒𝑛 (𝑥) 𝑑𝑥 = −

(4.6.4) (4.6.5) (4.6.6)

or in general, 𝑥𝑘−1 𝑒−𝑛 (𝑥) 𝑘! 𝑘! 𝑥𝑘 𝑒−𝑛 (𝑥) −( ) ) 𝑘! 2𝜋𝑖𝑛 (𝑘 − 1)! (2𝜋𝑖𝑛)2 𝑒 (𝑥) 𝑘! 𝑥𝑒 (𝑥) 𝑘! − ⋯ − ( ) −𝑛 𝑘 − ( ) −𝑛 𝑘+1 + 𝐶. 1! (2𝜋𝑖𝑛) 0! (2𝜋𝑖𝑛)

∫ 𝑥𝑘 𝑒𝑛 (𝑥) 𝑑𝑥 = − (

(4.6.7)

Note that we will often integrate functions of the form 𝑓(𝑥)𝑒𝑛 (𝑥) = 𝑓(𝑥)𝑒−𝑛 (𝑥), for reasons that will become clear. We also have the following calculations, which turn out to be key to the foundations of Fourier analysis: 1

1 if 𝑛 = 𝑘, ∫ 𝑒𝑛 (𝑥) 𝑒𝑘 (𝑥) 𝑑𝑥 = { 0 otherwise. 0 See Problems 4.6.1–4.6.2 for proofs of the above integration formulas.

(4.6.8)

102

Chapter 4. Series of functions

e π i/2 eπ i

e0 e3 π i/2

Figure 4.6.1. Special values of 𝑒𝑖𝑥 on the unit cirle Next, we have the following useful formulas and special values of 𝑒𝑛 and 𝑒−𝑛 , for any 𝑛, 𝑘 ∈ 𝐙 and any 𝑥 ∈ 𝐑 (Figure 4.6.1 illustrates the case 𝑛 = 1): |𝑒𝑛 (𝑥)| = 1, 2𝜋𝑖𝑛𝑘

(4.6.9) −2𝜋𝑖𝑛𝑘

𝑒𝑛 (𝑘) = 𝑒 = 𝑒−𝑛 (𝑘) = 𝑒 = 1, (4.6.10) 1 1 𝑒𝑛 ( ) = 𝑒𝜋𝑖𝑛 = 𝑒−𝑛 ( ) = 𝑒−𝜋𝑖𝑛 = (−1)𝑛 , (4.6.11) 2 2 1 1 𝑒𝑛 ( ) = 𝑒−𝑛 (− ) = 𝑒𝜋𝑖𝑛/2 = 𝑖𝑛 , (4.6.12) 4 4 1 1 𝑒𝑛 (− ) = 𝑒−𝑛 ( ) = 𝑒−𝜋𝑖𝑛/2 = (−𝑖)𝑛 . (4.6.13) 4 4 We also note an important nonequality, in that (𝑒𝑖𝑎 )𝑏 may not be equal to 𝑒𝑖𝑎𝑏 if 𝑏 is not an integer. (More generally, there is no consistent way to define 𝑧𝑏 for all 𝑧 ∈ 𝐂; again, see Ahlfors [Ahl79] or Conway [Con78] for a discussion.) As the reader may recall from calculus or differential equations, linear differential equations with constant coefficients often have solutions expressed in terms of exponential and trig functions. To be specific: Theorem 4.6.3. Consider the interval 𝐼 = [0, 𝑏] or 𝐼 = [0, +∞) and its interior 𝐼0 = (0, 𝑏) or (0, +∞). (1) For 𝛼, 𝐶 ∈ 𝐂, the differential equation 𝑓′ (𝑥) = 𝛼𝑓(𝑥) has exactly one solution that is continuous on 𝐼, differentiable on 𝐼0 , and satisfies 𝑓(0) = 𝐶, namely, 𝑓(𝑥) = 𝐶𝑒𝛼𝑥 . (2) For 𝐶0 , 𝐶1 ∈ 𝐂 and 𝛼 > 0, the differential equation 𝑓″ (𝑥) = −𝛼2 𝑓(𝑥) has exactly one solution that is continuously differentiable on 𝐼, twice differentiable on 𝐼0 , and satisfies 𝑓(0) = 𝐶0 and 𝑓′ (0) = 𝐶1 , namely, 𝐶 𝑓(𝑥) = 𝐶0 cos(𝛼𝑥) + ( 1 ) sin(𝛼𝑥). (4.6.14) 𝛼 Proof. Problems 4.6.3 and 4.6.4. Finally, for the completist, we recover the fundamental facts about the log, power, and exponential functions promised in Lemma 3.6.10. We begin with some observations about the exponential function restricted to the domain 𝐑. Theorem 4.6.4. Let 𝐸 ∶ 𝐑 → 𝐑 be the restriction of 𝑒𝑥 to the real numbers. Then: (1) For all 𝑥 ∈ 𝐑, 𝐸 ′ (𝑥) = 𝐸(𝑥) > 0.

103

(2) 𝐸 is a strictly increasing function on 𝐑. (3) For all 𝑏 > 0, 𝐸(𝑏) ≥ 1 + 𝑏. (4) The image of 𝐸 is precisely {𝑦 ∈ 𝐑 ∣ 𝑦 > 0}. Proof. Problem 4.6.5. The main point, then, is the definition of ln 𝑥 and 𝑎𝑏 . Definition 4.6.5. Let 𝑋 = {𝑥 ∈ 𝐑 ∣ 𝑥 > 0}. We define ln ∶ 𝑋 → 𝐑 to be the inverse of the function 𝐸 ∶ 𝐑 → 𝑋 defined in Theorem 3.2.15. (Note that ln is well-defined by Theorems 3.2.15 and 4.6.4.) Also, for 𝑎 > 0 and 𝑏 ∈ 𝐑, we define 𝑎𝑏 = 𝑒𝑏 ln 𝑎 . 𝑎

(4.6.15) 𝑥

In particular, for 𝑎, 𝑐 > 0, the functions 𝑥 and 𝑐 are well-defined on the domains 𝑋 and 𝐑, respectively. Lemma 3.6.10 then becomes a calculus problem (Problem 4.6.6).

Problems. 4.6.1. Use induction on 𝑘 and integration by parts to prove 𝑥𝑘−1 𝑒−𝑛 (𝑥) 𝑘! 𝑥𝑘 𝑒−𝑛 (𝑥) 𝑘! −( ) ) 𝑘! 2𝜋𝑖𝑛 (𝑘 − 1)! (2𝜋𝑖𝑛)2 𝑒 (𝑥) 𝑘! 𝑥𝑒 (𝑥) 𝑘! − ⋯ − ( ) −𝑛 𝑘 − ( ) −𝑛 𝑘+1 + 𝐶, 1! (2𝜋𝑖𝑛) 0! (2𝜋𝑖𝑛) starting with the base case ∫ 𝑥𝑘 𝑒𝑛 (𝑥) 𝑑𝑥 = − (

∫ 𝑒𝑛 (𝑥) 𝑑𝑥 = − 4.6.2. Prove that

𝑒−𝑛 (𝑥) + 𝐶. 2𝜋𝑖𝑛

(4.6.16)

(4.6.17)

1

1 if 𝑛 = 𝑘, ∫ 𝑒𝑛 (𝑥) 𝑒𝑘 (𝑥) 𝑑𝑥 = { 0 otherwise. 0

(4.6.18)

4.6.3. (Proves Theorem 4.6.3) Suppose 𝛼 ∈ 𝐂, and suppose 𝑓 is a function that is continuous on [0, 𝑏], differentiable on (0, 𝑏), and satisfies 𝑓′ (𝑥) = 𝛼𝑓(𝑥) on (0, 𝑏). (a) Let 𝑔(𝑥) = 𝑓(𝑥)𝑒−𝛼𝑥 . Calculate 𝑔′ (𝑥) for 𝑥 ∈ (0, 𝑏) (with proof). (b) Prove that 𝑓(𝑥) = 𝑓(0)𝑒𝛼𝑥 for all 𝑥 ∈ [0, 𝑏]. 4.6.4. (Proves Theorem 4.6.3) Fix 𝐶0 , 𝐶1 ∈ 𝐂 and 𝛼 > 0. (a) Prove that

𝐶1 (4.6.19) ) sin(𝛼𝑥) 𝛼 satisfies 𝑓″ (𝑥) = −𝛼2 𝑓(𝑥), 𝑓(0) = 𝐶0 , and 𝑓′ (0) = 𝐶1 . (b) Suppose 𝑓 is given by (4.6.19) and 𝑔 is continuously differentiable on [0, 𝑏], twice differentiable on (0, 𝑏), and also satisfies 𝑔″ (𝑥) = −𝛼2 𝑔(𝑥), 𝑔(0) = 𝐶0 , and 𝑔′ (0) = 𝐶1 . Let ℎ(𝑥) = 𝑓(𝑥) − 𝑔(𝑥). Compute ℎ(0), ℎ′ (0), and ℎ″ (𝑥) in terms of ℎ(𝑥). 𝑓(𝑥) = 𝐶0 cos(𝛼𝑥) + (

104

Chapter 4. Series of functions (c) Let ℎ be the function from part (b). By considering the function 𝑘(𝑥) = 𝛼2 (ℎ(𝑥))2 + (ℎ′ (𝑥))2 , prove that ℎ(𝑥) = 0 for all 𝑥 ∈ [0, 𝑏].

4.6.5. (Proves Theorem 4.6.4) Let 𝐸 ∶ 𝐑 → 𝐑 be the restriction of 𝑒𝑥 to the real numbers. (a) Prove that for all 𝑥 ∈ 𝐑, 𝐸 ′ (𝑥) = 𝐸(𝑥) > 0. (b) Prove that for all 𝑏 > 0, 𝐸(𝑏) ≥ 1 + 𝑏. (c) Prove that the image of 𝐸 is precisely {𝑦 ∈ 𝐑 ∣ 𝑦 > 0}. 4.6.6. (Proves Lemma 3.6.10) Using Definition 4.6.5: 𝑑 1 (ln 𝑥) = . 𝑑𝑥 𝑥 𝑑 𝑎 (𝑥 ) = 𝑎𝑥𝑎−1 . (b) Prove that 𝑑𝑥 𝑑 𝑥 (c) Prove that (𝑐 ) = (ln 𝑐)𝑐𝑥 . 𝑑𝑥 (a) Prove that

4.7 The Schwartz space In most of this book, we study functions on some closed and bounded interval in 𝐑. Sometimes, however, as in Sections 7.5.2 and 8.5.2, occasionally in Chapters 10 and 11, and heavily in Chapters 12 and 13, we wish to consider functions on all of 𝐑. However, because functions need to decay as 𝑥 → ±∞ in some sense in order to have well-defined integrals on 𝐑, the following class of functions turns out to be very useful. Definition 4.7.1. To say that a continuous function 𝑓 ∶ 𝐑 → 𝐂 is rapidly decaying means that one of the following equivalent conditions (see Problem 4.7.1) holds: (1) For any 𝑛 ≥ 0, 𝑥𝑛 𝑓(𝑥) is bounded on 𝐑. (2) For any 𝑛 ≥ 0, lim 𝑥𝑛 𝑓(𝑥) = 0. 𝑥→±∞

In the language of asymptotics (Definition 3.6.7), the latter condition can be written as 𝑓(𝑥) ≪ (1/𝑥𝑛 ), or in other words, as 𝑥 → ±∞, 𝑓(𝑥) goes to 0 faster than any function 1/𝑥𝑛 . The Schwartz space 𝒮(𝐑) is the space of all 𝑓 ∶ 𝐑 → 𝐂 such that for all 𝑘 ≥ 0, the 𝑘th derivative 𝑓(𝑘) (𝑥) of 𝑓 exists for all 𝑥 ∈ 𝐑 and is rapidly decaying. We have the following basic properties of Schwartz functions. Theorem 4.7.2. Suppose 𝑓, 𝑔 ∈ 𝒮(𝐑) and 𝑝(𝑥) is a polynomial. Then 𝑓′ (𝑥), 𝑓(𝑥)+𝑔(𝑥), 𝑓(𝑥)𝑔(𝑥), and 𝑝(𝑥)𝑓(𝑥) are all in 𝒮(𝐑). Proof. Problem 4.7.2. We also observe that for any 𝑓 ∈ 𝒮(𝐑), since 𝑓 is differentiable and 𝑓′ is bounded, Corollary 3.2.14 implies: Corollary 4.7.3. If 𝑓 ∈ 𝒮(𝐑), then 𝑓 is uniformly continuous on 𝐑. We also have the following key example of a Schwartz function.

4.8. Integration on 𝐑

105

Theorem 4.7.4. For 𝑎 > 0, 𝑏 ∈ 𝐑, and 𝑝(𝑥) a polynomial, the function 2 +𝑏𝑥

𝑓(𝑥) = 𝑝(𝑥)𝑒−𝑎𝑥

(4.7.1)

is in 𝒮(𝐑). Proof. Problem 4.7.3. 1 . With 1 + 𝑥2 𝑔(𝑥), even though 𝑔 satisfies the limit condition in Definition 4.7.1, 𝑔 ∉ 𝒮(𝐑) because 𝑔 is not differentiable at 0. As for ℎ(𝑥), even though ℎ(𝑘) (𝑥) exists for every 𝑘 ≥ 0 and 𝑥 ∈ 𝐑, ℎ ∉ 𝒮(𝐑) because (for example) lim 𝑥4 ℎ(𝑥) = +∞. Example 4.7.5. Two functions not in 𝒮(𝐑) are 𝑔(𝑥) = 𝑒−|𝑥| and ℎ(𝑥) =

𝑥→±∞

Problems. 4.7.1. Suppose 𝑓 ∶ 𝐑 → 𝐂 is continuous. This problem proves the equivalence of the two conditions in Definition 4.7.1 that define what it means for 𝑓 to decay rapidly. (a) Prove that if |𝑥𝑛+1 𝑓(𝑥)| ≤ 𝑀 for all 𝑥 ∈ 𝐑, then lim 𝑥𝑛 𝑓(𝑥) = 0. 𝑥→±∞

(b) Prove that if lim 𝑥𝑛 𝑓(𝑥) = 0, then there exists some 𝑀 such that |𝑥𝑛 𝑓(𝑥)| ≤ 𝑥→±∞

𝑀 for all 𝑥 ∈ 𝐑. 4.7.2. (Proves Theorem 4.7.2) Suppose 𝑓, 𝑔 ∈ 𝒮(𝐑) and 𝑝 is a polynomial. (a) Prove that 𝑓′ (𝑥) ∈ 𝒮(𝐑). (b) Prove that 𝑓(𝑥) + 𝑔(𝑥) ∈ 𝒮(𝐑). (c) Prove that 𝑓(𝑥)𝑔(𝑥) ∈ 𝒮(𝐑). (d) Prove that 𝑝(𝑥)𝑓(𝑥) ∈ 𝒮(𝐑). 4.7.3. (Proves Theorem 4.7.4) For the purposes of this problem only, we define a poly2 Gaussian to be a function of the form 𝑓(𝑥) = 𝑝(𝑥)𝑒−𝑎𝑥 +𝑏𝑥 , where 𝑝(𝑥) is a polynomial, 𝑎 > 0, and 𝑏 ∈ 𝐑. The following shows that every poly-Gaussian function is in 𝒮(𝐑). (a) Prove that the derivative of a poly-Gaussian function is poly-Gaussian. (b) Prove that if 𝑓(𝑥) is poly-Gaussian, then lim 𝑓(𝑥) = 0. 𝑥→±∞

4.8 Integration on 𝐑 Starting occasionally in Chapters 10 and 11 and everywhere in Chapters 12 and 13, we will need to integrate functions over the entire real line. Therefore, in this section, we extend several results from Section 3.6 to (improper) integration over all of 𝐑. In order of increasing difficulty, these are •

integration by parts (Theorem 4.8.7),

differentiating under the integral sign (Theorem 4.8.8), and

Fubini’s Theorem (Theorem 4.8.11).

106

Chapter 4. Series of functions ∞ 2

We also establish the integral ∫ 𝑒−𝜋𝑥 𝑑𝑥 = 1 (Theorem 4.8.6). −∞

As with Section 3.6, the first-time reader is encouraged to skip the proofs and absorb only Definition 4.8.2 and the statements of the main results; and as with Section 4.7, the reader only interested in Fourier series and not the Fourier transform is encouraged to skip this section altogether. In any case, we begin with the standard definitions, which the reader may recall from calculus. Definition 4.8.1. To say that 𝑓 ∶ 𝐑 → 𝐂 is locally integrable means that 𝑓 is integrable on any closed and bounded interval in 𝐑. Definition 4.8.2. Let 𝑓 ∶ 𝐑 → 𝐂 be locally integrable. For 𝑎 ∈ 𝐑, to say that the ∞

improper integral ∫ 𝑓(𝑥) 𝑑𝑥 converges (or exists) means that the limit 𝑎 𝑏

lim ∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥

𝑏→∞

𝑎

(4.8.1)

𝑎

𝑏

exists; to say that ∫ 𝑓(𝑥) 𝑑𝑥 converges means that −∞ 𝑏

𝑏

lim ∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥

𝑎→−∞

𝑎

(4.8.2)

−∞

exists; and to say that ∫ 𝑓(𝑥) 𝑑𝑥 converges means that both (4.8.1) and (4.8.2) exist −∞

for some (and therefore, any) fixed 𝑎 and 𝑏, in which case, we define ∞

𝑐

∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥 + ∫ 𝑓(𝑥) 𝑑𝑥 −∞

−∞

(4.8.3)

𝑐

for some (and therefore, any) 𝑐. (Note that it follows from the usual properties of integration that (4.8.3) is independent of 𝑐.) If (4.8.3) converges, we say that 𝑓 is integrable on 𝐑; for clarity, we sometimes add a phrase like “in the sense of an improper Riemann integral.” Note that it follows in a straightforward manner from Definition 4.8.2 that if 𝑓 is integrable on 𝐑, ∞

𝑁

∫ 𝑓(𝑥) 𝑑𝑥 = lim ∫ 𝑓(𝑥) 𝑑𝑥. −∞

𝑁→∞

(4.8.4)

−𝑁

Thanks to (4.8.4), we will later be able to apply our theory of sequences of functions to improper integrals. Returning to the fundamentals of improper integrals, by applying the limit laws to Corollary 3.4.3 and Theorem 3.4.8, we see that improper integrals inherit the linearity and absolute value properties of proper integrals. Moreover, improper integrals satisfy the following analogues of Corollaries 4.1.2 and 4.1.4.

4.8. Integration on 𝐑

107

Theorem 4.8.3 (Cauchy Criterion for Improper Integrals). Suppose 𝑓 ∶ 𝐑 → 𝐂 is locally integrable. Then 𝑓 is integrable on 𝐑 if and only if for any 𝜖 > 0, there exists | 𝑐 | 𝑁(𝜖) > 0 such that if 𝑏, 𝑐 > 𝑁(𝜖) or 𝑏, 𝑐 < −𝑁(𝜖), then |∫ 𝑓(𝑥) 𝑑𝑥| < 𝜖. | 𝑏 | ∞

Proof. It suffices to prove the one-sided analogue for ∫ 𝑓(𝑥) 𝑑𝑥, which is done in 𝑎

Problem 4.8.1.

Theorem 4.8.4 (Comparison Test for Improper Integrals). Suppose 𝑓, 𝑔 ∶ 𝐑 → 𝐂 are locally integrable and |𝑓(𝑥)| ≤ 𝑔(𝑥) for all 𝑥 ∈ 𝐑. If 𝑔 is integrable on 𝐑, then so is 𝑓, and ∞

| | |∫ 𝑓(𝑥) 𝑑𝑥| ≤ ∫ 𝑔(𝑥) 𝑑𝑥. | −∞ | −∞

(4.8.5)

Proof. It again suffices to prove the one-sided version; see Problem 4.8.2. Example 4.8.5. If 𝑓 is locally integrable, it follows from Theorem 4.8.4 and standard results of calculus that if 𝑓 is bounded and there exist constants 𝐶 > 0 and 𝑘 > 1 such 𝐶 that |𝑓(𝑥)| ≤ for all 𝑥 ≠ 0, then 𝑓 is integrable on 𝐑. In particular, this holds for 𝑘 |𝑥| all 𝑓 ∈ 𝒮(𝐑). We are now ready to begin tackling our main results. First, the following integral is used at a few critical junctures in Chapters 11 and 12. ∞ 2

Theorem 4.8.6. We have that ∫ 𝑒−𝜋𝑥 𝑑𝑥 = 1. −∞

Proof. Problem 4.8.3. The next result is a relatively straightforward matter of taking limits. Theorem 4.8.7 (Integration by Parts on 𝐑). Suppose that 𝑓, 𝑔 ∶ 𝐑 → 𝐂 are differentiable, 𝑓′ and 𝑔′ are continuous on 𝐑, 𝑓(𝑥)𝑔′ (𝑥) is integrable on 𝐑, and both lim 𝑓(𝑎)𝑔(𝑎) and lim 𝑓(𝑏)𝑔(𝑏) exist. Then 𝑔(𝑥)𝑓′ (𝑥) is integrable on 𝐑, and

𝑎→−∞

𝑏→∞ ∞

∫ 𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 = lim 𝑓(𝑏)𝑔(𝑏) − lim 𝑓(𝑎)𝑔(𝑎) − ∫ 𝑔(𝑥)𝑓′ (𝑥) 𝑑𝑥. 𝑏→∞

−∞

𝑎→−∞

(4.8.6)

−∞

For brevity, we define ∞

𝐹(𝑥)|−∞ = lim 𝐹(𝑏) − lim 𝐹(𝑎), 𝑏→∞

𝑎→−∞

(4.8.7)

so we may rewrite (4.8.6) as ∞

∫ 𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 = 𝑓(𝑥)𝑔(𝑥)|−∞ − ∫ 𝑔(𝑥)𝑓′ (𝑥) 𝑑𝑥. −∞

−∞

Proof. Again, the one-sided case suffices; see Problem 4.8.4.

(4.8.8)

108

Chapter 4. Series of functions

Our next main result, the improper version of differentiating under the integral sign, comes from an application of our results on uniform convergence (Section 4.3). Theorem 4.8.8. Let 𝑓 ∶ [𝑎, 𝑏] × 𝐑 → 𝐂 be a continuous function such that the partial 𝜕𝑓 derivative of 𝑓 in the first variable, , is continuous on [𝑎, 𝑏] × 𝐑 (as a function of two 𝜕𝑥 𝜕𝑓 variables), and for any fixed 𝑥0 ∈ [𝑎, 𝑏], both 𝑓(𝑥0 , 𝑦) and (𝑥 , 𝑦) are integrable on 𝐑 𝜕𝑥 0 (as functions of 𝑦). Suppose also that the sequences 𝑁

𝑁

𝐹𝑁 (𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦,

𝑔𝑁 (𝑥) = ∫

−𝑁

−𝑁

𝜕𝑓 𝑑𝑦 𝜕𝑥

(4.8.9)

of functions 𝐹𝑁 , 𝑔𝑁 ∶ [𝑎, 𝑏] → 𝐂 converge uniformly (i.e., independently of 𝑥) on [𝑎, 𝑏] to ∞

𝐹(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦,

𝑔(𝑥) = ∫

−∞

−∞

𝜕𝑓 𝑑𝑦, 𝜕𝑥

(4.8.10)

respectively. Then for all 𝑥 ∈ [𝑎, 𝑏], ∞

𝐹 ′ (𝑥) =

𝜕𝑓 𝑑 ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦 = ∫ 𝑑𝑦 = 𝑔(𝑥). 𝑑𝑥 −∞ 𝜕𝑥 −∞

(4.8.11)

To give an example of how the above uniform convergence conditions on 𝐹𝑁 and 𝑔𝑁 can be used in practice, if 𝑓(𝑥, 𝑦) = ℎ(𝑥, 𝑦)𝑘(𝑦), where |ℎ(𝑥, 𝑦)| ≤ 𝐶 for all 𝑥, 𝑦 ∈ 𝐑 and 𝑘(𝑦) ∈ 𝒮(𝐑), then the uniform convergence condition on 𝐹𝑁 holds because ∞

| | |∫ 𝑓(𝑥, 𝑦) 𝑑𝑦| ≤ ∫ |ℎ(𝑥, 𝑦)𝑘(𝑦)| 𝑑𝑦 ≤ ∫ 𝐶 |𝑘(𝑦)| 𝑑𝑦 | 𝑁 | 𝑁 𝑁

(4.8.12)

converges to 0 as 𝑁 → ∞, independently of 𝑥. Proof. Problem 4.8.5. Our last main result is Fubini’s Theorem for integration on 𝐑 (Theorem 4.8.11). In our version of Fubini, we will assume unnecessary hypotheses, like differentiability, to keep the proof both accessible and relatively brief. To be precise, we consider only the following kind of integrand. Definition 4.8.9. To say that 𝐹 ∶ 𝐑 × 𝐑 → 𝐂 is integrable by separation means that 𝐹 is continuous (as a function of two variables) and there exists a set of nonnegative real bounded functions 𝑓𝑖 , 𝑔𝑖 ∶ 𝐑 → 𝐑 (1 ≤ 𝑖 ≤ 𝑘), each integrable on 𝐑, such that 𝑘

|𝐹(𝑥, 𝑦)| ≤ ∑ 𝑓𝑖 (𝑥)𝑔𝑖 (𝑦)

(4.8.13)

𝑖=1

for all 𝑥, 𝑦 ∈ 𝐑. The following lemma then sums up some useful convergence properties of functions that are integrable by separation.

4.8. Integration on 𝐑

109

𝜕𝐹 Lemma 4.8.10. Suppose 𝐹 ∶ 𝐑 × 𝐑 → 𝐂 is a function such that 𝐹 and are both 𝜕𝑥 integrable by separation. Then: (1) The integral

∞ 𝑦

𝐹 (𝑥) = ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦

(4.8.14)

−∞

converges for every 𝑥 ∈ 𝐑. 𝑁 𝑦

(2) The sequence of functions 𝐹𝑁 (𝑥) = ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦 converges uniformly (i.e., indepen−𝑁

dently of 𝑥) on 𝐑 to 𝐹 𝑦 .

(3) The function 𝐹 𝑦 (𝑥) is differentiable in 𝑥 and integrable on 𝐑. In particular, the double ∞

integral ∫ ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥 is well-defined. −∞

−∞

(4) We have that ∞

lim ∫

𝑁→∞

−∞

∞ 𝑦 𝐹𝑁 (𝑥) 𝑑𝑥

𝑦

= ∫ 𝐹 (𝑥) 𝑑𝑥 = ∫ ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦 𝑑𝑥. −∞

−∞

(4.8.15)

−∞

Also, by symmetry, the same statements all hold with 𝑥 and 𝑦 switched. Note that statement (4) does not follow from statement (2), as it is possible to find a sequence 𝑓𝑛 ∶ 𝐑 → 𝐑 that converges uniformly to some 𝑓 ∶ 𝐑 → 𝐑 such that the integrals of the 𝑓𝑛 do not converge to the integral of 𝑓 (Problem 4.8.6). Proof. By Definition 4.8.9, suppose 𝐹 is continuous and that 𝑓𝑖 , 𝑔𝑖 ∶ 𝐑 → 𝐑 (1 ≤ 𝑖 ≤ 𝑘) are continuous nonnegative real bounded functions, each integrable on 𝐑, such that 𝑘

|𝐹(𝑥, 𝑦)| ≤ ∑ 𝑓𝑖 (𝑥)𝑔𝑖 (𝑦)

(4.8.16)

𝑖=1

for all 𝑥, 𝑦 ∈ 𝐑. We first see that statement (1) follows from an application of the comparison test 𝑘

for improper integrals (Problem 4.8.7). Next, let 𝑓(𝑥) = ∑ 𝑓𝑖 (𝑥). The key point here is 𝑖=1

to prove the following claim (Problem 4.8.8). Claim: For every 𝜖 > 0, there exists 𝐾(𝜖) (not depending on 𝑥) such 𝑦 that if 𝑁 > 𝐾(𝜖), then ||𝐹𝑁 (𝑥) − 𝐹 𝑦 (𝑥)|| < 𝜖𝑓(𝑥). Once the claim is established, since the 𝑓𝑖 are bounded, so is 𝑓, and the claim im𝜕𝐹 plies statement (2). Repeating the proof so far for instead of 𝐹 also shows that the se𝜕𝑥 ∞ 𝑁 𝜕𝐹 𝜕𝐹 (𝑥, 𝑦) 𝑑𝑦 converges uniformly to the convergent integral ∫ (𝑥, 𝑦) 𝑑𝑦, quence ∫ 𝜕𝑥 𝜕𝑥 −∞ −𝑁 so Theorem 4.8.8 implies the differentiability part of statement (3). Then, if we let ∞

𝐶𝑖 = ∫ 𝑔𝑖 (𝑦) 𝑑𝑦, −∞

(4.8.17)

110

Chapter 4. Series of functions

since 𝐹 𝑦 is continuous on 𝐑 and 𝑘

|𝐹 𝑦 (𝑥)| ≤ ∑ 𝐶𝑖 𝑓𝑖 (𝑥),

(4.8.18)

𝑖=1

the integrability part of statement (3) follows by Theorem 4.8.4. Finally, it remains to show that the claim implies statement (4), and this is again Problem 4.8.8. Theorem 4.8.11 (Fubini’s Theorem on 𝐑). Suppose 𝐹 ∶ 𝐑 × 𝐑 → 𝐂 is a function such 𝜕𝐹 𝜕𝐹 , and are all integrable by separation. Then both sides of that 𝐹, 𝜕𝑥 𝜕𝑦 ∞

∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥 = ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑥) 𝑑𝑦 −∞

−∞

−∞

(4.8.19)

−∞

converge and are equal. Proof. Lemma 4.8.10(3) shows that both sides of (4.8.19) converge, so it remains to show that they are equal. By the bounded version of Fubini’s Theorem 3.6.21, for all 𝐾, 𝑁 ∈ 𝐍, 𝐾

𝑁

𝑁

𝐾

∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥 = ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑥) 𝑑𝑦. −𝐾

−𝑁

−𝑁

(4.8.20)

−𝐾

We now evaluate the double limit lim lim of both sides. 𝐾→∞ 𝑁→∞

On the one hand, 𝐾

𝑁

𝐾

lim lim ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥 = lim ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥

𝐾→∞ 𝑁→∞

−𝐾

𝐾→∞

−𝑁

−𝐾 ∞

−∞

(*) (4.8.21)

= ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑦) 𝑑𝑥, −∞

−∞ ∞

where (*) follows by the uniform convergence of ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦 (Lemma 4.8.10(2)) and −∞

Theorem 4.3.14. On the other hand, 𝑁

𝐾

𝐾

lim lim ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑥) 𝑑𝑦 = lim ∫ ∫ 𝐹(𝑥, 𝑦) 𝑑𝑥 𝑑𝑦

𝐾→∞ 𝑁→∞

−𝑁

−𝐾

𝐾→∞ ∞

−∞ −𝐾 ∞

(4.8.22)

= ∫ (∫ 𝐹(𝑥, 𝑦) 𝑑𝑥) 𝑑𝑦, −∞

−∞

where the last step follows by Lemma 4.8.10(4). The theorem follows.

Problems. 4.8.1. (Proves Theorem 4.8.3) Suppose 𝑓 ∶ 𝐑 → 𝐂 is locally integrable. ∞

(a) Prove that if ∫ 𝑓(𝑥) 𝑑𝑥 converges, then for any 𝜖 > 0, there exists 𝑁(𝜖) > 0 𝑎

| 𝑐 | such that if 𝑏, 𝑐 > 𝑁(𝜖), then |∫ 𝑓(𝑥) 𝑑𝑥| < 𝜖. | 𝑏 |

4.8. Integration on 𝐑

111

(b) Suppose that for any 𝜖 > 0, there exists 𝑁(𝜖) > 0 such that if 𝑏, 𝑐 > 𝑁(𝜖), then ∞ | 𝑐 | |∫ 𝑓(𝑥) 𝑑𝑥| < 𝜖. Prove that ∫ 𝑓(𝑥) 𝑑𝑥 converges. | 𝑏 | 𝑎 4.8.2. (Proves Theorem 4.8.4) Suppose 𝑓, 𝑔 ∶ 𝐑 → 𝐂 are locally integrable, |𝑓(𝑥)| ≤ 𝑔(𝑥) ∞

for all 𝑥 ∈ 𝐑, and ∫ 𝑔(𝑥) 𝑑𝑥 converges. Prove that ∫ 𝑓(𝑥) 𝑑𝑥 converges. 0

0

4.8.3. (Proves Theorem 4.8.6) In this problem, we cheat a little and use standard facts about change of variables from multivariable calculus. ∞

∞ 2 +𝑦2 )

(a) Prove that ∫ ∫ 𝑒−𝜋(𝑥 −∞ ∞

𝑑𝑥 𝑑𝑦 = 1.

−∞ 2

(b) Prove that ∫ 𝑒−𝜋𝑥 𝑑𝑥 = 1. −∞

4.8.4. (Proves Theorem 4.8.7) Fix 𝑎 ∈ 𝐑, and suppose that 𝑓, 𝑔 are differentiable on 𝐑, 𝑓′ , 𝑔′ are continuous on 𝐑, 𝑓(𝑥)𝑔′ (𝑥) is integrable on 𝐑, and lim 𝑓(𝑥)𝑔(𝑥) exists. 𝑏→∞

Prove that ∞

∫ 𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 = lim 𝑓(𝑏)𝑔(𝑏) − 𝑓(𝑎)𝑔(𝑎) − ∫ 𝑔(𝑥)𝑓′ (𝑥) 𝑑𝑥. 𝑎

𝑏→∞

(4.8.23)

𝑎

In particular, prove that the improper integral on the right-hand side exists. 4.8.5. (Proves Theorem 4.8.8) Let 𝑓 ∶ [𝑎, 𝑏] × 𝐑 → 𝐂 be a continuous function such 𝜕𝑓 that is continuous on [𝑎, 𝑏] × 𝐑 (as a function of two variables), and for any 𝜕𝑥 𝜕𝑓 (𝑥 , 𝑦) are integrable on 𝐑 (as functions of fixed 𝑥0 ∈ [𝑎, 𝑏], both 𝑓(𝑥0 , 𝑦) and 𝜕𝑥 0 𝑦). Suppose also that the sequences 𝑁

𝑁

𝐹𝑁 (𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦,

𝑔𝑁 (𝑥) = ∫

−𝑁

−𝑁

𝜕𝑓 𝑑𝑦 𝜕𝑥

(4.8.24)

of functions 𝐹𝑁 , 𝑔𝑁 ∶ [𝑎, 𝑏] → 𝐂 converge uniformly on [𝑎, 𝑏] to ∞

𝜕𝑓 𝑑𝑦, 𝜕𝑥

(4.8.25)

𝜕𝑓 𝜕 ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦 = ∫ 𝑑𝑦 = 𝑔(𝑥). 𝜕𝑥 𝑐 𝜕𝑥 𝑐

(4.8.26)

𝐹(𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦,

𝑔(𝑥) = ∫

−∞

−∞

respectively. Then for all 𝑥 ∈ [𝑎, 𝑏], 𝑑

𝐹 ′ (𝑥) =

𝑑

4.8.6. Define 𝑓𝑛 , 𝑓 ∶ 𝐑 → 𝐂 by 1 𝑓𝑛 (𝑥) = { 𝑛 0

for 𝑛 ≤ 𝑥 ≤ 2𝑛,

(4.8.27)

otherwise,

and 𝑓(𝑥) = 0. Prove that 𝑓𝑛 converges uniformly to 𝑓 on 𝐑, but ∞

lim ∫ 𝑓𝑛 (𝑥) 𝑑𝑥 ≠ ∫ 𝑓(𝑥) 𝑑𝑥.

𝑛→∞

−∞

−∞

(4.8.28)

112

Chapter 4. Series of functions For Problems 4.8.7 and 4.8.8, suppose 𝐹 ∶ 𝐑 × 𝐑 → 𝐂 is continuous as a function of two variables and there exists a set of nonnegative real functions 𝑓𝑖 , 𝑔𝑖 ∶ 𝐑 → 𝐑 (1 ≤ 𝑖 ≤ 𝑘), each integrable on 𝐑, such that 𝑘

|𝐹(𝑥, 𝑦)| ≤ ∑ 𝑓𝑖 (𝑥)𝑔𝑖 (𝑦)

(4.8.29)

𝑖=1

for all 𝑥, 𝑦 ∈ 𝐑. 4.8.7. (Proves Lemma 4.8.10) Prove that the integral ∞

𝐹 𝑦 (𝑥) = ∫ 𝑓(𝑥, 𝑦) 𝑑𝑦

(4.8.30)

−∞

converges for every 𝑥 ∈ 𝐑. 4.8.8. (Proves Lemma 4.8.10) Let 𝑁

𝐹𝑁 (𝑥) = ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦,

𝐹 𝑦 (𝑥) = ∫ 𝐹(𝑥, 𝑦) 𝑑𝑦.

𝑦

−𝑁

(4.8.31)

−∞

𝑘

(a) Let 𝑓(𝑥) = ∑ 𝑓𝑖 (𝑥). Prove that for every 𝜖 > 0, there exists 𝐾(𝜖) (not depend𝑖=1

𝑦

ing on 𝑥) such that if 𝑁 > 𝐾(𝜖), then ||𝐹𝑁 (𝑥) − 𝐹 𝑦 (𝑥)|| < 𝜖𝑓(𝑥). 𝑦

(b) Assuming that 𝐹𝑁 , 𝐹 𝑦 ∶ 𝐑 → 𝐂 are integrable on 𝐑, prove that ∞

∞ 𝑦

lim ∫ 𝐹𝑁 (𝑥) 𝑑𝑥 = ∫ 𝐹 𝑦 (𝑥) 𝑑𝑥.

𝑛→∞

−∞

−∞

(4.8.32)

Part 2

Fourier series and Hilbert spaces

5 The idea of a function space . . . [T]here is a difference between numbers and numbers that matter. This is what separates data from metrics. You can’t pick your data, but you must pick your metrics. — Jeff Bladt and Bob Filbin, “Know the Difference Between Your Data and Your Metrics,” Harvard Business Review, March 4, 2013 In this brief chapter, we motivate some of the main ideas of Part 2 of this book. First, starting from an old conundrum due to Lewis Carroll, in Section 5.1, we introduce the question: What is a reasonable way to determine how close two functions are? More precisely, in Section 5.2, we argue that, when trying to find a“good” or “best” approximation to a given function 𝑓, we should choose a suitable function space 𝑉 and look at a reasonable metric (see Section 2.3) on 𝑉 to determine how good that approximation is. Finally, as it turns out, our favorite metric on function spaces is best described in terms of an abstract version of dot products, so in Section 5.3, we review the properties of ordinary Euclidean dot products, which the reader may have seen in multivariable calculus or physics.

5.1 Which clock keeps better time? In “The Two Clocks,” Lewis Carroll asked: Which is better, a clock that is right only once a year, or a clock that is right twice every day? “The latter,” you reply, “unquestionably.” Very good, now attend. I have two clocks: one doesn’t go at all, and the other loses a minute a day: which would you prefer? Now the answer to the second question may seem obvious, but (as Mr. Carroll puts it) attend: If clock #2 loses a minute a day and no one realizes that or fixes clock #2, then after several months or years, clock #2 will be quite far off from the actual time. 115

116

Chapter 5. The idea of a function space

Of course, after several more months or years, clock #2 will be back to being nearly on time. So which clock is better? More precisely, what’s a reasonable way to quantify the question of which clock is better? One way to resolve this dilemma is to ask: Question 5.1.1. Which clock is, on average, less wrong? The reason this version of the clock question is convenient is that, as the reader may recall from calculus, the average value of an integrable 𝑓 ∶ [𝑎, 𝑏] → 𝐑 is defined to be 𝑏 1 ∫ 𝑓(𝑡) 𝑑𝑡. (5.1.1) 𝑏−𝑎 𝑎 To quantify this fully, let 𝑡 be time in days, and assuming 24-hour clocks, let 𝑓(𝑡) be the actual time, let 𝑠(𝑡) = 0 be the time on the stopped clock, and let ℓ(𝑡) be the time on the lagging clock. Then assuming both clocks are correct at midnight (𝑡 = 0) of some particular day, we see that the magnitude (absolute value) of the errors of the stopped and lagging clocks are 1

1

2

2

|𝑓(𝑡) − 𝑠(𝑡)| = |24𝑡|

for − ≤ 𝑡 ≤ ,

(5.1.2)

𝑡 |𝑓(𝑡) − ℓ(𝑡)| = || || 60

for −720 ≤ 𝑡 ≤ 720,

(5.1.3)

respectively, where |𝑓(𝑡) − 𝑠(𝑡)| is periodic with period 1 and |𝑓(𝑡) − ℓ(𝑡)| is periodic with period 1440. It therefore makes sense to take the average of |𝑓(𝑡) − 𝑠(𝑡)| on the in1 1 terval [− , ] and the average of |𝑓(𝑡) − ℓ(𝑡)| on the interval [−720, 720]. (The finicky 2 2 reader may prefer to take both averages on [−720, 720], but the periodicity of |𝑓(𝑡) − 𝑠(𝑡)| implies that we will get the same answer.) So we may now ask, precisely: Question 5.1.2. Which average error is greater: 1 2

1 1

1

2

2

( ) − (− )

720

∫ |𝑓(𝑡) − 𝑠(𝑡)| 𝑑𝑡 −

1 2

or

1 ∫ |𝑓(𝑡) − ℓ(𝑡)| 𝑑𝑡? 720 − (−720) −720

(5.1.4)

Now, there are many other possible measures of average error. For example, suppose large errors concern us more than small ones; in that case we might, for example, replace the average of the error with the average of the squared error. More generally: Question 5.1.3. For 𝑝 = 2, which average error is greater: 1 2

1 1

1

2

2

( ) − (− )

𝑝

∫ |𝑓(𝑡) − 𝑠(𝑡)| 𝑑𝑡 −

1 2

720

or

1 𝑝 ∫ |𝑓(𝑡) − ℓ(𝑡)| 𝑑𝑡? (5.1.5) 720 − (−720) −720

How about 𝑝 > 1 other than 𝑝 = 2? The answers to the above questions will be left as problems, but we hope that the reader has at least gotten a flavor of what it means to measure the distance between two functions.

5.2. Function spaces and metrics

117

Problems. 5.1.1. Calculate the average errors 1 2

1 1

1

2

2

( ) − (− )

720

∫ |𝑓(𝑡) − 𝑠(𝑡)| 𝑑𝑡, −

1 2

1 ∫ |𝑓(𝑡) − ℓ(𝑡)| 𝑑𝑡. 720 − (−720) −720

(5.1.6)

Which clock is more accurate, on average? 5.1.2. Calculate the “average squared errors” 1 2

1 1

1

2

2

( ) − (− )

2

∫ |𝑓(𝑡) − 𝑠(𝑡)| 𝑑𝑡, −

1 2

720

1 2 ∫ |𝑓(𝑡) − ℓ(𝑡)| 𝑑𝑡. 720 − (−720) −720

(5.1.7)

Which clock is more accurate, on average? 5.1.3. For 𝑝 > 1, 𝑝 ≠ 2, calculate the two average “errors-raised-to-the-𝑝th-power” 1 2

1 1

1

2

2

( ) − (− )

𝑝

∫ |𝑓(𝑡) − 𝑠(𝑡)| 𝑑𝑡, −

1 2

720

1 𝑝 ∫ |𝑓(𝑡) − ℓ(𝑡)| 𝑑𝑡. 720 − (−720) −720

(5.1.8)

Which clock is more accurate, on average, and how does that depend on 𝑝?

5.2 Function spaces and metrics As we saw in Section 5.1, one way to make the question “Which function is closer?” (or later, “Which approximation is better?”) precise is to define what will turn out to be a metric on functions (see Section 2.3). However, to ensure that these kinds of “metrics” actually satisfy the axioms of a metric (Definition 2.3.1), we need to do several things; most prominently, we must specify a function space on which such a metric is to be defined. We therefore come to the following definition. Definition 5.2.1. Let 𝑋 be a set. We define a function space on 𝑋 to be a collection 𝑉 of (complex-valued) functions, all with the same domain 𝑋, such that the following properties hold: (1) (Nonempty) 𝑉 contains the zero function 0(𝑥) = 0. (2) (Closed under addition) For 𝑓, 𝑔 ∈ 𝑉, 𝑓 + 𝑔 ∈ 𝑉. (3) (Closed under scalar multiplication) For 𝑓 ∈ 𝑉 and 𝑐 ∈ 𝐂, 𝑐𝑓 ∈ 𝑉. A subset of a function space 𝑉 that is itself a function space is called a function subspace, or simply a subspace, of 𝑉. If 𝑉 is a function space, often we call 𝑓 ∈ 𝑉 a function when 𝑓 is being considered in terms of analysis and call 𝑓 a vector when we think of 𝑓 as an element of an unspecified abstract function space. If the reader has any familiarity with linear algebra, then the following example will show how Definition 5.2.1 encompasses the standard examples of a first course in linear algebra.

118

Chapter 5. The idea of a function space

Example 5.2.2. Let 𝑉 be the set of all complex-valued functions on 𝑋 = {1, … , 𝑛}. Then certainly 𝑉 satisfies the axioms of Definition 5.2.1; moreover, if we write the values of 𝑓 ∶ 𝑋 → 𝐂 as the vector (𝑓(1), … , 𝑓(𝑛)), we see that 𝑉 can be identified with 𝐂𝑛 , the space of all complex row vectors of length 𝑛. In this context, the zero function (0, … , 0) is also called the zero vector. Remark 5.2.3. The reader familiar with axiom-based linear algebra will note that, by the Subspace Theorem (Theorem B.4), a function space on 𝑋 is precisely a subspace of 𝐅(𝑋, 𝐂); see Appendix B for details. In any case, the main point of Definition 5.2.1 is that functions in a function space can be manipulated algebraically like vectors in a vector space; for example, we can form arbitrary linear combinations of functions in a function space 𝑉 and still stay in 𝑉. Some familiar results can be rephrased in terms of function spaces as follows. Example 5.2.4. For 𝑋 ⊆ 𝐂, it follows immediately from Theorem 3.1.5 that the set of all continuous functions on 𝑋 is a function space on 𝑋. Similarly, if 𝑋 ⊆ 𝐂 is a set such that every point of 𝑋 is a limit point of 𝑋, then by Theorem 3.2.8, the set of all differentiable functions on 𝑋 is a function space on 𝑋; and by Corollary 3.4.3, the set of all integrable functions on a closed interval [𝑎, 𝑏] is a function space on [𝑎, 𝑏] denoted by ℛ(𝑋). In the rest of this book, we will be particularly interested in spaces of functions defined by their degrees of smoothness (continuity and differentiability), bringing us to the following definition. Definition 5.2.5. Let 𝑋 be a nonempty subset of 𝐂 such that every point of 𝑋 is a limit point. We define 𝐶 0 (𝑋) to be the set of all continuous 𝑓 ∶ 𝑋 → 𝐂, which is a function space by Theorem 3.1.5. Similarly, for any positive integer 𝑟, we define 𝐶 𝑟 (𝑋) to be the set of all 𝑓 ∶ 𝑋 → 𝐂 with continuous 𝑟th derivatives, which is a function space by Theorems 3.2.8 and 3.1.5. We also define 𝐶 ∞ (𝑋) to be the set of all 𝑓 ∶ 𝑋 → 𝐂 with 𝑟th derivatives for every positive integer 𝑟, which is a function space for analogous reasons. Finally, when 𝑋 is an interval in 𝐑, we define ℛ(𝑋) to be the set of all integrable functions on 𝑋, which is a function space by Corollary 3.4.3. Note that by Corollary 3.2.7 and Theorem 3.4.5, we have that ℛ(𝑋) ⊃ 𝐶 0 (𝑋) ⊃ 𝐶 1 (𝑋) ⊃ 𝐶 2 (𝑋) ⊃ ⋯ ⊃ 𝐶 ∞ (𝑋).

(5.2.1)

Definition 5.2.6. We will also occasionally use multivariable versions of 𝐶 𝑟 (𝑋). Specifically, for a function 𝑓(𝑥, 𝑦) of two variables on a domain 𝑋 ⊆ 𝐑 × 𝐑, to say that 𝑓 ∈ 𝐶𝑥𝑟 (𝑋) means that for any fixed 𝑦0 ∈ 𝐑, 𝑓(𝑥, 𝑦0 ) is in 𝐶 𝑟 (𝑋) as a function of 𝑥, and similarly for 𝑓 ∈ 𝐶𝑦𝑟 (𝑋). Finally, to say that 𝑓 ∈ 𝐶 𝑟 (𝑋) means that every 𝑟th partial derivative of 𝑓 exists and is continuous; in particular, by Theorem 3.6.18, if 𝑓 ∈ 𝐶 1 (𝑋), then 𝑓 is itself continuous. Example 5.2.7. To give some key examples, Theorem 4.5.2 implies that 𝑒𝑥 ∈ 𝐶 ∞ (𝐑). More generally, the chain rule and other derivative laws then imply that for any 𝛼 ∈ 𝐂, 𝑘 ∈ 𝐑, 𝑒𝛼𝑥 , cos(𝑘𝑥), and sin(𝑘𝑥) are all in 𝐶 ∞ (𝐑).

5.2. Function spaces and metrics

119

Example 5.2.8. For the reader who has seen the Schwartz space 𝒮(𝐑) from Section 4.7, Theorem 4.7.2 implies that 𝒮(𝐑) is a function space on 𝐑. We also define a special case of function spaces that will be the source of most of our main examples, at least until Part 4 of this book. Definition 5.2.9. To say that the domain of a function 𝑓 is the circle 𝑆 1 means that: •

The domain of 𝑓 is 𝐑.

For all 𝑥 ∈ 𝐑, 𝑓(𝑥 + 1) = 𝑓(𝑥).

We think of such functions as being defined on the circle because they are determined by their values on [0, 1], with 𝑓(0) = 𝑓(1), and identifying the ends of a closed interval gives a circle. Continuity, limits, and derivatives are all defined as usual for functions on 𝑆 1 , but integrals are defined differently: To say that 𝑓 ∶ 𝑆 1 → 𝐂 is integrable means that 1 2

1

∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥 𝑆1

0

1 2

(5.2.2)

1 1

exists. Note that if 𝑓 is integrable on either [0, 1] or [− , ], (5.2.2) holds by periodicity 2 2 and additivity of domain; in fact, the integral of 𝑓 may be computed on any interval [𝑎, 𝑎 + 1] in 𝐑. As in (5.2.1), we have ℛ(𝑆 1 ) ⊃ 𝐶 0 (𝑆 1 ) ⊃ 𝐶 1 (𝑆1 ) ⊃ 𝐶 2 (𝑆 1 ) ⊃ ⋯ ⊃ 𝐶 ∞ (𝑆1 ).

(5.2.3)

Example 5.2.10. To give a key example, by Theorem 4.5.7, for any 𝑛 ∈ 𝐙, the functions 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 (Definition 4.6.1) are all in 𝐶 ∞ (𝑆 1 ). Convention 5.2.11. Since every 𝑥 ∈ 𝐑 differs from some 𝑥0 ∈ [0, 1) by an integer, a function 𝑓 ∶ 𝑆 1 → 𝐂 is determined by its values on [0, 1). We will therefore often describe such an 𝑓 by a formula that is only meant to apply when 0 ≤ 𝑥 < 1, or 1 1 similarly, is only meant to apply when − ≤ 𝑥 < , and so on. 2

2

Example 5.2.12. It is not too hard to find functions in 𝐶 0 (𝑆 1 ) that are not in 𝐶 1 (𝑆 1 ); for example, we have the “periodized” absolute value function given by 𝑓(𝑥) = |𝑥| for 1 1 − ≤ 𝑥 ≤ , as per Convention 5.2.11. We can then construct functions contained in 2

2

𝐶 𝑘 (𝑆 1 ) but not in 𝐶 𝑘+1 (𝑆 1 ) by repeated indefinite integration; see Problem 5.2.1. To better connect the idea of a function space back to Section 5.1, we now give one example of a metric on a space of functions. (The reader interested in our most prominent examples of metrics on function spaces may want to glance ahead at Sections 7.2 and 7.6.) Definition 5.2.13. For 𝑋 a closed and bounded subset of 𝐂 and 𝑓, 𝑔 ∈ 𝐶 0 (𝑋), we define 𝑑(𝑓, 𝑔) = sup {|𝑓(𝑥) − 𝑔(𝑥)| ∣ 𝑥 ∈ 𝑋} . (5.2.4) (Note that Corollary 3.1.16 ensures that 𝑑(𝑓, 𝑔) is finite.)

120

Chapter 5. The idea of a function space

Theorem 5.2.14. For 𝑋 a closed and bounded subset of 𝐂, the function 𝑑(𝑓, 𝑔) given by (5.2.4) defines a metric on 𝐶 0 (𝑋). We call 𝑑(𝑓, 𝑔) the 𝐿∞ metric on 𝐶 0 (𝑋). Proof. First, 𝑑(𝑓, 𝑔) ≥ 0 because 𝑑(𝑓, 𝑔) is the supremum of a set of nonnegative numbers. It is also clear that 𝑑(𝑓, 𝑔) = 𝑑(𝑔, 𝑓) and that 𝑑(𝑓, 𝑔) = 0 if and only if 𝑓 = 𝑔, so it remains only to verify the triangle inequality, which we do in Problem 5.2.2. Note that the quantity 𝑑𝑛 that appears in the “alternative definition” of uniform convergence (Lemma 4.3.6) is, in the terms of (5.2.4), precisely 𝑑𝑛 = 𝑑(𝑓𝑛 , 𝑓). In fact, it follows from Lemmas 2.4.16 and 4.3.6 that 𝑓𝑛 converges to 𝑓 uniformly on 𝑋 if and only if 𝑓𝑛 converges to 𝑓 in the 𝐿∞ metric on 𝐶 0 (𝑋). Along those lines, we have: Theorem 5.2.15. For 𝑋 a closed and bounded subset of 𝐂, we have that 𝐶 0 (𝑋) is a complete metric space (Definition 2.5.4) under the 𝐿∞ metric. Proof. First, we observe that if 𝑓𝑛 is a sequence in 𝐶 0 (𝑋) that is Cauchy with respect to the 𝐿∞ metric, then 𝑓𝑛 is a uniformly Cauchy sequence of functions on 𝑋 (Problem 5.2.3). Therefore, by Theorem 4.3.5, 𝑓𝑛 must converge uniformly to some 𝑓 ∶ 𝑋 → 𝐂. By Theorem 4.3.12, a sequence of continuous functions that converges uniformly must converge to a continuous function, and the theorem follows. We also take this opportunity to introduce (or review) some terminology from linear algebra that we will use occasionally. Definition 5.2.16. For a function space 𝑉, to say that a finite subset {𝑓1 , … , 𝑓𝑛 } ⊆ 𝑉 is linearly independent means that if 𝑎1 𝑓1 + ⋯ + 𝑎𝑛 𝑓𝑛 = 0 for some coefficients 𝑎𝑖 ∈ 𝐂, then every coefficient 𝑎𝑖 = 0. Definition 5.2.17. For a function space 𝑉 and a finite subset 𝑆 = {𝑓1 , … , 𝑓𝑛 } ⊆ 𝑉, the span of 𝑆 is defined to be the set {𝑎1 𝑓1 + ⋯ + 𝑎𝑛 𝑓𝑛 ∣ 𝑎𝑖 ∈ 𝐂} ⊆ 𝑉. For more on function spaces in the context of axiom-based linear algebra, see Appendix B.

Problems. 5.2.1. Suppose 𝑓 ∈ 𝐶 𝑘 (𝑆 1 ) for some 𝑘 ∈ 𝐙, 𝑘 ≥ 0, and let 1

𝑎 = ∫ 𝑓(𝑥) 𝑑𝑥,

𝑥

𝑔(𝑥) = 𝑓(𝑥) − 𝑎,

𝐹(𝑥) = ∫ 𝑔(𝑡) 𝑑𝑡.

0

(5.2.5)

0

(a) Prove that 𝐹(𝑥 + 1) = 𝐹(𝑥). (b) Prove that 𝐹 ∈ 𝐶 𝑘+1 (𝑆 1 ). 5.2.2. (Proves Theorem 5.2.14) Let 𝑋 be a subset of 𝐂, and for 𝑓, 𝑔 ∈ 𝐶 0 (𝑋), define 𝑑(𝑓, 𝑔) = sup {|𝑓(𝑥) − 𝑔(𝑥)| ∣ 𝑥 ∈ 𝑋} . 0

(5.2.6)

(a) Find real-valued 𝑓, 𝑔, ℎ ∈ 𝐶 ([0, 1]) such that 𝑓(𝑥) ≤ 𝑔(𝑥) ≤ ℎ(𝑥) for all 𝑥 ∈ [0, 1] and 𝑑(𝑓, ℎ) ≠ 𝑑(𝑓, 𝑔) + 𝑑(𝑔, ℎ). (b) For 𝑓, 𝑔, ℎ ∈ 𝐶 0 (𝑋), prove that 𝑑(𝑓, ℎ) ≤ 𝑑(𝑓, 𝑔) + 𝑑(𝑔, ℎ).

5.3. Dot products

121

5.2.3. (Proves Theorem 5.2.15) Let 𝑋 ⊆ 𝐂 and let 𝑓𝑛 be a sequence in 𝐶 0 (𝑋) that is Cauchy with respect to the 𝐿∞ metric. Prove that 𝑓𝑛 is a uniformly Cauchy sequence of functions on 𝑋. 5.2.4. Let 𝑋 = [𝑎, 𝑏], and for 𝑓, 𝑔 ∈ 𝐶 0 (𝑋), define 𝑑(𝑓, 𝑔) = sup {|𝑓(𝑥) − 𝑔(𝑥)| ∣ 𝑥 ∈ 𝑋} .

(5.2.7)

Prove for 𝑎 ∈ 𝐂 that 𝑑(𝑎𝑓, 0) = |𝑎| 𝑑(𝑓, 0).

5.3 Dot products The previous sections of this chapter introduced the ideas behind two of the key tools we will use to study Fourier series, namely, function spaces and metrics defined upon them. Our favorite metric on a function space will be what is known as the 𝐿2 metric, as that metric allows us to introduce geometry through a generalization of the dot product known as an inner product. Therefore, in this section, we briefly review the geometry of dot products. Recall that the dot product ⋅ ∶ 𝐑𝑛 × 𝐑𝑛 → 𝐑 is defined to be (𝑥1 , … , 𝑥𝑛 ) ⋅ (𝑦1 , … , 𝑦𝑛 ) = 𝑥1 𝑦1 + ⋯ + 𝑥𝑛 𝑦𝑛

(5.3.1)

𝑛

for all (𝑥1 , … , 𝑥𝑛 ), (𝑦1 , … , 𝑦𝑛 ) ∈ 𝐑 . The dot product has the following algebraic properties, as the reader may recall (or check by calculation). Theorem 5.3.1. For 𝑣, 𝑤, 𝑥 ∈ 𝐑𝑛 and 𝑐 ∈ 𝐑, we have the following properties: (1) 𝑣 ⋅ 𝑤 = 𝑤 ⋅ 𝑣. (2) (𝑣 + 𝑤) ⋅ 𝑥 = 𝑣 ⋅ 𝑥 + 𝑤 ⋅ 𝑥. (3) (𝑐𝑣) ⋅ 𝑤 = 𝑐(𝑣 ⋅ 𝑤). (4) If 𝑥 = (𝑥1 , … , 𝑥𝑛 ), then 𝑥 ⋅ 𝑥 = 𝑥12 + ⋯ + 𝑥𝑛2 . Proof. Problem 5.3.1. What may be less familiar to the reader is that many key features of Euclidean geometry can be expressed in terms of dot products. For example, Theorem 5.3.1(4) shows that the standard Euclidean length of a vector 𝑣 is ‖𝑣‖ = √𝑣 ⋅ 𝑣. It is also a fact from 2- and 3-dimensional geometry that if 𝜃 is the angle between vectors 𝑣 and 𝑤, then 𝑣⋅𝑤 cos 𝜃 = (5.3.2) . ‖𝑣‖ ‖𝑤‖ Generalizing to 𝐑𝑛 , we may instead define the angle 𝜃 between nonzero 𝑣, 𝑤 ∈ 𝐑𝑛 by 𝑣⋅𝑤 𝜃 = cos−1 ( ). In particular, if 𝑣 ⋅ 𝑤 = 0, then 𝜃 = 𝜋/2, and we say that 𝑣 and 𝑤 ‖𝑣‖ ‖𝑤‖ are orthogonal. Orthogonality turns out to be useful for many reasons. For example, to say that {𝑣1 , … , 𝑣𝑘 } ⊆ 𝐑𝑛 is orthonormal means that 1 if 𝑖 = 𝑗, 𝑣𝑖 ⋅ 𝑣𝑗 = { 0 if 𝑖 ≠ 𝑗.

(5.3.3)

122

Chapter 5. The idea of a function space

Orthonormal bases also give conveniently computed coordinates, in that if {𝑣1 , … , 𝑣𝑛 } is an orthonormal set in 𝐑𝑛 and 𝑤 = 𝑎1 𝑣1 + ⋯ + 𝑎𝑛 𝑣𝑛

(5.3.4)

for some 𝑤 ∈ 𝐑𝑛 , then 𝑎𝑖 = 𝑤 ⋅ 𝑣𝑖 (Problem 5.3.2). As we shall see in Section 7.1, the dot product also generalizes to 𝐂𝑛 in a straightforward manner (Example 7.1.5). For now, however, we generalize the dot product in a different manner, so we can introduce an example that is quite important later. Definition 5.3.2. Let 𝑋 = 𝐍 or 𝐙. While a function 𝑎 ∶ 𝑋 → 𝐂 is, technically speaking, either a sequence or a “two-sided sequence,” we will later need to work with sequences of such functions, so in the context we are about to consider, we use the notation 𝑎(𝑥) to refer to 𝑎 ∶ 𝑋 → 𝐂. In any case, for 𝑋 = 𝐍 or 𝐙, we define ℓ2 (𝑋) to be the set of all 𝑎 ∶ 𝑋 → 𝐂 such that 2 2 ‖𝑎(𝑥)‖ = ∑ |𝑎(𝑥)| (5.3.5) 𝑥∈𝑋

is finite. Note that since (5.3.5) is a series with nonnegative terms, it can be shown that the order of summation does not matter (see Appendix A), so the same definition actually works for any countable set 𝑋. It turns out that we can think of ℓ2 (𝑋) (𝑋 = 𝐍 or 𝐙) as a function space with a dot product on it, in the following sense. Theorem 5.3.3. For 𝑋 = 𝐍 or 𝐙, ℓ2 (𝑋) is a function space, and for 𝑎(𝑥), 𝑏(𝑥) ∈ ℓ2 (𝑋), ⟨𝑎(𝑥), 𝑏(𝑥)⟩ = ∑ 𝑎(𝑥)𝑏(𝑥)

(5.3.6)

𝑥∈𝑋

converges absolutely. Proof. For 𝑎(𝑥), 𝑏(𝑥) ∈ ℓ2 (𝑋) and 𝑐 ∈ 𝐂, standard properties of series imply that 2 2 2 ‖𝑐𝑎(𝑥)‖ = |𝑐| ‖𝑎(𝑥)‖ , and the fact that 𝑎(𝑥)+𝑏(𝑥) ∈ ℓ2 (𝑋) is proved in Problem 5.3.3. It follows that ℓ2 (𝑋) is a function space on 𝑋. As for the absolute convergence of (5.3.6), again, see Problem 5.3.3. See Problem 7.1.9 for an alternative proof of Theorem 5.3.3.

Problems. 5.3.1. (Proves Theorem 5.3.1) Fix 𝑣, 𝑤, 𝑥 ∈ 𝐑𝑛 and 𝑐 ∈ 𝐑. (a) Prove that 𝑣 ⋅ 𝑤 = 𝑤 ⋅ 𝑣. (b) Prove that (𝑣 + 𝑤) ⋅ 𝑥 = 𝑣 ⋅ 𝑥 + 𝑤 ⋅ 𝑥. (c) Prove that (𝑐𝑣) ⋅ 𝑤 = 𝑐(𝑣 ⋅ 𝑤). (d) Prove that if 𝑣 = (𝑣1 , … , 𝑣𝑛 ), then 𝑣 ⋅ 𝑣 = 𝑣12 + ⋯ + 𝑣𝑛2 . 5.3.2. Suppose {𝑣1 , … , 𝑣𝑛 } is an orthonormal set in 𝐑𝑛 and 𝑤 = 𝑎1 𝑣1 + ⋯ + 𝑎𝑛 𝑣𝑛 for some 𝑤 ∈ 𝐑𝑛 . Prove that 𝑎𝑖 = 𝑣𝑖 ⋅ 𝑤.

5.3. Dot products

123

5.3.3. (Proves Theorem 5.3.3) Let 𝑋 = 𝐍 or 𝐙 and suppose that 𝑎(𝑥), 𝑏(𝑥) ∈ ℓ2 (𝑋). (a) Prove that for all 𝑥 ∈ 𝑋, |𝑎(𝑥)𝑏(𝑥)| ≤

1 1 2 2 |𝑎(𝑥)| + |𝑏(𝑥)| . 2 2

(5.3.7)

(b) Prove that ⟨𝑎(𝑥), 𝑏(𝑥)⟩ = ∑ 𝑎(𝑥)𝑏(𝑥) 𝑥∈𝑋

converges absolutely. (c) Prove that 𝑎(𝑥) + 𝑏(𝑥) ∈ ℓ2 (𝑋).

(5.3.8)

6 Fourier series The enigma which, about 2,500 years ago, Pythagoras proposed to science, which investigates the reasons of things, “Why is consonance determined by the ratios of small whole numbers?” has been solved. . . . The resolution into partial tones, mathematically expressed, is effected by Fourier’s law, which [shows] how any periodically variable magnitude, whatever be its nature, can be expressed by a sum of the simplest periodic magnitudes. . . . Ultimately, then, the reason of the rational numerical relations of Pythagoras is to be found in the theorem of Fourier, and in one sense this theorem may be considered as the prime source of the theory of harmony. — Hermann von Helmholtz, On the Sensations of Tone In this chapter, we introduce Fourier series and prove some initial results. To begin with, we define Fourier polynomials (Section 6.1) and then Fourier series (Section 6.2). After examining the special case of real-valued functions (Section 6.3), we discuss what we can prove about pointwise convergence of Fourier series with relatively straightforward methods and why it will be useful to have fancier and better methods available to us (Section 6.4).

6.1 Fourier polynomials The goal of the rest of Part 2 is to understand how we may “best” (in a sense to be made precise later) approximate a given function 𝑓 with domain 𝑆 1 with functions of the following type. Definition 6.1.1. A trigonometric polynomial of degree 𝑁 is a function 𝑝 ∶ 𝑆 1 → 𝐂 of the form 𝑁

𝑝(𝑥) = ∑ 𝑐𝑛 𝑒𝑛 (𝑥)

(6.1.1)

𝑛=−𝑁

for some coefficients 𝑐𝑛 ∈ 𝐂, where again, 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 . 125

126

Chapter 6. Fourier series

We describe the function 𝑝(𝑥) in (6.1.1) as trigonometric because each 𝑒𝑛 (𝑥) can be expanded as (6.1.2) 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 = cos(2𝜋𝑛𝑥) + 𝑖 sin(2𝜋𝑛𝑥). On the other hand, 𝑝(𝑥) is also a polynomial, because if we let 𝑞 = 𝑒2𝜋𝑖𝑥 , then 𝑒𝑛 = 𝑞𝑛 , 𝑁

and the sum in (6.1.1) becomes ∑ 𝑐𝑛 𝑞𝑛 , a Laurent polynomial (polynomial with 𝑛=−𝑁

both positive and negative integer power terms) in 𝑞. Note that we would usually only say that the degree of such a polynomial is 𝑁 if either 𝑐𝑁 ≠ 0 or 𝑐−𝑁 ≠ 0, but out of laziness, we allow the possibility of 𝑐𝑁 = 𝑐−𝑁 = 0, to avoid having to say “degree at most 𝑁” repeatedly. In any case, we may now ask the imprecise question: Which trigonometric polynomial(s) best approximate a given 𝑓 ∶ 𝑆 1 → 𝐂? Better yet, keeping Chapter 5 in mind, we may ask (still imprecisely): Question 6.1.2. For a given 𝑓 with domain 𝑆 1 , which trigonometric polynomials best approximate a given 𝑓 ∶ 𝑆 1 → 𝐂 on average? As in Chapter 5, by “on average” we mean something like “having an absolute error function with the smallest possible integral on 𝑆1 .” Now, we hope the reader finds it plausible that for the trigonometric polynomial 𝑝(𝑥) in (6.1.1) to approximate 𝑓 ∶ 𝑆 1 → 𝐂 well, it should at least have the same average behavior as 𝑓(𝑥). We therefore spend the rest of this section examining the average behavior of 𝑝(𝑥) on 𝑆1 . For example: Theorem 6.1.3. Let 𝑝(𝑥) be a trigonometric polynomial given by (6.1.1). Then 1

∫ 𝑝(𝑥) 𝑑𝑥 = ∫ 𝑝(𝑥) 𝑑𝑥 = 𝑐0 . 𝑆1

(6.1.3)

0

Proof. Problem 6.1.1. Emboldened by the success of Theorem 6.1.3 in extracting the constant term of 𝑝(𝑥) based on its average behavior, we may ask: Can we do the same for the other coefficients of 𝑝(𝑥)? The following theorem shows that the answer is yes. Theorem 6.1.4. Let 𝑝(𝑥) be a trigonometric polynomial given by (6.1.1). Then for any 𝑛 such that −𝑁 ≤ 𝑛 ≤ 𝑁, we have 1

∫ 𝑝(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥 = 𝑐𝑛 .

(6.1.4)

0

Proof. Problem 6.1.2. We may therefore (a bit presumptuously) conclude: If a trigonometric polynomial 𝑝(𝑥) is to approximate some integrable 𝑓 ∶ 𝑆 1 → 𝐂 well on average, then we should have 1

1

∫ 𝑓(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥 = ∫ 𝑝(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥 = 𝑐𝑛 . 0

This leads to the following definition.

0

(6.1.5)

6.2. Fourier series

127

Definition 6.1.5. Let 𝑓 ∶ 𝑆 1 → 𝐂 be integrable. For 𝑛 ∈ 𝐙, we define 1

̂ = ∫ 𝑓(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥 𝑓(𝑛)

(6.1.6)

0

to be the 𝑛th Fourier coefficient of 𝑓. We define the 𝑁th Fourier polynomial 𝑓𝑁 of 𝑓 to be 𝑁

̂ 𝑓𝑁 (𝑥) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥).

(6.1.7)

𝑛=−𝑁

In other words, 𝑓𝑁 (𝑥) is the trigonometric polynomial of degree 𝑁 whose coefficients ̂ are the Fourier coefficients 𝑓(𝑛).

Problems. 6.1.1. (Proves Theorem 6.1.3) Let 𝑁

𝑝(𝑥) = ∑ 𝑐𝑛 𝑒𝑛 (𝑥).

(6.1.8)

𝑛=−𝑁

Prove that

1

∫ 𝑝(𝑥) 𝑑𝑥 = ∫ 𝑝(𝑥) 𝑑𝑥 = 𝑐0 . 𝑆1

(6.1.9)

0

6.1.2. (Proves Theorem 6.1.4) Let 𝑁

𝑝(𝑥) = ∑ 𝑐𝑛 𝑒𝑛 (𝑥).

(6.1.10)

𝑛=−𝑁

Prove that

1

∫ 𝑝(𝑥) 𝑒𝑘 (𝑥) 𝑑𝑥 = 𝑐𝑘 ,

(6.1.11)

0

where −𝑁 ≤ 𝑘 ≤ 𝑁. (Note the change of subscript from 𝑛 in (6.1.4) to 𝑘 in (6.1.11), which does not change the meaning of the equation but will help to avoid confusion between the constant subscript 𝑘 and the variable subscript 𝑛 appearing in the definition of 𝑝(𝑥).)

6.2 Fourier series Continuing our chain of plausibilities from Section 6.1, we may reason that if the Fourier polynomials of 𝑓 ∶ 𝑆 1 → 𝐂 are good approximations of 𝑓, then their limit as 𝑁 → ∞ will converge to 𝑓. Put another way, we may make the idea of “good approximation” precise by saying that the 𝑓𝑁 are good approximations of 𝑓 if lim 𝑓𝑁 = 𝑓 in some ap𝑁→∞

propriate sense: pointwise, uniform, or “on average” (a term that, again, we will make precise later). For now, we will be content to introduce one of the two principal objects of study in Part 2. Definition 6.2.1. Let 𝑓 ∶ 𝑆 1 → 𝐂 be integrable, and recall that for 𝑛 ∈ 𝐙, we define 1

̂ = ∫ 𝑓(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥. 𝑓(𝑛) 0

(6.2.1)

128

Chapter 6. Fourier series

We define the Fourier series of 𝑓 to be the limit of its Fourier polynomials as 𝑁 → ∞: ∞

̂ ̂ 𝑓(𝑥) ∼ lim 𝑓𝑁 (𝑥) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥) = ∑ 𝑓(𝑛)𝑒𝑛 (𝑥). 𝑁→∞

𝑛=−∞

(6.2.2)

𝑛∈𝐙

Note that the notation ∼ indicates merely that what is on the right-hand side is the Fourier series of 𝑓 and need not have any implications in terms of convergence (uniform, pointwise, or otherwise). Note also that by definition, the series in (6.2.2) is summed synchronously (Definition 4.1.8). ̂ As discussed in Section 6.1, we choose the coefficients 𝑓(𝑛) so that the Fourier polynomials 𝑓𝑁 (𝑥) approximate 𝑓(𝑥) well on average. As further motivation for the ̂ in (6.2.1), we have that the Fourier series of a trigonometric polydefinition of 𝑓(𝑛) nomial 𝑝(𝑥) is 𝑝(𝑥) itself (Problem 6.2.1). More generally, if we have a trigonometric series that converges uniformly to 𝑓(𝑥), then its coefficients must be the Fourier coef̂ ficients 𝑓(𝑛): Theorem 6.2.2. Let 𝑓 ∶ 𝑆 1 → 𝐂 be integrable and let 𝑁

𝑔𝑁 (𝑥) = ∑ 𝑐𝑛 𝑒𝑛 (𝑥)

(6.2.3)

𝑛=−𝑁

be a sequence of trigonometric polynomials such that 𝑔𝑁 converges to 𝑓 uniformly on [0, 1] (i.e., on 𝑆 1 ). Then 1

̂ = ∫ 𝑓(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥. 𝑐𝑛 = 𝑓(𝑛)

(6.2.4)

0

Proof. Problem 6.2.2. Remark 6.2.3. As mentioned above, for Fourier series, synchronous summation (Definition 4.1.8) is the natural order of summation coming from (6.2.2). Note also, however, that Fourier series provide natural examples where we need to be careful about ̂ ∈ 𝐑 such that for fixed the order of summation, in that we can choose coefficients 𝑓(𝑛) 𝑥 (say, 𝑥 = 0), by allowing 𝑁 to go to +∞ and −∞ at different rates, we can get (6.2.2) to converge to any real number that we want, or even +∞ or −∞! See Appendix A, and Example A.5 in particular, for details; again, the point to keep in mind is that when a series converges absolutely, the order of summation does not matter (Corollary A.3). Remark 6.2.4. We will later see a fancier version of Definition 6.2.1 in (8.1.3), but rest assured, Definition 6.2.1 will always work for an integrable 𝑓. We now present several examples, leaving computations to the reader (Problems 6.2.3–6.2.7). Note that the one tricky aspect of computing Fourier coefficients is that ̂ separately from 𝑓(𝑛) ̂ with 𝑛 ≠ 0. we often have to handle 𝑓(0) Example 6.2.5 (Square wave). Let 𝑓 ∶ 𝑆 1 → 𝐂 be given by 1

𝑓(𝑥) = {

1 if 0 ≤ 𝑥 ≤ , 0 if

1 2

2

≤ 𝑥 < 1.

(6.2.5)

6.2. Fourier series

129

Figure 6.2.1. Fourier approximation of square wave, 𝑁 = 19 Then 𝑛 ̂ = 1, ̂ = 1 − (−1) 𝑓(0) 𝑓(𝑛) for 𝑛 ≠ 0. (6.2.6) 2 2𝜋𝑖𝑛 To give an idea of what the Fourier series of 𝑓 looks like, we have that the 3rd Fourier polynomial 𝑓3 (𝑥) is 1 1 1 1 1 𝑓3 (𝑥) = − ( (6.2.7) ) 𝑒 (𝑥) − ( ) 𝑒−1 (𝑥) + + ( ) 𝑒1 (𝑥) + ( ) 𝑒 (𝑥), 3𝜋𝑖 −3 𝜋𝑖 2 𝜋𝑖 3𝜋𝑖 3 and one might imagine the Fourier series of 𝑓 as continuing similarly in both directions. Looking further on in the sequence of Fourier polynomials, Figure 6.2.1 compares 𝑓(𝑥) and 𝑓19 (𝑥).

Example 6.2.6 (Sawtooth wave). Let 𝑓 ∶ 𝑆 1 → 𝐂 be given by 1 1 𝑓(𝑥) = 𝑥 for − ≤ 𝑥 < . 2 2 Then 𝑛 ̂ = 0, ̂ = − (−1) 𝑓(0) 𝑓(𝑛) for 𝑛 ≠ 0. 2𝜋𝑖𝑛 Again, to give an idea of what the Fourier series of 𝑓 looks like, we have 1 1 1 𝑓3 (𝑥) = − ( ) 𝑒 (𝑥) + ( ) 𝑒 (𝑥) − ( ) 𝑒 (𝑥) 6𝜋𝑖 −3 4𝜋𝑖 −2 2𝜋𝑖 −1 1 1 1 +( ) 𝑒 (𝑥) − ( ) 𝑒 (𝑥) + ( ) 𝑒 (𝑥). 2𝜋𝑖 1 4𝜋𝑖 2 6𝜋𝑖 3 1 Note that since (𝑒𝑛 (𝑥) − 𝑒−𝑛 (𝑥)) = sin(2𝜋𝑛𝑥), we also have that 2𝑖 sin(2𝜋𝑥) sin(4𝜋𝑥) sin(6𝜋𝑥) 𝑓3 (𝑥) = − + . 𝜋 2𝜋 3𝜋

(6.2.8)

(6.2.9)

(6.2.10)

(6.2.11)

130

Chapter 6. Fourier series

In fact, the reader may recognize the Fourier series of 𝑓 as our initial motivating example (1.1.2) from Section 1.1; see Figure 1.1.1 for what convergence looks like. Example 6.2.7 (Triangle wave). Let 𝑓 ∶ 𝑆 1 → 𝐂 be given by 1 1 𝑓(𝑥) = |𝑥| for − ≤ 𝑥 < . 2 2 Then 𝑛 ̂ = 1, ̂ = 2 − 2(−1) 𝑓(0) 𝑓(𝑛) for 𝑛 ≠ 0. 2 4 (2𝜋𝑖𝑛)

(6.2.12)

(6.2.13)

Figure 6.2.2. Fourier approximation of triangle wave, 𝑁 = 5 To give an idea of what the Fourier series of 𝑓 looks like, we have 4 4 ) 𝑒 (𝑥) + ( ) 𝑒 (𝑥) (6𝜋𝑖)2 −3 (2𝜋𝑖)2 −1 1 4 4 + +( ) 𝑒 (𝑥) + ( ) 𝑒 (𝑥). 4 (2𝜋𝑖)2 1 (6𝜋𝑖)2 3

𝑓3 (𝑥) = (

(6.2.14)

Looking further on in the sequence of Fourier polynomials, Figure 6.2.1 compares 𝑓(𝑥) and 𝑓5 (𝑥). (Note that we only go as far as 𝑁 = 5 because the Fourier series of 𝑓 happens to converge relatively quickly, and going much further makes it difficult to see the difference between 𝑓(𝑥) and 𝑓𝑁 (𝑥).) Example 6.2.8 (𝑥2 periodized). Let 𝑓 ∶ 𝑆 1 → 𝐂 be given by 1 1 𝑓(𝑥) = 𝑥2 for − ≤ 𝑥 < . 2 2 Then 𝑛 ̂ = 1, ̂ = −2(−1) 𝑓(0) for 𝑛 ≠ 0. 𝑓(𝑛) 12 (2𝜋𝑖𝑛)2

(6.2.15)

(6.2.16)

6.2. Fourier series

131

Example 6.2.9 (𝑥3 periodized). Let 𝑓 ∶ 𝑆 1 → 𝐂 be given by 𝑓(𝑥) = 𝑥3

1 1 ≤𝑥< . 2 2

for −

(6.2.17)

Then ̂ = 0, 𝑓(0)

𝑛 𝑛 ̂ = − ( 1 ) (−1) − 6(−1) 𝑓(𝑛) 4 2𝜋𝑖𝑛 (2𝜋𝑖𝑛)3

for 𝑛 ≠ 0.

(6.2.18)

We note one last set of formulas, proven in Problem 6.2.8: For integrable 𝑓, 𝑔 ∶ 𝑆 1 → 𝐂 and 𝑎 ∈ 𝐂, we have ˆ ̂ (𝑎𝑓)(𝑛) = 𝑎𝑓(𝑛),

ˆ ̂ + 𝑔(𝑛). (𝑓 + 𝑔)(𝑛) = 𝑓(𝑛) ̂

(6.2.19)

Problems. 𝑁

6.2.1. Prove that if 𝑝(𝑥) = ∑ 𝑐𝑛 𝑒𝑛 (𝑥), then 𝑝(𝑛) ̂ = 𝑐𝑛 . (In other words, the Fourier 𝑛=−𝑁

series of a trigonometric polynomial 𝑝(𝑥) is 𝑝(𝑥) itself.) 6.2.2. (Proves Theorem 6.2.2) Let 𝑓 ∶ 𝑆 1 → 𝐂 be integrable and let 𝑁

𝑔𝑁 (𝑥) = ∑ 𝑐𝑛 𝑒𝑛 (𝑥)

(6.2.20)

𝑛=−𝑁

be a sequence of trigonometric polynomials such that 𝑔𝑁 converges to 𝑓 uniformly on [0, 1]. Prove that 1

𝑐𝑛 = ∫ 𝑓(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥.

(6.2.21)

0

6.2.3. Show that the Fourier coefficients of 1 𝑓(𝑥) = { 0

1 if 0 ≤ 𝑥 ≤ , 2 1 if ≤ 𝑥 < 1 2

(6.2.22)

are as described in Example 6.2.5. 6.2.4. Show that the Fourier coefficients of 𝑓(𝑥) = 𝑥

for −

1 1 ≤𝑥< 2 2

(6.2.23)

are as described in Example 6.2.6. 6.2.5. Show that the Fourier coefficients of 𝑓(𝑥) = |𝑥|

1 1 ≤𝑥< 2 2

(6.2.24)

1 1 ≤𝑥< 2 2

(6.2.25)

for −

are as described in Example 6.2.7. 6.2.6. Show that the Fourier coefficients of 𝑓(𝑥) = 𝑥2 are as described in Example 6.2.8.

for −

132

Chapter 6. Fourier series

6.2.7. Show that the Fourier coefficients of 𝑓(𝑥) = 𝑥3

for −

1 1 ≤𝑥< 2 2

(6.2.26)

are as described in Example 6.2.9. 6.2.8. For integrable 𝑓, 𝑔 ∶ 𝑆 1 → 𝐂 and 𝑎, 𝑏 ∈ 𝐂, prove that ˆ ̂ (𝑎𝑓)(𝑛) = 𝑎𝑓(𝑛),

ˆ ̂ + 𝑔(𝑛). (𝑓 + 𝑔)(𝑛) = 𝑓(𝑛) ̂

(6.2.27)

6.3 Real Fourier series When we take the Fourier series of a real-valued function 𝑓 ∶ 𝑆 1 → 𝐑, it turns out that we get cancellation that allows us to express that series as an infinite sum of sines and cosines. In this section, we show that any Fourier series can be rewritten in terms of sines and cosines; we show that a real-valued function has a Fourier sine/cosine series with real coefficients; and we look at the real Fourier series of odd and even extensions 1 of functions on [0, ]. 2 Note that this section is later used only in Section 11.4 and may otherwise be skipped. Nevertheless, the reader should be aware that many users of Fourier series express them in terms of sines and cosines, making this section useful for any reader interested in practical applications.

6.3.1 Fourier series in sines and cosines. First, we observe that since 𝑒𝑛 (𝑥) = cos(2𝜋𝑛𝑥) + 𝑖 sin(2𝜋𝑛𝑥),

𝑒−𝑛 (𝑥) = cos(2𝜋𝑛𝑥) − 𝑖 sin(2𝜋𝑛𝑥),

(6.3.1)

we have that 1 1 (6.3.2) (𝑒𝑛 (𝑥) + 𝑒−𝑛 (𝑥)), sin(2𝜋𝑛𝑥) = (𝑒𝑛 (𝑥) − 𝑒−𝑛 (𝑥)). 2 2𝑖 It follows that the span of {𝑒𝑛 , 𝑒−𝑛 } is equal to the span of {cos(2𝜋𝑛𝑥), sin(2𝜋𝑛𝑥)}. More precisely (Problem 6.3.1), for any 𝑐𝑛 ∈ 𝐂, there exist 𝑎𝑛 , 𝑏𝑛 ∈ 𝐂 such that cos(2𝜋𝑛𝑥) =

𝑁

𝑁

∑ 𝑐𝑛 𝑒𝑛 (𝑥) = 𝑛=−𝑁

𝑎0 + ∑ (𝑎𝑛 cos(2𝜋𝑛𝑥) + 𝑏𝑛 sin(2𝜋𝑛𝑥)). 2 𝑛=1

(6.3.3)

Note that letting 𝑁 → ∞ in (6.3.3) gives precisely the synchronous summation order of a two-sided series from Definition 4.1.8, so the synchronous order of summation of the complex form of a Fourier series precisely matches the standard order of summation of its sine/cosine form.

6.3.2 Real Fourier series of real-valued functions. The rewriting in (6.3.3) is particularly interesting in the case of the Fourier series of a real-valued function. Theorem 6.3.1. Let 𝑓 ∶ 𝑆 1 → 𝐑 be integrable and real-valued. If 𝑁

𝑁

𝑎0 ̂ ∑ 𝑓(𝑛)𝑒 + ∑ (𝑎𝑛 cos(2𝜋𝑛𝑥) + 𝑏𝑛 sin(2𝜋𝑛𝑥)), 𝑛 (𝑥) = 2 𝑛=−𝑁 𝑛=1

(6.3.4)

then ̂ + 𝑓(𝑛), ̂ 𝑎𝑛 = 𝑓(𝑛) and both 𝑎𝑛 and 𝑏𝑛 are real valued.

̂ − 𝑓(𝑛)), ̂ 𝑏𝑛 = 𝑖(𝑓(𝑛)

(6.3.5)

6.3. Real Fourier series

133

1 Proof. Problem 6.3.2. Note that the in the constant term of (6.3.4) is chosen so that 2 (6.3.5) still holds for 𝑛 = 0. Definition 6.3.2. Let 𝑓 ∶ 𝑆 1 → 𝐑 be integrable and real valued. We define the real Fourier series of 𝑓 to be ∞

𝑎0 + ∑ (𝑎𝑛 cos(2𝜋𝑛𝑥) + 𝑏𝑛 sin(2𝜋𝑛𝑥)), 2 𝑛=1

(6.3.6)

where 𝑎𝑛 , 𝑏𝑛 ∈ 𝐑 are given by (6.3.5). Note that since (6.3.6) has partial sums that are ̂ equal to those of ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥), the real Fourier series of 𝑓 is equal to the (complex) 𝑛∈𝐙

Fourier series of 𝑓 by definition; (6.3.6) is just a different way to write the sum. As the reader may find elsewhere, it is certainly possible to define real Fourier series without starting from complex Fourier series. To do so, we would begin with formulas for real trigonometric functions analogous to (4.6.8); namely, ⎧1/2 if 𝑛 = 𝑘 ≠ 0, ∫ cos(2𝜋𝑛𝑥) cos(2𝜋𝑘𝑥) 𝑑𝑥 = 1 if 𝑛 = 𝑘 = 0, ⎨ 0 otherwise, ⎩0 1

(6.3.7)

1

1/2 if 𝑛 = 𝑘, ∫ sin(2𝜋𝑛𝑥) sin(2𝜋𝑘𝑥) 𝑑𝑥 = { 0 otherwise, 0

(6.3.8)

1

∫ sin(2𝜋𝑛𝑥) cos(2𝜋𝑘𝑥) 𝑑𝑥 = 0.

(6.3.9)

0

See Problem 6.3.3. Proceeding analogously to Sections 6.1–6.2, we would end up defining 1

𝑎𝑛 = 2 ∫ 𝑓(𝑥) cos(2𝜋𝑛𝑥) 𝑑𝑥,

1

𝑏𝑛 = 2 ∫ 𝑓(𝑥) sin(2𝜋𝑛𝑥) 𝑑𝑥,

0

(6.3.10)

0

a set of formulas that in our approach is a result of combining (6.2.1) and Theorem 6.3.1 (Problem 6.3.4). Remark 6.3.3. Note that the aesthetically unappealing factors of 2 in (6.3.10) are nevertheless correct and are forced upon us by the 1/2’s appearing in (6.3.7) and (6.3.8). In treatments where functions on 𝑆 1 have period 2𝜋 and not 1, they can be better hidden 1 1 to . by changing a factor of 2𝜋 𝜋

6.3.3 Real Fourier series of odd and even extensions. We will later find it useful to consider variations on real Fourier series that only use cosines or sines. Recall from calculus that a function 𝑔 ∶ 𝐑 → 𝐑 is even if 𝑔(−𝑥) = 𝑔(𝑥) for all 𝑥 and odd if 𝑔(−𝑥) = −𝑔(𝑥) for all 𝑥. Recall also that: •

The product of even functions is even.

The product of two odd functions is even.

The product of an even and an odd function is odd.

134

Chapter 6. Fourier series 𝑏

𝑏

If 𝑔 is even, then ∫ 𝑔(𝑥) 𝑑𝑥 = 2 ∫ 𝑔(𝑥) 𝑑𝑥. −𝑏

0

𝑏

If 𝑔 is odd, then ∫ 𝑔(𝑥) 𝑑𝑥 = 0. −𝑏 1

Definition 6.3.4. For 𝑓 ∶ [0, ] → 𝐑, define the even and odd extensions 𝑓even , 𝑓odd ∶ 2

𝑆 1 → 𝐑 of 𝑓 by

𝑓even (𝑥) = {

𝑓(𝑥) 𝑓(−𝑥)

⎧𝑓(𝑥) ⎪0 𝑓odd (𝑥) = ⎨−𝑓(−𝑥) ⎪ ⎩0

1

if 0 ≤ 𝑥 ≤ , 2

1

if − ≤ 𝑥 < 0, 2

1

if 0 < 𝑥 < , 2

if 𝑥 = 0, 1

if − < 𝑥 < 0, 2

(6.3.11)

1

if 𝑥 = ± . 2

In other words the even (resp. odd) extension of 𝑓 is the even (resp. odd) function on 1 1 𝑆 1 that agrees with 𝑓 on [0, ], with the possible exceptions of 𝑓odd (0) and 𝑓odd ( ). 2 2 More visually, as shown in Figure 6.3.1, we may think of the even and odd extensions of a given 𝑓 as “extending by reflection” and “extending by rotation around the origin,” 1 respectively. Figure 6.3.1 also shows that the definitions 𝑓odd (0) = 0 = 𝑓odd (± ) are 2 chosen to be rotation-symmetric, or alternatively, to preserve continuity when 𝑓(0) = 0 1 and 𝑓( ) = 0. 2

Figure 6.3.1. Even and odd extensions of the same 1 function on [0, ] 2

1

1

1

2

2

2

Remark 6.3.5. We observe that if 𝑓 ∶ [0, ] → 𝐑 is in 𝐶 1 ([0, ]) and 𝑓(0) = 𝑓( ) = 0, 1

1

then 𝑓odd is in 𝐶 1 (𝑆1 ); and similarly, if 𝑓 is in 𝐶 1 ([0, ]) and 𝑓′ (0) = 𝑓′ ( ) = 0, then 2

𝑓even is in 𝐶 1 (𝑆1 ). See Problem 6.3.5.

2

1

Theorem 6.3.6. For integrable 𝑓 ∶ [0, ] → 𝐑, let 𝑓even and 𝑓odd be the even and odd 2 extensions of 𝑓, respectively. Then the real Fourier series of 𝑓even and 𝑓odd have the form ∞

𝑓even (𝑥) ∼

𝑎0 + ∑ 𝑎𝑛 cos(2𝜋𝑛𝑥), 2 𝑛=1

(6.3.12)

𝑓odd (𝑥) ∼ ∑ 𝑏𝑛 sin(2𝜋𝑛𝑥), 𝑛=1

(6.3.13)

6.3. Real Fourier series

135

where 1/2

𝑎𝑛 = 4 ∫

1/2

𝑓(𝑥) cos(2𝜋𝑛𝑥) 𝑑𝑥,

𝑏𝑛 = 4 ∫

0

𝑓(𝑥) sin(2𝜋𝑛𝑥) 𝑑𝑥.

(6.3.14)

0

The series in (6.3.12) and (6.3.13) are called the Fourier cosine and Fourier sine series of 𝑓, respectively. Again, they are actually just equal to the Fourier series of 𝑓even and 𝑓odd ; the point is that cancellation allows us to write them in simpler form. Proof. Problem 6.3.6. 1

Remark 6.3.7. We will later show that if 𝑓 is a function on [0, ], under reasonable 2 conditions, the Fourier sine and cosine series of 𝑓 converge to 𝑓 for all but finitely 1 many values of 𝑥 ∈ [0, ]. (For example, this holds if 𝑓 is “piecewise Lipschitz”; see 2 Section 8.5.6.) As a result, if 𝑓(0) ≠ 0, it may seem paradoxical for 𝑓 to be expressed as an infinite sum of sine functions, each of which is equal to 0 at 𝑥 = 0; similarly, if 1 𝑓( ) ≠ 0, we seem to obtain much the same paradox. 2 The explanation for this phenomenon comes from the fact that the sine series of 𝑓 1 is really the Fourier series of 𝑓odd , and 𝑓odd (0) = 𝑓odd ( ) = 0 no matter what the value 2 of 𝑓(0) is. An analogous explanation accounts for the apparent paradox of a function with 𝑓′ (0) ≠ 0 being expressed as an infinite sum of cosine functions; in that case, since 𝑓even is not differentiable at 𝑥 = 0, term-by-term differentiation at 𝑥 = 0 must fail.

Problems. 6.3.1. (a) Given coefficients 𝑐𝑛 ∈ 𝐂, find formulas for 𝑎𝑛 , 𝑏𝑛 ∈ 𝐂 such that 𝑁

𝑁

∑ 𝑐𝑛 𝑒𝑛 (𝑥) = 𝑛=−𝑁

𝑎0 + ∑ (𝑎𝑛 cos(2𝜋𝑛𝑥) + 𝑏𝑛 sin(2𝜋𝑛𝑥)). 2 𝑛=1

(6.3.15)

(b) Given 𝑎𝑛 , 𝑏𝑛 ∈ 𝐂, find formulas for 𝑐𝑛 ∈ 𝐂 such that (6.3.15) holds. 6.3.2. (Proves Theorem 6.3.1) Let 𝑓 ∶ 𝑆 1 → 𝐑 be integrable and real valued. ̂ ̂ (a) For 𝑛 ∈ 𝐙, prove that 𝑓(−𝑛) = 𝑓(𝑛). (b) Prove that if 𝑛 ∈ 𝐙, then ̂ ̂ 𝑓(−𝑛)𝑒 −𝑛 (𝑥) + 𝑓(𝑛)𝑒𝑛 (𝑥) = 𝑎𝑛 cos(2𝜋𝑛𝑥) + 𝑏𝑛 sin(2𝜋𝑛𝑥), where ̂ + 𝑓(𝑛), ̂ 𝑎𝑛 = 𝑓(𝑛)

̂ − 𝑓(𝑛)), ̂ 𝑏𝑛 = 𝑖(𝑓(𝑛)

(6.3.16)

and 𝑎𝑛 , 𝑏𝑛 ∈ 𝐑. 6.3.3. Prove the following formulas by writing everything in terms of complex exponentials. ⎧1/2 if 𝑛 = 𝑘 ≠ 0, (a) Prove that ∫ cos(2𝜋𝑛𝑥) cos(2𝜋𝑘𝑥) 𝑑𝑥 = 1 if 𝑛 = 𝑘 = 0, ⎨ 0 otherwise. ⎩0 1

136

Chapter 6. Fourier series 1

1/2 if 𝑛 = 𝑘, (b) Prove that ∫ sin(2𝜋𝑛𝑥) sin(2𝜋𝑘𝑥) 𝑑𝑥 = { 0 otherwise. 0 1

(c) Prove that ∫ sin(2𝜋𝑛𝑥) cos(2𝜋𝑘𝑥) 𝑑𝑥 = 0. 0

6.3.4. Assuming Theorem 6.3.1, prove that 1

1

𝑎𝑛 = 2 ∫ 𝑓(𝑥) cos(2𝜋𝑛𝑥) 𝑑𝑥,

𝑏𝑛 = 2 ∫ 𝑓(𝑥) sin(2𝜋𝑛𝑥) 𝑑𝑥

0

(6.3.17)

0

are the coefficients of the real Fourier series of 𝑓. 1

1

2

2

6.3.5. Suppose 𝑓 ∶ [0, ] → 𝐑 has continuous 𝑓′ ∶ [0, ] → 𝐑. (a) Prove that if 𝑓(0) = 𝑓(1/2) = 0, then 𝑓odd is continuous on 𝑆 1 . ′ (b) Prove that if 𝑓(0) = 𝑓(1/2) = 0, then 𝑓odd is continuous on 𝑆 1 .

(c) Prove that 𝑓even is continuous on 𝑆 1 . ′ (d) Prove that if 𝑓′ (0) = 𝑓′ (1/2) = 0, then 𝑓even is continuous on 𝑆 1 . 1

6.3.6. (Proves Theorem 6.3.6) Let 𝑓 ∶ [0, ] → 𝐑 be integrable, and let 𝑓even and 𝑓odd 2 be the even and odd extensions of 𝑓, respectively. Prove that the real Fourier series of 𝑓even and 𝑓odd have the form ∞

𝑓even (𝑥) ∼

𝑎0 + ∑ 𝑎𝑛 cos(2𝜋𝑛𝑥), 2 𝑛=1

(6.3.18)

𝑓odd (𝑥) ∼ ∑ 𝑏𝑛 sin(2𝜋𝑛𝑥),

(6.3.19)

𝑛=1

where 1/2

𝑎𝑛 = 4 ∫ 0

1/2

𝑓(𝑥) cos(2𝜋𝑛𝑥) 𝑑𝑥,

𝑏𝑛 = 4 ∫

𝑓(𝑥) sin(2𝜋𝑛𝑥) 𝑑𝑥.

(6.3.20)

0

6.4 Convergence of Fourier series of differentiable functions To recap, so far, we have defined the Fourier series of an integrable 𝑓 ∶ 𝑆 1 → 𝐂, but we have discussed neither the question of when that series converges nor the question of what it converges to. The reason is that these questions are extremely subtle and difficult! For example, there exist continuous 𝑓 ∶ 𝑆 1 → 𝐂 whose Fourier series diverge at uncountably many points (!) in 𝑆 1 . (See Remark 8.5.20 for a discussion.) As it turns out, to get the convergence results we want, we will need both more sophisticated ideas (Chapters 7 and 8) and more honest-to-goodness hard work (Chapter 8). Instead, in this section, we will restrict ourselves to showing how close we can get to convergence with only a moderate amount of effort. We begin with a seemingly unremarkable, but actually surprisingly important and deep, formula describing the Fourier coefficients of 𝑓′ in terms of the Fourier coefficients of 𝑓.

6.4. Convergence of Fourier series of differentiable functions

137

Theorem 6.4.1. For 𝑓 ∈ 𝐶 1 (𝑆1 ) and 𝑛 ∈ 𝐙, we have that ̂ 𝑓ˆ′ (𝑛) = (2𝜋𝑖𝑛)𝑓(𝑛).

(6.4.1)

Proof. Problem 6.4.1. Note that Theorem 6.4.1 means that Fourier coefficients have a useful feature common to many similar transforms: Taking Fourier coefficients turns a differential operation (the derivative) into an algebraic operation (multiplication by 2𝜋𝑖𝑛). Theorem 6.4.2. For 𝑓 ∶ 𝑆 1 → 𝐂, we have that: (1) If 𝑓 is continuous (i.e., 𝑓 ∈ 𝐶 0 (𝑆 1 )), then there exists some constant 𝐾0 > 0, indepen̂ || ≤ 𝐾0 for all 𝑛 ∈ 𝐙. dent of 𝑛, such that ||𝑓(𝑛) (2) If 𝑓 ∈ 𝐶 1 (𝑆1 ), then there exists some constant 𝐾1 > 0, independent of 𝑛, such that ̂ || ≤ 𝐾1 for all 𝑛 ∈ 𝐙, 𝑛 ≠ 0. ||𝑓(𝑛) |𝑛| (3) For any integer 𝑟 ≥ 2, if 𝑓 ∈ 𝐶 𝑟 (𝑆 1 ), then there exists some constant 𝐾𝑟 > 0, indepen̂ || ≤ 𝐾𝑟 for all 𝑛 ∈ 𝐙, 𝑛 ≠ 0. dent of 𝑛, such that ||𝑓(𝑛) 𝑟 |𝑛| Proof. Problem 6.4.2. Theorem 6.4.2 also illustrates another feature of Fourier coefficients: Local behav̂ ior of 𝑓(𝑥) (here, continuity or differentiability) determines the global behavior of 𝑓(𝑛). ∞ 1 In fact, Theorem 6.4.2 immediately implies that if 𝑓 ∈ 𝐶 (𝑆 ), then the Fourier coefficients 𝑓 are rapidly decaying in the following precise sense. ̂ = 0 for any integer 𝑘 ≥ 0. Corollary 6.4.3. If 𝑓 ∈ 𝐶 ∞ (𝑆 1 ), then lim 𝑛𝑘 𝑓(𝑛) 𝑛→∞

Remarkably, the converse to Corollary 6.4.3 is also true, and there is also a partial converse to Theorem 6.4.2. However, the proof requires us to understand convergence of Fourier series much better and therefore must wait until Section 8.5. Returning to our original question of the convergence of the Fourier series of 𝑓, we can use Theorem 6.4.2 to get the following result. Theorem 6.4.4. If 𝑓 ∈ 𝐶 2 (𝑆1 ), then the Fourier series of 𝑓 converges uniformly to some ̂ continuous function 𝑔 such that for all 𝑛 ∈ 𝐙, 𝑔(𝑛) ̂ = 𝑓(𝑛). Proof. Recall that by definition, the Fourier series of 𝑓 is equal to 𝑁

lim

𝑁→∞

̂ ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥).

(6.4.2)

𝑛=−𝑁

Problem 6.4.4 shows that (6.4.2) converges uniformly to some 𝑔, which must be con̂ = tinuous by Theorem 4.3.12; and Theorem 6.2.2 then implies that for all 𝑛 ∈ 𝐙, 𝑔(𝑛) ̂ 𝑓(𝑛). Now on the one hand, Theorem 6.4.4 gives a substantial result (the Fourier series of 𝑓 converges to something that seems similar to 𝑓) while remaining relatively close to the ground, as proofs go. However, Theorem 6.4.4 has an unfortunate shortcoming in

138

Chapter 6. Fourier series

that it does not actually apply to natural, concrete examples like Examples 6.2.5–6.2.9, as those examples are not in 𝐶 1 (𝑆1 ). Even more frustratingly, we are left with the highly unsatisfying situation that although the Fourier series of 𝑓 converges to some function 𝑔 with the same Fourier coefficients, we cannot yet be sure that 𝑔 = 𝑓. In fact, by the linearity of Fourier coefficients (6.2.19), this comes down to the question: ̂ Question 6.4.5. For a continuous function 𝑘 ∶ 𝑆 1 → 𝐂, if 𝑘(𝑛) = 0 for all 𝑛 ∈ 𝐙, is 𝑘 = 0? Question 6.4.5 turns out to be not so easy — could there be some magical continuous 𝑘(𝑥) that achieves cancellation, on average, with every function 𝑒𝑛 (𝑥) on [0, 1]? As we will see, the answer is no, but it will take us two chapters of new ideas and results to get there (Chapters 7 and 8).

Problems. 6.4.1. (Proves Theorem 6.4.1) For 𝑓 ∈ 𝐶 1 (𝑆 1 ) and 𝑛 ∈ 𝐙, prove that

ˆ 𝑑𝑓 ̂ (𝑛) = (2𝜋𝑖𝑛)𝑓(𝑛). 𝑑𝑥

6.4.2. (Proves Theorem 6.4.2) (a) Let 𝑓 ∶ 𝑆 1 → 𝐂 be continuous. Prove that there exists some constant 𝐾0 > 0 ̂ ≤ 𝐾0 for all 𝑛 ∈ 𝐙. such that 𝑓(𝑛) ̂ ≤ (b) For 𝑓 ∈ 𝐶 1 (𝑆 1 ), prove that there exists some constant 𝐾1 > 0 such that 𝑓(𝑛) 𝐾1 for all 𝑛 ∈ 𝐙, 𝑛 ≠ 0. |𝑛| (c) Prove by induction on 𝑟 that for any integer 𝑟 ≥ 2, if 𝑓 ∈ 𝐶 𝑟 (𝑆1 ), then there ̂ || ≤ 𝐾𝑟 for all exists some constant 𝐾𝑟 > 0, independent of 𝑛, such that ||𝑓(𝑛) 𝑟 |𝑛| 𝑛 ∈ 𝐙, 𝑛 ≠ 0. 6.4.3. (Proves Corollary 6.4.3) Assume 𝑓 ∈ 𝐶 ∞ (𝑆1 ). Prove that for any integer 𝑘 ≥ 0, ̂ = 0. lim 𝑛𝑘 𝑓(𝑛) 𝑛→∞

6.4.4. (Proves Theorem 6.4.4) If 𝑓 ∈ 𝐶 2 (𝑆1 ), prove that 𝑁

̂ ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥) = lim 𝑛∈𝐙

𝑁→∞

̂ ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥) 𝑛=−𝑁

converges absolutely and uniformly to some function 𝑔.

(6.4.3)

7 Hilbert spaces The method of “postulating” what we want has many advantages; they are the same as the advantages of theft over honest toil. — Bertrand Russell, Introduction to Mathematical Philosophy Engineer 1: I can’t seem to fix my fine Swiss watch with this hammer. Engineer 2: Have you tried a bigger hammer? — old engineering joke As we saw in Chapter 6, we will need a bigger hammer to understand the convergence of Fourier series. Therefore, in this chapter, we introduce two new, more abstract, ideas: an algebraic structure known as an inner product space (Sections 7.1–7.3) and an extension of the Riemann integral known as the Lebesgue integral (Sections 7.4– 7.5). Combining inner product spaces with a notion of (metric) completeness fulfilled by the Lebesgue integral, we obtain one of the central ideas of this book, namely, a Hilbert space (Section 7.6). The reader should be advised up front that to avoid spending half of the book talking about Lebesgue integration, we will simply axiomatize the necessary properties of the Lebesgue integral and leave the proof of its existence to a later text or class. (See the Introduction for an explanation of our “eat dessert first” philosophy.)

7.1 Inner product spaces We briefly pause our study of analysis for an excursion into linear algebra. Specifically, our first new idea is to define the following notion. (The reader new to these ideas may find it helpful to focus on the example 𝑉 = 𝐂𝑛 (Example 5.2.2), at least at first.) Definition 7.1.1. Let 𝑉 be a function space (Definition 5.2.1). We define an inner product on 𝑉 to be a function ⟨⋅, ⋅⟩ ∶ 𝑉 × 𝑉 → 𝐂 that satisfies the following axioms: (1) (Linear in first variable) For any 𝑓, 𝑔, ℎ ∈ 𝑉 and 𝑎, 𝑏 ∈ 𝐂, we have that ⟨𝑎𝑓 + 𝑏𝑔, ℎ⟩ = 𝑎 ⟨𝑓, ℎ⟩ + 𝑏 ⟨𝑔, ℎ⟩. 139

140

Chapter 7. Hilbert spaces

(2) (Hermitian) For any 𝑓, 𝑔 ∈ 𝑉, ⟨𝑔, 𝑓⟩ = ⟨𝑓, 𝑔⟩. Note that consequently, for any 𝑓 ∈ 𝑉, ⟨𝑓, 𝑓⟩ = ⟨𝑓, 𝑓⟩ must be in 𝐑. (3) (Positive definite) For any 𝑓 ∈ 𝑉, ⟨𝑓, 𝑓⟩ ≥ 0, and if ⟨𝑓, 𝑓⟩ = 0, then 𝑓 = 0. We also define an inner product space to be a function space 𝑉 along with a particular choice of inner product. For brevity, to say that 𝑉 is an inner product space means that its (unspecified) inner product will be denoted by ⟨⋅, ⋅⟩. Note that since 𝑉 is assumed to be a function space in the above definition, we will generally use names like 𝑓 and 𝑔 for elements of 𝑉. Definition 7.1.2. Let 𝑉 be an inner product space. For 𝑓 ∈ 𝑉, we define the norm of 𝑓 to be ‖𝑓‖ = √⟨𝑓, 𝑓⟩. When there is the possibility of other norms on 𝑉 (see Section 7.2), we call ‖𝑓‖ = √⟨𝑓, 𝑓⟩ the inner product norm, or 𝐿2 norm, on 𝑉. As we shall see in Section 7.2, we can think of ‖𝑓‖ as the length of 𝑓, which will allow us to define a metric on 𝑉. We collect a few miscellaneous straightforward properties of inner products in the following theorem. Theorem 7.1.3. Let 𝑉 be an inner product space. (1) (Antilinear in second variable) For 𝑓, 𝑔, ℎ ∈ 𝑉 and 𝑎, 𝑏 ∈ 𝐂, ⟨𝑓, 𝑎𝑔 + 𝑏ℎ⟩ = 𝑎 ⟨𝑓, 𝑔⟩ + 𝑏 ⟨𝑓, ℎ⟩ .

(7.1.1)

(2) (Absolute homogeneity) For 𝑓 ∈ 𝑉 and 𝑎 ∈ 𝐂, ‖𝑎𝑓‖ = |𝑎| ‖𝑓‖ .

(7.1.2)

Proof. Problem 7.1.1. Remark 7.1.4. The reader interested in physics should know that in contrast with Definition 7.1.1 and Theorem 7.1.3, the physics convention is usually that inner products are linear in the second variable and antilinear in the first. It will be helpful for the reader to keep two examples in mind: first, the example that is most likely to be familiar and, second, the example that is of greatest importance to us (Example 7.1.6). Example 7.1.5. For 𝑉 = 𝐂𝑛 , the dot product ⟨(𝑣1 , … , 𝑣𝑛 ), (𝑤1 , … , 𝑤𝑛 )⟩ = 𝑣1 𝑤1 + ⋯ + 𝑣𝑛 𝑤𝑛

(7.1.3)

is an inner product on 𝑉 (Problem 7.1.2). (Compare the real-valued version from Section 5.3.) Example 7.1.6. Let 𝑋 = [𝑎, 𝑏] or 𝑆 1 , and let 𝑉 = 𝐶 0 (𝑋). Then for 𝑓, 𝑔 ∈ 𝑉, ⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥) 𝑑𝑥

(7.1.4)

𝑋

defines an inner product on 𝑉 (Problem 7.1.3), which we call the 𝐿2 inner product on 𝐶 0 (𝑋). Note that the 𝐿2 inner product is useful to us because when 𝑋 = 𝑆 1 and 𝑓 ∈ ̂ = ⟨𝑓, 𝑒𝑛 ⟩; that is, Fourier coefficients are defined to be inner products. 𝐶 0 (𝑆1 ), 𝑓(𝑛)

7.1. Inner product spaces

141

Taken together, Examples 7.1.5 and 7.1.6 point out that when we want to understand the 𝐿2 inner product, we can think of it as being a generalization of the ordinary dot product. The benefit of having an inner product on a function space 𝑉 is that it allows us to define geometry on 𝑉. Primarily, we have the following crucial idea. Definition 7.1.7. Let 𝑉 be an inner product space. For 𝑓, 𝑔 ∈ 𝑉, to say that 𝑓 is orthogonal to 𝑔 means that ⟨𝑓, 𝑔⟩ = 0. We begin with some straightforward properties of orthogonality. Theorem 7.1.8. Let 𝑉 be an inner product space, 𝑓, 𝑔, ℎ ∈ 𝑉, and 𝑎, 𝑏 ∈ 𝐂. (1) If 𝑓 is orthogonal to 𝑔, then 𝑔 is orthogonal to 𝑓. (We may therefore say simply that 𝑓 and 𝑔 are orthogonal.) (2) 𝑓 is orthogonal to the zero vector/zero function 0. (3) If each of 𝑓 and 𝑔 is orthogonal to ℎ, then 𝑎𝑓 + 𝑏𝑔 is also orthogonal to ℎ. Proof. Problem 7.1.4. If nonzero 𝑓, 𝑔 are orthogonal, it is helpful to picture 𝑓 and 𝑔 as “vectors at right angles.” For example: Theorem 7.1.9 (Pythagorean Theorem). Let 𝑉 be an inner product space. If 𝑓, 𝑔 ∈ 𝑉 2 2 2 are orthogonal, then ‖𝑓 + 𝑔‖ = ‖𝑓‖ + ‖𝑔‖ . Proof. Problem 7.1.5. A closely related idea is that of orthogonal projection: Definition 7.1.10. Let 𝑉 be an inner product space, and let 𝑔 be a nonzero element of 𝑉. For 𝑓 ∈ 𝑉, we define the projection of 𝑓 onto 𝑔 to be ⟨𝑓, 𝑔⟩ (7.1.5) proj𝑔 (𝑓) = 𝑔. ⟨𝑔, 𝑔⟩ (Note that ⟨𝑔, 𝑔⟩ ≠ 0 by positive definiteness.)

g

f proj g( f )

Figure 7.1.1. The projection of 𝑓 onto 𝑔 The idea of proj𝑔 (𝑓) is that 𝑓 can be expressed as the sum of a vector parallel to 𝑔 and a vector perpendicular to 𝑔, and the parallel part is proj𝑔 (𝑓). This idea is shown in Figure 7.1.1 and verified algebraically as follows.

142

Chapter 7. Hilbert spaces

Theorem 7.1.11. Let 𝑉 be an inner product space, and let 𝑔 be a nonzero element of 𝑉. For 𝑓 ∈ 𝑉, we have ⟨proj𝑔 (𝑓), 𝑔⟩ = ⟨𝑓, 𝑔⟩ ,

(7.1.6)

⟨𝑓 − proj𝑔 (𝑓), 𝑔⟩ = 0,

(7.1.7)

⟨𝑓 − proj𝑔 (𝑓), proj𝑔 (𝑓)⟩ = 0,

(7.1.8)

‖proj (𝑓)‖ ≤ ‖𝑓‖ . 𝑔 ‖ ‖

(7.1.9)

Note that (7.1.8) says that 𝑓 is the sum of proj𝑔 (𝑓) and a vector 𝑓 − proj𝑔 (𝑓) orthogonal to proj𝑔 (𝑓), and (7.1.9) says that proj𝑔 (𝑓) is never longer than 𝑓. Proof. Problem 7.1.6. We can now prove two of the fundamental properties of inner product spaces. (Compare Theorem 2.3.4.) Theorem 7.1.12. Let 𝑉 be an inner product space. For 𝑓, 𝑔 ∈ 𝑉, we have: (1) (Cauchy-Schwarz inequality) |⟨𝑓, 𝑔⟩| ≤ ‖𝑓‖ ‖𝑔‖. (2) (Triangle inequality) ‖𝑓 + 𝑔‖ ≤ ‖𝑓‖ + ‖𝑔‖. Proof. For Cauchy-Schwarz, note that when 𝑔 = 0, both sides of the inequality are 0; otherwise, see Problem 7.1.7. For the triangle inequality, see Problem 7.1.8. Recall (Definition 5.3.2) that for 𝑋 = 𝐍 or 𝐙, ℓ2 (𝑋) is the set of all such 𝑎 ∶ 𝑋 → 𝐂 such that 2

2

‖𝑎(𝑥)‖ = ∑ |𝑎(𝑥)|

(7.1.10)

𝑥∈𝑋

is finite (converges). In these terms, we may now use Cauchy-Schwarz to provide an alternative proof of a fact previously mentioned in Section 5.3. Theorem 7.1.13. For 𝑋 = 𝐍 or 𝐙, ℓ2 (𝑋) is a function space on 𝑋, and for 𝑎(𝑥), 𝑏(𝑥) ∈ ℓ2 (𝑋), ⟨𝑎(𝑥), 𝑏(𝑥)⟩ = ∑ 𝑎(𝑥)𝑏(𝑥)

(7.1.11)

𝑛∈𝑋

converges absolutely. Proof. Since absolute convergence implies that the order of summation in (7.1.11) does not matter (see Appendix A), it suffices to consider the case 𝑋 = 𝐍, which is proved in Problem 7.1.9. We therefore have the following very useful example of an inner product space. Theorem 7.1.14. For 𝑋 = 𝐍 or 𝐙, (7.1.11) defines an inner product on ℓ2 (𝑋). Proof. Problem 7.1.10.

7.1. Inner product spaces

143

Problems. 7.1.1. (Proves Theorem 7.1.3) Let 𝑉 be an inner product space, 𝑓, 𝑔, ℎ ∈ 𝑉, and 𝑎, 𝑏 ∈ 𝐂. (a) Prove that ⟨𝑓, 𝑎𝑔 + 𝑏ℎ⟩ = 𝑎 ⟨𝑓, 𝑔⟩ + 𝑏 ⟨𝑓, ℎ⟩. (b) Prove that ‖𝑎𝑓‖ = |𝑎| ‖𝑓‖. 7.1.2. Verify that the dot product ⟨(𝑣1 , … , 𝑣𝑛 ), (𝑤1 , … , 𝑤𝑛 )⟩ = 𝑣1 𝑤1 + ⋯ + 𝑣𝑛 𝑤𝑛

(7.1.12)

satisfies the axioms of an inner product on 𝑉 = 𝐂𝑛 . 7.1.3. Let 𝑋 = [𝑎, 𝑏] or 𝑆 1 . Verify that the 𝐿2 inner product ⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥) 𝑑𝑥

(7.1.13)

𝑋

satisfies the axioms of an inner product on 𝐶 0 (𝑋). 7.1.4. (Proves Theorem 7.1.8) Let 𝑉 be an inner product space, 𝑓, 𝑔, ℎ ∈ 𝑉, and 𝑎, 𝑏 ∈ 𝐂. (a) Prove that if 𝑓 is orthogonal to 𝑔, then 𝑔 is orthogonal to 𝑓. (b) Prove that 𝑓 is orthogonal to the zero vector/zero function 0. (c) Prove that if each of 𝑓 and 𝑔 is orthogonal to ℎ, then 𝑎𝑓 + 𝑏𝑔 is also orthogonal to ℎ. 7.1.5. (Proves Theorem 7.1.9) Let 𝑉 be an inner product space, and suppose 𝑓, 𝑔 ∈ 𝑉 2 2 2 are orthogonal. Prove ‖𝑓 + 𝑔‖ = ‖𝑓‖ + ‖𝑔‖ . 7.1.6. (Proves Theorem 7.1.11) Let 𝑉 be an inner product space, let 𝑔 be a nonzero element of 𝑉, and let 𝑓 be in 𝑉. (a) Prove ⟨proj𝑔 (𝑓), 𝑔⟩ = ⟨𝑓, 𝑔⟩. (b) Prove ⟨𝑓 − proj𝑔 (𝑓), 𝑔⟩ = ⟨𝑓 − proj𝑔 (𝑓), proj𝑔 (𝑓)⟩ = 0. (c) Prove ‖‖proj𝑔 (𝑓)‖‖ ≤ ‖𝑓‖. 7.1.7. (Proves Theorem 7.1.12) Let 𝑉 be an inner product space, let 𝑓 ∈ 𝑉, and let 𝑔 be a nonzero element of 𝑉. (a) For 𝑎, 𝑏 ∈ 𝐂, prove that |⟨𝑎𝑔, 𝑏𝑔⟩| = ‖𝑎𝑔‖ ‖𝑏𝑔‖. (b) Prove that |⟨𝑓, 𝑔⟩| = ‖‖proj𝑔 (𝑓)‖‖ ‖𝑔‖. (c) Prove that |⟨𝑓, 𝑔⟩| ≤ ‖𝑓‖ ‖𝑔‖. 7.1.8. (Proves Theorem 7.1.12) Let 𝑉 be an inner product space. Prove that if 𝑓, 𝑔 ∈ 𝑉, then ‖𝑓 + 𝑔‖ ≤ ‖𝑓‖ + ‖𝑔‖.

144

Chapter 7. Hilbert spaces

7.1.9. (Proves Theorem 7.1.13) Recall that ℓ2 (𝐍) is the set of all such 𝑎(𝑥) (𝑥 ∈ 𝐍) such 2 2 that ‖𝑎(𝑥)‖ = ∑ |𝑎(𝑥)| is finite (converges). Suppose 𝑎(𝑥), 𝑏(𝑥) ∈ ℓ2 (𝐍). 𝑥∈𝐍

(a) Prove that for all 𝑁 ∈ 𝐍, | |𝑁 | ∑ 𝑎(𝑥)𝑏(𝑥)| ≤ ‖𝑎(𝑥)‖ ‖𝑏(𝑥)‖ , | | |𝑥=1 | where again, ‖𝑎(𝑥)‖ = (b) Prove that the product

√ √ √

(7.1.14)

2

∑ |𝑎(𝑥)| , and similarly for ‖𝑏(𝑥)‖.

√𝑥=1

⟨𝑎(𝑥), 𝑏(𝑥)⟩ = ∑ 𝑎(𝑥)𝑏(𝑥)

(7.1.15)

𝑛∈𝑋

converges absolutely. (c) Prove that if 𝑎(𝑥), 𝑏(𝑥) ∈ ℓ2 (𝑋), then 𝑎(𝑥) + 𝑏(𝑥) ∈ ℓ2 (𝑋). 7.1.10. (Proves Theorem 7.1.14) Let 𝑋 = 𝐍 or 𝐙. Verify that the product ⟨𝑎(𝑥), 𝑏(𝑥)⟩ = ∑ 𝑎(𝑥)𝑏(𝑥)

(7.1.16)

𝑛∈𝑋

satisfies the axioms of an inner product on ℓ2 (𝑋).

7.2 Normed spaces To talk about ideas like limits and continuity in an inner product space, it is helpful to consider the following more general idea. Definition 7.2.1. Let 𝑉 be a function space. A norm on 𝑉 is a function ‖⋅‖ ∶ 𝑉 → 𝐑 that satisfies the following axioms: (1) (Positive definite) For all 𝑓 ∈ 𝑉, ‖𝑓‖ ≥ 0, and if ‖𝑓‖ = 0, then 𝑓 = 0. (2) (Absolute homogeneity) For all 𝑓 ∈ 𝑉 and 𝑎 ∈ 𝐂, ‖𝑎𝑓‖ = |𝑎| ‖𝑓‖. (3) (Triangle inequality) For all 𝑓, 𝑔 ∈ 𝑉, ‖𝑓 + 𝑔‖ ≤ ‖𝑓‖ + ‖𝑔‖. Analogously to Definition 7.1.1, a normed space is a function space with a (possibly unspecified) choice of norm. As before, we continue to use names like 𝑓 and 𝑔 for “vectors” in 𝑉. Example 7.2.2. If 𝑉 is an inner product space (Definition 7.1.1), then the inner product norm/𝐿2 norm on 𝑉 (Definition 7.1.2) is a norm in the sense of Definition 7.2.1: Positive definiteness follows by definition, homogeneity follows by Theorem 7.1.3, and the triangle inequality follows by Theorem 7.1.12. In particular, 𝐂 is a normed space with ‖𝑧‖ = |𝑧|. Example 7.2.3. Let 𝑋 = [𝑎, 𝑏] or 𝑆 1 , and consider the 𝐿∞ metric on 𝑉 = 𝐶 0 (𝑋) (Definition 5.2.13). If we define ‖𝑓‖ = 𝑑(𝑓, 0) = sup {|𝑓(𝑥)| ∣ 𝑥 ∈ 𝑋} ,

(7.2.1)

7.2. Normed spaces

145

then ‖⋅‖ is a norm on 𝑉. Positive definiteness and the triangle inequality follow by Theorem 5.2.14, as they are part of the definition of metric, and homogeneity follows by Problem 5.2.4. We call this norm the 𝐿∞ norm on 𝑉. Example 7.2.4. Let 𝑋 = [𝑎, 𝑏] or 𝑆 1 , let 𝑉 = 𝐶 0 (𝑋), and define ‖𝑓‖ = ∫ |𝑓(𝑥)| 𝑑𝑥.

(7.2.2)

𝑋

(Compare Section 5.1.) Positive definiteness follows by Lemma 3.4.9, absolute homogeneity follows by the linearity of the integral, and the norm triangle inequality follows by the triangle inequality |𝑓(𝑥) + 𝑔(𝑥)| ≤ |𝑓(𝑥)| + |𝑔(𝑥)| in 𝐂 and Theorem 3.4.2. We call this norm the 𝐿1 norm on 𝑉. Notation 7.2.5. When there is the possibility of using the 𝐿2 norm (Example 7.2.2), the 𝐿∞ norm (Example 7.2.3), and the 𝐿1 norm (Example 7.2.4) on the same function space 𝑉, for 𝑓 ∈ 𝑉, we use ‖𝑓‖ to denote the 𝐿2 norm, ‖𝑓‖∞ to denote the 𝐿∞ norm, and ‖𝑓‖1 to denote the 𝐿1 norm. See also Remark 7.5.2, below. For us, the most interesting thing about a normed space 𝑉 is its associated metric, which will allow us to define convergence and continuity in 𝑉. Definition 7.2.6. Let 𝑉 be a normed space. We define the norm metric on 𝑉 by 𝑑(𝑓, 𝑔) = ‖𝑓 − 𝑔‖. Note that all of the axioms of a metric follow immediately from the definition of norm, except perhaps 𝑑(𝑓, 𝑔) = 𝑑(𝑔, 𝑓), which follows because ‖𝑓 − 𝑔‖ = |−1| ‖𝑔 − 𝑓‖ by homogeneity. Recall that in Definition 2.4.15, we defined the limit of a sequence in any metric space, and as normed spaces are metric spaces, this definition applies to sequences in normed spaces. Nevertheless, the reader may find it helpful to see Definition 2.4.15 rewritten in our new setting, as follows. Definition 7.2.7. For a sequence 𝑓𝑛 in a normed space 𝑉 and 𝑓 ∈ 𝑉, to say that lim 𝑓𝑛 = 𝑓 means that for every 𝜖 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if

𝑛→∞

𝑛 > 𝑁(𝜖), then ‖𝑓𝑛 − 𝑓‖ < 𝜖. The terms convergent, divergent, and so on are again used as before. Remark 7.2.8. Let 𝑉 = 𝐶 0 ([0, 1]), and consider a sequence 𝑓𝑛 in 𝑉. Note that we have now defined lim 𝑓𝑛 = 𝑓 in four different ways: 𝑛→∞

• •

Pointwise convergence: For every 𝑥 ∈ [0, 1], lim 𝑓𝑛 (𝑥) = 𝑓(𝑥). 𝑛→∞

Uniform, or 𝐿 convergence: If ‖⋅‖∞ is the 𝐿 norm on 𝐶 0 ([0, 1]), then lim ‖𝑓𝑛 − 𝑓‖∞ 𝑛→∞

= 0, or in other words (see Lemma 4.3.6), 𝑓𝑛 converges uniformly to 𝑓 on [0, 1]. 1

𝐿1 convergence: lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| 𝑑𝑥 = 0. 𝑛→∞

0 1

2

𝐿2 convergence/inner product norm: lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| 𝑑𝑥 = 0. (Note that this 𝑛→∞

0

is the norm square converging to 0, which is equivalent to the norm converging to 0.)

146

Chapter 7. Hilbert spaces

On the one hand, uniform convergence implies the other three senses of convergence, a fortiori in the case of pointwise convergence, and by Theorem 4.3.14 in the case of 𝐿1 and 𝐿2 convergence. However, the converses of those statements do not hold. Specifically, Example 4.2.6 gives an example where 𝑓𝑛 converges to 0 pointwise, but not in either 𝐿1 or 𝐿2 ; and a slight modification of Problem 7.2.1 gives an example of a sequence 𝑓𝑛 in 𝐶 0 ([0, 1]) that converges to 0 in both 𝐿1 and 𝐿2 , but not pointwise at any 𝑥 ∈ [0, 1]. (For the remaining possible implications, see Problem 7.2.8.) The upshot is that when we say that 𝑓𝑛 converges to 𝑓, we must make sure that the sense in which we mean that statement is clear. In fact, since normed spaces have the operations of addition and scalar multiplication, we may also prove the corresponding limit theorems for such sequences by taking our proofs in 𝐂 and replacing |⋅| with ‖⋅‖. To be precise, in the following, let 𝑉 be a normed space. Definition 7.2.9. To say that a nonempty subset 𝑆 of 𝑉 is bounded means that there exists some 𝑀 > 0 such that for 𝑓 ∈ 𝑆, ‖𝑓‖ < 𝑀. To say that a sequence 𝑓𝑛 in 𝑉 is bounded means that {𝑓𝑛 } (the set of its values) is bounded. Theorem 7.2.10. If 𝑓𝑛 is a convergent sequence in 𝑉, then 𝑓𝑛 is bounded. Proof. Problem 7.2.2. Theorem 7.2.11. Let 𝑓𝑛 and 𝑔𝑛 be sequences in 𝑉, and suppose that lim 𝑓𝑛 = 𝑓, lim 𝑔𝑛 𝑛→∞

𝑛→∞

= 𝑔, and 𝑐 ∈ 𝐂. Then we have that: (1) lim 𝑐𝑓𝑛 = 𝑐𝑓. 𝑛→∞

(2) lim (𝑓𝑛 + 𝑔𝑛 ) = 𝑓 + 𝑔. 𝑛→∞

Proof. Problem 7.2.3. We shall also occasionally use the idea of continuous functions between normed spaces. In principle, this has been defined in Definition 3.1.3, but again, the reader may find it convenient to see it repeated in this setting: Definition 7.2.12. Let 𝑇 ∶ 𝑉 → 𝑊 be a function, where 𝑉 and 𝑊 are normed spaces (e.g., 𝑊 = 𝐂). For 𝑔 ∈ 𝑉, to say that 𝑇 is continuous at 𝑔 means that one of the following conditions holds: •

(Sequential continuity) For every sequence 𝑓𝑛 in 𝑉 such that lim 𝑓𝑛 = 𝑔, we have 𝑛→∞

that lim 𝑇(𝑓𝑛 ) = 𝑇(𝑔). 𝑛→∞

(𝜖-𝛿 continuity) For every 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if 𝑓 ∈ 𝑉 and ‖𝑓 − 𝑔‖ < 𝛿(𝜖), then ‖𝑇(𝑓) − 𝑇(𝑔)‖ < 𝜖.

To say that 𝑇 is continuous on 𝑉 means that 𝑇 is continuous at 𝑓 for all 𝑓 ∈ 𝑉. One immediate application for continuity is the following handy result.

7.2. Normed spaces

147

Theorem 7.2.13. Let 𝑉 be an inner product space, and fix 𝑔 ∈ 𝑉. Then the function 𝑇𝑔 ∶ 𝑉 → 𝐹 defined by 𝑇𝑔 (𝑓) = ⟨𝑓, 𝑔⟩ is continuous on 𝑉, and similarly for 𝑇 𝑔 (𝑓) = ⟨𝑔, 𝑓⟩. In other words, an inner product is continuous in each variable. Proof. For 𝑔 = 0, 𝑇𝑔 is the zero function, so it suffices to consider the 𝑔 ≠ 0 case; see Problem 7.2.4. We will usually apply Theorem 7.2.13 in the following form. ∞

Corollary 7.2.14. Let 𝑉 be an inner product space, and suppose that ∑ 𝑓𝑛 converges to 𝑛=1

𝑓 in the inner product norm. Then ∞

⟨𝑓, 𝑔⟩ = ∑ ⟨𝑓𝑛 , 𝑔⟩ ,

⟨𝑔, 𝑓⟩ = ∑ ⟨𝑔, 𝑓𝑛 ⟩ .

𝑛=1

𝑛=1

(7.2.3)

In particular, the series on the right-hand side of each equation of (7.2.3) converges. ∞

Note that as a special case of (7.2.3), if ∑ 𝑓𝑛 converges to 𝑓 in the 𝐿2 norm on 𝑛=1

𝐶 0 ([0, 1]), then 1

1

∫ 𝑓(𝑥)𝑔(𝑥) 𝑑𝑥 = ∑ ∫ 𝑓𝑛 (𝑥)𝑔(𝑥) 𝑑𝑥, 0

(7.2.4)

𝑛=1 0

and the series on the right-hand side of (7.2.4) converges. Proof. Problem 7.2.5. Finally, we remind the reader that the ideas of Cauchy sequences and Cauchy completeness hold in any metric space. Again, we repeat the definitions in our new setting. Definition 7.2.15. Let 𝑉 be a normed space, and let 𝑓𝑛 be a sequence in 𝑉. To say that 𝑓𝑛 is Cauchy means that for every 𝜖 > 0, there exists some 𝑁(𝜖) ∈ 𝐑 such that if 𝑛, 𝑘 > 𝑁(𝜖), then ‖𝑓𝑛 − 𝑓𝑘 ‖ < 𝜖. The analogue of Lemma 2.5.5 still holds in a normed space: Lemma 7.2.16. If 𝑓𝑛 is a Cauchy sequence in a normed space 𝑉, then 𝑓𝑛 is bounded. Proof. Problem 7.2.6. Theorem 2.5.2 still implies that a convergent sequence in a normed space is Cauchy, but the converse need not be true, as shown by the following example. 1+(1/𝑛)

in 𝑉 = 𝐶 1 ([−1, 1]) from Example 7.2.17. Consider the sequence 𝑓𝑛 (𝑥) = |𝑥| 0 Problem 4.3.7 and 𝑓(𝑥) = |𝑥| in 𝐶 ([−1, 1]), noting that 𝑓 ∉ 𝑉. As shown in Example 4.3.15, 𝑓𝑛 converges to 𝑓 uniformly on [−1, 1], which means that 𝑓𝑛 converges to 𝑓 in the 𝐿∞ norm (Example 7.2.3). Therefore, by Theorem 2.5.2, 𝑓𝑛 is a Cauchy sequence in the 𝐿∞ norm, even in the smaller space 𝑉. However, since the 𝐿∞ limit of 𝑓𝑛 is not in 𝑉, 𝑉 is not complete.

148

Chapter 7. Hilbert spaces

We therefore come back to the following definition, which we again restate in our new setting. Definition 7.2.18. To say that a normed space 𝑉 is complete means that any Cauchy sequence in 𝑉 converges to some limit in 𝑉. Now, on the one hand, we shall see that (Cauchy) completeness is a useful quality for a function space to possess. Unfortunately, the following example shows that the function space in which we are most interested so far is not complete under the 𝐿2 norm. (We omit some details for brevity.) Example 7.2.19 (Example 4.2.3 revisited). Let 𝑉 = 𝐶 0 ([0, 2]), and consider the following sequence in 𝑉: 𝑥𝑛 if 0 ≤ 𝑥 ≤ 1, (7.2.5) 𝑓𝑛 (𝑥) = { 1 if 𝑥 > 1. A calculation shows that 1

2

2

‖𝑓𝑛 − 𝑓𝑘 ‖ = ∫ (𝑥𝑛 − 𝑥𝑘 )2 𝑑𝑥 + ∫ 0 𝑑𝑥 0

1 1

= ∫ 𝑥2𝑛 − 2𝑥𝑛+𝑘 + 𝑥2𝑘 𝑑𝑥 0 2𝑛+1

𝑛+𝑘+1

2𝑘+1

1

(7.2.6)

2𝑥 𝑥 𝑥 − + )] 2𝑛 + 1 𝑛 + 𝑘 + 1 2𝑘 + 1 0 1 2 1 = − + . 2𝑛 + 1 𝑛 + 𝑘 + 1 2𝑘 + 1 2 Therefore, for 𝜖 > 0, if 𝑛, 𝑘 > 𝑁(𝜖) = 2 , (7.2.6) plus the triangle inequality shows 𝜖 2 that ‖𝑓𝑛 − 𝑓𝑘 ‖ < 𝜖2 , or in other words, 𝑓𝑛 is Cauchy. However, if there were some 0 𝑓 ∈ 𝐶 ([0, 2]) to which 𝑓𝑛 converged under the 𝐿2 norm, one can show (for example, using ideas from Sections 7.4 and 7.5) that essentially, the only possibility is =(

0 if 0 ≤ 𝑥 < 1, 𝑓(𝑥) = { 1 if 𝑥 ≥ 1,

(7.2.7)

which is not continuous. It follows that 𝑉 is not complete under the 𝐿2 norm. One might then try to fix the problem in Example 7.2.19 by looking at the space of all functions that are continuous on [0, 2] except at finitely many points, or even the space of (Riemann) integrable functions on [0, 2]. However, this allows us to construct sequences whose limits are even more ill-behaved, as in Example 4.2.5. Taking this process to its logical extent, we are naturally led to expand the definition of integration to what is known as the Lebesgue integral; see Sections 7.4 and 7.5 for (much) more on this idea. As a final aside, for the reader who has seen the fundamentals of the topology of metric spaces (Section 2.6), we present an example of a space, namely, ℓ2 (𝐍) (Theorem 7.1.14), where the most natural generalization of Bolzano-Weierstrass (Corollary 2.6.7) in 𝐂 does not hold. We first introduce some notation for sequences in ℓ2 (𝐍).

7.2. Normed spaces

149

Notation 7.2.20. Because the elements of ℓ2 (𝐍) are themselves sequences, a sequence in ℓ2 (𝐍) is, by definition, a sequence of sequences. As mentioned in Definition 5.3.2, we therefore use the following somewhat unorthodox notation when discussing sequences in ℓ2 (𝐍): We write single elements of ℓ2 (𝐍) in the form 𝑎(𝑥), where the variable 𝑥 ranges over 𝐍, and we write sequences in ℓ2 (𝐍) in the form 𝑎𝑛 (𝑥), where each 𝑎𝑛 (𝑥) for fixed 𝑛 is a sequence in the variable 𝑥. Example 7.2.21. We can now describe a counterexample to the most natural generalization of Corollary 2.6.7, as promised back in Section 2.6. Let 𝑉 = ℓ2 (𝐍), and for 𝑘 ∈ 𝐍, following Notation 7.2.20, let 𝑒𝑘 (𝑥) be the element of ℓ2 (𝐍) defined by 𝑒𝑘 (𝑘) = 1 and 𝑒𝑘 (𝑥) = 0 for 𝑥 ≠ 𝑘; in other words, let 𝑒1 = (1, 0, 0, … ), 𝑒2 = (0, 1, 0, … ), and so on. Then 𝒩1 (0), the closed 1-neighborhood of 0 (Definition 2.6.1), is closed and bounded (Problem 2.6.4), but the sequence 𝑒𝑘 in 𝒩1 (0) has no convergent subsequence (Problem 7.2.7).

Problems. 7.2.1. For 𝑘 ≥ 0 and 2𝑘 ≤ 𝑛 ≤ 2𝑘+1 − 1, define 𝑓𝑛 ∶ [0, 1] → 𝐂 by 𝑛 − 2𝑘 𝑛 + 1 − 2𝑘 ≤𝑥≤ , 𝑘 𝑓𝑛 (𝑥) = { 2 2𝑘 0 otherwise. 1 if

(7.2.8)

Note that for any positive integer 𝑛, 2𝑘 ≤ 𝑛 ≤ 2𝑘+1 − 1 exactly when 𝑘 = ⌊log2 (𝑛)⌋ (the greatest integer ≤ log2 (𝑛)), so (7.2.8) gives a well-defined sequence of functions 𝑓𝑛 . (a) Draw the graph of 𝑓𝑛 for 1 ≤ 𝑛 ≤ 7. 1

(b) Prove that lim ∫ 𝑓𝑛 (𝑥) 𝑑𝑥 = 0, or in other words, 𝑓𝑛 converges to 0 in the 𝐿1 𝑛→∞

norm.

0

(c) Prove that for any 𝑥 ∈ [0, 1], lim 𝑓𝑛 (𝑥) does not exist. 𝑛→∞

Note: The above example does not quite live up to its billing in Remark 7.2.8, as the functions 𝑓𝑛 are not continuous. We leave it to the reader to modify the 𝑓𝑛 slightly to get the same result with continuous functions, as suggested by Figure 7.2.1. (We also suggest the reader try to understand only the basic idea and avoid formulas and other details.)

Figure 7.2.1. Adjusting (7.2.8) to be continuous

7.2.2. (Proves Theorem 7.2.10) Let 𝑉 be a normed space, and suppose that lim 𝑓𝑛 = 𝑓 𝑛→∞

in 𝑉. Prove that 𝑓𝑛 is bounded in 𝑉.

150

Chapter 7. Hilbert spaces

7.2.3. (Proves Theorem 7.2.11) Let 𝑉 be a normed space, let 𝑎 ∈ 𝐂, and suppose that lim 𝑓𝑛 = 𝑓 and lim 𝑔𝑛 = 𝑔 in 𝑉. 𝑛→∞

𝑛→∞

(a) Prove that lim 𝑎𝑓𝑛 = 𝑎𝑓 in 𝑉. 𝑛→∞

(b) Prove that lim (𝑓𝑛 + 𝑔𝑛 ) = 𝑓 + 𝑔 in 𝑉. 𝑛→∞

7.2.4. (Proves Theorem 7.2.13) Let 𝑉 be an inner product space, fix 𝑔 ≠ 0 in 𝑉, and let 𝑇𝑔 ∶ 𝑉 → 𝐂 be defined by 𝑇𝑔 (𝑓) = ⟨𝑓, 𝑔⟩. Prove that 𝑇𝑔 is continuous on 𝑉. 7.2.5. (Proves Corollary 7.2.14) Let 𝑉 be an inner product space, fix 𝑔 ≠ 0 in 𝑉, and let ∞

𝑇𝑔 ∶ 𝑉 → 𝐂 be defined by 𝑇𝑔 (𝑓) = ⟨𝑓, 𝑔⟩. Suppose ∑ 𝑓𝑛 converges to 𝑓 in the 𝑛=1

inner product norm. 𝑁

𝑁

(a) Prove that 𝑇𝑔 ( ∑ 𝑓𝑛 ) = ∑ ⟨𝑓𝑛 , 𝑔⟩. 𝑛=1

𝑛=1 ∞

(b) Prove that 𝑇𝑔 (𝑓) = ∑ ⟨𝑓𝑛 , 𝑔⟩; in particular, prove the series on the right-hand 𝑛=1

side converges. 7.2.6. (Proves Lemma 7.2.16) Let 𝑉 be a normed space, and suppose that 𝑓𝑛 is Cauchy with respect to the norm metric. Prove that 𝑓𝑛 is bounded in 𝑉. 7.2.7. Let 𝑉 = ℓ2 (𝐍), and for 𝑘 ∈ 𝐍, following Notation 7.2.20, let 𝑒𝑘 (𝑥) be the element of ℓ2 (𝐍) defined by 𝑒𝑘 (𝑘) = 1 and 𝑒𝑘 (𝑥) = 0 for 𝑥 ≠ 𝑘 (i.e., 𝑒1 = (1, 0, 0, … ), 𝑒2 = (0, 1, 0, … ), etc.). Let ℬ = {𝑒𝑘 }. Prove that any subsequence of the sequence 𝑒𝑘 is not Cauchy, and therefore, not convergent. 7.2.8. (*) Consider the following types of convergence for 𝑓𝑛 , 𝑓 ∈ ℛ([0, 1]). •

Pointwise convergence: For every 𝑥 ∈ [0, 1], lim 𝑓𝑛 (𝑥) = 𝑓(𝑥). 𝑛→∞

1

𝐿1 convergence: lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| 𝑑𝑥 = 0. 𝑛→∞

0 1

2

𝐿2 convergence/inner product norm: lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| 𝑑𝑥 = 0. 𝑛→∞

0

Of the six possible implications between two of these conditions, exactly one always holds. Find counterexamples for the other five possible implications. In particular, find sequences 𝑓𝑛 that converge to 𝑓 in the 𝐿1 and 𝐿2 norms but diverge at every value of 𝑥 ∈ [0, 1].

7.3 Orthogonal sets and bases Returning to the setting of inner product spaces, recall that in Section 7.1, we saw how orthogonality in an inner product space 𝑉 allows us to decompose a given 𝑓 ∈ 𝑉 into pieces that are parallel and orthogonal to some other given 𝑔 ∈ 𝑉 (Theorem 7.1.11). In this section, we describe how to decompose an entire space relative to a certain type of set we call an orthogonal basis (Definition 7.3.14).

7.3. Orthogonal sets and bases

151

Notation 7.3.1. It will be convenient to have a uniform notation to describe sets of vectors like {𝑢1 , … , 𝑢𝑁 }, {𝑢𝑖 ∣ 𝑖 ∈ 𝐍}, and {𝑢𝑖 ∣ 𝑖 ∈ 𝐙}. We do so by calling the set of subscripts that appear in such a set of vectors an index set. For example, for the above three sets of vectors, the index sets are {1, … , 𝑁}, 𝐍, and 𝐙, respectively. We begin with the following idea. Definition 7.3.2. Let 𝑉 be an inner product space and let 𝐼 be an index set. To say that ℬ = {𝑢𝑖 ∣ 𝑖 ∈ 𝐼} ⊂ 𝑉 is an orthogonal set means that for 𝑖 ≠ 𝑗, 𝑢𝑖 and 𝑢𝑗 are orthogonal (i.e., ⟨𝑢𝑖 , 𝑢𝑗 ⟩ = 0). To say that ℬ = {𝑒𝑖 ∣ 𝑖 ∈ 𝐼} ⊂ 𝑉 is an orthonormal set means that ℬ is an orthogonal set and also, for every 𝑖 ∈ 𝐼, ⟨𝑒𝑖 , 𝑒𝑖 ⟩ = 1. (Note that in general we use 𝑒𝑖 to denote a vector of norm 1 and 𝑢𝑖 to denote a vector of arbitrary norm.) Example 7.3.3. In Example 7.1.5, we saw that 𝐂𝑁 with the dot product (7.1.3) is an inner product space. Then if 𝑒𝑛 is the standard 𝑛th basis vector (i.e., 𝑛th coordinate equal to 1 and all other coordinates equal to 0), we see that ℬ = {𝑒1 , … , 𝑒𝑁 } is an orthonormal set. Example 7.3.4. In Example 7.1.6, we saw that 𝐶 0 (𝑆1 ) with the 𝐿2 inner product (7.1.4) is an inner product space. Then if 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 , as in (4.6.1), we see that (4.6.8) says precisely that {𝑒𝑛 (𝑥) ∣ 𝑛 ∈ 𝐙} is an orthonormal set. Remark 7.3.5. The reader may note that in Definition 7.3.2 and Examples 7.3.3 and 7.3.4, we have introduced a notational ambiguity, in that 𝑒𝑛 could either refer to a vector in an arbitrary orthonormal set in an inner product space, to a standard basis vector in 𝐂𝑁 , or to 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 in the inner product space 𝐶 0 (𝑆 1 ). Rest assured, we have introduced this ambiguity intentionally to help the reader keep in mind that the orthonormal set {𝑒𝑛 (𝑥) ∣ 𝑛 ∈ 𝐙} is the main reason we are interested in orthonormal sets in the first place and that the “vectors” 𝑒𝑛 (𝑥) in 𝐶 0 (𝑆1 ) are analogous to the standard basis vectors in 𝐂𝑁 . In any case, in the sequel, if we want to use 𝑒𝑛 to refer to a vector in an arbitrary orthonormal set, we will say so explicitly; otherwise, the reader should assume that 𝑒𝑛 means 𝑒𝑛 (𝑥). Example 7.3.6. We also see from (6.3.7)–(6.3.9) that ℬ = {1, cos(2𝜋𝑛𝑥), sin(2𝜋𝑛𝑥) ∣ 𝑛 ∈ 𝐙, 𝑛 > 0}

(7.3.1) 2

2

is an orthogonal set, but not an orthonormal one, as ‖cos(2𝜋𝑛𝑥)‖ = ‖sin(2𝜋𝑛𝑥)‖ = 1 . 2 Many facts about orthogonality can be generalized from our discussion of orthogonality in 𝐑𝑛 from Section 5.3. For example, we have Problems 7.3.1 and 7.3.2 and the following. Theorem 7.3.7 (Generalized Pythagorean Theorem). If {𝑢1 , … , 𝑢𝑛 } is an orthogonal set in an inner product space 𝑉, then 2

𝑛 ‖ 𝑛 ‖ 2 ‖‖ ∑ 𝑢𝑘 ‖‖ = ∑ ‖𝑢𝑘 ‖ . ‖ ‖ 𝑘=1

𝑘=1

(7.3.2)

152

Chapter 7. Hilbert spaces

Proof. Problem 7.3.3. Our next step is to define the following geometric generalization of Fourier polynomials. Definition 7.3.8. Let 𝑉 be an inner product space and let 𝐼 be an index set. If ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐼} is an orthogonal set of nonzero vectors in 𝑉, then for 𝑓 ∈ 𝑉 and 𝑛 ∈ 𝐼, we define the 𝑛th generalized Fourier coefficient of 𝑓 with respect to ℬ to be ̂ = ⟨𝑓, 𝑢𝑛 ⟩ = ⟨𝑓, 𝑢𝑛 ⟩ . 𝑓(𝑛) 2 ⟨𝑢𝑛 , 𝑢𝑛 ⟩ ‖𝑢𝑛 ‖

(7.3.3)

If ℬ = {𝑢1 , … , 𝑢𝑁 }, then we also define 𝑁

𝑁

⟨𝑓, 𝑢𝑛 ⟩ ̂ 𝑢 projℬ 𝑓 = ∑ 𝑓(𝑛)𝑢 𝑛 = ∑ ⟨𝑢𝑛 , 𝑢𝑛 ⟩ 𝑛 𝑛=1 𝑛=1

(7.3.4)

to be the projection of 𝑓 onto the span of ℬ. Comparing Definition 7.1.10, we see that projℬ (𝑓) is precisely the sum of the projections of 𝑓 onto each 𝑢𝑛 . Definition 7.3.8 also generalizes Definition 7.1.10 geometrically, in that, as we shall see in Theorem 7.3.12, 𝑓 can be expressed as a linear combination of vectors in ℬ and a vector perpendicular to each vector of ℬ, and the parallel part is projℬ (𝑓), as shown in Figure 7.3.1.

span B

f projB( f )

Figure 7.3.1. The projection of 𝑓 onto the span of ℬ ̂ Note that for an orthonormal set ℬ = {𝑒1 , … , 𝑒𝑁 }, (7.3.3) becomes 𝑓(𝑛) = ⟨𝑓, 𝑒𝑛 ⟩. (Compare Problem 7.3.1.) In any case, letting 𝑁 go to infinity, we also have the following generalization. Definition 7.3.9. Let 𝑉 be an inner product space, and suppose ℬ = {𝑢𝑖 ∣ 𝑖 ∈ 𝐍} is an orthogonal set of nonzero vectors in 𝑉. We define 𝑁

̂ ̂ 𝑓 ∼ lim ∑ 𝑓(𝑛)𝑢 𝑛 = ∑ 𝑓(𝑛)𝑢𝑛 𝑁→∞

𝑛=1

(7.3.5)

𝑛=1

to be the generalized Fourier series of 𝑓 with respect to ℬ. Note that as with ordinary Fourier series, the symbol ∼ does not indicate any assumptions about convergence. Similarly, if ℬ = {𝑢𝑖 ∣ 𝑖 ∈ 𝐙} is an orthogonal set of nonzero vectors in 𝑉, the generalized Fourier series of 𝑓 is ̂ 𝑓 ∼ ∑ 𝑓(𝑛)𝑢 (7.3.6) 𝑛. 𝑛∈𝐙

7.3. Orthogonal sets and bases

153

(We interpret the two-sided series in (7.3.6) as per Definition 4.1.7, though for our most important case, the order of summation actually does not matter; see Theorem 7.6.4 and the discussion afterwards.) Example 7.3.10. For 𝑉 = 𝐶 0 (𝑆 1 ) with the 𝐿2 inner product and ℬ𝑁 = {𝑒0 , 𝑒1 , 𝑒−1 , 𝑒2 , 𝑒−2 , … , 𝑒𝑁 , 𝑒−𝑁 } , (7.3.7) ̂ the 𝑛th Fourier coefficient 𝑓(𝑛) is exactly as defined in (6.2.1) (adjusting our numbering appropriately), the projection of 𝑓 onto the span of ℬ𝑁 is precisely the 𝑁th Fourier polynomial of 𝑓, and the generalized Fourier series with respect to the (infinite) orthonormal set ℬ = {𝑒0 , 𝑒1 , 𝑒−1 , 𝑒2 , 𝑒−2 , … } is precisely the usual Fourier series of 𝑓. Example 7.3.11. For 𝑉 = 𝐶 0 (𝑆 1 ) (real valued) with the 𝐿2 inner product and ℬ𝑁 = {1, cos(2𝜋𝑥), sin(2𝜋𝑥), cos(4𝜋𝑥), sin(4𝜋𝑥), … , cos(2𝑁𝜋𝑥), sin(2𝑁𝜋𝑥)} , (7.3.8) the generalized Fourier coefficients are 𝑎𝑛 and 𝑏𝑛 as defined in (6.3.10), and the projection of 𝑓 onto the span of ℬ𝑁 is given in (6.3.4). Note that the factors of 2 found in the formulas for 𝑎𝑛 and 𝑏𝑛 in (6.3.10) can be thought of as coming from the ⟨𝑢𝑛 , 𝑢𝑛 ⟩ factor 1 in (7.3.3) and the fact that ⟨cos(2𝜋𝑛𝑥), cos(2𝜋𝑛𝑥)⟩ = = ⟨sin(2𝜋𝑛𝑥), sin(2𝜋𝑛𝑥)⟩. 2 The key feature of the projection projℬ 𝑓 is that, in the linear-algebraic terminology of Definition 5.2.17, it is the vector in the span of ℬ that best approximates 𝑓, as measured by the 𝐿2 norm. To be precise, we have the following crucial theorem. Theorem 7.3.12 (Best Approximation Theorem). Let 𝑉 be an inner product space, let ℬ = {𝑢1 , … , 𝑢𝑁 } be an orthogonal set of nonzero vectors in 𝑉, and let 𝑓 be in 𝑉. (1) For 1 ≤ 𝑛 ≤ 𝑁, the vector 𝑓 − projℬ 𝑓 is orthogonal to 𝑢𝑛 . (2) For any 𝑐1 , … , 𝑐𝑁 ∈ 𝐂, we have 2

𝑁 𝑁 ‖ ‖ 2 2 ̂ − 𝑐𝑛 || ⟨𝑢𝑛 , 𝑢𝑛 ⟩ + ‖𝑓 − proj 𝑓‖ . ‖𝑓 − ∑ 𝑐𝑛 𝑢𝑛 ‖ = ∑ ||𝑓(𝑛) ℬ ‖ ‖ ‖ ‖ ‖ ‖ 𝑛=1 𝑛=1

(7.3.9)

(3) The vector projℬ 𝑓 is the unique element in the span of ℬ that is closest to 𝑓 in the 𝐿2 metric. (4) (Bessel’s inequality) We have that ‖‖projℬ 𝑓‖‖ ≤ ‖𝑓‖, or in other words, 𝑁

2

̂ || ⟨𝑢𝑛 , 𝑢𝑛 ⟩ ≤ ‖𝑓‖2 . ∑ ||𝑓(𝑛)

(7.3.10)

𝑛=1

In particular, claim (3) means that the 𝑁th Fourier polynomial of 𝑓 is the unique trigonometric polynomial of degree 𝑁 that is closest to 𝑓 in the 𝐿2 metric, providing a complete answer to Question 6.1.2. Compare also Theorem 7.1.11. Proof. Claims (1) and (2) are proven in Problem 7.3.4. As for claim (3), since the right2 hand side of (7.3.9) is a sum of nonnegative terms and ‖‖𝑓 − projℬ 𝑓‖‖ is independent 𝑁 ‖ ‖ ̂ of the 𝑐𝑛 , we see that ‖‖𝑓 − ∑ 𝑐𝑛 𝑢𝑛 ‖‖ is minimized if and only if 𝑐𝑛 = 𝑓(𝑛). Bessel’s ‖ ‖ 𝑛=1 inequality is then a special case of (7.3.9) (Problem 7.3.5).

154

Chapter 7. Hilbert spaces

We note here a strange fact about convergence of orthogonal series in the inner product metric that will come in handy later: Unlike the usual situation of convergence, where a series can get closer to and then farther away from its limit, orthogonal series always get closer. To be precise: Corollary 7.3.13 (Always Better Theorem). Let 𝑉 be an inner product space, and let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of nonzero vectors in 𝑣. Then for 𝑓 ∈ 𝑉 and 1 ≤ 𝐾 ≤ 𝑁, we have that 𝐾 𝑁 ‖ ‖ ‖ ‖ ̂ ̂ ‖. ‖ ≤ ‖𝑓 − ∑ 𝑓(𝑛)𝑢 ‖𝑓 − ∑ 𝑓(𝑛)𝑢 𝑛 𝑛 ‖ ‖ ‖ ‖ ‖ ‖ ‖ ‖ 𝑛=1 𝑛=1

(7.3.11)

In other words, as 𝑁 increases in the (generalized) Fourier series (7.3.5) (Definition 7.3.9), the 𝐿2 approximation to 𝑓 never gets worse, only better. Proof. Problem 7.3.6. We now finally define the key idea of this section. Definition 7.3.14. Let 𝑉 be an inner product space. To say that ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} ⊂ 𝑉 is an orthogonal basis means that ℬ is an orthogonal set of nonzero vectors and for any 𝑓 ∈ 𝑉, the generalized Fourier series of 𝑓 converges to 𝑓 in the inner product metric. In other words, the latter condition says that for any 𝑓 ∈ 𝑉, ∞

𝑁

̂ ̂ ∑ 𝑓(𝑛)𝑢 𝑛 = lim ∑ 𝑓(𝑛)𝑢𝑛 = 𝑓, 𝑁→∞

𝑛=1

(7.3.12)

𝑛=1

where convergence is in the inner product norm (𝐿2 norm). Similarly, to say that ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐙} ⊂ 𝑉 is an orthogonal basis means that ℬ is an orthogonal set of nonzero vectors, and, for any 𝑓 ∈ 𝑉, ̂ ∑ 𝑓(𝑛)𝑢 (7.3.13) 𝑛 = 𝑓. 𝑛∈𝐙

Finally, orthonormal bases are defined analogously, replacing “orthogonal set of nonzero vectors” with “orthonormal set.” Remark 7.3.15. Since Definition 7.3.14 is given for the general abstract setting of an inner product space, the only way to make sense of convergence in (7.3.12) and (7.3.13) is in the inner product norm. Nevertheless, since there will be other kinds of convergence possible in examples (see Remark 7.2.8), we carefully (if somewhat redundantly) specify convergence in the inner product norm in Definition 7.3.14. Remark 7.3.16. We will later show that if ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} ⊂ 𝑉 is an orthogonal basis for 𝑉, then as 𝑁 → ∞, Bessel’s inequality (Theorem 7.3.12) converges to an equality known as Parseval’s identity (Theorem 7.6.8). However, if we do not know that ℬ is an orthogonal basis, then Bessel’s inequality is the best we can do, even as 𝑁 → ∞. Example 7.3.17. Let 𝑉 = 𝐂𝑁 with the dot product, and let 𝑒𝑛 be the 𝑛th standard basis vector (1 ≤ 𝑛 ≤ 𝑁). Then ℬ = {𝑒𝑛 ∣ 1 ≤ 𝑛 ≤ 𝑁} is an orthonormal basis for 𝑉 (Problem 7.3.7).

7.3. Orthogonal sets and bases

155

Remark 7.3.18. More generally, the reader may recall from linear algebra that if ℬ is a finite subset of an inner product space 𝑉, then to say that ℬ is a basis for 𝑉 means that 𝑉 equals the span of ℬ and ℬ is linearly independent (Definition 5.2.16). However, since orthogonal sets of nonzero vectors are linearly independent (Problem 7.3.2), and for finite ℬ, the “series span” of ℬ defined by (7.3.12) is the same as the algebraic span from Definition 5.2.17, the two definitions are equivalent. See Appendix B, especially Remark B.10, for more about the relationship between linear-algebraic bases and orthogonal bases in general. Example 7.3.19. Let 𝑋 = 𝐍 or 𝐙, let 𝑉 = ℓ2 (𝑋) (Definition 5.3.2), and let 𝑒𝑛 ∈ 𝑉 be the sequence defined by 1 if 𝑥 = 𝑛, 𝑒𝑛 (𝑥) = { (7.3.14) 0 otherwise. In other words, let 𝑒𝑛 be the analogue in 𝑉 of the 𝑛th standard basis vector. Then ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝑋} is an orthonormal basis for 𝑉 (Problem 7.3.8). We will soon have much more to say about orthogonal bases. However, to take full advantage of them, we will need to consider not just inner product spaces, but inner product spaces with all of their “holes” filled in (see Example 7.2.19). This means that we can no longer put off discussing the Lebesgue integral, and we do so in the next section.

Problems. 7.3.1. Let 𝑉 be an inner product space, and let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthonormal set ∞

in 𝑉. Suppose that 𝑓 = ∑ 𝑎𝑛 𝑢𝑛 , where 𝑎𝑛 ∈ 𝐂 and convergence is in the inner 𝑛=1

product norm on 𝑉. Prove that 𝑎𝑘 = ⟨𝑓, 𝑢𝑘 ⟩. 7.3.2. Let 𝑉 be an inner product space, and let ℬ = {𝑢1 , … , 𝑢𝑁 } be an orthogonal set of nonzero vectors in 𝑉. Prove that ℬ is linearly independent (Definition 5.2.16). 7.3.3. (Proves Theorem 7.3.7) Prove that if {𝑢1 , … , 𝑢𝑛 } is an orthogonal set in an inner product space 𝑉, then 2 𝑛 ‖ ‖ 𝑛 2 ‖‖ ∑ 𝑢𝑘 ‖‖ = ∑ ‖𝑢𝑘 ‖ . (7.3.15) ‖ ‖ 𝑘=1

𝑘=1

7.3.4. (Proves Theorem 7.3.12) Let 𝑉 be an inner product space, let ℬ = {𝑢1 , … , 𝑢𝑁 } be an orthogonal set of nonzero vectors in 𝑉, and let 𝑓 be in 𝑉. (a) Prove that for 1 ≤ 𝑘 ≤ 𝑁, the vector 𝑓 − projℬ 𝑓 is orthogonal to 𝑢𝑘 . (b) For 𝑐1 , … , 𝑐𝑁 ∈ 𝐂, prove that 2

𝑁 𝑁 ‖ ‖ 2 2 ̂ − 𝑐𝑛 || ⟨𝑢𝑛 , 𝑢𝑛 ⟩ + ‖𝑓 − proj 𝑓‖ . ‖𝑓 − ∑ 𝑐𝑛 𝑢𝑛 ‖ = ∑ ||𝑓(𝑛) ℬ ‖ ‖ ‖ ‖ ‖ ‖ 𝑛=1 𝑛=1

(7.3.16)

7.3.5. (Proves Theorem 7.3.12) Let 𝑉 be an inner product space, let ℬ = {𝑢1 , … , 𝑢𝑁 } be an orthogonal set of nonzero vectors in 𝑉, and let 𝑓 be in 𝑉. Prove Bessel’s inequality ‖‖projℬ 𝑓‖‖ ≤ ‖𝑓‖.

156

Chapter 7. Hilbert spaces

7.3.6. (Proves Corollary 7.3.13) Let 𝑉 be an inner product space, and let ℬ = {𝑢𝑛 } be an orthogonal set of nonzero vectors in 𝑉. Prove that for any 𝑓 ∈ 𝑉 and 1 ≤ 𝐾 ≤ 𝑁, 𝑁 𝐾 ‖ ‖ ‖ ‖ ̂ ̂ ‖𝑓 − ∑ 𝑓(𝑛)𝑢 ‖ ‖ ‖ ∑ ≤ 𝑓 − 𝑓(𝑛)𝑢 𝑛‖ 𝑛‖ . ‖ ‖ ‖ ‖ ‖ ‖ 𝑛=1 𝑛=1

(7.3.17)

7.3.7. Let 𝑉 = 𝐂𝑁 with the dot product, and let 𝑒𝑛 be the 𝑛th standard basis vector (1 ≤ 𝑛 ≤ 𝑁). Prove that ℬ = {𝑒𝑛 ∣ 1 ≤ 𝑛 ≤ 𝑁} is an orthonormal basis for 𝑉. 7.3.8. (*) Let 𝑉 = ℓ2 (𝐍) (Definition 5.3.2), and let 𝑒𝑛 ∈ 𝑉 be defined by 𝑒𝑛 (𝑥) = {

1 if 𝑥 = 𝑛, 0 otherwise.

(7.3.18)

Prove that ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍} is an orthonormal basis for 𝑉, as follows. Fix 𝑓(𝑥) ∈ 𝑉. (a) Describe the generalized Fourier polynomial 𝑓𝑁 (𝑥) in terms of the coordinates of 𝑓(𝑥), and prove your description. (b) Prove that lim ‖𝑓 − 𝑓𝑁 ‖ = 0. 𝑁→∞

7.4 The Lebesgue integral: Measure zero Recall from Analysis I that the real numbers 𝐑 are what we get when we start with the rational numbers 𝐐 and fill in “holes” like √2 or 𝜋 using the completeness axiom (supremum property). Equivalently, but more abstractly, one can fill in the holes of 𝐐 by forcing Cauchy completeness (Definition 2.5.4) to hold; see, for example, Hewitt and Stromberg [HS75, Sec. 5]. As we have seen, like 𝐐, the space of Riemann integrable functions also has “holes”: •

It is possible to have a sequence of Riemann integrable functions whose pointwise limit is not Riemann integrable (Example 4.2.5).

If we look at the space 𝑉 = 𝐶 0 ([𝑎, 𝑏]) of continuous functions on a closed and bounded interval under the 𝐿2 metric, we see that 𝑉 is not complete as a metric space, just like 𝐐 (Example 7.2.19).

We can fill in those “holes” of the Riemann integral by defining what is known as the Lebesgue integral. However, to cover the Lebesgue integral properly requires a book of its own; see, for example, Johnston [Joh15], Nelson [Nel15], and Royden and Fitzpatrick [RF10], or at a more sophisticated level, Rudin [Rud86]. Therefore, instead of developing the Lebesgue integral in detail, we will simply assume the properties of the Lebesgue integral that we will require as axioms and leave the details of its definition and the proof of its properties for the reader to learn elsewhere. (Again: “Dessert first.”) Now, truth be told, the reader does need to know a few details to understand the value of the Lebesgue integral. In particular, while the reader does not need to understand the measure of a subset of 𝐑, which would be the first step in most full developments of the Lebesgue integral, we do need to discuss the idea of a set of measure zero, that is, a subset of the domain of a function 𝑓 that “doesn’t matter” when we take the integral of 𝑓.

7.4. The Lebesgue integral: Measure zero

157

We begin with the following definitions. Definition 7.4.1. We define the length of an open interval (𝑎, 𝑏) to be ℓ((𝑎, 𝑏)) = 𝑏 − 𝑎. For 𝐸 ⊆ 𝐑, we define a countable open cover of 𝐸 to be a countable collection {𝑈𝑖 } of open intervals whose union contains 𝐸 (i.e., 𝐸 ⊆ ⋃𝑖∈𝐍 𝑈𝑖 ). Definition 7.4.2. To say that 𝐸 ⊆ 𝐑 has measure zero means that for any 𝜖 > 0, there ∞

exists some open cover {𝑈𝑖 } of 𝐸 such that ∑ ℓ(𝑈𝑖 ) < 𝜖. (Note that since ∑ ℓ(𝑈𝑖 ) is 𝑖=1

𝑖=1

a sum with nonnegative terms, the order of summation does not matter; see Appendix A.) Definition 7.4.3. For 𝑋 ⊆ 𝐑, to say that a statement is true almost everywhere, or a.e., in 𝑋, means that the set of points in 𝑋 where the statement does not hold has measure 0. Phrases like almost all points in 𝑋 are defined similarly. For example, for 𝑓, 𝑔 ∶ 𝑋 → 𝐂, to say that 𝑓(𝑥) = 𝑔(𝑥) a.e. means that the set {𝑥 ∈ 𝑋 ∣ 𝑓(𝑥) ≠ 𝑔(𝑥)} has measure zero. The reason the reader will need some idea of what a set of measure zero is like is that if we use Lebesgue integration to prove a result about functions with domain 𝑋, then by the very nature of Lebesgue integration, those results hold only “up to sets of measure zero” in 𝑋. Moreover, with respect to Lebesgue integration, functions themselves are more formally considered as equivalence classes under the relation of being equal almost everywhere. We therefore present several results (Theorems 7.4.5 and 7.4.8, Corollary 7.4.9) designed to convey some intuition about sets of measure zero. We begin with a few concrete examples. Example 7.4.4. A set consisting of a single point of 𝐑 has measure zero (Problem 7.4.1). More generally, any finite subset of 𝐑 has measure zero (Problem 7.4.2). Theorem 7.4.5. Let 𝐸 = {𝑥𝑖 ∈ 𝐑 ∣ 𝑖 ∈ 𝐍} be a countably infinite subset of 𝐑. Then 𝐸 has measure zero. 𝜖 Proof. Suppose 𝜖 > 0. For 𝑖 ∈ 𝐍, let 𝑈𝑖 be an interval of length 𝑛+1 containing 𝑥𝑖 ; 2 𝜖 𝜖 i.e., 𝑈𝑖 = (𝑥𝑖 − 𝑛+2 , 𝑥𝑖 + 𝑛+2 ). Then {𝑈𝑖 } is an open cover of 𝐸 with 2 2 ∞

𝜖 𝜖 = < 𝜖. 𝑛+1 2 2 𝑖=1

∑ ℓ(𝑈𝑖 ) = ∑ 𝑖=1

(7.4.1)

The theorem follows. So for example, any subset of the rationals has measure zero. The following example, however, shows that a set of measure zero need not be countable. Example 7.4.6 (Cantor “middle thirds” set). Let 𝐸0 = [0, 1], 𝐸1 = [0, 1/3] ∪ [2/3, 1], 𝐸2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1], and in general, given 𝐸𝑛 , let 𝐸𝑛+1 be the set obtained from 𝐸𝑛 by deleting the middle third of each closed interval in 𝐸𝑛 , as shown in Figure 7.4.1. Then for any 𝑛 ∈ 𝐍, 𝐸𝑛 can be covered by a collection of

158

Chapter 7. Hilbert spaces

E0 E1 E2 E3 Figure 7.4.1. Construction of the Cantor “middle thirds” set open intervals, each slightly larger than a corresponding closed interval in 𝐸𝑛 , whose ∞

lengths total at most 2(2/3)𝑛 (say). Taking the limit as 𝑛 → ∞, we see that 𝐸 =

𝐸𝑛

𝑛=0

has measure zero. Nevertheless, it can be shown that 𝐸 is uncountable; more precisely, 𝐍 we can define a bijection from the uncountable set {0, 2} to 𝐸 (Problem 7.4.6). In addition, an argument similar to the proof of Theorem 7.4.5 shows that a countable union of (not necessarily countable) sets of measure zero still has measure zero (Problem 7.4.3). It follows that sets of measure zero can be spread everywhere in 𝐑, and at the same time, quite large in cardinality. Nevertheless, it is still true that any set of measure zero amounts to “almost nothing.” In a course on measure theory, this is shown by defining the measure, or generalized length, of a large class of subsets of 𝐑 and proving that the measure of any set is unaffected by removing a set of measure zero. We will instead settle for a much weaker statement (Theorem 7.4.8), for which we need the following “obvious” but not so conveniently proven lemma. Lemma 7.4.7. Let (𝐴, 𝐵) be an open interval in 𝐑. If {𝑈𝑖 } is a countable collection of open intervals, each contained in (𝐴, 𝐵), then there exists a countable collection {𝑉𝑗 } of bounded open intervals such that: (1) The 𝑉𝑗 are pairwise disjoint, or in other words, for 𝑗 ≠ 𝑘, 𝑉𝑗 ∩ 𝑉𝑘 = ∅. ∞

(2)

⋃ 𝑗=1 ∞

𝑉𝑗 =

𝑈𝑖 .

𝑖=1

(3) ∑ ℓ(𝑉𝑗 ) ≤ ∑ ℓ(𝑈𝑖 ). 𝑗=1

𝑖=1

We hope that the idea of Lemma 7.4.7 seems “obvious”: By merging intervals of {𝑈𝑖 } with nonempty intersection, we may assume that the intervals in a countable open cover are disjoint without increasing the total length. And indeed, the reader should feel comfortable skipping the proof on a first reading. However, we include the proof here to illustrate one of the difficulties of measure theory: It can be tricky to prove “obvious” statements about things like countable collections of intervals for several reasons — not least of which is that occasionally those “obvious” statements are not true. Proof. Fix a countable collection of open intervals {𝑈𝑖 }. For the purposes of this proof only, we define an interval chain of length 𝑘 ≥ 0 to be a finite sequence of open intervals

7.4. The Lebesgue integral: Measure zero

159

{𝑊0 , … , 𝑊𝑘 } ⊆ {𝑈𝑖 } such that for 0 ≤ 𝑖 ≤ 𝑘 − 1, 𝑊𝑖 ∩ 𝑊𝑖+1 ≠ ∅. To say that open intervals 𝑈 and 𝑈 ′ are connected by an interval chain means that there exists an interval chain {𝑊0 , … , 𝑊𝑘 } such that 𝑈 = 𝑊0 and 𝑊𝑘 = 𝑈 ′ . Define a relation ∼ on {𝑈𝑖 } by saying that 𝑈 ∼ 𝑈 ′ exactly when 𝑈 and 𝑈 ′ are connected by an interval chain. We claim that: Claim 1. ∼ is an equivalence relation on {𝑈𝑖 }. Furthermore, if 𝒞 and 𝒞 ′ are different equivalence classes under ∼, then (

𝑈) ∩ (

𝑈∈𝒞

⋃ ′ ′

𝑈 ′ ) = ∅.

(7.4.2)

𝑈 ∈𝒞

Claim 1 is proven in Problem 7.4.4. We next claim: Claim 2. If {𝑊0 , … , 𝑊𝑘 } is an interval chain, where 𝑊𝑖 = (𝑎𝑖 , 𝑏𝑖 ), 𝑎 = 𝑘

min(𝑎0 , … , 𝑎𝑘 ), and 𝑏 = max(𝑏0 , … , 𝑏𝑘 ), then

𝑊𝑖 is precisely the

𝑖=0

open interval (𝑎, 𝑏), and 𝑘

∑ ℓ(𝑊𝑖 ) ≥ 𝑏 − 𝑎,

(7.4.3)

𝑖=0

the length of the single interval (𝑎, 𝑏). Claim 2 is proven in Problem 7.4.5. Our final claim is: Claim 3. Suppose {𝑈𝑖 } is a countable collection of open intervals 𝑈𝑖 = (𝑎𝑖 , 𝑏𝑖 ), all of which are equivalent under ∼. Let 𝑎 = inf {𝑎𝑖 } and 𝑏 = sup {𝑏𝑖 }, both of which are finite because 𝐴 and 𝐵 are lower and upper bounds for {𝑎𝑖 } and {𝑏𝑖 }, respectively. Then ∞

𝑈𝑛 = (𝑎, 𝑏),

(7.4.4)

𝑛=1 ∞

∑ ℓ(𝑈𝑛 ) ≥ 𝑏 − 𝑎.

(7.4.5)

𝑛=1 ∞

To prove Claim 3, on the one hand, since 𝑎 ≤ 𝑎𝑛 < 𝑏𝑛 ≤ 𝑏 for all 𝑛 ∈ 𝐍,

𝑈𝑛 ⊆

𝑛=1

(𝑎, 𝑏). On the other hand, suppose 𝑎 < 𝑥 < 𝑦 < 𝑏. By the Arbitrarily Close Criterion, Theorem 2.1.5, there exist 𝑘, 𝑛 ∈ 𝐍 such that 𝑎 ≤ 𝑎𝑘 < 𝑥 < 𝑦 < 𝑏𝑛 ≤ 𝑏. Since all of the 𝑈𝑖 are equivalent, let {𝑊0 , … , 𝑊𝑚 } be an interval chain from 𝑊0 = (𝑎𝑘 , 𝑏𝑘 ) to 𝑚

𝑊𝑚 = (𝑎𝑛 , 𝑏𝑛 ), and by Claim 2, let 𝑏𝑛 ≤ 𝑏′ ≤ 𝑏. Then

𝑊𝑖 = (𝑎′ , 𝑏′ ), where 𝑎 ≤ 𝑎′ ≤ 𝑎𝑘 < 𝑥 < 𝑦
0, 𝑛=1

and (7.4.5) follows. Turning to the full statement of Lemma 7.4.7, let the 𝑉𝑗 be the unions of the (countably many) equivalence classes of {𝑈𝑖 }. Since each 𝑉𝑗 is a bounded open interval by (7.4.4) in Claim 3, it remains to verify the various statements in the conclusion of the lemma. However, statement (1) of the lemma follows because unions of different equivalence classes are disjoint (Claim 1), statement (2) follows by construction of the 𝑉𝑗 , and statement (3) follows by (7.4.5) in Claim 3 and rearranging the order of ∞

summation in ∑ ℓ(𝑈𝑖 ). The lemma follows. 𝑖=1

Theorem 7.4.8. If 𝐸 is a set of measure zero and (𝑎, 𝑏) is any open interval in 𝐑, then (𝑎, 𝑏) is not contained in 𝐸. Proof. Proceeding by contradiction, suppose (𝑎, 𝑏) ⊆ 𝐸, and suppose {𝑈𝑖 } is an open cover of 𝐸 (and therefore, of (𝑎, 𝑏)) with total length less than (𝑏 − 𝑎)/2. Without either changing the fact that {𝑈𝑖 } is an open cover of (𝑎, 𝑏) or increasing the length of any interval 𝑈𝑖 , we may assume that no left endpoint of an interval 𝑈𝑖 is less than 𝑎 and no right endpoint of an interval 𝑈𝑖 is greater than 𝑏. Therefore, applying Lemma 7.4.7, we may also assume that the elements of {𝑈𝑖 } are pairwise disjoint. Then on the one hand, if we have some 𝑈𝑖 = (𝑎𝑖 , 𝑏𝑖 ) with 𝑏𝑖 < 𝑏, then the fact that the 𝑈𝑖 are pairwise disjoint means that the point 𝑏𝑖 cannot be contained in any other 𝑈𝑗 , and similarly, if 𝑎 < 𝑎𝑖 , then the point 𝑎𝑖 cannot be contained in any other 𝑈𝑗 . On the other hand, if some 𝑈𝑖 = (𝑎, 𝑏), that contradicts our assumption of total length less than (𝑏 − 𝑎)/2. Either way, we have a contradiction, and the theorem follows. Corollary 7.4.9. Suppose 𝑋 = [𝑎, 𝑏] or 𝐑 and for some 𝑓, 𝑔 ∶ 𝑋 → 𝐂, we have that 𝑓(𝑥) = 𝑔(𝑥) for almost all 𝑥 ∈ 𝑋. Then for 𝑐 ∈ 𝑋, if 𝑓 and 𝑔 are continuous at 𝑐, then 𝑓(𝑐) = 𝑔(𝑐). In particular, if 𝑓 and 𝑔 are both continuous on 𝑋 and equal a.e. on 𝑋, then they are equal at every point of 𝑋. Note that Corollary 7.4.9 also applies to functions on 𝑋 = 𝑆 1 , as such functions are really defined on 𝐑. Proof. Considering ℎ(𝑥) = 𝑓(𝑥) − 𝑔(𝑥), it suffices to show that if ℎ(𝑥) = 0 a.e. on 𝑋 and ℎ is continuous at 𝑐, then ℎ(𝑐) = 0. However, in that case, by Theorem 7.4.8, 1 1 for every 𝑛, there is a point 𝑐𝑛 ∈ (𝑐 − , 𝑐 + ) such that ℎ(𝑐𝑛 ) = 0, and so by the 𝑛 𝑛 sequential definition of continuity, ℎ(𝑐) = lim ℎ(𝑐𝑛 ) = 0. 𝑛→∞

The corollary follows.

(7.4.8)

7.4. The Lebesgue integral: Measure zero

161

Problems. 7.4.1. For 𝐸 = {𝑥0 } ⊂ 𝐑, prove that 𝐸 has measure zero. 7.4.2. For 𝐸 = {𝑥1 , … , 𝑥𝑛 } ⊂ 𝐑, prove that 𝐸 has measure zero. 7.4.3. Suppose that 𝐸𝑖 ⊂ 𝐑 has measure zero for all 𝑖 ∈ 𝐍. Prove that ⋃𝑖∈𝐍 𝐸𝑖 has measure zero (i.e., a countable union of sets of measure zero still has measure zero). 7.4.4. (Proves Lemma 7.4.7) Let {𝑈𝑖 } be a collection of open intervals, and define a relation ∼ on {𝑈𝑖 } as in the proof of Lemma 7.4.7. (a) Prove that ∼ is an equivalence relation. (Recall that this means proving that ∼ is reflective, symmetric, and transitive.) (b) Prove that if 𝒞 and 𝒞 ′ are equivalence classes under ∼ and (⋃𝑈∈𝒞 𝑈) ∩ (⋃𝑈 ′ ∈𝒞′ 𝑈 ′ ) is nonempty, then 𝒞 = 𝒞 ′ . 7.4.5. (Proves Lemma 7.4.7) Let 𝑊𝑖 = (𝑎𝑖 , 𝑏𝑖 ) for 0 ≤ 𝑖 ≤ 𝑘, and suppose that 𝑊𝑖 ∩𝑊𝑖+1 ≠ ∅ for 0 ≤ 𝑖 ≤ 𝑘 − 1. (a) Prove that if 𝑊 ′ = (𝑎′ , 𝑏′ ) and 𝑊 ″ = (𝑎″ , 𝑏″ ) are open intervals such that 𝑊 ′ ∩ 𝑊 ″ ≠ ∅, then 𝑊 ∪ 𝑊 ′ = (𝑎, 𝑏), where 𝑎 = min(𝑎′ , 𝑎″ ) and 𝑏 = max(𝑏′ , 𝑏″ ), and 𝑏 − 𝑎 ≤ (𝑏′ − 𝑎′ ) + (𝑏″ − 𝑎″ ). (7.4.9) 𝑘

(b) Now let 𝑎 = min(𝑎0 , … , 𝑎𝑘 ) and 𝑏 = max(𝑏0 , … , 𝑏𝑘 ). Prove that ⋃𝑖=0 𝑊𝑖 is precisely the open interval (𝑎, 𝑏) and 𝑘

∑ ℓ(𝑊𝑖 ) ≥ 𝑏 − 𝑎.

(7.4.10)

𝑖=0

7.4.6. (*) Let 𝐸0 = [0, 1], 𝐸1 = [0, 1/3] ∪ [2/3, 1],

(7.4.11)

𝐸2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1], ⋮

and in general, let 𝐸𝑛+1 be obtained from 𝐸𝑛 by deleting the middle third of each ∞

closed interval in 𝐸𝑛 . Define the Cantor set 𝐸 to be 𝐸 =

𝐸𝑛 .

𝑛=0 𝐍

𝐍

Let {0, 2} be the set of all sequences in the set {0, 2}, and define 𝑓 ∶ {0, 2} → 𝐑 by ∞

𝑎 𝑎𝑛 𝑎 𝑎 = 1 + 22 + 33 + ⋯ . 𝑛 3 3 3 3 𝑛=1

𝑓(𝑎𝑛 ) = ∑

(7.4.12) 𝐍

(a) Prove that the series (7.4.12) converges for every 𝑎𝑛 ∈ {0, 2} . 𝐍

𝑁

𝑎𝑛 is the 3𝑛 𝑛=1

(b) Prove that for any 𝑎𝑛 ∈ {0, 2} and any 𝑁 ∈ 𝐍, the partial sum ∑ left-hand endpoint of a closed interval in 𝐸𝑁 . 𝐍

(c) Prove that for any 𝑎𝑛 ∈ {0, 2} , 𝑓(𝑎𝑛 ) ∈ 𝐸.

162

Chapter 7. Hilbert spaces 𝐍

(d) Prove that for 𝑎𝑛 , 𝑏𝑛 ∈ {0, 2} , if 𝑎𝑁 ≠ 𝑏𝑁 for some 𝑁 ∈ 𝐍, then 𝑓(𝑎𝑛 ) ≠ 𝑓(𝑏𝑛 ). (In other words, 𝑓 is one-to-one.) 𝐍 (e) Prove that for any 𝑦 ∈ 𝐸, there exists some 𝑎𝑛 ∈ {0, 2} such that 𝑓(𝑎𝑛 ) = 𝑦. (In other words, 𝑓 is onto.)

7.5 The Lebesgue integral: Axioms With some intuition about sets of measure zero in hand, we now state the axiomatic properties of Lebesgue integration that we require. Throughout, let 𝑋 = [𝑎, 𝑏] or 𝐑, and while we will get into a fair amount of detail, keep in mind that our end goal is to extend the definition of the Riemann integral to a much wider class of functions.

7.5.1 Axioms about integration. Our first axiom describes the class of functions we can consider. Lebesgue Axiom 1 (Measurable functions). There exists a function space ℳ(𝑋) on 𝑋, called the space of measurable functions on 𝑋, with the following properties. (1) (Riemann integrable implies measurable) If 𝑓 ∶ 𝑋 → 𝐂 is Riemann integrable on every closed and bounded subinterval of 𝑋, then 𝑓 is measurable. In particular, if 𝑓 is continuous, then 𝑓 is measurable, and if 𝑋 = [𝑎, 𝑏], then any Riemann integrable function is measurable. (2) (Closed under algebraic operations) If 𝑓, 𝑔 ∈ ℳ(𝑋), then 𝑓(𝑥)𝑔(𝑥), 𝑓(𝑥), and |𝑓(𝑥)| are measurable. Note that since ℳ(𝑋) is a function space, 𝑓(𝑥) + 𝑔(𝑥) and 𝑐𝑓(𝑥) (𝑐 ∈ 𝐂) are also measurable, as are the real and imaginary parts of a measurable function. (3) (Closed under limits) If 𝑓(𝑥) = lim 𝑓𝑛 (𝑥) and each 𝑓𝑛 ∈ ℳ(𝑋), then 𝑓 is measur𝑛→∞

able. (4) (Nonnegative integral) If 𝑓 ∈ ℳ(𝑋) and 𝑓 is real-valued and nonnegative, then there exists an extended nonnegative real number (i.e., either a nonnegative real number or the symbol +∞) ∫ 𝑓, called the Lebesgue integral of 𝑓 on 𝑋. In partic𝑋

ular, for any measurable function 𝑓 ∈ ℳ(𝑋), ∫ |𝑓| is well-defined. 𝑋

(5) (Monotonicity) For real-valued nonnegative 𝑓, 𝑔 ∈ ℳ(𝑋), if 𝑓(𝑥) ≤ 𝑔(𝑥) for all 𝑥 ∈ 𝑋, then ∫ 𝑓 ≤ ∫ 𝑔. 𝑋

𝑋

Note that one of the main advantages of considering measurable functions, and the Lebesgue integral in general, is that property (3) of Axiom 1 does not hold for differentiable functions, continuous functions, or Riemann integrable functions. (See the six NO’s in Section 4.2.) In any case, we may now define the analogue of (Riemann) integrability for the Lebesgue integral. Definition 7.5.1. To say that 𝑓 ∶ 𝑋 → 𝐂 is Lebesgue integrable, or simply integrable, means that 𝑓 ∈ ℳ(𝑋) and ∫ |𝑓| is finite. We also define 𝐿1 (𝑋) to be the set of all 𝑋

7.5. The Lebesgue integral: Axioms

163

Lebesgue integrable functions on 𝑋. More generally, for finite 𝑝 ≥ 1, we define 𝐿𝑝 (𝑋) 𝑝

to be the set of all 𝑓 ∈ ℳ(𝑋) such that ∫ |𝑓| is finite. 𝑋 𝑝

Remark 7.5.2. The reader may recall that we first encountered ∫ |𝑓| as a way of 𝑋

measuring the size of a function back in Section 5.1, where it was used to measure the average difference between two clocks. More precisely, for 1 ≤ 𝑝 < ∞, if we define 1/𝑝 𝑝

‖𝑓‖𝑝 = (∫ |𝑓| )

(7.5.1)

𝑋

to be the 𝐿𝑝 norm of 𝑓, it can be shown that ‖𝑓‖𝑝 is indeed a norm (Definition 7.2.1) on 𝐿𝑝 (𝑋). (For a proof, see, for example, Rudin [Rud86, Ch. 3].) We also note that we can define 𝐿∞ (𝑋) by extending (7.2.1) (Example 7.2.3) from 0 𝐶 (𝑋) to ℳ(𝑋) by, roughly speaking, ignoring subsets of 𝑋 of measure zero when computing ‖𝑓‖∞ = sup {|𝑓(𝑥)| ∣ 𝑥 ∈ 𝑋}. (Again, see Rudin [Rud86, Ch. 3] for details.) We can also now justify the notation ‖𝑓‖∞ by the fact that when 𝑓 ∈ 𝐿𝑞 (𝑥) for some 1 ≤ 𝑞 < ∞, then lim ‖𝑓‖𝑝 = ‖𝑓‖∞ . (7.5.2) 𝑝→∞

See Problem 7.5.1 for a proof in a special case. Our next main axiom is that for functions in 𝐿1 (𝑋), the Lebesgue integral has many of the same properties as the Riemann integral. To be precise: Lebesgue Axiom 2 (Integral properties). The set 𝐿1 (𝑋) (Definition 7.5.1) is a subspace of ℳ(𝑋), and for any 𝑓 ∈ 𝐿1 (𝑋), there exists a complex number ∫ 𝑓 called the Lebesgue 𝑋 1

integral of 𝑓. Furthermore, 𝐿 (𝑋) and ∫ 𝑓 satisfy the following properties. 𝑋

(1) (Extends nonnegative integral) If 𝑓 ∈ 𝐿1 (𝑋) and 𝑓 is real and nonnegative, then ∫ 𝑓 has the same value as the Lebesgue integral from Lebesgue Axiom 1. 𝑋

(2) (Extends Riemann integral) If 𝑓 ∶ 𝑋 → 𝐂 is Riemann integrable on some [𝑎, 𝑏] ⊆ 𝑋 and 𝑓(𝑥) = 0 for all 𝑥 ∉ [𝑎, 𝑏], then 𝑓 ∈ 𝐿1 (𝑋) and 𝑏

∫ 𝑓 = ∫ 𝑓(𝑥) 𝑑𝑥, 𝑋

(7.5.3)

𝑎

where the right-hand side is the Riemann integral. In particular, if 𝑋 = [𝑎, 𝑏], (7.5.3) holds for all Riemann integrable 𝑓. (3) (Linearity) If 𝑓, 𝑔 ∈ 𝐿1 (𝑋) and 𝑎, 𝑏 ∈ 𝐂, then ∫ 𝑎𝑓 + 𝑏𝑔 = 𝑎 ∫ 𝑓 + 𝑏 ∫ 𝑔. 𝑋

𝑋

𝑋

(4) (Additivity of domain) If 𝑋 = [𝑎, 𝑏], 𝑌 = [𝑏, 𝑐], and 𝑍 = [𝑎, 𝑐], and 𝑓 ∈ 𝐿1 (𝑋) and 𝑓 ∈ 𝐿1 (𝑌 ) when restricted to those domains, then ∫ 𝑓 = ∫ 𝑓 + ∫ 𝑓. 𝑍

𝑋

𝑌

164

Chapter 7. Hilbert spaces

| | (5) (Conjugates and absolute value) If 𝑓 ∈ 𝐿1 (𝑋), then ∫ 𝑓 = ∫ 𝑓 and |∫ 𝑓| ≤ ∫ |𝑓|. | 𝑋 | 𝑋 𝑋 𝑋 Note that Lebesgue Axioms 1 and 2 together imply that monotonicity (Lebesgue Axiom 1, property (5)) holds for all real-valued (but not necessarily nonnegative) 𝑓, 𝑔 ∈ 𝐿1 (𝑋); see Problem 7.5.2. We also assume that the Lebesgue integral ignores values on any set of measure zero in the domain, or more precisely: Lebesgue Axiom 3 (Measure zero). For any 𝑓 ∈ ℳ(𝑋), we have following properties of ∫ 𝑓, either in the complex sense or the nonnegative sense. 𝑋

(1) (Up to measure zero) If 𝑓 = 𝑔 almost everywhere in 𝑋, then 𝑔 is also measurable; and if we also have that 𝑓 ∈ 𝐿1 (𝑋), then 𝑔 ∈ 𝐿1 (𝑋) and ∫ 𝑓 = ∫ 𝑔. In other 𝑋

𝑋

words, ∫ 𝑓 is “only defined up to sets of measure zero.” 𝑋

(2) (Zero integral of nonnegative implies zero a.e.) If 𝑓 is real-valued and nonnegative and ∫ 𝑓 = 0, then 𝑓 = 0 almost everywhere in 𝑋. 𝑋

We further assume that Lebesgue integral has the following convergence properties. Lebesgue Axiom 4 (Convergence properties). Let 𝑓𝑛 ∶ 𝑋 → 𝐂 be a sequence in ℳ(𝑋), and let 𝑓 ∶ 𝑋 → 𝐂 be a function such that lim 𝑓𝑛 (𝑥) = 𝑓(𝑥) a.e. in 𝑋. 𝑛→∞

(1) (Monotone convergence) If the 𝑓𝑛 are nonnegative real measurable and 𝑓𝑛 (𝑥) ≤ 𝑓𝑛+1 (𝑥) for all 𝑥 ∈ 𝑋, then ∫ 𝑓 = lim ∫ 𝑓𝑛 . 𝑛→∞

𝑋

𝑋

(2) (Dominated convergence) If there exists some Lebesgue integrable 𝑔 ∶ 𝑋 → 𝐑 such that |𝑓𝑛 (𝑥)| ≤ 𝑔(𝑥) for all 𝑛 ∈ 𝐍 and 𝑥 ∈ 𝑋, then ∫ 𝑓 = lim ∫ 𝑓𝑛 . (This is 𝑋

𝑛→∞

𝑋

called “dominated convergence” because 𝑔(𝑥) dominates each 𝑓𝑛 (𝑥).)

At this point, we pause in our axiomatic description of the Lebesgue integral to observe the following computationally useful consequence of our axioms so far: Lebesgue integrals on 𝐑 can often be computed as improper Riemann integrals (Section 4.8). Theorem 7.5.3. Suppose 𝑓 ∶ 𝐑 → 𝐂 is locally integrable (Definition 4.8.1) and 𝑓 ∈ 𝐿1 (𝐑). Then the Lebesgue integral of 𝑓 on 𝐑 may be computed by an improper Riemann integral: ∞

∫ 𝑓 = ∫ 𝑓(𝑥) 𝑑𝑥, 𝐑

(7.5.4)

−∞

in the sense of Definition 4.8.2. Again, as with Lebesgue Axiom 2, the main point is that the Lebesgue integral on 𝐑 is often the same as the Riemann integral on 𝐑, in terms of practical computation; the advantage of the Lebesgue integral is its superior theoretical properties.

7.5. The Lebesgue integral: Axioms

165

|𝑓| ≤ ∫ |𝑓|, by nonnegativity and additivity of do-

Proof. Fix 𝑎 ∈ 𝐑. Since ∫ [𝑎,+∞)

𝐑

main (Lebesgue Axioms 1 and 2), we see that 𝑓 ∈ 𝐿1 ([𝑎, +∞)). Let 𝑓(𝑥) 𝑓𝑛 (𝑥) = { 0

if 𝑎 ≤ 𝑥 ≤ 𝑛, if 𝑥 > 𝑛.

(7.5.5)

Since |𝑓𝑛 (𝑥)| converges pointwise to |𝑓(𝑥)| and |𝑓𝑛 (𝑥)| ≤ |𝑓(𝑥)|, Lebesgue Axiom 4 implies that ∫

|𝑓| = lim ∫ 𝑛→∞

[𝑎,+∞)

|𝑓𝑛 | .

(7.5.6)

[𝑎,+∞)

Therefore, given 𝜖 > 0, there exists some 𝑁 such that 𝜖>∫

|𝑓| .

(|𝑓| − |𝑓𝑁 |) = ∫

(7.5.7)

[𝑁,+∞)

[𝑎,+∞)

For 𝑏 > 𝑁, by Lebesgue Axiom 2, we then have 𝜖>∫ [𝑁,+∞)

𝑏 | | | | |𝑓| ≥ |∫ 𝑓| = ||∫ 𝑓 − ∫ 𝑓(𝑥) 𝑑𝑥|| . | [𝑏,+∞) | | [𝑎,+∞) | [𝑏,+∞) 𝑎

|𝑓| ≥ ∫

(7.5.8)

𝑏

It follows by the definition of limit that ∫

𝑓 = lim ∫ 𝑓. The theorem follows

[𝑎,+∞)

𝑏→+∞

𝑎

by applying a similar argument with 𝑎 → −∞. The reader may also find it helpful to see the following examples (Examples 7.5.4– 7.5.6) of functions that are not Riemann integrable on 𝑋 but, because of our axioms, must be in 𝐿1 (𝑋). Example 7.5.4. Let 𝑋 = [0, 1], and recall the following non-Riemann integrable function 𝑓 ∶ 𝑋 → 𝐑 from Problem 3.3.5: 1 if 𝑥 is rational, 𝑓(𝑥) = { 0 if 𝑥 is irrational.

(7.5.9)

Problem 7.5.3 shows that 𝑓 is Lebesgue integrable and ∫ 𝑓 = 0. 𝑋

Example 7.5.5. Let 𝑋 = [0, 1], and consider 𝑓 ∶ 𝑋 → 𝐑 defined by 𝑓(𝑥) = 1/√𝑥. Note that 𝑓 is not actually defined at 𝑥 = 0, but since a single point has measure zero, that makes no difference. Since 𝑓 is not bounded, it is not Riemann integrable, but as shown in Problem 7.5.4, by our axioms, the Lebesgue integral of 𝑓 on 𝑋 is 1

∫ 𝑓 = lim+ ∫ 𝑥−1/2 𝑑𝑥 = 2. 𝑋

𝑎→0

(7.5.10)

𝑎

Example 7.5.6. Similarly, let 𝑋 = [1, +∞], and consider 𝑓 ∶ 𝑋 → 𝐑 defined by 𝑓(𝑥) = 1/𝑥2 . Since 𝑋 is not a finite interval, the Riemann integral of 𝑓 on 𝑋 is not defined, but as shown in Problem 7.5.5, by our axioms, the Lebesgue integral of 𝑓 on 𝑋 is 𝑏

∫ 𝑓 = lim ∫ 𝑥−2 𝑑𝑥 = 1. 𝑋

𝑏→+∞

1

(7.5.11)

166

Chapter 7. Hilbert spaces

We hope Examples 7.5.4–7.5.6 give the reader an idea of the kinds of singularities allowed in Lebesgue integrable functions. For other results along the same lines, see Problem 7.5.3 and Theorem 7.5.13 below. Remark 7.5.7. In conclusion, the reader seeing this material for the first time should take away two main ideas from this section: (1) If a function is Riemann integrable on 𝑋, then its Lebesgue integral on 𝑋 exists and has the same value. (2) The main advantage of Lebesgue integration that we have so far is that the monotone convergence and dominated convergence properties give conditions under which ∫ commutes with lim . 𝑋

𝑛→∞

Remark 7.5.8. Interestingly, using the ideas developed in this section, we can now state a precise description of the relationship between Riemann integrability and continuity: Let 𝑓 ∶ [𝑎, 𝑏] → 𝐂 be bounded. Then 𝑓 is Riemann integrable if and only if 𝑓 is continuous almost everywhere on [𝑎, 𝑏]. The proof relies heavily on the details of Lebesgue integration, so we refer the reader to Rudin [Rud76, Thm. 11.33]. (Though see the proof of Lemma 3.4.7 and Figure 3.4.1 for an idea of what this looks like.)

7.5.2 Axioms about 𝐿2 . Our primary motivation for introducting Lebesgue integration is to be able to use the function space 𝐿2 (𝑋) (Definition 7.5.1). Note that for 𝑋 = [𝑎, 𝑏] or 𝑆 1 , 𝐿2 (𝑋) includes all Riemann integrable functions (Lebesgue Axiom 2), and for 𝑋 = 𝐑, 𝐿2 (𝑋) includes all functions such that the improper (Riemann) integral ∞

2

∫ |𝑓(𝑥)| 𝑑𝑥 is finite (Theorem 7.5.3). More precisely, if 𝑋 = [𝑎, 𝑏] or 𝑆 1 and ℛ(𝑋) −∞

denotes the subspace of all Riemann integrable functions on 𝑋, then 𝐿2 (𝑋) ⊃ ℛ(𝑋) ⊃ 𝐶 0 (𝑋) ⊃ 𝐶 1 (𝑋) ⊃ 𝐶 2 (𝑋) ⊃ ⋯ ⊃ 𝐶 ∞ (𝑋). 2

(7.5.12)

1

In other words, spaces of the form 𝐿 (𝑋) (𝑋 = [𝑎, 𝑏], 𝑆 , or 𝐑) include nearly every specific example of a function that we have discussed so far, including every function in the Schwartz space 𝒮(𝐑) (Problem 7.5.6). The difference is that 𝐿2 (𝑋) also includes other functions that fill the “holes” in (for example) 𝐶 0 (𝑋). Remark 7.5.9. We again remind the reader that functions in 𝐿2 (𝑋) are only defined “up to values on a set of measure zero.” More precisely, when we discuss 𝑓 ∈ 𝐿2 (𝑋), we actually consider an equivalence class of functions that are equal a.e. The algebraically inclined reader may wish to think of 𝐿2 (𝑋) as the quotient space of square-integrable functions mod the subspace of functions equal to 0 a.e.; the point is that by the “up to measure zero” axiom (Lebesgue Axiom 3), ∫ 𝑓 is still well-defined on an equivalence class, or element of the quotient space.

𝑋

Definition 7.5.10. We take this opportunity to introduce some notation for 𝐿2 spaces of two-variable functions: Specifically, to say that 𝑓(𝑥, 𝑦) ∈ 𝐿2𝑥 (𝑋) means that for fixed 𝑦0 , 𝑓(𝑥, 𝑦0 ) is in 𝐿2 (𝑋) as a function of 𝑥, and 𝐿2𝑦 (𝑋) is defined analogously.

7.5. The Lebesgue integral: Axioms

167

The main feature that makes 𝐿2 (𝑋) special is that 𝐿2 (𝑋) has a natural structure as an inner product space. More precisely: Theorem 7.5.11. Let 𝑋 = [𝑎, 𝑏], 𝑆1 , or 𝐑. Then 𝐿2 (𝑋) is a function space, and ⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥)

(7.5.13)

𝑋

is an inner product on 𝐿2 (𝑋). Proof. First, we must show that (7.5.13) is well-defined. For any 𝑓, 𝑔 ∈ 𝐿2 (𝑋), since (|𝑓(𝑥)| − |𝑔(𝑥)|)2 ≥ 0, we see that |𝑓(𝑥)𝑔(𝑥)| = |𝑓(𝑥)𝑔(𝑥)| ≤ 1 |𝑓(𝑥)|2 + 1 |𝑔(𝑥)|2 . (7.5.14) | | 2 2 Therefore, if 𝑓, 𝑔 ∈ 𝐿2 (𝑋) and 𝑎 ∈ 𝐂, then by linearity and monotonicity (Lebesgue Axiom 2) and the fact that 2

2

2

|𝑎𝑓(𝑥)| = |𝑎| |𝑓(𝑥)| ,

2

2

2

|𝑓(𝑥) + 𝑔(𝑥)| ≤ |𝑓(𝑥)| + 2 |𝑓(𝑥)𝑔(𝑥)| + |𝑔(𝑥)| ,

(7.5.15)

we see that 𝑎𝑓 ∈ 𝐿2 (𝑋) and 𝑓 + 𝑔 ∈ 𝐿2 (𝑋). It also follows that 𝑓(𝑥)𝑔(𝑥) ∈ 𝐿1 (𝑋), which means that (7.5.13) is well-defined on 𝐿2 (𝑋). It remains only to verify the axioms of an inner product, and this works exactly as in Problem 7.1.3, with the exception of positive definiteness. There, instead of Lemma 2

3.4.9, we use Lebesgue Axiom 3 to conclude that if ∫ |𝑓| = 0, then 𝑓(𝑥) = 0 a.e.; 𝑋

otherwise, the proof stays the same.

Now, since 𝐿2 (𝑋) is an inner product space, it is also a normed space (Definition 7.2.1) under the inner product norm (Definition 7.1.2), and therefore, also a metric space. In these terms, we can now describe the two remaining Lebesgue Axioms, each of which is a key property of 𝐿2 (𝑋) as a metric space. The first is: Lebesgue Axiom 5 (Completeness of 𝐿2 ). Let 𝑋 = [𝑎, 𝑏], 𝑆 1 , or 𝐑. Then 𝐿2 (𝑋) is complete in the 𝐿2 metric. To review what Lebesgue Axiom 5 means, suppose that 𝑓𝑛 ∶ 𝑋 → 𝐂 is a sequence of 𝐿2 functions that is Cauchy in the 𝐿2 (norm) metric, or to be precise, that for every 𝜖 > 0, there exists some 𝑁(𝜖) such that if 𝑛, 𝑘 ∈ 𝐙, 𝑛, 𝑘 > 𝑁(𝜖), then 1/2 2

‖𝑓𝑛 − 𝑓𝑘 ‖ = (∫ |𝑓𝑛 (𝑥) − 𝑓𝑘 (𝑥)| )

(7.5.16)

< 𝜖.

𝑋

In other words, suppose 𝑓𝑛 and 𝑓𝑘 are eventually always very close on average. Then Lebesgue Axiom 5 posits that there exists some 𝑓 ∈ 𝐿2 (𝑋) such that 1/2 2

lim ‖𝑓𝑛 − 𝑓‖ = lim (∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| )

𝑛→∞

𝑛→∞

= 0.

(7.5.17)

𝑋

The final metric property of 𝐿2 (𝑋) that we will assume is: Lebesgue Axiom 6 (Continuous functions are dense in 𝐿2 ). If 𝑋 = [𝑎, 𝑏] or 𝑆 1 , then 𝐶 0 (𝑋) is a dense (Definition 2.4.18) subset of 𝐿2 (𝑋). In other words, for every 𝑓 ∈ 𝐿2 (𝑋) and every 𝜖 > 0, there exists some 𝑔 ∈ 𝐶 0 (𝑋) with ‖𝑓 − 𝑔‖ < 𝜖.

168

Chapter 7. Hilbert spaces

We next show that Lebesgue Axiom 6 implies the analogous fact for 𝐿2 (𝐑) (Theorem 7.5.14). However, the precise statement of this analogue is complicated by the fact that continuous functions (such as the constant function 1) need not be in 𝐿2 (𝐑). We therefore come to the following definition. Definition 7.5.12. The support of a function 𝑓 ∶ 𝑋 → 𝐂 is defined to be the set of all 𝑥 ∈ 𝑋 such that 𝑓(𝑥) ≠ 0. The space 𝐶𝑐0 (𝑋) of continuous functions with compact support is therefore defined to be the set of all 𝑓 ∈ 𝐶 0 (𝐑) such that there exists some closed and bounded interval [𝑎, 𝑏] such that for any 𝑥 ∉ [𝑎, 𝑏], 𝑓(𝑥) = 0. Before we can prove Theorem 7.5.14, we need the following result, which also sheds some more light on the nature of integrable functions on 𝐑. Theorem 7.5.13. Suppose 𝑓 ∈ 𝐿2 (𝐑), and for 𝑛 ∈ 𝐍, let 𝑓(𝑥) 𝑓𝑛 (𝑥) = { 0

if −𝑛 ≤ 𝑥 ≤ 𝑛, otherwise.

(7.5.18)

Then 𝑓𝑛 converges to 𝑓 both pointwise and in the 𝐿2 metric. We include the proof of Theorem 7.5.13 here as an example of how to use dominated convergence (Lebesgue Axiom 4). Proof. First, given 𝑥 ∈ 𝐑 and 𝜖 > 0, let 𝑁(𝜖) = 𝑥. Then 𝑓𝑛 (𝑥) = 𝑓(𝑥) for 𝑛 > 𝑁(𝜖), so 2 𝑓𝑛 converges to 𝑓 pointwise. As for convergence in the 𝐿2 norm, since |𝑓𝑛 (𝑥) − 𝑓(𝑥)| ≤ 2 2 |𝑓(𝑥)| and |𝑓(𝑥)| is Lebesgue integrable, by dominated convergence, we see that 2

2

lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| = ∫ lim |𝑓𝑛 (𝑥) − 𝑓(𝑥)| = ∫ 0 = 0.

𝑛→∞

𝐑

𝐑

𝑛→∞

(7.5.19)

𝐑

The theorem follows. Note that if we think of 𝑓𝑛 , as given by (7.5.18), as a “horizontal truncation” of 𝑓, then one might imagine a “vertical” analogue of Theorem 7.5.13; see Problem 7.5.7 for a statement and proof. In any case, we now come to the analogue of Lebesgue Axiom 6 for 𝐿2 (𝐑). Theorem 7.5.14. The space of continuous functions with compact support is a dense subspace of 𝐿2 (𝐑). Again, comparing Definition 2.4.18, this means that for every 𝑓 ∈ 𝐿2 (𝐑) and every 𝜖 > 0, there exists some 𝑔 ∈ 𝐶𝑐0 (𝐑) with ‖𝑓 − 𝑔‖ < 𝜖. Proof. Problem 7.5.8. We also have the following facts about 𝐿1 and 𝐿2 for later use. Theorem 7.5.15. For a closed interval 𝑋 ⊆ 𝐑 (including the possibility 𝑋 = 𝐑), the set 𝐿1 (𝑋) is a function space on 𝑋. Proof. Problem 7.5.9.

7.5. The Lebesgue integral: Axioms

169

Theorem 7.5.16. If 𝑓 ∈ 𝐿2 ([𝑎, 𝑏]), then 𝑓 ∈ 𝐿1 ([𝑎, 𝑏]). Moreover, if lim 𝑓𝑛 = 𝑓 in 𝑛→∞

𝐿2 ([𝑎, 𝑏]), then 𝑏

lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| 𝑑𝑥 = 0.

𝑛→∞

(7.5.20)

𝑎

Proof. Problem 7.5.10. We conclude our introduction of the Lebesgue integral on the domain 𝑋 by briefly recapitulating our axioms in paraphrase. •

Lebesgue Axiom 1. The function space ℳ(𝑋) contains almost all examples encountered in practice. For any 𝑓 ∈ ℳ(𝑋), ∫ |𝑓| is a well-defined nonnegative 𝑋

extended real number. •

Lebesgue Axiom 2. The Lebesgue integral ∫ 𝑓 is well-defined on the space 𝐿1 (𝑋) 𝑋

of all 𝑓 ∈ ℳ(𝑋) such that ∫ |𝑓| < ∞. It extends the Riemann integral and has similar formal properties. •

𝑋

Lebesgue Axiom 3. The Lebesgue integral ∫ 𝑓 is unaffected by changing the values of 𝑓 on a set of measure zero.

𝑋

Lebesgue Axiom 4. Unlike the Riemann integral, the Lebesgue integral satisfies the monotone and dominated convergence properties.

Lebesgue Axiom 5. The function space 𝐿2 (𝑋) is an inner product space that is complete in the 𝐿2 metric.

Lebesgue Axiom 6. Continuous functions (or continuous functions with compact support, for 𝑋 = 𝐑) are dense in 𝐿2 (𝑋).

The reader may also find it helpful to know that of these axioms, Lebesgue Axioms 4, 5, and 6 are the ones we will most often use directly, with Axioms 5 and 6 playing an especially important role in Chapter 8.

Problems. 7.5.1. (*) The goal of this problem is to show that lim ‖𝑓‖𝑝 = ‖𝑓‖∞ when (for sim𝑝→∞

plicity) 𝑋 = [0, 1], 𝑓 ∈ 𝐶 0 (𝑋), and ‖𝑓‖𝑝 and ‖𝑓‖∞ are both computed in ℳ(𝑋). (a) For 0 ≤ 𝑎 ≤ 𝑏 ≤ 1 and 𝑐 > 0, let 𝑐 𝑔(𝑥) = { 0

if 𝑎 ≤ 𝑥 ≤ 𝑏, otherwise.

Prove that lim ‖𝑔‖𝑝 = 𝑐. 𝑝→∞

(b) Prove that for 𝑓 ∈ 𝐶 0 (𝑋), we have that lim ‖𝑓‖𝑝 = ‖𝑓‖∞ . 𝑝→∞

(7.5.21)

170

Chapter 7. Hilbert spaces

7.5.2. Use Lebesgue Axioms 1 and 2 to prove that if 𝑓, 𝑔 ∈ 𝐿1 (𝑋) are real valued and 𝑓(𝑥) ≤ 𝑔(𝑥) for all 𝑥 ∈ 𝑋, then ∫ 𝑓 ≤ ∫ 𝑔. 𝑋

𝑋

7.5.3. Let 𝑋 = [0, 1], and let 𝑓, 𝑓𝑛 ∶ 𝑋 → 𝐑 (𝑛 ∈ 𝐍) be defined by 1 if 𝑥 is rational, 𝑓(𝑥) = { 0 if 𝑥 is irrational.

(7.5.22)

Use our axioms for the Lebesgue integral to prove that 𝑓 is measurable, 𝑓 is Lebesgue integrable on 𝑋, and ∫ 𝑓 = 0. 𝑋

7.5.4. Let 𝑋 = [0, 1], and let 𝑓, 𝑓𝑛 ∶ 𝑋 → 𝐑 (𝑛 ∈ 𝐍) be defined by 𝑓(𝑥) =

1 √𝑥

0 𝑓𝑛 (𝑥) = { 1

,

√𝑥

1

if 𝑥 < , 𝑛 1

if 𝑥 ≥ .

(7.5.23)

𝑛

(a) Prove that for all 𝑥 ∈ [0, 1], 𝑥 ≠ 0, lim 𝑓𝑛 (𝑥) = 𝑓(𝑥). 𝑛→∞

(b) For 𝑛 ∈ 𝐍, prove that 𝑓𝑛 (𝑥) is Riemann integrable on 𝑋. (c) Use our axioms for the Lebesgue integral to prove that 𝑓 is measurable, 𝑓 is Lebesgue integrable on 𝑋, and ∫ 𝑓 = 2. 𝑋

7.5.5. Let 𝑋 = [1, +∞], and let 𝑓, 𝑓𝑛 ∶ 𝑋 → 𝐑 (𝑛 ∈ 𝐍) be defined by 𝑓(𝑥) =

1 𝑓𝑛 (𝑥) = { 𝑥2 0

1 , 𝑥2

if 𝑥 ≤ 𝑛,

(7.5.24)

if 𝑥 > 𝑛.

(a) Prove that for all 𝑥 ∈ [1, +∞], lim 𝑓𝑛 (𝑥) = 𝑓(𝑥). 𝑛→∞

(b) For 𝑛 ∈ 𝐍, prove that 𝑓𝑛 (𝑥) is Riemann integrable on any closed and bounded interval [𝑎, 𝑏] ⊆ 𝑋. (c) Use our axioms for the Lebesgue integral to prove that 𝑓 is measurable, 𝑓 is Lebesgue integrable on 𝑋, and ∫ 𝑓 = 1. 𝑋

7.5.6. Prove that if 𝑓 ∈ 𝒮(𝐑) (Definition 4.7.1), then 𝑓 ∈ 𝐿2 (𝐑). 7.5.7. Let 𝑋 = [𝑎, 𝑏], 𝑆 1 , or 𝐑, suppose 𝑓 ∈ 𝐿2 (𝑋), and for 𝑛 ∈ 𝐍, let 𝑓𝑛 (𝑥) = {

𝑓(𝑥)

if |𝑓(𝑥)| ≤ 𝑛,

0

otherwise.

(7.5.25)

Prove that 𝑓𝑛 converges to 𝑓 both pointwise and in the 𝐿2 metric. 7.5.8. (Proves Theorem 7.5.14) Suppose 𝑓 ∈ 𝐿2 (𝐑) and 𝜖 > 0. Prove that there exists some 𝑔 ∈ 𝐶𝑐0 (𝐑) with ‖𝑓 − 𝑔‖ < 𝜖. 7.5.9. (Proves Theorem 7.5.15) Suppose 𝑋 is a closed interval in 𝐑, 𝑓, 𝑔 ∈ 𝐿1 (𝑋), and 𝑐 ∈ 𝐂. Prove that 𝑓 + 𝑔 ∈ 𝐿1 (𝑋) and 𝑐𝑓 ∈ 𝐿1 (𝑋).

7.6. Hilbert spaces

171

7.5.10. (Proves Theorem 7.5.16) Let ‖𝑓‖ denote the 𝐿2 ([𝑎, 𝑏]) norm. 𝑏

(a) Prove that ∫ |𝑓| 𝑑𝑥 ≤ √𝑏 − 𝑎 ‖𝑓‖. 𝑎

(b) Prove that if 𝑓 ∈ 𝐿2 ([𝑎, 𝑏]), then 𝑓 ∈ 𝐿1 ([𝑎, 𝑏]). (c) Prove that if lim 𝑓𝑛 = 𝑓 in 𝐿2 ([𝑎, 𝑏]), then 𝑛→∞

𝑏

lim ∫ |𝑓𝑛 (𝑥) − 𝑓(𝑥)| 𝑑𝑥 = 0.

𝑛→∞

(7.5.26)

𝑎

7.6 Hilbert spaces We can now finally define the main idea of this chapter. Definition 7.6.1. A Hilbert space is an inner product space that is complete in the inner product metric. As the reader may have guessed, the Hilbert spaces in which we are most interested are of the form 𝐿2 (𝑋), where 𝑋 = [𝑎, 𝑏], 𝑆 1 , or 𝐑; these spaces are Hilbert spaces by Theorem 7.5.11 and Lebesgue Axiom 5. To give another immediate example: Theorem 7.6.2. For 𝑋 = 𝐍 or 𝐙, the inner product space ℓ2 (𝑋) is a Hilbert space. We use Notation 7.2.20 throughout the proof of Theorem 7.6.2. By our summation conventions, we also see that it suffices to consider the case 𝑋 = 𝐍, though we continue to use 𝑋 to distinguish between 𝑥 ∈ 𝑋 and the indices 𝑘, 𝑛 ∈ 𝐍 of a sequence in ℓ2 (𝑋). Proof. Suppose 𝑎𝑛 (𝑥) is a Cauchy sequence in ℓ2 (𝑋), or in other words, suppose that for every 𝜖0 > 0, there exists some 𝑁0 (𝜖0 ) such that for 𝑘, 𝑛 ∈ 𝐍, 𝑘, 𝑛 > 𝑁0 (𝜖0 ), we have that ‖𝑎𝑘 − 𝑎𝑛 ‖ < 𝜖0 . First, we observe that for fixed 𝑥0 ∈ 𝑋, 2

2

2

|𝑎𝑘 (𝑥0 ) − 𝑎𝑛 (𝑥0 )| ≤ ∑ |𝑎𝑘 (𝑥0 ) − 𝑎𝑛 (𝑥0 )| = ‖𝑎𝑘 − 𝑎𝑛 ‖ ,

(7.6.1)

𝑥∈𝑋

which implies that 𝑎𝑘 (𝑥0 ) is a Cauchy sequence (in the variable 𝑘). By the completeness of 𝐂, we see that lim 𝑎𝑘 (𝑥0 ) = 𝑎(𝑥0 ) for some 𝑎(𝑥0 ) ∈ 𝐂; in other words, there exists 𝑘→∞

some 𝑎 ∶ 𝑋 → 𝐂 such that lim 𝑎𝑘 (𝑥) = 𝑎(𝑥) pointwise. It remains to show that 𝑘→∞

𝑎 ∈ ℓ2 (𝑋) and that 𝑎𝑛 converges to 𝑎 in the inner product metric. So for 𝜖 > 0, let 𝑁(𝜖) = 𝑁0 (𝜖/2). Fix 𝑀 ∈ 𝑋 = 𝐍. By the definition of Cauchy sequence, for 𝑘, 𝑛 > 𝑁(𝜖), we have 𝑀

2

2

∑ |𝑎𝑘 (𝑥) − 𝑎𝑛 (𝑥)| ≤ ∑ |𝑎𝑘 (𝑥) − 𝑎𝑛 (𝑥)| < 𝜖2 /4. 𝑥=1

(7.6.2)

𝑥=1

Taking lim on both sides of (7.6.2) and applying the limit laws for sequences in 𝑘 and 𝑘→∞

Theorem 2.4.10, we see that 𝑀

𝑀

2

2

∑ |𝑎(𝑥) − 𝑎𝑛 (𝑥)| = lim ∑ |𝑎𝑘 (𝑥) − 𝑎𝑛 (𝑥)| ≤ 𝜖2 /4. 𝑥=1

𝑘→∞

𝑥=1

(7.6.3)

172

Chapter 7. Hilbert spaces

Therefore, since (7.6.3) holds for all 𝑀 ∈ 𝐍, we see that 1/2

2

‖𝑎 − 𝑎𝑛 ‖ = ( ∑ |𝑎(𝑥) − 𝑎𝑛 (𝑥)| )

≤ 𝜖/2 < 𝜖

(7.6.4)

𝑥=1

for all 𝑛 > 𝑁(𝜖). Therefore, if 𝑎 ∈ ℓ2 (𝑋), then 𝑎𝑛 converges to 𝑎 in the inner product norm, so it remains only to show that ‖𝑎‖ is finite. Note that if we already knew that 𝑎 ∈ ℓ2 (𝑋), it would follow that ‖𝑎‖ ≤ ‖𝑎 − 𝑎𝑛 ‖ + ‖𝑎𝑛 ‖, but to avoid circular logic, we need to be more careful. So fix some 𝑛 > 𝑁(1). For fixed 𝑀 ∈ 𝑋 = 𝐍, applying the triangle inequality to the inner product norm on 𝐂𝑀 , we see that 𝑀

1/2 2

( ∑ |𝑎(𝑥)| )

1/2

𝑀

2

≤ ( ∑ |𝑎(𝑥) − 𝑎𝑛 (𝑥)| )

𝑥=1

𝑥=1

𝑀

1/2 2

+ ( ∑ |𝑎𝑛 (𝑥)| ) 𝑥=1

(7.6.5)

≤ ‖𝑎 − 𝑎𝑛 ‖ + ‖𝑎𝑛 ‖ < 1 + ‖𝑎𝑛 ‖ . Then, since the right-hand side of (7.6.5) is finite by hypothesis and independent of 𝑀, we see that ‖𝑎‖ must be finite as well. The theorem follows. Remark 7.6.3. While it may not be obvious how ℓ2 (𝑋) relates to our main problem of the convergence of Fourier series, we will see in Theorem 7.6.8 that ℓ2 (𝑋) is actually a canonical Hilbert space in exactly the same way that 𝐑𝑛 is a canonical finitedimensional (real) vector space. See Remark 7.6.9 for a precise statement. Also, while the proof of Theorem 7.6.2 is somewhat tangential to our main story, we include it because it is relatively brief but still includes many of the main ingredients in proving that any function space 𝑉 is complete: Given a Cauchy sequence 𝑓𝑛 in 𝑉, we first find some 𝑓 to which 𝑓𝑛 converges pointwise and then verify both that 𝑓 is in 𝑉 and also that 𝑓𝑛 converges to 𝑓 in the relevant norm. We now return to our discussion of orthogonal bases from Section 7.3 with the notion of Hilbert space at our disposal. For notational convenience, in the rest of this section, we always take our index set for orthogonal sets to be 𝐍, but the analogous results hold for bases indexed by 𝐙 or {1, … , 𝑛}. In any case, the point is that the advantage of studying orthogonality on a Hilbert space ℋ is that the completeness of ℋ allows us to get more information out of the existence of an orthogonal basis, mainly via the following useful result. Theorem 7.6.4 (Hilbert Space Absolute Convergence Theorem). Let ℋ be a Hilbert space, let {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of nonzero vectors in ℋ, and let 𝑐𝑛 ∈ 𝐂 be coefficients. Then the following are equivalent: ∞

(1) The infinite series ∑ 𝑐𝑛 𝑢𝑛 converges to some element of ℋ. 𝑛=1 ∞

2

2

(2) The infinite series ∑ |𝑐𝑛 | ‖𝑢𝑛 ‖ converges in 𝐑. 𝑛=1

We can think of this as saying that any infinite sum of orthogonal vectors converges in a Hilbert space if and only if it converges absolutely. (Compare Corollary 4.1.5 and

7.6. Hilbert spaces

173

Definition 4.1.6.) Note that as a consequence, if {𝑢𝑛 ∣ 𝑛 ∈ 𝐙} is an orthogonal subset of a Hilbert space ℋ, then we may sum the series ∑ 𝑐𝑛 𝑢𝑛 in any order we please, including 𝑛∈𝐙

the synchronous summation order (Definition 4.1.8) that arises naturally with Fourier series. Proof. Problem 7.6.2. Absolute convergence in Hilbert spaces has several immediate applications. For example, we can show that a generalized Fourier series (Definition 7.3.9) of some 𝑓 in a Hilbert space always converges, though not necessarily to 𝑓. Corollary 7.6.5. Let ℋ be a Hilbert space, let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of ̂ = ⟨𝑓, 𝑢𝑛 ⟩ be the 𝑛th generalized Fourier nonzero vectors in ℋ, and for 𝑓 ∈ ℋ, let 𝑓(𝑛) ⟨𝑢𝑛 , 𝑢𝑛 ⟩ ∞

̂ coefficient of 𝑓 relative to ℬ. Then ∑ 𝑓(𝑛)𝑢 𝑛 converges to some 𝑔 ∈ ℋ. 𝑛=1 𝑁

2

̂ || ‖𝑢𝑛 ‖2 ≤ ‖𝑓‖ for all 𝑁 ∈ 𝐍. It follows Proof. By Bessel’s inequality (7.3.10), ∑ ||𝑓(𝑛) 𝑛=1 𝑁

2

̂ || ‖𝑢𝑛 ‖2 is a monotone sequence (in the variable 𝑁) that is bounded above that ∑ ||𝑓(𝑛) 𝑛=1 ∞

2

̂ || ‖𝑢𝑛 ‖2 converges by the convergence of monotone by ‖𝑓‖, which means that ∑ ||𝑓(𝑛) 𝑛=1 ∞

̂ sequences (Theorem 2.4.14). Then by Theorem 7.6.4, ∑ 𝑓(𝑛)𝑢 𝑛 converges to some 𝑛=1

𝑔 ∈ ℋ. We pause to note the following nonobvious consequence of Corollary 7.6.5: the Riemann-Lebesgue Lemma for Fourier series. Corollary 7.6.6 (Riemann-Lebesgue Lemma). Let ℋ be a Hilbert space, let 𝐼 = 𝐍 or 𝐙, ̂ be the 𝑛th let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of nonzero vectors in ℋ, and let 𝑓(𝑛) ̂ generalized Fourier coefficient of 𝑓 relative to ℬ. Then for 𝑓 ∈ ℋ, lim 𝑓(𝑛) ‖𝑢𝑛 ‖ = 0, 𝑛→∞

̂ ‖𝑢𝑛 ‖ = 0 as well. In particular, if ℬ is orthonormal, then and if 𝐼 = 𝐙, then lim 𝑓(𝑛) 𝑛→−∞

̂ = 0. lim 𝑓(𝑛)

𝑛→±∞

Proof. See Problem 7.6.3 for the case 𝐼 = 𝐍; the case 𝐼 = 𝐙 is similar. We also have the following corollary of, and analogue of, the comparison test for series (Corollary 4.1.4). Corollary 7.6.7 (Hilbert Space Comparision Test). Let ℋ be a Hilbert space, and let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍}

(7.6.6)

174

Chapter 7. Hilbert spaces ∞

be an orthogonal set of nonzero vectors in ℋ. If 𝑏𝑛 , 𝑐𝑛 ∈ 𝐂, ∑ 𝑐𝑛 𝑢𝑛 converges in ℋ, and 𝑛=1 ∞

|𝑏𝑛 | ≤ |𝑐𝑛 | for all 𝑛 ∈ 𝐍, then ∑ 𝑏𝑛 𝑢𝑛 also converges in ℋ. 𝑛=1

Proof. Problem 7.6.4. For another application of Theorem 7.6.4, see Problem 7.6.5. The full power of the existence of an orthogonal basis can now be demonstrated in the following result, which we call the Isomorphism Theorem for Fourier Series. Theorem 7.6.8 (Isomorphism Theorem for Fourier Series). Let ℋ be a Hilbert space, and let ℬ = {𝑢𝑛 ∣ 𝑖 ∈ 𝐍} ⊂ ℋ be an orthogonal set of nonzero vectors. Then the following are equivalent: (1) ℬ is an orthogonal basis for ℋ. (2) (Parseval’s identity) For any 𝑓, 𝑔 ∈ ℋ, ∞

̂ 𝑔(𝑛) ̂ ⟨𝑢𝑛 , 𝑢𝑛 ⟩ . ⟨𝑓, 𝑔⟩ = ∑ 𝑓(𝑛)

(7.6.7)

𝑛=1

(3) (Parseval’s identity) For any 𝑓 ∈ ℋ, ∞

2

2 ̂ || ⟨𝑢𝑛 , 𝑢𝑛 ⟩ . ‖𝑓‖ = ∑ ||𝑓(𝑛)

(7.6.8)

𝑛=1

(4) For any 𝑓 ∈ ℋ, if ⟨𝑓, 𝑢𝑛 ⟩ = 0 for all 𝑛 ∈ 𝐍, then 𝑓 = 0. In particular, if {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} is an orthonormal basis for a Hilbert space ℋ, then for any 𝑓 ∈ ℋ, 2 2 ̂ || . ‖𝑓‖ = ∑ ||𝑓(𝑛) (7.6.9) 𝑛∈𝐙

Proof. (1) implies (2): Problem 7.6.6. (2) implies (3): Take 𝑔 = 𝑓. (3) implies (4): If ⟨𝑓, 𝑢𝑛 ⟩ = 0 for all 𝑛 ∈ 𝐍, then ∞

2

2 ̂ || ⟨𝑢𝑛 , 𝑢𝑛 ⟩ = ∑ 0 ⋅ ⟨𝑢𝑛 , 𝑢𝑛 ⟩ = 0. ‖𝑓‖ = ∑ ||𝑓(𝑛) 𝑛=1

(7.6.10)

𝑛=1

By the axioms of an inner product, 𝑓 = 0. (4) implies (1): Problem 7.6.7. Remark 7.6.9. If we can find an orthogonal basis for a Hilbert space ℋ, Theorem 7.6.8 tells us that not only does the Fourier series of any 𝑓 ∈ ℋ converge to 𝑓 in the inner product metric, but also, we can compute inner products and norms purely in terms of Fourier coefficients by (7.6.7) and (7.6.8). Put another way, if {𝑒𝑛 ∣ 𝑛 ∈ 𝐍} is an or̂ is a linear transformation from ℋ thonormal basis for ℋ, then the mapping 𝑓 ↦ 𝑓(𝑛) to ℓ2 (𝐍) that preserves inner products and norms, which we might naturally call an isomorphism of Hilbert spaces. (See Problem 7.6.8 for a precise statement and proof.) As promised in Remark 7.6.3, this is exactly the sense in which ℓ2 (𝐍) is the canonical Hilbert space with a countably infinite orthogonal basis.

7.6. Hilbert spaces

175

Remark 7.6.10. The reader may also have noticed that we have yet to demonstrate an example of an orthogonal basis for any 𝐿2 (𝑋) arising “in nature”! This is where pure abstraction reaches its limit and hard work comes in, and so proving that {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} is an orthonormal basis for 𝐿2 (𝑆1 ) becomes the main topic of the next chapter.

Problems. 7.6.1. Let 𝑉 be an inner product space and let {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of nonzero vectors in 𝑉. ∞

(a) Prove that for 𝑐𝑛 ∈ 𝐂, if ∑ 𝑐𝑛 𝑢𝑛 converges to 𝑓 in the 𝐿2 metric on 𝑉, then 𝑛=1

̂ = 𝑐𝑘 . the generalized Fourier coefficient 𝑓(𝑘) ∞

(b) Prove that for 𝑏𝑛 , 𝑐𝑛 ∈ 𝐂, if ∑ 𝑏𝑛 𝑢𝑛 = ∑ 𝑐𝑛 𝑢𝑛 , with both series converging 𝑛=1

𝑛=1

in the 𝐿2 metric on 𝑉, then 𝑏𝑛 = 𝑐𝑛 for all 𝑛 ∈ 𝐍. 7.6.2. (Proves Theorem 7.6.4) Let ℋ be a Hilbert space, let {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of nonzero vectors in ℋ, and let 𝑐𝑛 ∈ 𝐂 be coefficients. ∞

2

2

(a) Prove that if ∑ 𝑐𝑛 𝑢𝑛 = 𝑓 ∈ ℋ, then ∑ |𝑐𝑛 | ‖𝑢𝑛 ‖ converges. 𝑛=1 ∞

𝑛=1 2

2

(b) Prove that if ∑ |𝑐𝑛 | ‖𝑢𝑛 ‖ converges, then ∑ 𝑐𝑛 𝑢𝑛 converges to some 𝑓 ∈ 𝑛=1

ℋ.

𝑛=1

7.6.3. (Proves Corollary 7.6.6) Let ℋ be a Hilbert space, let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an ̂ be the 𝑛th generalized Fourier orthogonal set of nonzero vectors in ℋ, and let 𝑓(𝑛) ̂ ‖𝑢𝑛 ‖ = 0. coefficient of 𝑓 with respect to ℬ. Prove that lim 𝑓(𝑛) 𝑛→∞

7.6.4. (Proves Corollary 7.6.7) Let ℋ be a Hilbert space, and let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an ∞

orthogonal set of nonzero vectors in ℋ. Prove that if 𝑏𝑛 , 𝑐𝑛 ∈ 𝐂, ∑ 𝑐𝑛 𝑢𝑛 converges 𝑛=1 ∞

in ℋ, and |𝑏𝑛 | ≤ |𝑐𝑛 | for all 𝑛 ∈ 𝐍, then ∑ 𝑏𝑛 𝑢𝑛 also converges in ℋ. 𝑛=1

7.6.5. Let ℋ be a Hilbert space, and let ℬ = {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthonormal set of 𝐾 vectors in ℋ. Prove that if there exists some 𝐾 ∈ 𝐑 such that |𝑐𝑛 | ≤ for all 𝑛 ∞

𝑛 ∈ 𝐍, then ∑ 𝑐𝑛 𝑢𝑛 converges in ℋ. 𝑛=1

7.6.6. (Proves Theorem 7.6.8) Let 𝑉 be an inner product space, and let ℬ = {𝑢𝑛 ∣ 𝑖 ∈ 𝐍} ⊂ 𝑉 be an orthogonal basis for 𝑉, which, we recall, means that for any 𝑓 ∈ 𝑉, ∞

𝑁

̂ ̂ ∑ 𝑓(𝑛)𝑢 𝑛 = lim ∑ 𝑓(𝑛)𝑢𝑛 = 𝑓 𝑛=1

in the inner product metric.

𝑁→∞

𝑛=1

(7.6.11)

176

Chapter 7. Hilbert spaces (a) Prove that if 𝑓, 𝑔 ∈ 𝑉 and 𝑁, 𝐾 ∈ 𝐍, then 𝑁

𝐾

min(𝑁,𝐾)

̂ ̂ ⟨ ∑ 𝑓(𝑛)𝑢 𝑛 , ∑ 𝑔(𝑘)𝑢 𝑘⟩ = 𝑛=1

𝑘=1

̂ 𝑔(𝑛) 𝑓(𝑛) ̂ ⟨𝑢𝑛 , 𝑢𝑛 ⟩ .

(7.6.12)

𝑛=1

(b) Prove Parseval’s identity ∞

̂ 𝑔(𝑛) ̂ ⟨𝑢𝑛 , 𝑢𝑛 ⟩ . ⟨𝑓, 𝑔⟩ = ∑ 𝑓(𝑛)

(7.6.13)

𝑛=1

7.6.7. (Proves Theorem 7.6.8) Let ℋ be a Hilbert space, let {𝑢𝑛 ∣ 𝑛 ∈ 𝐍} be an orthogonal set of nonzero vectors in ℋ, and suppose we know that if ⟨𝑓, 𝑢𝑛 ⟩ = 0 for all 𝑛 ∈ 𝐍, ∞

̂ then 𝑓 = 0. Prove that ∑ 𝑓(𝑛)𝑢 𝑛 = 𝑓. 𝑛=1

7.6.8. (*) Let ℋ be a Hilbert space, and suppose {𝑒𝑛 ∣ 𝑛 ∈ 𝐍} is an orthonormal basis ̂ for ℋ. Define Φ ∶ ℋ → ℓ2 (𝐍) by Φ(𝑓) = 𝑓(𝑛). (a) Prove that Φ is well-defined (what could potentially go wrong with that definition?), linear, and bijective. (b) An isomorphism of Hilbert spaces is a linear bijection between Hilbert spaces that “preserves inner products.” Make this idea precise and prove that 𝜙 is an isomorphism of Hilbert spaces.

8 Convergence of Fourier series . . . [I]n Analytic Theory of Heat Fourier supplies the first modern definition of convergence, as well as introduces the vital idea of convergence in an interval. But. . . Fourier fails to give a rigorous proof, or even to spell out convergence-criteria that would make such a proof possible. So the thing is now we’re talking about convergence. — David Foster Wallace, Everything and More In this chapter, we complete the task we set for ourselves in Part 2 of this book, namely, to establish natural conditions under which the Fourier series of 𝑓 ∶ 𝑆 1 → 𝐂 converges to 𝑓. We begin by briefly reviewing our story so far (Section 8.1) and stating our main result: the Inversion Theorem for Fourier Series (Theorem 8.1.1). Our approach to proving the Inversion Theorem, taken from Stein and Shakarchi [SS03], requires the introduction of two new concepts: convolutions (Section 8.2) and Dirac kernels (Section 8.3). Armed with these new tools, we then prove the Inversion Theorem and related results (Section 8.4) and discuss some consequences (Section 8.5).

8.1 Fourier series in 𝐿2 (𝑆 1 ) In this section we review what happens when we apply Sections 7.3 and 7.6 in the case of the Hilbert space 𝐿2 (𝑆1 ). Nothing new will be presented, but it may help the reader to have it all in one place. Let ℬ be the orthonormal set {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} in 𝐿2 (𝑆1 ), where 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 . For 𝑓 ∈ 𝐿2 (𝑆1 ) and 𝑛 ∈ 𝐙, since 𝑒𝑛 ∈ 𝐿2 (𝑆1 ), we define 1

̂ = ⟨𝑓, 𝑒𝑛 ⟩ = ∫ 𝑓(𝑥) 𝑒𝑛 (𝑥) 𝑑𝑥. 𝑓(𝑛)

(8.1.1)

0

We define the 𝑁th Fourier polynomial of 𝑓 to be the projection of 𝑓 onto {𝑒−𝑁 , … , 𝑒𝑁 }, or 𝑁

̂ 𝑓𝑁 (𝑥) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥),

(8.1.2)

𝑛=−𝑁

177

178

Chapter 8. Convergence of Fourier series

and we define the Fourier series of 𝑓 to be ̂ 𝑓(𝑥) ∼ lim 𝑓𝑁 (𝑥) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥). 𝑁→∞

(8.1.3)

𝑛∈𝐙

As before (Definition 6.2.1), ∼ indicates merely that what is on the right-hand side is the Fourier series of 𝑓 and, so far as we have proven until now, need not have any implications in terms of convergence. We have proven a few things about Fourier polynomials and series, however. •

Comparing (8.1.1), (8.1.2), and the Best Approximation Theorem 7.3.12, we see that for any 𝑓 ∈ 𝐿2 (𝑆 1 ), the 𝑁th Fourier polynomial of 𝑓 is the trigonometric polynomial of degree 𝑁 that is closest to 𝑓 in the 𝐿2 metric. Furthermore, we always have ‖𝑓𝑁 ‖ ≤ ‖𝑓‖ (Bessel’s inequality).

By the Riemann-Lebesgue Lemma (Corollary 7.6.6), for any 𝑓 ∈ 𝐿2 (𝑆1 ), we have ̂ = 0. that lim 𝑓(𝑛) 𝑛→±∞

Suppose we can prove that ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} is an orthonormal basis for 𝐿2 (𝑆 1 ), or in other words, that for any 𝑓 ∈ 𝐿2 (𝑆 1 ), the Fourier series of 𝑓 converges to 𝑓 in the 𝐿2 metric. If we can manage to do that, we will know that all of the equivalent conditions of the Isomorphism Theorem 7.6.8 hold. For example, we will know that ⟨𝑓, 𝑔⟩ can be computed by taking the “infinite series dot product” (7.6.7) of ̂ and 𝑔(𝑛); 𝑓(𝑛) ̂ in particular, we will know that 2

2 ̂ || . ‖𝑓‖ = ∑ ||𝑓(𝑛)

(8.1.4)

𝑛∈𝐙

Therefore, the principal remaining problem in establishing a satisfactory theory of Fourier series in 𝐿2 is to prove the following theorem. Theorem 8.1.1 (Inversion Theorem for Fourier Series). Let ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐙}, where 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 . Then ℬ is an orthonormal basis for 𝐿2 (𝑆 1 ). In other words, for any 𝑓 ∈ 𝐿2 (𝑆1 ), ̂ 𝑓 = ∑ 𝑓(𝑛)𝑒 (8.1.5) 𝑛, 𝑛∈𝐙

where convergence on the right-hand side of (8.1.5) is in the 𝐿2 metric. The Inversion Theorem is so named because we can also think of it as indicating ̂ that any 𝑓 ∈ 𝐿2 (𝑆 1 ) can be recovered from its Fourier coefficients 𝑓(𝑛), or in other ̂ words, the transformation 𝑓(𝑥) ↦ 𝑓(𝑛) has a well-defined inverse. Now, a sequence of functions may converge in 𝐿2 without converging pointwise anywhere (see Problem 7.2.1). Nevertheless, we can use convergence in the 𝐿2 metric to obtain results about pointwise, or even uniform, convergence for sufficiently smooth functions. For example, recall that in Theorem 6.4.4, we reached the frustrating conclusion that if 𝑓 ∈ 𝐶 2 (𝑆 1 ), then the Fourier series of 𝑓 converges uniformly to some continuous function 𝑔 with the same Fourier coefficients as 𝑓. However, thanks to Theorem 8.1.1 and what we call the Extra Derivative Lemma (Lemma 8.4.7), we will show that: Theorem 8.1.2. If 𝑓 ∈ 𝐶 1 (𝑆 1 ), then the Fourier series of 𝑓 converges absolutely and uniformly to 𝑓.

8.2. Convolutions

179

In fact, we will even show that if 𝑓 ∈ 𝐶 0 (𝑆1 ), then the Fourier series of 𝑓 converges uniformly to 𝑓, if we sum it in a different way; see Corollary 8.4.4.

8.2 Convolutions We now switch back, for the moment, from the fancy-pants world of 𝐿2 (𝑆1 ) to the world of continuous functions and ordinary calculus. We begin by observing that integrals on 𝑆 1 are translation invariant, or to be precise: Lemma 8.2.1. For 𝑓 ∈ 𝐶 0 (𝑆 1 ) and 𝑎 ∈ 𝐑, we have that 1

1

∫ 𝑓(𝑥 + 𝑎) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥. 0

(8.2.1)

0

Proof. Problem 8.2.1. To prove our main results (Theorems 8.1.1 and 8.1.2), we will use two new ideas. The first of these ideas, convolution, turns out to be quite useful in Fourier analysis and other situations. While convolutions can be defined on many different domains, we will only use two cases, one of which is the following. Definition 8.2.2. For 𝑓, 𝑔 ∈ 𝐶 0 (𝑆1 ), the convolution 𝑓 ∗ 𝑔 ∶ 𝑆 1 → 𝐂 is defined by the formula 1

(𝑓 ∗ 𝑔)(𝑥) = ∫ 𝑓(𝑥 − 𝑡)𝑔(𝑡) 𝑑𝑡.

(8.2.2)

0

Remark 8.2.3. Definition 8.2.2 actually works for 𝑓, 𝑔 ∈ 𝐿2 (𝑆1 ), or even (much less obviously) 𝑓, 𝑔 ∈ 𝐿1 (𝑆 1 ), but for simplicity, for now we stick to the case of continuous 𝑓 and 𝑔. The reader should also check that the integrand 𝑓(𝑥 − 𝑡)𝑔(𝑡) is periodic of period 1 in the variable 𝑡, making it a genuine function on 𝑆 1 . Theorem 8.2.4. Convolution has the following properties: (1) For 𝑓, 𝑔 ∈ 𝐶 0 (𝑆 1 ), 𝑓 ∗ 𝑔 ∈ 𝐶 0 (𝑆1 ). (2) Convolution is linear in each variable. That is, for 𝑐1 , 𝑐2 ∈ 𝐂 and 𝑓1 , 𝑓2 , 𝑓, 𝑔1 , 𝑔2 , 𝑔 ∈ 𝐶 0 (𝑆1 ), we have that (𝑐1 𝑓1 + 𝑐2 𝑓2 ) ∗ 𝑔 = 𝑐1 (𝑓1 ∗ 𝑔) + 𝑐2 (𝑓2 ∗ 𝑔), (8.2.3) 𝑓 ∗ (𝑐1 𝑔1 + 𝑐2 𝑔2 ) = 𝑐1 (𝑓 ∗ 𝑔1 ) + 𝑐2 (𝑓 ∗ 𝑔2 ). (3) For 𝑓, 𝑔 ∈ 𝐶 0 (𝑆 1 ), (𝑓 ∗ 𝑔)(𝑥) = (𝑔 ∗ 𝑓)(𝑥). (4) For 𝑓, 𝑔, ℎ ∈ 𝐶 0 (𝑆1 ), ((𝑓 ∗ 𝑔) ∗ ℎ)(𝑥) = (𝑓 ∗ (𝑔 ∗ ℎ))(𝑥). 𝑑 (5) For 𝑓 ∈ 𝐶 1 (𝑆1 ) and 𝑔 ∈ 𝐶 0 (𝑆 1 ), we have 𝑓 ∗ 𝑔 ∈ 𝐶 1 (𝑆 1 ) and ((𝑓 ∗ 𝑔)(𝑥)) = 𝑑𝑥 𝑑𝑓 ∗ 𝑔) (𝑥). ( 𝑑𝑥 ˆ ̂ 𝑔(𝑛). (6) For 𝑓, 𝑔 ∈ 𝐶 0 (𝑆 1 ), 𝑓 ∗ 𝑔(𝑛) = 𝑓(𝑛) ̂ Proof. Problems 8.2.2–8.2.7.

180

Chapter 8. Convergence of Fourier series

Remark 8.2.5. Property (6) of Theorem 8.2.4 is perhaps the signature property of convolution, in that it states that convolution of functions corresponds to multiplication of the Fourier coefficients. For the reader interested in acoustics (Section 1.2) or signal processing (Section 14.2), this means that if 𝑓, 𝑔 ∈ 𝐶 0 (𝑆1 ) represent tones (idealized periodic sound waves), then 𝑓 ∗ 𝑔 is the tone 𝑓 enhanced in the strong harmonics of 𝑔 and diminished in the weak harmonics of 𝑔. Somewhat surprisingly, by Theorem 8.2.4, the opposite is true as well: 𝑓 ∗ 𝑔 = 𝑔 ∗ 𝑓 is 𝑔 enhanced in the strong harmonics of 𝑓 and diminished in the weak harmonics of 𝑓. See Sections 12.2, 13.3, 14.2, and 14.6 for other interpretations of convolution and further related discussion. Remark 8.2.6. It is also a useful fact that property (5) of Theorem 8.2.4 means that differentiability can be transferred from 𝑓 to 𝑔 by taking the convolution 𝑓 ∗ 𝑔. Property (6) of Theorem 8.2.4 gives another explanation for this smoothing effect: By Thê decays at least as fast as 𝐾/ |𝑛|𝑟 , so convolution orem 6.4.2, if 𝑓 ∈ 𝐶 𝑟 (𝑆 1 ), then 𝑓(𝑛) 𝑟 with 𝑓 multiplies the 𝑛th Fourier coefficient by a factor of size at most 𝐾/ |𝑛| , which has the effect of dampening higher frequencies.

Problems. 1

8.2.1. (Proves Lemma 8.2.1) For 𝑓 ∈ 𝐶 0 (𝑆 1 ) and 𝑎 ∈ 𝐑, prove that ∫ 𝑓(𝑥 + 𝑎) 𝑑𝑥 = 0

1

∫ 𝑓(𝑥) 𝑑𝑥. 0

8.2.2. (Proves Theorem 8.2.4) For 𝑓 ∈ 𝐶 0 (𝑆 1 ) and 𝑔 integrable on 𝑆 1 , prove that 𝑓 ∗ 𝑔 is continuous on 𝑆1 . 8.2.3. (Proves Theorem 8.2.4) Suppose 𝑐1 , 𝑐2 ∈ 𝐂 and 𝑓1 , 𝑓2 , 𝑔 ∈ 𝐶 0 (𝑆 1 ). Prove that if 𝑥 ∈ 𝑆 1 , then ((𝑐1 𝑓1 + 𝑐2 𝑓2 ) ∗ 𝑔)(𝑥) = 𝑐1 (𝑓1 ∗ 𝑔)(𝑥) + 𝑐2 (𝑓2 ∗ 𝑔)(𝑥).

(8.2.4)

8.2.4. (Proves Theorem 8.2.4) For 𝑓, 𝑔 ∈ 𝐶 0 (𝑆 1 ), prove that (𝑓 ∗ 𝑔)(𝑥) = (𝑔 ∗ 𝑓)(𝑥). 8.2.5. (Proves Theorem 8.2.4) For 𝑓, 𝑔, ℎ ∈ 𝐶 0 (𝑆 1 ), prove that ((𝑓 ∗ 𝑔) ∗ ℎ)(𝑥) = (𝑓 ∗ (𝑔 ∗ ℎ))(𝑥).

(8.2.5)

8.2.6. (Proves Theorem 8.2.4) For 𝑓 ∈ 𝐶 1 (𝑆1 ) and 𝑔 ∈ 𝐶 0 (𝑆1 ), prove that 𝑑𝑓 𝑑 ((𝑓 ∗ 𝑔)(𝑥)) = ( ∗ 𝑔) (𝑥). 𝑑𝑥 𝑑𝑥

(8.2.6)

ˆ ̂ 𝑔(𝑛). 8.2.7. (Proves Theorem 8.2.4) For 𝑓, 𝑔 ∈ 𝐶 0 (𝑆 1 ), 𝑓 ∗ 𝑔(𝑛) = 𝑓(𝑛) ̂

8.3 Dirac kernels The other main new idea we will use to prove our convergence theorems is motivated by the following wishful thinking. ENTERING THE LAND OF WISHFUL THINKING

8.3. Dirac kernels

181

Suppose we have a function 𝛿(𝑥), called the Dirac delta function, with the property that for 𝑓 ∈ 𝐶 0 (𝑆 1 ), we have 1/2

(8.3.1)

𝛿(𝑥)𝑓(𝑥) 𝑑𝑥 = 𝑓(0).

−1/2

If we could imagine the graph of 𝛿, it would be at 0 for all nonzero values of 𝑥, with a “spike to infinity” at 𝑥 = 0, as shown in Figure 8.3.1. Now, you may object to this spike to infinity; moreover, you may object that since an integral (in either the Riemann or Lebesgue sense) is not affected by changing the value of the integrand at one value of 𝑥, no such function 𝛿 could exist. To which we reply: Give us a break; we’re in the land of wishful thinking!

x 0 Figure 8.3.1. Graph of the Dirac delta “function” Continuing our wishful thinking, given 𝑓 ∈ 𝐶 0 (𝑆1 ), we now compute the convolution 𝑓 ∗ 𝛿: 1/2

(𝑓 ∗ 𝛿)(𝑥) = ∫

𝑓(𝑥 − 𝑡)𝛿(𝑡) 𝑑𝑡 = 𝑓(𝑥 − 0) = 𝑓(𝑥).

(8.3.2)

−1/2

In other words, if we think of convolution as a sort of multiplication, then 𝛿(𝑥) is the identity element with respect to convolution. Suppose that now we have a sequence of functions 𝑓𝑛 and we want to prove that lim 𝑓𝑛 = 𝑓. If we just so happen to come across a sequence of functions 𝐾𝑛 such that

𝑛→∞

𝑓 ∗ 𝐾𝑛 = 𝑓𝑛 and lim 𝐾𝑛 = 𝛿, then it follows that 𝑛→∞

lim 𝑓𝑛 = lim (𝑓 ∗ 𝐾𝑛 ) = 𝑓 ∗ ( lim 𝐾𝑛 ) = 𝑓 ∗ 𝛿 = 𝑓.

𝑛→∞

𝑛→∞

𝑛→∞

(8.3.3)

(The second equality is explained by the fact that in the land of wishful thinking, limits commute with every operation.) EXITING THE LAND OF WISHFUL THINKING While the above discussion is truly a matter of wishful thinking, remarkably, the following definition (Definition 8.3.1) actually gives a way to make that discussion rigorous. Definition 8.3.1. A Dirac kernel on 𝑆 1 is a sequence of continuous functions 𝐾𝑛 ∶ 1 1 [− , ] → 𝐑 such that: 2 2

1 1

(1) For all 𝑛 and all 𝑥 ∈ [− , ], 𝐾𝑛 (𝑥) ≥ 0. 2 2

182

Chapter 8. Convergence of Fourier series 1/2

(2) For all 𝑛, ∫

𝐾𝑛 (𝑥) 𝑑𝑥 = 1.

−1/2

(3) For any fixed 𝛿 > 0, we have lim ∫

𝑛→∞

𝛿≤|𝑥|≤

1 2

𝐾𝑛 (𝑥) 𝑑𝑥 = 0.

(8.3.4)

In other words, for any 𝛿 > 0 and 𝜖 > 0, there exists some 𝑁(𝜖) such that for 𝑛 > 𝑁(𝜖), we have 𝛿

1 − 𝜖 < ∫ 𝐾𝑛 (𝑥) 𝑑𝑥 ≤ 1.

(8.3.5)

−𝛿

Again remarkably, we have the following useful examples. (Only the second is a Dirac kernel, but the first is close to being one, as we shall see in Remark 8.3.9 and Subsection 8.5.6.) Example 8.3.2. The Dirichlet kernel {𝐷𝑁 ∣ 𝑁 ≥ 0} is 𝑁

𝐷𝑁 (𝑥) = ∑ 𝑒𝑛 (𝑥).

(8.3.6)

𝑛=−𝑁

Example 8.3.3. The Fejér kernel {𝐹𝑁 ∣ 𝑁 ≥ 1} is 𝐹𝑁 (𝑥) =

𝐷0 (𝑥) + ⋯ + 𝐷𝑁−1 (𝑥) . 𝑁

(8.3.7)

Figure 8.3.2. Dirichlet kernel, 𝑁 = 6, 21, 36 The Dirichlet and Fejér kernels (Figures 8.3.2 and 8.3.3, respectively) are useful to us because of two remarkable facts. First, convolution with each of those sequences

8.3. Dirac kernels

183

Figure 8.3.3. Fejér kernel, 𝑁 = 6, 21, 36 gives a partial sum of a Fourier series, in the following sense: Theorem 8.3.4. For 𝑓 ∈ 𝐶 0 (𝑆1 ), we have that 𝑓 ∗ 𝐷𝑁 = 𝑓𝑁 , the 𝑁th Fourier polynomial of 𝑓, and 𝑓 (𝑥) + ⋯ + 𝑓𝑁−1 (𝑥) (𝑓 ∗ 𝐹𝑁 )(𝑥) = 0 (8.3.8) , 𝑁 the average of the Fourier polynomials 𝑓0 , … , 𝑓𝑁−1 . Proof. Problem 8.3.1. Definition 8.3.5. The sum 𝑠𝑁 (𝑥) = (𝑓 ∗ 𝐹𝑁 )(𝑥) from (8.3.8) is called the 𝑁th Cesàro sum of the Fourier series of 𝑓. Another remarkable fact is that the (finite) series defining 𝐷𝑁 and 𝐹𝑁 can be summed usefully: Lemma 8.3.6. For 𝑥 ∈ 𝑆 1 , 𝑛 ≥ 0, and 𝑁 ≥ 1, we have that sin((2𝑛 + 1)𝜋𝑥) sin(𝜋𝑥) 𝐷𝑛 (𝑥) = { 2𝑛 + 1 1 sin2 (𝑁𝜋𝑥) 𝐹𝑁 (𝑥) = { 𝑁 sin2 (𝜋𝑥) 𝑁

if 𝑥 ≠ 0,

(8.3.9)

if 𝑥 = 0,

if 𝑥 ≠ 0,

(8.3.10)

if 𝑥 = 0.

Proof. Let 𝑞 = 𝑒𝜋𝑖𝑥 . By the formula for a finite geometric series, 𝑛

𝐷𝑛 (𝑥) = ∑ 𝑞 𝑘=−𝑛

However, since (8.3.9).

2𝑘

𝑞−2𝑛 − 𝑞2𝑛+2 1 − 𝑞2 ={ 2𝑛 + 1

if 𝑥 ≠ 0,

(8.3.11)

if 𝑥 = 0.

𝑞−2𝑛 − 𝑞2𝑛+2 𝑞2𝑛+1 − 𝑞−2𝑛−1 = and 𝑞𝑘 −𝑞−𝑘 = 2𝑖 sin(𝑘𝜋𝑥), we obtain 2 𝑞 − 𝑞−1 1−𝑞

184

Chapter 8. Convergence of Fourier series Next, averaging (8.3.11) as 𝑛 goes from 0 to 𝑁 − 1, we first see that 𝐹𝑁 (0) =

1 𝑁(1 + 2𝑁 − 1) ⋅ = 𝑁. 𝑁 2

(8.3.12)

For 𝑥 ≠ 0, averaging (8.3.11) and applying geometric series again, we get 𝑁−1

𝐹𝑁 (𝑥) = =

𝑁−1

1 1 ( ) ( ∑ 𝑞−2𝑛 − ∑ 𝑞2𝑛+2 ) 𝑁 1 − 𝑞2 𝑛=0 𝑛=0 1 − 𝑞−2𝑁 𝑞2 − 𝑞2𝑁+2 1 1 − ( )( ) 2 −2 𝑁 1−𝑞 1−𝑞 1 − 𝑞2

(8.3.13)

1 − 𝑞−2𝑁 1 − 𝑞2𝑁 1 1 = ( + )( ) 2 −2 𝑁 1−𝑞 1−𝑞 1 − 𝑞−2 =

1 −𝑞−2𝑁 + 2 − 𝑞2𝑁 ( ). 𝑁 −𝑞2 + 2 − 𝑞−2

However, since (𝑞𝑁 − 𝑞−𝑁 )2 𝑞2𝑁 − 2 + 𝑞−2𝑁 sin2 (𝑁𝜋𝑥) = = , 2 −1 2 𝑞2 − 2 + 𝑞−2 sin (𝜋𝑥) (𝑞 − 𝑞 )

(8.3.14)

(8.3.10) follows. As promised, we now prove that we have at least one example of our theory. Theorem 8.3.7. The Fejér kernel 𝐹𝑁 is a Dirac kernel. Proof. First, the fact that 𝐹𝑁 (𝑥) ≥ 0 follows from (8.3.10). Problem 8.3.2 shows that 1/2

for all 𝑁, ∫

𝐹𝑁 (𝑥) 𝑑𝑥 = 1. Finally, Problem 8.3.3 shows that for any fixed 𝛿 > 0,

−1/2

lim ∫

𝑁→∞

𝛿≤|𝑥|≤

1 2

𝐹𝑁 (𝑥) 𝑑𝑥 = 0.

(8.3.15)

The theorem follows. Remark 8.3.8. The reader should be aware that by an unfortunate linguistic coincidence, kernels in analysis share their name with an unrelated idea in linear algebra and abstract algebra. Alas, it is too late to change the names in either subject now. Remark 8.3.9. Returning briefly to the land of wishful thinking, if 𝛿(𝑥) really were a function, then (8.3.1) would imply that every Fourier coefficient of 𝛿(𝑥) would be equal to 1, which would mean that the Dirichlet kernel function 𝐷𝑁 (𝑥) would be precisely the 𝑁th Fourier polynomial of 𝛿. It is therefore reasonable to hope that 𝐷𝑁 (𝑥) approaches the delta function in the manner of a Dirac kernel. Unfortunately, 𝐷𝑁 (𝑥) fails to satisfy the nonnegativity condition of Definition 8.3.1 and is therefore not a Dirac kernel. Nevertheless, it turns out that we can still use convolution with 𝐷𝑁 (𝑥) to obtain pointwise convergence results for Fourier series; again, see Subsection 8.5.6.

8.4. Proof of the Inversion Theorem

185

Problems. 8.3.1. (Proves Theorem 8.3.4) Suppose 𝑓 ∈ 𝐶 0 (𝑆 1 ). (a) Prove that 𝐷𝑁 ∗ 𝑓 = 𝑓𝑁 . 𝑓 (𝑥) + ⋯ + 𝑓𝑁−1 (𝑥) (b) Prove that 𝐹𝑁 ∗ 𝑓 = 0 . 𝑁 8.3.2. (Proves Theorem 8.3.7) 1/2

(a) Prove that for all 𝑛, ∫

𝐷𝑛 (𝑥) 𝑑𝑥 = 1.

−1/2 1/2

(b) Prove that for all 𝑁, ∫

𝐹𝑁 (𝑥) 𝑑𝑥 = 1.

−1/2

8.3.3. (Proves Theorem 8.3.7) Fix some 𝛿 such that 0 < 𝛿 < 1/2. 1

(a) Prove that for 𝛿 ≤ |𝑥| ≤ , 2

𝐹𝑁 (𝑥) ≤ (b) Prove that lim ∫ 𝑁→∞

𝛿≤|𝑥|≤

1 2

1 1 . 𝑁 sin2 (𝜋𝛿)

(8.3.16)

𝐹𝑁 (𝑥) 𝑑𝑥 = 0.

8.4 Proof of the Inversion Theorem The main results of this chapter now stem from the following general result, which makes the wishful thinking of Section 8.3 come true. Theorem 8.4.1. If {𝐾𝑁 } is a Dirac kernel and 𝑓 ∈ 𝐶 0 (𝑆1 ), then lim (𝑓 ∗ 𝐾𝑁 )(𝑥) = 𝑓(𝑥),

𝑁→∞

(8.4.1)

where convergence is uniform on 𝑆 1 . The proof of Theorem 8.4.1 is somewhat intricate, so we break it down into several lemmas, always assuming that {𝐾𝑁 } is a Dirac kernel and 𝑓 ∶ 𝑆 1 → 𝐂 is continuous. The first lemma states that by keeping 𝑡 near 0, we can force the integral of |𝑓(𝑥 − 𝑡) − 𝑓(𝑥)| |𝐾𝑛 (𝑡)| with respect to 𝑡 to be as small as we like, independent of 𝑥 and 𝑛. Lemma 8.4.2. For any 𝜖1 > 0, there exists some 𝛿1 (𝜖1 ) < 1/2 such that for any 𝛿 < 𝛿1 (𝜖1 ), any 𝑥 ∈ 𝑆 1 , and any 𝑛 ∈ 𝐍, 𝛿

∫ |𝑓(𝑥 − 𝑡) − 𝑓(𝑥)| |𝐾𝑛 (𝑡)| 𝑑𝑡 < 𝜖1 . −𝛿

Proof. Problem 8.4.1. The second lemma states that by keeping 𝑡 away from 0 and letting 𝑛 → +∞, we can also force the integral of |𝑓(𝑥 − 𝑡) − 𝑓(𝑥)| |𝐾𝑛 (𝑡)| to be as small as we like, independent of 𝑥.

186

Chapter 8. Convergence of Fourier series

Lemma 8.4.3. For any fixed 𝛿 > 0 and 𝜖2 > 0, there exists some 𝑁2 (𝛿, 𝜖2 ) such that for 𝑛 > 𝑁2 (𝛿, 𝜖2 ) and any 𝑥 ∈ 𝑆 1 , we have ∫ 𝛿≤|𝑡|≤

1 2

|𝑓(𝑥 − 𝑡) − 𝑓(𝑥)| |𝐾𝑛 (𝑡)| 𝑑𝑡 < 𝜖2 .

(8.4.2)

Proof. Problem 8.4.2. Combining Lemmas 8.4.2 and 8.4.3 carefully, we get: Proof of Theorem 8.4.1. Problem 8.4.3. For the case where 𝐾𝑁 is the Fejér kernel 𝐹𝑁 , combining Theorem 8.4.1 with Theorem 8.3.7, we have the following result. Corollary 8.4.4. For 𝑓 ∈ 𝐶 0 (𝑆 1 ), let 𝑠𝑁 (𝑥) be the 𝑁th Cesàro sum 𝑠𝑁 (𝑥) = (𝑓 ∗ 𝐹𝑁 )(𝑥) =

𝑓0 (𝑥) + ⋯ + 𝑓𝑁−1 (𝑥) 𝑁

(8.4.3)

(Definition 8.3.5) of the Fourier series of 𝑓. Then 𝑠𝑁 converges uniformly to 𝑓 on 𝑆 1 .

Figure 8.4.1. Standard and Cesàro sums, 𝑁 = 5, 15, 30

Remark 8.4.5. It is interesting to compare the virtues and vices of Cesàro summation versus ordinary summation of Fourier series. The differences can perhaps be seen more clearly in a discontinuous example, such as Figure 8.4.1, where we see that the Cesàro sums of the Fourier series of 𝑓 converge more slowly to 𝑓, overall, but have less dramatic errors, whereas the usual Fourier sums converge more quickly to 𝑓, on average, but for a small number of values of 𝑥, have more dramatic errors. Indeed, this is not just some tactical error, but actually a natural consequence of trying to obtain the best approximation “on average”; see Davidson and Donsig [DD10, 14.4] for a discussion.

8.4. Proof of the Inversion Theorem

187

We are now finally ready to prove the Inversion Theorem for Fourier Series (Theorem 8.1.1). We have done enough preparation to make the proof relatively short, though it still involves some subtle points. (The skeptical reader who slogged patiently through the abstraction of Chapter 7 may regard this proof as their reward.) For clarity, we first restate the Inversion Theorem in the following form. Inversion Theorem. For 𝑓 ∈ 𝐿2 (𝑆 1 ), if 𝑓𝑁 is the 𝑁th Fourier polynomial of 𝑓, then (8.4.4) lim ‖𝑓 − 𝑓𝑁 ‖ = 0, 𝑁→∞

where convergence in (8.4.4) is in the 𝐿2 norm. Proof of the Inversion Theorem for Fourier Series. Suppose 𝑓 ∈ 𝐿2 (𝑆1 ) and 𝜖 > 0. By 𝜖 Lebesgue Axiom 6, there exists some 𝑔 ∈ 𝐶 0 (𝑆 1 ) such that ‖𝑓 − 𝑔‖ < . By Corol2 lary 8.4.4 and Theorem 4.3.14, since the Cesàro sums 𝑠𝑁 of the Fourier series of 𝑔 converge uniformly to 𝑔, 𝑠𝑁 also converges to 𝑔 in the 𝐿2 norm, which means that there 𝜖 exists some Cesàro sum 𝑠𝑀 such that ‖𝑔 − 𝑠𝑀 ‖ < . It follows by the triangle inequal2 ity that there exists some trigonometric polynomial 𝑠𝑀 of degree 𝑀 = 𝑀(𝜖) such that ‖𝑓 − 𝑠𝑀 ‖ ≤ ‖𝑓 − 𝑔‖ + ‖𝑔 − 𝑠𝑀 ‖ < 𝜖.

(8.4.5)

However, by the Best Approximation Theorem 7.3.12, the Fourier polynomial of degree 𝑀 is the best approximation of 𝑓 among trigonometric polynomials of degree 𝑀, which means that ‖𝑓 − 𝑓𝑀 ‖ ≤ ‖𝑓 − 𝑠𝑀 ‖ < 𝜖. Therefore, since Fourier polynomials always give better approximations as 𝑁 → ∞ (the Always Better Theorem, Corollary 7.3.13), for 𝑁 > 𝑀(𝜖), we have that ‖𝑓 − 𝑓𝑁 ‖ ≤ ‖𝑓 − 𝑓𝑀 ‖ < 𝜖. The theorem follows. Now that we have finally shown that {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} is an orthonormal basis for 𝐿2 (𝑆 1 ), we can reap the rewards. For example, reading off the Isomorphism Theorem for Fourier Series (Theorem 7.6.8), we see that: •

2

2 ̂ || . ̂ 𝑔(𝑛). For any 𝑓, 𝑔 ∈ 𝐿2 (𝑆 1 ), ⟨𝑓, 𝑔⟩ = ∑ 𝑓(𝑛) ̂ In particular, ‖𝑓‖ = ∑ ||𝑓(𝑛) 𝑛∈𝐙

𝑛∈𝐙

For any 𝑓 ∈ 𝐿2 (𝑆1 ), if ⟨𝑓, 𝑒𝑛 ⟩ = 0 for all 𝑛 ∈ 𝐙, then 𝑓 = 0. As another example, we have the following result.

Corollary 8.4.6. For 𝑓, 𝑔 ∈ 𝐿2 (𝑆1 ), the following are equivalent: (1) 𝑓 = 𝑔 almost everywhere on 𝑆 1 . ̂ = 𝑔(𝑛) (2) 𝑓(𝑛) ̂ for all 𝑛 ∈ 𝐙. ̂ is defined by a Lebesgue integral that is not affected by changing 𝑓 Proof. Since 𝑓(𝑛) on a set of measure zero, (1) implies (2). For (2) implies (1), see Problem 8.4.4. Interestingly, we can now use Corollary 8.4.6, a “soft” convergence result, to complete and improve upon the “hard” convergence results we first attempted to prove back in Section 6.4. We begin with a technical result that will be useful to us several times.

188

Chapter 8. Convergence of Fourier series

Lemma 8.4.7 (Extra Derivative Lemma). If 𝑔 ∈ 𝐿2 (𝑆 1 ), then the two-sided series ∑( 𝑛≠0

1 ̂ ) 𝑔(𝑛) 2𝜋𝑛

(8.4.6)

converges absolutely (as a series of complex numbers). The name of Lemma 8.4.7 comes from the fact that, as we shall see shortly, we can sometimes use it to show that having an extra degree of differentiability allows us to obtain uniform convergence of a Fourier series. For example, we will use Lemma 8.4.7 to show that the Fourier series of any 𝑓 ∈ 𝐶 1 (𝑆1 ) converges uniformly to 𝑓 (Theorem 8.1.2), though the same need not hold for 𝑓′ . Proof. Problem 8.4.5. We now prove our principal “hard” convergence result (Theorem 8.1.2). (See Subsection 8.5.6 for a discussion of other pointwise convergence results, especially Theorem 8.5.17.) Proof of Theorem 8.1.2. Suppose 𝑓 ∈ 𝐶 1 (𝑆1 ). We claim: Claim. The Fourier series of 𝑓 converges absolutely and uniformly to some 𝑔 ∈ 𝐶 0 (𝑆 1 ). The proof is in Problem 8.4.6. ̂ Theorem 6.2.2 then implies that for all 𝑛 ∈ 𝐙, 𝑔(𝑛) ̂ = 𝑓(𝑛). However, Corollary 8.4.6 then implies that 𝑓 = 𝑔 a.e., and since both 𝑓 and 𝑔 are continuous, Corollary 7.4.9 implies that 𝑓 = 𝑔 everywhere. Theorem 8.1.2 also implies the following term-by-term differentiation property for Fourier series. Corollary 8.4.8. For 𝑘 ≥ 1, if 𝑓 ∈ 𝐶 𝑘 (𝑆1 ), then for 0 ≤ 𝑗 ≤ 𝑘 − 1, the Fourier series of 𝑓(𝑗) , ̂ (8.4.7) 𝑓(𝑗) (𝑥) = ∑ (2𝜋𝑖𝑛)𝑗 𝑓(𝑛)𝑒 𝑛 (𝑥), 𝑛∈𝐙

converges absolutely and uniformly to 𝑓(𝑗) . In particular, for 𝑓 ∈ 𝐶 ∞ (𝑆 1 ), we may take arbitrary term-by-term derivatives of the Fourier series of 𝑓. In other words, if 𝑓 has a continuous 𝑘th derivative, then we may take up to 𝑘 − 1 term-by-term derivatives of the Fourier series of 𝑓, as per (8.4.7). Proof of Corollary 8.4.8. Because 𝑓(𝑗) ∈ 𝐶 1 (𝑆1 ), by Theorem 8.1.2, the Fourier series of 𝑓(𝑗) converges absolutely and uniformly to 𝑓(𝑗) . Applying Theorem 6.4.1 𝑗 times, we get (8.4.7). Note that because there exists an 𝑓 ∈ 𝐶 0 (𝑆 1 ) such that the Fourier series of 𝑓 diverges at uncountably many points (see Subsection 8.5.6), Corollary 8.4.8 is, in some sense, an optimal result for term-by-term differentiation, as we cannot expect to recover the 0th derivative of a 𝐶 0 function (though again, see Theorem 8.5.17 for one class of continuous functions where we can guarantee pointwise convergence).

8.5. Applications of Fourier series

189

Remark 8.4.9. As we saw previously in Theorem 6.4.2, the smoother (more differen̂ must decay, or converge tiable) a function 𝑓 ∈ 𝐿2 (𝑆 1 ) is, the faster its coefficients 𝑓(𝑛) to 0, as 𝑛 → ∞. Since Theorem 8.1.2 and Corollary 8.4.8 ultimately depend on facts about the decay rate of Fourier coefficients of a 𝐶 𝑘 function, they can also be seen as a sort of converse to Theorem 6.4.2 and Corollary 6.4.3. For converses to Theorem 6.4.2 and Corollary 6.4.3 that use the decay rate more explictly, see Subsection 8.5.1.

Problems. For problems 8.4.1–8.4.3, assume that {𝐾𝑁 } is a Dirac kernel (Definition 8.3.1) and 𝑓 ∶ 𝑆 1 → 𝐂 is continuous.

8.4.1. (Proves Lemma 8.4.2) Prove that for any 𝜖1 > 0, there exists some 𝛿1 (𝜖1 ) < 1/2 such that for any 𝛿 < 𝛿1 (𝜖1 ), any 𝑥 ∈ 𝑆 1 , and any 𝑛 ∈ 𝐍, 𝛿

∫ |𝑓(𝑥 − 𝑡) − 𝑓(𝑥)| |𝐾𝑛 (𝑡)| 𝑑𝑡 < 𝜖1 .

(8.4.8)

−𝛿

8.4.2. (Proves Lemma 8.4.3) Prove that for any fixed 𝛿 > 0 and 𝜖2 > 0, there exists some 𝑁2 (𝛿, 𝜖2 ) such that for 𝑛 > 𝑁2 (𝛿, 𝜖2 ) and any 𝑥 ∈ 𝑆 1 , we have ∫ 𝛿≤|𝑡|≤

1 2

|𝑓(𝑥 − 𝑡) − 𝑓(𝑥)| |𝐾𝑛 (𝑡)| 𝑑𝑡 < 𝜖2 .

(8.4.9)

8.4.3. (Proves Theorem 8.4.1) Prove that for 𝜖 > 0, there exists some 𝑁(𝑓, 𝜖), not depending on 𝑥 ∈ 𝑆 1 , such that for all 𝑥 ∈ 𝑆 1 and all 𝑛 ∈ 𝐙, if 𝑛 > 𝑁(𝑓, 𝜖), then |(𝑓 ∗ 𝐾𝑛 )(𝑥) − 𝑓(𝑥)| < 𝜖. In other words, prove that 𝑓 ∗ 𝐾𝑛 converges uniformly to 𝑓 on 𝑆 1 . ̂ 8.4.4. (Proves Corollary 8.4.6) Suppose 𝑓, 𝑔 ∈ 𝐿2 (𝑆1 ) and 𝑓(𝑛) = 𝑔(𝑛) ̂ for all 𝑛 ∈ 𝐙. Prove that 𝑓 = 𝑔 a.e. 8.4.5. (Proves Lemma 8.4.7) Suppose 𝑔 ∈ 𝐿2 (𝑆 1 ). Let the double-sided sequence 𝑎𝑛 be defined by 1 if 𝑛 ≠ 0, 𝑎𝑛 = { |2𝜋𝑛| (8.4.10) 0 if 𝑛 = 0. (a) Prove that 𝑎𝑛 ∈ ℓ2 (𝐙). 1 | | |𝑔(𝑛)| ̂ converges. (b) Prove that the two-sided series ∑ ||| 2𝜋𝑛 | 𝑛≠0

8.4.6. (Proves Theorem 8.1.2) Suppose 𝑓 ∈ 𝐶 1 (𝑆 1 ). Prove that the Fourier series of 𝑓 converges absolutely and uniformly to some 𝑔 ∈ 𝐶 0 (𝑆1 ).

8.5 Applications of Fourier series Before we get to our main application in Part 3, we conclude this chapter by considering several miscellaneous applications and other results about Fourier series. Note that of these results, Subsection 8.5.1 is used in Chapter 11, Theorem 8.5.6 is used in Chapter 12, and certainly on a first reading, it will be enough just to quote those results. All other material is optional for what follows.

190

Chapter 8. Convergence of Fourier series

8.5.1 Decay of Fourier coefﬁcients and smoothness. Recall that Theorem 6.4.2 and Corollary 6.4.3 state that the more differentiable a function is, the faster its Fourier coefficients must decay as 𝑛 → ∞. As a sort of converse, Theorem 8.1.2 and Corollary 8.4.8 tell us that we can take (𝑘 − 1) term-by-term derivatives of the Fourier series of a 𝐶 𝑘 function, with absolute and uniform convergence. However, the term-by-term results in Theorem 8.1.2 and Corollary 8.4.8 rely on knowing somewhat more information about the coefficients of a Fourier series than their decay rate. The following partial converse of Theorem 6.4.2 is a similar result that assumes only knowledge of the decay rate. Theorem 8.5.1. For 𝑗 ≥ 1, let 𝑓 ∈ 𝐿2 (𝑆 1 ) have the property that for some constants 𝐾 ̂ || ≤ 𝐾 . Then the Fourier series of and 𝑝 > 𝑗 + 1 and for all 𝑛 ∈ 𝐙, 𝑛 ≠ 0, we have ||𝑓(𝑛) 𝑝 |𝑛| 𝑗 1 𝑓 converges absolutely and uniformly to some 𝑔 ∈ 𝐶 (𝑆 ) such that ̂ (8.5.1) 𝑔(𝑗) (𝑥) = ∑ (2𝜋𝑖𝑛)𝑗 𝑓(𝑛)𝑒 𝑛 (𝑥) 𝑛∈𝐙

and 𝑓 = 𝑔 a.e. Note that if 𝑝 = 𝑗 + 1, the conclusion of Theorem 8.5.1 may not hold. For example, ̂ || ≤ 1 for if 𝑓 is the triangle wave of Example 6.2.7, we see that even though ||𝑓(𝑛) 𝜋 2 𝑛2 𝑛 ≠ 0 (i.e., 𝑝 = 2), 𝑓 is not contained in 𝐶 1 (𝑆 1 ). Proof. Problem 8.5.1. Applying Theorem 8.5.1 for all 𝑗 ≥ 1, we obtain the following converse of Corollary 6.4.3. Corollary 8.5.2. Let 𝑓 ∈ 𝐿2 (𝑆1 ) have the property that for all 𝑝 > 2, there exists some 𝐾 ̂ || ≤ 𝑝 . Then the Fourier series constant 𝐾𝑝 such that for all 𝑛 ∈ 𝐙, 𝑛 ≠ 0, we have ||𝑓(𝑛) 𝑝 |𝑛| ∞ 1 of 𝑓 converges uniformly to some 𝑔 ∈ 𝐶 (𝑆 ) such that 𝑓 = 𝑔 a.e. Proof. Problem 8.5.2.

8.5.2 Approximation by smooth functions. Since all trigonometric polynomials are infinitely differentiable, Theorem 8.1.1 immediately gives the following corollary. Corollary 8.5.3. The space 𝐶 ∞ (𝑆 1 ) is dense in 𝐿2 (𝑆 1 ). (In fact, the space of trigonometric polynomials is dense in 𝐿2 (𝑆 1 ).) Similary, Corollary 8.4.4 shows that trigonometric polynomials are dense in 𝐶 0 (𝑆 1 ) with respect to the 𝐿∞ metric. We may also use Corollary 8.4.4 to show: 1

Theorem 8.5.4 (Weierstrass Approximation Theorem). For every 𝑓 ∈ 𝐶 0 ([0, ]) and 2 𝜖 > 0, there exists some polynomial 𝑝(𝑥) such that |𝑓(𝑥) − 𝑝(𝑥)| < 𝜖 for all 𝑥 ∈ [0, 1/2]. Proof. Problem 8.5.3.

8.5. Applications of Fourier series

191

When studying functions on 𝐑, in Chapter 10 and afterwards, we will also use a refinement of Theorem 7.5.14 that states that the space of Schwartz functions (Section 4.7) is dense in 𝐿2 (𝐑). In fact, we will prove the even stronger Theorem 8.5.6, which states that not only are continuous functions with compact support dense in 𝐿2 (𝐑), but also smooth functions with compact support are. The first-time reader may effectively treat Theorem 8.5.6, or its main consequence Corollary 8.5.7, as another axiom, but the proof involves some interesting new ideas, so we include part of it here and more details in Appendix C. Perhaps the most remarkable thing about smooth functions with compact support is that there are any such functions at all! The key ingredient is the following result. Theorem 8.5.5. For 𝑎 < 𝑏 and 𝛿 > 0, there exists some 𝜙 ∶ 𝐑 → 𝐑 such that: (1) 𝜙 ∈ 𝐶 ∞ (𝐑). (2) For 𝑎 ≤ 𝑥 ≤ 𝑏, 𝜙(𝑥) = 1. (3) For 𝑎 − 𝛿 ≤ 𝑥 ≤ 𝑎 and 𝑏 ≤ 𝑥 ≤ 𝑏 + 𝛿, we have 0 ≤ 𝜙(𝑥) ≤ 1. (4) For 𝑥 ≤ 𝑎 − 𝛿 and 𝑏 + 𝛿 ≤ 𝑥, 𝜙(𝑥) = 0. A function 𝜙(𝑥) satisfying the conditions of Theorem 8.5.5 is known as a bump function; one such function is shown in Figure 8.5.1. The proof that bump functions exist is not deep, but it does take some effort and is somewhat of a tangent to our main story, so we relegate it to Appendix C.

x a−δ a

b b+δ

Figure 8.5.1. Graph of a bump function 𝜙(𝑥) In any case, we may now combine Lebesgue Axiom 6, Corollary 8.5.3, and bump functions to prove the following theorem. Theorem 8.5.6. The space 𝐶𝑐∞ (𝐑) of smooth functions with compact support is dense in 𝐿2 (𝐑). Proof. Problem 8.5.4. Moreover, since functions with compact support certainly have a limit of 0 as 𝑥 → ±∞, we see that 𝐶𝑐∞ (𝐑) ⊆ 𝒮(𝐑) ⊆ 𝐿2 (𝐑), and therefore: Corollary 8.5.7. The Schwartz space 𝒮(𝐑) is dense in 𝐿2 (𝐑).

192

Chapter 8. Convergence of Fourier series

8.5.3 The Riemann zeta function. For real 𝑠 > 1, the Riemann zeta function 𝜁(𝑠) is defined by ∞

1 . 𝑠 𝑛 𝑛=1

𝜁(𝑠) = ∑

(8.5.2)

This can be extended in a relatively straightforward manner to 𝑠 ∈ 𝐂 with ℜ(𝑠) > 1, and with more effort, to all complex 𝑠 ≠ 1. The conjectured location of the zeros of the zeta function, famously known as the Riemann hypothesis, contains a wealth of information about the distribution of prime numbers and has therefore been one of the fundamental problems in number theory for the last 150 years (as of this writing). Similarly, finding the value of 𝜁(𝑠) for particular values of 𝑠 has long been of interest in number theory, and it turns out that one of the main methods for investigating the zeta function and related ideas is Fourier analysis; see, for example, Edwards [Edw01, Ch. 10]. To give just one example, by applying Parseval’s identity (Theorem 7.6.8) to carefully chosen examples of 𝑓 ∈ 𝐿2 (𝑆 1 ), we obtain the following result. Theorem 8.5.8. We have that 𝜁(2) =

𝜋2 , 6

𝜁(4) =

𝜋4 . 90

(8.5.3)

Proof. Problem 8.5.5. See Section 14.5 for more about the Riemann hypothesis, including some recent developments related to the material in this book.

8.5.4 Discrete time series and the Wiener-Khinchin theorem. As mentioned in slightly different language in Section 1.2, physically we can think of Fourier series as characterizing a periodic signal 𝑓 ∈ 𝐿2 (𝑆1 ) in terms of its “frequency response” ̂ at all (discrete) frequencies 𝑛 ∈ 𝐙. The Inversion Theorem for Fourier Series then 𝑓(𝑛) ̂ tells us that, as long as we only require convergence in 𝐿2 , we can recover 𝑓 from 𝑓(𝑛). In some fields, such as statistics, we sometimes consider the reverse of that process. That is, we want to understand a discrete-time signal, or time series, 𝑥(𝑡), which is by definition a function 𝑥 ∶ 𝐙 → 𝐂. We can think of such an 𝑥(𝑡) as a complex signal sampled at regular time intervals, which after scaling we may take to be the integers. In this case, we reverse our usual procedure and define the discrete-time Fourier transform (DTFT) of 𝑥(𝑡) to be its frequency response at any (continuous) frequency 𝛾 ∈ 𝐑. In other words, for 𝑥 ∈ ℓ2 (𝐙) and 𝛾 ∈ 𝐑, we define 𝑥(𝛾) ̂ = ∑ 𝑥(𝑡)𝑒𝑡 (𝛾),

(8.5.4)

𝑡∈𝐙

which converges in 𝐿2 (𝑆 1 ) by Problem 8.5.6. Note that by definition, 𝑥̂ naturally has domain 𝑆 1 , or in other words, the frequency response of 𝑥(𝑡) is periodic with period 1. Note also that in contrast with Fourier series, where coefficients are defined by an integral (8.1.1) and series are inverted by summation (8.1.3), for the discrete-time Fourier transform, coefficients are defined by summation (8.5.4) and, as we shall see in Theorem 8.5.9, series are inverted by integration.

8.5. Applications of Fourier series

193

To be precise, since the DTFT is precisely Fourier series inversion with a sign change, the Inversion Theorem for Fourier Series also shows that for 𝑥(𝑡) ∈ ℓ2 (𝐙), we may recover 𝑥(𝑡) from 𝑥(𝛾) ̂ as follows. Theorem 8.5.9. For 𝑥(𝑡) ∈ ℓ2 (𝐙), we may recover 𝑥(𝑡) by 1

𝑥(𝑡) = ∫ 𝑥(𝛾)𝑒 ̂ 𝑡 (𝛾) 𝑑𝛾.

(8.5.5)

0

Furthermore, 2

1

2

2

∑ |𝑥(𝑡)| = ‖𝑥(𝛾)‖ ̂ 𝑑𝛾. ̂ = ∫ |𝑥(𝛾)|

(8.5.6)

0

𝑡∈𝐙

Proof. Problem 8.5.7. Definition 8.5.10. Because (8.5.6) can be interpreted as saying that taking the DTFT preserves the total power of the signal 𝑥(𝑡), we can think of the function 2

𝑆𝑥 (𝛾) = |𝑥(𝛾)| ̂

(8.5.7)

as describing the (continuous) distribution of signal power among all frequencies. We therefore call 𝑆𝑥 (𝛾) the power spectrum of 𝑥(𝑡). Definition 8.5.11. In statistics, we may interpret the ℓ2 (𝐙) inner product ⟨𝑥(𝑡), 𝑦(𝑡)⟩ = ∑ 𝑥(𝑡)𝑦(𝑡) as describing the extent to which 𝑥 and 𝑦 are “pointed in the same direc𝑡∈𝐙

tion,” or correlated. For 𝜏 ∈ 𝐙 and 𝑥(𝑡) ∈ ℓ2 (𝐙), it is therefore reasonable to define 𝑟𝑥 (𝜏) = ⟨𝑥(𝑡), 𝑥(𝑡 − 𝜏)⟩ = ∑ 𝑥(𝑡)𝑥(𝑡 − 𝜏)

(8.5.8)

𝑡∈𝐙

to be the autocorrelation function of 𝑥(𝑡) with time lag 𝜏, since 𝑟𝑥 (𝜏) describes how 𝑥(𝑡) is correlated with 𝑥(𝑡 − 𝜏) (i.e., 𝑥(𝑡) shifted by time 𝜏). In the above terms, the following theorem describes the relationship between the autocorrelation function and the power spectrum of a time series 𝑥(𝑡). Theorem 8.5.12 (Discrete-time Wiener-Khinchin). For 𝑥(𝑡) ∈ ℓ2 (𝐙), we have that 1

𝑟𝑥 (𝜏) = ∫ 𝑆𝑥 (𝛾)𝑒𝜏 (𝛾) 𝑑𝛾.

(8.5.9)

0

In other words, the autocorrelation function value 𝑟𝑥 (𝜏) is precisely the (−𝜏)th Fourier coefficient of the power spectrum 𝑆𝑥 (𝛾). Proof. Problem 8.5.8. Remark 8.5.13. Theorem 8.5.12 is actually the “easy” case of the discrete-time WienerKhinchin theorem, as the point of Wiener’s theorem is to extend (8.5.9) to a situation where Fourier inversion (Theorem 8.5.9) may not actually hold, and Khinchin’s contribution is to extend Wiener’s result to the case where 𝑥(𝑡) is not a fixed signal, but a random variable. Nevertheless, we hope that Theorem 8.5.12 gives some idea of what Wiener-Khinchin says.

194

Chapter 8. Convergence of Fourier series

8.5.5 A nowhere differentiable function. In this subsection, we present a version of Weierstrass’s construction of a nondifferentiable function (Theorem 8.5.15), using lacunary Fourier series, or in other words, Fourier series with large “gaps” between nonzero coefficients. Our presentation is based on Davidson and Donsig [DD10, Ex. 8.4.9]; see also Stein and Shakarchi [SS03, Ch. 4, Sect. 3] for a more conceptual approach. We begin with a convenient “two-sided” characterization of differentiability. Lemma 8.5.14. If 𝑓 ∶ 𝐑 → 𝐂 is differentiable at 𝑐 ∈ 𝐑 and 𝑥𝑛− and 𝑥𝑛+ are sequences such that 𝑥𝑛− ≤ 𝑐 ≤ 𝑥𝑛+ and 𝑥𝑛− < 𝑥𝑛+ for all 𝑛 ∈ 𝐍 and lim 𝑥𝑛− = lim 𝑥𝑛+ = 𝑐, then 𝑛→∞

𝑓(𝑥𝑛+ ) lim 𝑛→∞ 𝑥𝑛+

− −

𝑓(𝑥𝑛− ) 𝑥𝑛−

𝑛→∞

= 𝑓′ (𝑐).

(8.5.10)

Proof. Problem 8.5.9. Theorem 8.5.15. Let 𝑎 ≥ 4, let 𝑏 be an even integer such that 𝑏 ≥ 𝑎(𝑎 + 1), and let 𝑓𝑘 (𝑥) = 𝑎−𝑘 cos(𝑏𝑘 𝜋𝑥). Then ∞

cos(𝑏𝑘 𝜋𝑥) 𝑎𝑘 𝑘=1

𝑔(𝑥) = ∑ 𝑓𝑘 (𝑥) = ∑ 𝑘=1

(8.5.11)

converges uniformly to a continuous function 𝑔 ∶ 𝑆 1 → 𝐑 that is not differentiable for any 𝑐 ∈ 𝐑.

Figure 8.5.2. The function from (8.5.11), 𝑎 = 4, 𝑏 = 20 The function 𝑔 defined in (8.5.11) is pictured in Figure 8.5.2 for the case 𝑎 = 4, 𝑏 = 20. Note that the constraints on 𝑎 and 𝑏 in Theorem 8.5.15 are certainly not optimal; however, they do simplify the proof. Proof. The uniform convergence of (8.5.11) to some function 𝑔 is proven in Problem 8.5.10. By Theorem 4.3.12, 𝑔 is continuous. By Lemma 8.5.14, given 𝑐 ∈ 𝐑, to prove that 𝑔 is not differentiable at 𝑐, it suffices to construct sequences 𝑥𝑛− and 𝑥𝑛+ such that 𝑥𝑛− ≤ 𝑐 ≤ 𝑥𝑛+ and 𝑥𝑛− < 𝑥𝑛+ for all 𝑛 ∈ 𝐍

8.5. Applications of Fourier series

195

and lim 𝑥𝑛− = lim 𝑥𝑛+ = 𝑐, but 𝑛→∞

𝑛→∞

𝑔(𝑥𝑛+ ) − 𝑔(𝑥𝑛− ) 𝑛→∞ 𝑥𝑛+ − 𝑥𝑛−

(8.5.12)

lim

does not exist. So for 𝑛 ∈ 𝐍, let 𝑑𝑛 be the greatest integer less than or equal to 𝑏𝑛 𝑐, which means that 𝑑𝑛 𝑑 +1 ≤𝑐≤ 𝑛𝑛 , (8.5.13) 𝑛 𝑏 𝑏 and let 𝑑 𝑑 +1 (8.5.14) 𝑥𝑛− = 𝑛𝑛 , 𝑥𝑛+ = 𝑛 𝑛 . 𝑏 𝑏 Since 𝑥𝑛+ − 𝑥𝑛− = 1/𝑏𝑛 , we see that 𝑥𝑛− ≤ 𝑐 ≤ 𝑥𝑛+ , 𝑥𝑛− < 𝑥𝑛+ , and lim 𝑥𝑛− = lim 𝑥𝑛+ = 𝑐. 𝑛→∞

𝑛→∞

It remains to show that the limit (8.5.12) does not exist. The basic idea is to show that in ∞

𝑔(𝑥𝑛+ ) − 𝑔(𝑥𝑛− ) = ∑ (𝑓𝑘 (𝑥𝑛+ ) − 𝑓𝑘 (𝑥𝑛− )),

(8.5.15)

𝑘=1

the 𝑘 = 𝑛 summand |𝑓𝑘 (𝑥𝑛+ ) − 𝑓𝑘 (𝑥𝑛− )| = 𝑎−𝑛 |cos((𝑑𝑛 + 1)𝜋𝑥) − cos(𝑑𝑛 𝜋𝑥)| =

2 𝑎𝑛

(8.5.16)

dominates all of the others put together. Starting with the 𝑘 < 𝑛 terms, since | 𝜋𝑏𝑘 | 𝜋𝑏𝑘 𝑘 |𝑓𝑘′ (𝑥)| = | |≤ 𝑘 , sin(𝑏 𝜋𝑥) | 𝑎𝑘 | 𝑎

(8.5.17)

for 𝑘 < 𝑛, the Mean Value Theorem implies that 𝑘

𝜋 𝑏 𝜋𝑏𝑘 + |𝑥𝑛 − 𝑥𝑛− | = 𝑛 ( ) . 𝑘 𝑏 𝑎 𝑎 Therefore, if 𝑟 = 𝑏/𝑎 ≥ 𝑎 + 1 ≥ 5, we see that |𝑓𝑘 (𝑥𝑛+ ) − 𝑓𝑘 (𝑥𝑛− )| ≤

𝑛−1

(8.5.18)

𝑛−1

∑ |𝑓𝑘 (𝑥𝑛+ ) − 𝑓𝑘 (𝑥𝑛− )| ≤ 𝑘=1

𝜋 𝑟 𝜋 ∑ 𝑟𝑘 = 𝑛 ( ) (𝑟𝑛−1 − 1) 𝑏𝑛 𝑘=1 𝑏 𝑟−1

𝜋 5 1 5 1 1 1 𝜋 ≤ 𝑛 , (8.5.19) − ( )( )< ( ) 𝑏 4 𝑎𝑛−1 𝑏𝑛−1 𝑎 + 1 4 𝑎𝑛 𝑎 1 where the next-to-last inequality discards the 𝑛−1 term and uses the fact that 𝑏 ≥ 𝑏 𝑎(𝑎 + 1). For 𝑘 > 𝑛, by the triangle inequality, ≤

|𝑓𝑘 (𝑥𝑛+ ) − 𝑓𝑘 (𝑥𝑛− )| ≤ |𝑓𝑘 (𝑥𝑛+ )| + |𝑓𝑘 (𝑥𝑛− )| ≤ 2𝑎−𝑘 ,

(8.5.20)

so, since 𝑎 ≥ 4, ∞

2 𝑘 𝑎 𝑘=𝑛+1

∑ |𝑓𝑘 (𝑥𝑛+ ) − 𝑓𝑘 (𝑥𝑛− )| ≤ ∑ 𝑘=𝑛+1

=

2 1 2 1 2 1 ( )= ( ) ≤ ( 𝑛). 𝑎 − 1 𝑎𝑛 3 𝑎 𝑎𝑛+1 1 − (1/𝑎)

(8.5.21)

196

Chapter 8. Convergence of Fourier series

It follows by the triangle inequality, (8.5.16), (8.5.19), and (8.5.21) that 2 1 2 1 |𝑔(𝑥𝑛+ ) − 𝑔(𝑥𝑛− )| ≥ 𝑛 − 𝑛 − 𝑛 = 𝑛 . (8.5.22) 𝑎 𝑎 3𝑎 3𝑎 Therefore, 𝑏𝑛 | 𝑔(𝑥𝑛+ ) − 𝑔(𝑥𝑛− ) | | | ≥ 𝑛. (8.5.23) + − | 𝑥𝑛 − 𝑥𝑛 | 3𝑎 However, since the right-hand side of (8.5.23) goes to +∞ as 𝑛 → ∞, the limit (8.5.12) cannot exist, and the theorem follows.

8.5.6 More on pointwise convergence. While Theorem 8.1.2 and related results in Section 8.4 and Subsection 8.5.1 are the results on pointwise convergence of Fourier series that we will use most often, in this subsection, we describe a pointwise convergence result (Theorem 8.5.17) that holds for many examples occurring in practice, including all of our examples from Section 6.2. We also briefly survey what else is known about pointwise convergence in general (Remark 8.5.20). To set the reader’s expectations properly, we first note that there are examples of continuous 𝑓 ∶ 𝑆 1 → 𝐂 that diverge at uncountably many points on 𝑆 1 (see Remark 8.5.20, below), so we will need to make a stronger assumption than continuity. Conversely, many natural examples (see Section 6.2) have a finite number of jump discontinuities and other points of nondifferentiability. We therefore arrive at the following class of functions. Definition 8.5.16. For an interval 𝐼 in 𝐑, to say that 𝑓 ∶ 𝐼 → 𝐂 is Lipschitz means that there exists some 𝐿 > 0 such that for all 𝑥, 𝑦 ∈ 𝐼, |𝑓(𝑥) − 𝑓(𝑦)| ≤ 𝐿 |𝑥 − 𝑦| .

(8.5.24)

Piecewise Lipschitz functions are then defined analogously to piecewise continuous functions (see Definition 3.1.25). For example, if 𝑓 is piecewise differentiable with bounded 𝑓′ , then 𝑓 is piecewise Lipschitz (Problem 8.5.11). To lower expectations one more time, recall that if a sequence of continuous functions converges uniformly to some 𝑓, then 𝑓 must be continuous (Theorem 4.3.12). Therefore, for discontinuous 𝑓, we can, at best, only hope for pointwise convergence. Consequently, the following theorem (Theorem 8.5.17) is about as good a result as one could hope for, as it gives not only convergence at points of continuity, but also “convergence to the average limit” at jump discontinuities. Note that for brevity in the subsequent discussion, we define 𝑓(𝑎+ ) = lim+ 𝑓(𝑥), 𝑥→𝑎

𝑓(𝑎− ) = lim− 𝑓(𝑥). 𝑥→𝑎

(8.5.25)

Theorem 8.5.17. If 𝑓 ∶ 𝑆 1 → 𝐂 is piecewise Lipschitz, then for any 𝑎 ∈ 𝑆 1 , we have 𝑓(𝑎+ ) + 𝑓(𝑎− ) . 2 𝑁→∞ In particular, if 𝑓 is also continuous at 𝑎, then lim 𝑓𝑁 (𝑎) = 𝑓(𝑎). lim 𝑓𝑁 (𝑎) =

(8.5.26)

𝑁→∞

For an example of what (8.5.26) looks like at jump discontinuities of 𝑓, see our very first example (Figure 1.1.1) back in Section 1.1. See also the examples in Remark 11.4.5. Before we can prove Theorem 8.5.17, we will need two more lemmas.

8.5. Applications of Fourier series

197

Lemma 8.5.18. For any 𝛼 ∈ 𝐑, [𝑎, 𝑏] ⊆ [0, 1], and 𝑓 ∈ 𝐿2 ([𝑎, 𝑏]), we have 𝑏

𝑏

lim ∫ 𝑓(𝑥) sin((2𝑛𝜋 + 𝛼)𝑥) 𝑑𝑥 = lim ∫ 𝑓(𝑥) cos((2𝑛𝜋 + 𝛼)𝑥) 𝑑𝑥 = 0.

𝑛→∞

𝑛→∞

𝑎

(8.5.27)

𝑎

Proof. For (8.5.27), we may assume [𝑎, 𝑏] = [0, 1] because we can extend 𝑓 ∶ [𝑎, 𝑏] → 𝐂 to the domain [0, 1] by defining 𝑓 to be 0 outside [𝑎, 𝑏]. The rest is proved in Problem 8.5.12. We also need the following Riemann integrability criterion. Lemma 8.5.19. Suppose 𝑓 ∶ [𝑎, 𝑏] → 𝐂 is bounded, and suppose that for any 𝛿 such that 0 < 𝛿 < 𝑏 − 𝑎, 𝑓 is Riemann integrable on [𝑎 + 𝛿, 𝑏]. Then 𝑓 is Riemann integrable on [𝑎, 𝑏]. Proof. Problem 8.5.13. Proof of Theorem 8.5.17. Fix 𝑎 ∈ 𝑆 1 . Recall from Example 8.3.2, Theorem 8.3.4, and Lemma 8.3.6 that the Dirichlet kernel sin((2𝑁 + 1)𝜋𝑥) sin(𝜋𝑥) 𝐷𝑁 (𝑥) = ∑ 𝑒𝑛 (𝑥) = { 𝑛=−𝑁 2𝑁 + 1 𝑁

if 𝑥 ≠ 0,

(8.5.28)

if 𝑥 = 0

has the property that 1 2

𝑓𝑁 (𝑎) = (𝑓 ∗ 𝐷𝑁 )(𝑎) = ∫ 𝑓(𝑎 − 𝑡)𝐷𝑁 (𝑡) 𝑑𝑡. −

1 2

(8.5.29)

Applying the substitution 𝑢 = −𝑡 and the fact that 𝐷𝑁 is an even function shows that 1 2

0

𝑓𝑁 (𝑎) = ∫ 𝑓(𝑎 − 𝑡)𝐷𝑁 (𝑡) 𝑑𝑡 + ∫ 𝑓(𝑎 − 𝑡)𝐷𝑁 (𝑡) 𝑑𝑡 0

− 1 2

1 2

0

= ∫ 𝑓(𝑎 − 𝑡)𝐷𝑁 (𝑡) 𝑑𝑡 − ∫ 𝑓(𝑎 + 𝑢)𝐷𝑁 (−𝑢) 𝑑𝑢

(8.5.30)

1 2

0 1 2

= ∫ (𝑓(𝑎 + 𝑡) + 𝑓(𝑎 − 𝑡))𝐷𝑁 (𝑡) 𝑑𝑡. 0

Therefore, since 1 2 1 𝑓(𝑎± ) = ∫ 𝑓(𝑎± ) 𝑑𝑡, 2 0

(8.5.31)

we see that 𝑓𝑁 (𝑎) − ( 1 2

𝑓(𝑎+ ) + 𝑓(𝑎− ) ) 2 1 2

+

(8.5.32) −

= ∫ (𝑓(𝑎 + 𝑡) − 𝑓(𝑎 ))𝐷𝑁 (𝑡) 𝑑𝑡 + ∫ (𝑓(𝑎 − 𝑡) − 𝑓(𝑎 ))𝐷𝑁 (𝑡) 𝑑𝑡. 0

0

198

Chapter 8. Convergence of Fourier series

It will therefore suffice to show that 1 2

lim ∫ (𝑓(𝑎 + 𝑡) − 𝑓(𝑎+ ))𝐷𝑁 (𝑡) 𝑑𝑡 = 0,

𝑁→∞

(8.5.33)

0

as an analogous argument proves the same fact for 𝑓(𝑎 − 𝑡) − 𝑓(𝑎− ). 1 For 0 < 𝑡 ≤ , let 2

𝐹(𝑡) =

𝑓(𝑎 + 𝑡) − 𝑓(𝑎+ ) 𝑓(𝑎 + 𝑡) − 𝑓(𝑎+ ) 𝑡 =( )( ). 𝑡 sin(𝜋𝑡) sin(𝜋𝑡)

(8.5.34)

1

Since the value of an integral on [0, ] is not affected by changing the value of the 2 integrand only at 𝑡 = 0, we see that 1 2

1 2

+

∫ (𝑓(𝑎 + 𝑡) − 𝑓(𝑎 ))𝐷𝑁 (𝑡) 𝑑𝑡 = ∫ 𝐹(𝑡) sin((2𝑁 + 1)𝜋𝑡) 𝑑𝑡. 0

(8.5.35)

0

By Lemma 8.5.18, the desired limit (8.5.33) will follow if we can show that 𝐹(𝑡) is Rie1 mann integrable on [0, ]. 2

1

Note that for 0 < 𝛿 < , 𝐹(𝑡) is piecewise continuous (and therefore Riemann inte1

2

grable) on [𝛿, ]. Therefore, by Lemma 8.5.19, it suffices to show that 𝐹(𝑡) is bounded 1

2

on [0, ] (again ignoring 𝐹(0)). Furthermore, suppose [0, 𝑏] is the interval of (Lips2 chitz) continuity containing 𝑡 = 0 for the piecewise Lipschitz function 𝑓(𝑎 + 𝑡). Then since 𝐹(𝑡) is bounded on any interval of continuity of 𝑓(𝑎 + 𝑡), except possibly for [0, 𝑏], it suffices to show that 𝐹(𝑡) is bounded on [0, 𝑏]. However, by the definition of piecewise Lipschitz (Definitions 3.1.25 and 8.5.16), there exists some 𝐿 > 0 such that for 𝑡 ∈ [0, 𝑏], |𝑓(𝑎 + 𝑡) − 𝑓(𝑎+ )| ≤ 𝐿 |𝑡| . (8.5.36) 𝑡 1 Furthermore, if 𝑔(𝑡) = sin(𝜋𝑡), then because is continuous for 0 < 𝑡 ≤ and 2 𝑔(𝑡) 𝑡 1 1 = ′ = , (8.5.37) lim 𝜋 𝑔 (0) 𝑡→0+ 𝑔(𝑡) 𝑡 is also bounded on [0, 𝑏]. By (8.5.34), we then see that 𝐹(𝑡) is the we see that sin(𝜋𝑡) product of two bounded functions on [0, 𝑏], and therefore bounded. The theorem follows. Remark 8.5.20. In general, the pointwise convergence of Fourier series is subtle and difficult, so we will be content to list a few important results. •

On the one hand, Carleson [Car66] showed that if 𝑓 ∈ 𝐿2 (𝑆 1 ), then the Fourier series of 𝑓 converges almost everywhere. In particular, this holds if 𝑓 is continuous.

On the other hand, Katznelson [Kat66] showed that given any set of 𝐴 ⊆ 𝑆 1 of measure zero, even an uncountable one, there exists a continuous 𝑓 whose Fourier series diverges everywhere on 𝐴 (and possibly other points).

At the other extreme from our 𝐿2 convergence results, Kolmogorov [Kol26] showed that there even exists an 𝑓 ∈ 𝐿1 (𝑆 1 ) whose Fourier series diverges everywhere on 𝑆 1 .

8.5. Applications of Fourier series

199

Problems. Since the only problems in this section that will be used later are Problems 8.5.1 and 8.5.2, the other problems are all of the starred variety described in the Introduction. 8.5.1. (Proves Theorem 8.5.1) Suppose 𝑓 ∈ 𝐿2 (𝑆 1 ), 𝑗 ≥ 1, and 𝐾 and 𝑝 are real constants ̂ || < 𝐾 . such that 𝑝 > 𝑗 + 1 and for all 𝑛 ≠ 0, ||𝑓(𝑛) 𝑝 |𝑛| (a) Prove that for 0 ≤ ℓ ≤ 𝑗, the series ̂ 𝑔ℓ (𝑥) = ∑ (2𝜋𝑖𝑛)ℓ 𝑓(𝑛)𝑒 𝑛 (𝑥)

(8.5.38)

𝑛∈𝐙

converges absolutely and uniformly to 𝑔ℓ ∈ 𝐶 0 (𝑆 1 ), and prove that 𝑔0 = 𝑓 a.e. (b) Let 𝑔 = 𝑔0 . Prove that for 0 ≤ ℓ ≤ 𝑗, we have that ̂ 𝑔(ℓ) (𝑥) = ∑ (2𝜋𝑖𝑛)ℓ 𝑓(𝑛)𝑒 𝑛 (𝑥).

(8.5.39)

𝑛∈𝐙

8.5.2. (Proves Corollary 8.5.2) Suppose 𝑓 ∈ 𝐿2 (𝑆 1 ) and, for all 𝑝 > 2, there exists some ̂ || ≤ 𝐾 . Prove that the Fourier series of constant 𝐾𝑝 such that for all 𝑛 ≠ 0, ||𝑓(𝑛) 𝑝 |𝑛| ∞ 1 𝑓 converges uniformly to some 𝑔 ∈ 𝐶 (𝑆 ) such that 𝑓 = 𝑔 a.e. 8.5.3. (*) (Proves Theorem 8.5.4) Suppose 𝑓 ∈ 𝐶 0 ([0, 1/2]) and 𝜖 > 0. (a) Explain why 𝑓even , the even extension of 𝑓, is continuous on 𝑆 1 (thinking of 𝑆1 as the interval [−1/2, 1/2]). 𝑁

(b) Prove that if 𝑔𝑁 (𝑥) = ∑ 𝑎𝑛 𝑒𝑛 (𝑥) is a trigonometric polynomial of degree 𝑛=−𝑁

𝑁 and 𝜖1 > 0, then there exists some polynomial function 𝑝(𝑥) such that |𝑔𝑁 (𝑥) − 𝑝(𝑥)| < 𝜖1 for all 𝑥 ∈ 𝑆 1 . (c) Prove that if 𝑔 ∈ 𝐶 0 (𝑆 1 ), then there exists some polynomial 𝑝(𝑥) such that for all 𝑥 ∈ 𝑆 1 , we have |𝑔(𝑥) − 𝑝(𝑥)| < 𝜖. In particular, if 𝑔 = 𝑓even , then |𝑓(𝑥) − 𝑝(𝑥)| < 𝜖 for all 𝑥 ∈ [0, 1/2]. 8.5.4. (*) (Proves Theorem 8.5.6) Suppose 𝑓 ∈ 𝐿2 (𝐑) and 𝜖 > 0. Prove that there exists some 𝑔 ∈ 𝐶 ∞ (𝐑) with compact support such that ‖𝑓 − 𝑔‖ < 𝜖. 8.5.5. (*) (Proves Theorem 8.5.8) (a) For 𝑓 as defined in Example 6.2.6, compute the value of 𝜁(2) by computing 2 ‖𝑓‖ in two different ways. (b) For 𝑓 as defined in Example 6.2.8, compute the value of 𝜁(4) by computing 2 ‖𝑓‖ in two different ways. 8.5.6. (*) Suppose 𝑥 ∈ ℓ2 (𝐙). Prove that the discrete-time Fourier transform 𝑥(𝛾) ̂ = ∑ 𝑥(𝑡)𝑒𝑡 (𝛾) 𝑡∈𝐙

converges as a series in 𝐿2 (𝑆1 ).

(8.5.40)

200

Chapter 8. Convergence of Fourier series

8.5.7. (*) (Proves Theorem 8.5.9) Suppose 𝑥 ∈ ℓ2 (𝐙), and let 𝑥(𝛾) ̂ be the discrete-time Fourier transform of 𝑥 (Problem 8.5.6). (a) Prove that

1

𝑥(𝑡) = ∫ 𝑥(𝛾)𝑒 ̂ 𝑡 (𝛾) 𝑑𝛾.

(8.5.41)

0

(b) Prove that

1

2

2

∑ |𝑥(𝑡)| = ∫ |𝑥(𝛾)| ̂ 𝑑𝛾.

(8.5.42)

0

𝑡∈𝐙

8.5.8. (*) (Proves Theorem 8.5.12) Suppose 𝑥 ∈ ℓ2 (𝐙). Define 2

𝑆𝑥 (𝛾) = |𝑥(𝛾)| ̂ ,

(8.5.43)

𝑟𝑥 (𝜏) = ⟨𝑥(𝑡), 𝑥(𝑡 − 𝜏)⟩ = ∑ 𝑥(𝑡)𝑥(𝑡 − 𝜏).

(8.5.44)

𝑡∈𝐙

Prove that

1

𝑟𝑥 (𝜏) = ∫ 𝑆𝑥 (𝛾)𝑒𝜏 (𝛾) 𝑑𝛾.

(8.5.45)

0

8.5.9. (*) (Proves Lemma 8.5.14) Suppose 𝑓 ∶ 𝐑 → 𝐂 is differentiable at 𝑐 ∈ 𝐑. (a) Prove that for any 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that for all 𝑥 ≤ 𝑐 ≤ 𝑦, 𝑥 < 𝑦 such that |𝑥 − 𝑐| < 𝛿(𝜖) and |𝑦 − 𝑐| < 𝛿(𝜖), we have that | | 𝑓(𝑦) − 𝑓(𝑥) | − 𝑓′ (𝑐)| < 𝜖. (8.5.46) | | 𝑦−𝑥 (b) Prove that if 𝑥𝑛− and 𝑥𝑛+ are sequences such that 𝑥𝑛− ≤ 𝑐 ≤ 𝑥𝑛+ and 𝑥𝑛− < 𝑥𝑛+ for all 𝑛 ∈ 𝐍 and lim 𝑥𝑛− = lim 𝑥𝑛+ = 𝑐, then 𝑛→∞

𝑛→∞

𝑓(𝑥𝑛+ ) − 𝑓(𝑥𝑛− ) lim = 𝑓′ (𝑐). 𝑛→∞ 𝑥𝑛+ − 𝑥𝑛−

(8.5.47)

8.5.10. (*) (Proves Theorem 8.5.15) For 𝑎 > 1, prove that ∞

cos(𝑏𝑘 𝜋𝑥) 𝑎𝑘 𝑘=1

𝑔(𝑥) = ∑

(8.5.48)

converges uniformly on 𝐑. 8.5.11. (*) Let 𝐼 be an interval in 𝐑. Prove that if 𝑓 is differentiable on 𝐼 and 𝑓′ is bounded, then 𝑓 is Lipschitz. 8.5.12. (*) (Proves Lemma 8.5.18) Suppose 𝑓 ∈ 𝐿2 (𝑆 1 ). Use the Riemann-Lebesgue Lemma (Corollary 7.6.6) to prove that 1

1

lim ∫ 𝑓(𝑥) sin((2𝑛𝜋 + 𝛼)𝑥) 𝑑𝑥 = lim ∫ 𝑓(𝑥) cos((2𝑛𝜋 + 𝛼)𝑥) 𝑑𝑥 = 0.

𝑛→∞

0

𝑛→∞

(8.5.49)

0

8.5.13. (*) (Proves Lemma 8.5.19) Suppose 𝑓 ∶ [𝑎, 𝑏] → 𝐂 is bounded, and suppose that for any 𝛿 > 0, 𝑓 is (Riemann) integrable on [𝑎 + 𝛿, 𝑏]. Prove that 𝑓 is integrable on [𝑎, 𝑏].

Part 3

Operators and differential equations

9 PDEs and diagonalization The same expression whose abstract properties geometers had considered, and which in this respect belongs to general analysis, represents as well the motion of light in the atmosphere, as it determines the laws of diffusion of heat in solid matter, and enters into all the chief problems of the theory of probability. — Joseph Fourier, The Analytical Theory of Heat [Improbability] generators were often used to break the ice at parties by making all the molecules in the hostess’s undergarments leap simultaneously one foot to the left, in accordance with the Theory of Indeterminacy. Many respectable physicists said that they weren’t going to stand for this, partly because it was a debasement of science, but mostly because they didn’t get invited to those sorts of parties. — Douglas Adams, The Hitchhiker’s Guide to the Galaxy In this chapter, we give a brief introduction to the partial differential equations (PDEs) that we will solve as our main application of Fourier series, both coming from classical physics (Section 9.1) and quantum mechanics (Section 9.2). We also survey some finite-dimensional linear algebra (Section 9.3) that we hope will give the reader some context for the infinite-dimensional linear algebra in Chapter 10.

9.1 Some PDEs from classical physics Joseph Fourier invented what is now known as Fourier analysis in order to solve certain partial differential equations, or PDEs, arising from physics. In this section, we introduce and give brief derivations of the heat and wave equations from classical physics, and in the next section, we discuss a PDE coming from quantum mechanics. For reasons that will become clear (see Section 10.2), the key examples all involve the second derivative in some way, so we first discuss how to approximate 𝑓″ (𝑥) for a 203

204

Chapter 9. PDEs and diagonalization

given 𝑥. Now, by Definition 3.2.2, if 𝑓 is differentiable at 𝑥, for small Δ𝑥, we have 𝑓′ (𝑥) ≈

𝑓(𝑥 + Δ𝑥) − 𝑓(𝑥) , Δ𝑥

(9.1.1)

with equality as Δ𝑥 → 0. So if we only know the value of 𝑓(𝑥) at evenly spaced intervals of length Δ𝑥 > 0, then the right-hand side of (9.1.1), which we hereby call 𝑚+ , gives a reasonable approximation of 𝑓′ (𝑥). More symmetrically, since 𝑥+Δ𝑥/2 is the midpoint of 𝑥 and 𝑥 + Δ𝑥, we can think of 𝑚+ as approximating 𝑓′ (𝑥 + Δ𝑥/2); and similarly, the backwards secant slope 𝑓(𝑥) − 𝑓(𝑥 − Δ𝑥) 𝑚− = (9.1.2) Δ𝑥 gives a reasonable approximation of 𝑓′ (𝑥 − Δ𝑥/2). We might therefore reasonably approximate 𝑚 − 𝑚− 𝑓(𝑥 + Δ𝑥) − 2𝑓(𝑥) + 𝑓(𝑥 − Δ𝑥) 𝑓″ (𝑥) ≈ + = , (9.1.3) Δ𝑥 (Δ𝑥)2 and in fact, one can prove that we get equality as Δ𝑥 → 0 (Problem 9.1.1). Turning now to the problem that first inspired Fourier, we consider the following initial value problem: Question 9.1.1. Suppose we have a 1-dimensional wire, possibly circular, such that the temperature at position 𝑥 and time 𝑡 = 0 is a given function 𝑢(𝑥, 0) = 𝑓(𝑥). Solve for 𝑢(𝑥, 𝑡), the temperature at position 𝑥 and time 𝑡 > 0.

Δx x− Δx heat flow in

x+Δx

x

heat flow in

Figure 9.1.1. Model of a heated wire We can model Question 9.1.1 using Fourier’s Law of heat conduction, which states the rate of heat flow in/out of an object in a given direction is proportional to the temperature gradient (rate of temperature change) in that direction. For example, as shown in Figure 9.1.1, the rate of heat flowing from a piece of the wire of length Δ𝑥 centered at 𝑥 + Δ𝑥 to a piece of the wire of length Δ𝑥 centered at 𝑥 is 𝑘1 (

𝑢(𝑥 + Δ𝑥, 𝑡) − 𝑢(𝑥, 𝑡) ) Δ𝑥

(9.1.4)

for some constant 𝑘1 > 0. (The reader should check that the signs in (9.1.4) make sense; e.g., if the piece at 𝑥 + Δ𝑥 is warmer than the piece at 𝑥, then the piece at 𝑥 gets warmer over time.) Let 𝑄 be the total heat contained in a piece of the wire of length Δ𝑥 centered at 𝑥. Assuming that heat is only transferred along the wire, and not through the surrounding material, by summing the heat flow from the pieces at 𝑥 + Δ𝑥 and 𝑥 − Δ𝑥 to the piece

9.1. Some PDEs from classical physics

205

at 𝑥, we see that (Figure 9.1.1) Δ𝑄 𝑢(𝑥 + Δ𝑥, 𝑡) − 𝑢(𝑥, 𝑡) 𝑢(𝑥 − Δ𝑥, 𝑡) − 𝑢(𝑥, 𝑡) = 𝑘1 ( + ) Δ𝑡 Δ𝑥 Δ𝑥 (9.1.5) 𝑢(𝑥 + Δ𝑥, 𝑡) − 2𝑢(𝑥, 𝑡) + 𝑢(𝑥 − Δ𝑥, 𝑡) = 𝑘1 . Δ𝑥 Furthermore, since temperature is defined to be (up to a constant depending only on the material in the wire) the average amount of heat per unit mass, we have that Δ𝑄 = 𝑘2 (𝜌Δ𝑥)Δ𝑢 for some constant 𝑘2 > 0, where 𝜌 is the (uniform) density of the wire in mass per unit length. Therefore, 𝑘 𝑢(𝑥 + Δ𝑥, 𝑡) − 2𝑢(𝑥, 𝑡) + 𝑢(𝑥 − Δ𝑥, 𝑡) Δ𝑢 =( 1 ) . Δ𝑡 𝑘2 𝜌 (Δ𝑥)2

(9.1.6)

Taking the limit as Δ𝑡, Δ𝑥 → 0, applying (9.1.3), and changing units to eliminate constants, we get the heat equation: 𝜕2 𝑢 𝜕𝑢 = . (9.1.7) 𝜕𝑡 𝜕𝑥2 The heat equation (9.1.7) has several other physical interpretations. For example, if 𝑢(𝑥, 𝑡) describes the concentration of (say) a gas along a linear pipe, a mathematically similar derivation shows that under simplified conditions, (9.1.7) describes the diffusion of that gas over time. Less straightforwardly, the Black-Scholes equation is a model in mathematical finance that determines fair pricing for what is known as a European (stock) option: 𝜕𝑉 𝜕𝑉 𝜎2 𝑠2 𝜕2 𝑉 + 𝑟𝑠 + − 𝑟𝑉 = 0. (9.1.8) 𝜕𝑡 𝜕𝑠 2 𝜕𝑠2 It turns out that (9.1.8) is equivalent to the heat equation under a change of variables; see, for example, Stein and Shakarchi [SS03, p. 170]. Another of our motivating PDEs comes from the following situation. Question 9.1.2. Suppose we have a 1-dimensional wire, held taut at both ends so that it vibrates a small amount vertically, as compared to its length. (Think of a string on a stringed instrument.) Suppose also that we know the initial height 𝑢(𝑥, 0) = 𝑓(𝑥) and 𝜕𝑢 the initial (vertical) velocity (𝑥, 0) = 𝑔(𝑥) of the wire at position 𝑥. Solve for 𝑢(𝑥, 𝑡), 𝜕𝑡 the height of the string at position 𝑥 and time 𝑡 > 0.

Figure 9.1.2. Ball-and-spring model of a vibrating wire We can model Question 9.1.2 with the following “ball-and-spring” approximation. Imagine that our 1-dimensional wire is made of individual particles (“balls”), linked to each other by springs representing the tension of the wire, and assume that the wire

206

Chapter 9. PDEs and diagonalization

is pulled tight enough that the balls are effectively constrained to move only vertically, as shown in Figure 9.1.2. A real-life version of the ball-and-spring model from San Francisco’s Exploratorium museum can also be seen in Figure 9.1.3; here, the rods are attached to a central axle so that the “balls” on the ends of the rods only move vertically.

Figure 9.1.3. A real-life ball-and-spring model, operated by the author’s daughter The ball-and-spring approximation can be turned into a PDE as follows. Suppose the wire is being pulled with a uniform tension of magnitude 𝜏. Then by action-reaction (Newton’s Third Law), at time 𝑡, each piece of the wire of length Δ𝑥 is pulled at both ends by a force with magnitude 𝜏 and direction determined by (roughly speaking) the slope of 𝑢(𝑥, 𝑡) in the 𝑥 direction. Specifically, the vertical force on the piece of wire of length Δ𝑥 centered at 𝑥 coming from the tension pulling in the positive 𝑥 direction is 𝜏(

Δ𝑢+ √(Δ𝑥)2

+ (Δ𝑢+

)2

) ≈ 𝜏(

𝑢(𝑥 + Δ𝑥, 𝑡) − 𝑢(𝑥, 𝑡) ), Δ𝑥

(9.1.9)

where the approximation in (9.1.9) again comes from the assumption that the wire is pulled tight enough that vertical motion is much smaller than horizontal motion. Adding the analogous force coming from the negative 𝑥 direction and applying 𝐹 = 𝑚𝑎 (Newton’s Second Law), we have 𝜏(

𝑢(𝑥 + Δ𝑥, 𝑡) − 𝑢(𝑥, 𝑡) 𝑢(𝑥 − Δ𝑥, 𝑡) − 𝑢(𝑥, 𝑡) 𝜕2 𝑢 + ) = 𝜌Δ𝑥 2 , Δ𝑥 Δ𝑥 𝜕𝑡

(9.1.10)

where 𝜌 is again the uniform density of the wire in mass per unit length. Dividing by Δ𝑥, taking limits, and rescaling units to eliminate constants, we obtain the wave

9.1. Some PDEs from classical physics

207

equation: 𝜕2 𝑢 𝜕2 𝑢 = . (9.1.11) 𝜕𝑥2 𝜕𝑡2 Like the heat equation, the wave equation (9.1.11) has other useful interpretations. One important application comes from Maxwell’s equations for the electrical field 𝐄(𝑥, 𝑦, 𝑧) and the magnetic field 𝐁(𝑥, 𝑦, 𝑧) in a region of empty space: 1 𝜕𝐁 1 𝜕𝐄 ∇×𝐄=− , ∇×𝐁= , 𝑐 𝜕𝑡 𝑐 𝜕𝑡 (9.1.12) ∇ ⋅ 𝐄 = 0, ∇ ⋅ 𝐁 = 0. One can use multivariable calculus to show that up to constants, (9.1.12) reduces to a system of three 3-dimensional wave equations (Problem 9.1.2). We may therefore think of (9.1.11) as modelling electromagnetic waves in (1-dimensional) space over time, giving, for example, plane wave solutions of the form 𝑒𝑖𝑘(𝑥−𝑡) (Problem 9.1.3). Now, both the heat and wave equations can be stated for 𝑢 ∶ 𝑆 1 → 𝐂, though for the wave equation, 𝑆 1 only makes physical sense as a domain if we think of 𝑢 as, for example, a periodic electromagnetic wave. For applications, however, it is perhaps more interesting to consider the following boundary value problems: Question 9.1.3. Solve the heat and wave equations on the domain [𝑎, 𝑏] under the following boundary conditions: (1) Dirichlet boundary conditions: We require that 𝑢(𝑎, 𝑡) = 𝑢(𝑏, 𝑡) = 0 for all 𝑡. For the heat equation, this means holding the temperature of the wire at 0 at both ends; for the wave equation, this means holding the wire fixed at height 0 at both ends. 𝜕𝑢 𝜕𝑢 (2) Neumann boundary conditions: We require that (𝑎, 𝑡) = (𝑏, 𝑡) = 0 for all 𝑡. 𝜕𝑥 𝜕𝑥 For the heat equation, by Fourier’s Law, this means that there is no heat flow in or out of the wire at its ends (i.e., perfect insulation). For the wave equation, this is perhaps somewhat less natural, but one might imagine the ends of the wire sliding up and down fixed vertical rods.

Problems. 9.1.1. Prove that if 𝑓″ (𝑥) exists, then 𝑓(𝑥 + ℎ) − 2𝑓(𝑥) + 𝑓(𝑥 − ℎ) = 𝑓″ (𝑥). lim ℎ2 ℎ→0

(9.1.13)

9.1.2. For a vector field 𝐹(𝑥, 𝑦, 𝑧) = (𝐹1 (𝑥, 𝑦, 𝑧), 𝐹2 (𝑥, 𝑦, 𝑧), 𝐹3 (𝑥, 𝑦, 𝑧)) and a scalar function 𝑓(𝑥, 𝑦, 𝑧) we define the grad of 𝑓 and the curl and div of 𝐹 by 𝜕𝑓 𝜕𝑓 𝜕𝑓 ∇𝑓 = ( , , (9.1.14) ), 𝜕𝑥 𝜕𝑦 𝜕𝑧 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝜕𝐹 ∇ × 𝐹 = ( 3 − 2, 1 − 3, 2 − 1), (9.1.15) 𝜕𝑦 𝜕𝑧 𝜕𝑧 𝜕𝑥 𝜕𝑥 𝜕𝑦 𝜕𝐹 𝜕𝐹 𝜕𝐹 ∇ ⋅ 𝐹 = 1 + 2 + 3, (9.1.16) 𝜕𝑥 𝜕𝑦 𝜕𝑧 respectively. We also define ∇2 𝑓 = nentwise:

𝜕2 𝑓 𝜕2 𝑓 𝜕2 𝑓 + 2 + 2 , and we define ∇2 𝐹 compo𝜕𝑥2 𝜕𝑦 𝜕𝑧

∇2 𝐹 = (∇2 𝐹1 , ∇2 𝐹2 , ∇2 𝐹3 ).

(9.1.17)

208

Chapter 9. PDEs and diagonalization (a) For a vector field 𝐹(𝑥, 𝑦, 𝑧), as above, prove that ∇ × (∇ × 𝐹) = ∇(∇ ⋅ 𝐹) − ∇2 𝐹.

(9.1.18)

(You may assume that partial derviatives commute; e.g.,

𝜕 𝜕 𝜕 𝜕 = .) 𝜕𝑥 𝜕𝑦 𝜕𝑦 𝜕𝑥

(b) Prove that if 𝐄 and 𝐁 satisfy Maxwell’s equations ∇×𝐄=−

1 𝜕𝐁 , 𝑐 𝜕𝑡

∇ ⋅ 𝐄 = 0,

1 𝜕𝐄 , 𝑐 𝜕𝑡 ∇⋅𝐁=0

∇×𝐁=

(9.1.19)

in empty space, then 1 𝜕2 𝐄 , (9.1.20) 𝑐2 𝜕𝑡2 a system of three 3-dimensional wave equations. (Again, assume partial derivatives commute.) ∇2 𝐄 =

9.1.3. Verify by computation that for any 𝑘 ∈ 𝐂, 𝑒𝑖𝑘(𝑥−𝑡) is a solution to the wave equa𝜕2 𝑢 𝜕2 𝑢 tion 2 = 2 . 𝜕𝑥 𝜕𝑡

9.2 Schrödinger’s equation Our third and final motivating PDE comes from quantum mechanics and addresses the following question. Question 9.2.1. Explain why, when the energy levels of oxygen (O2 ) molecules are measured, those energy levels are distributed not continuously, but discretely, or in other words, in a quantized manner. The phenomenon from Question 9.2.1 is illustrated in Figure 9.2.1, which gives a rough idea of what the energy levels (bright lines) look like in the visible part of the spectrum for photons coming from a sample of oxygen molecules that are excited (e.g., heated) and then transition to a lower energy state by emitting a photon. Note that these energy levels (or, actually, differences between energy levels) appear to be concentrated in narrow bands, as opposed to, say, some kind of more even or continuous distribution.

bluer

redder

Figure 9.2.1. Sketch of emission spectrum of oxygen molecules Now, the answer to Question 9.2.1 cannot be obtained from ordinary mechanics in the straightforward manner that we obtained the heat and wave equations. Indeed, no less an authority than Feymann [Fey11, III.16] calls Schrödinger’s equation a matter of inspired guesswork. Nevertheless, following a number of sources (such as Eisberg

9.2. Schrödinger’s equation

209

and Resnick [ER85, Ch. 5] and Holland [Hol07, pp. 284–285]), with hindsight, we can approximate Schrödinger’s heuristic reasoning as follows. First off, we note that the bond of a diatomic molecule like O2 can be modeled as a “spring,” as shown in Figure 9.2.2. In that model, if 𝑥 is the displacement from equilibrium bond length (distance between the two oxygen nuclei), then the corresponding force 𝐹(𝑥) applied to the system is 𝐹(𝑥) = −𝑘1 𝑥

(9.2.1)

for some constant 𝑘1 > 0. Note that if the bond length is longer than equilibrium (𝑥 > 0), then (9.2.1) indicates that attracting forces shorten it, and vice versa.

Figure 9.2.2. A diatomic molecule is like a spring 𝑑𝑥 Applying 𝐹 = 𝑚𝑎 to rewrite (9.2.1) as 𝑚𝑎 + 𝑘1 𝑥 = 0, multiplying by 𝑣 = , 𝑑𝑡 1 and integrating with respect to 𝑡 gives a potential energy function 𝑉(𝑥) = 𝑘1 𝑥2 that 2 satisfies the conservation of energy equation 1 2 1 𝑚𝑣 + 𝑘1 𝑥2 = 𝐸, 2 2

(9.2.2)

1 where 𝑚𝑣2 represents the kinetic energy of the electron and 𝐸 is the (constant) total 2 energy in the system. Rewriting (9.2.2) in terms of momentum 𝑝 = 𝑚𝑣, we then get 1 𝑘2 𝑝2 + 𝑘1 𝑥2 = 𝐸. (9.2.3) 2 Following Schrödinger, with the benefit of hindsight, we can now turn (9.2.3) into an operator equation using de Broglie’s theory of “matter waves.” In that theory, de Broglie posited that a particle expressing wave-like properties in one spatial dimension can be modeled by a plane wave 𝑢(𝑥, 𝑡) = 𝑒𝑖(𝑝𝑥−𝐸𝑡)/𝑘3 ,

(9.2.4)

where 𝑝 and 𝐸 are the (constant) momentum and energy of the particle, respectively. (Compare Problem 9.1.3.) Differentiating (9.2.4) and rearranging terms, we see that 𝜕𝑢 𝜕𝑢 = 𝑝𝑢, 𝑖𝑘3 = 𝐸𝑢. 𝜕𝑥 𝜕𝑡 If we then think of (9.2.5) as equations among operators on 𝑢, we have that −𝑖𝑘3

𝜕 𝜕 = 𝑝, 𝑖𝑘3 = 𝐸. 𝜕𝑥 𝜕𝑡 Substituting into (9.2.3), we get the operator equation −𝑖𝑘3

(9.2.5)

(9.2.6)

1 𝜕 𝜕2 + 𝑘 𝑥2 = 𝑖𝑘3 . (9.2.7) 𝜕𝑡 𝜕𝑥2 2 1 So what do the two sides of (9.2.7) operate on? Well, the final quantum leap (pun intended) is to suppose that our particle is represented by a state function Ψ(𝑥, 𝑡) whose −𝑘4

210

Chapter 9. PDEs and diagonalization

interpretation will be discussed later. For now, we merely note that after adjusting constants, (9.2.7) gives 𝜕2 Ψ 𝜕Ψ − 2 + 4𝜋2 𝑥2 Ψ = 𝑖 . (9.2.8) 𝜕𝑡 𝜕𝑥 This is Schrödinger’s equation for the quantum harmonic oscillator and is our final motivating PDE problem: Question 9.2.2. Solve Schrödinger’s equation for the quantum harmonic oscillator (9.2.8), given some initial complex-valued state Ψ(𝑥, 0). Remark 9.2.3. Note that in (9.2.8), the specific forces governing the quantum har1 monic oscillator only appear via the potential energy function 𝑉(𝑥) = 𝑘1 𝑥2 . We may 2 therefore obtain Schrödinger’s equation for any number of other physical situations by changing the potential 𝑉(𝑥). For many examples, and much more about the general theory of Schrödinger’s equation, see Teschl [Tes09].

9.3 Diagonalization In this section, we review some relevant material from linear algebra to give the reader some motivation for the material on operators in Chapter 10 and beyond. As the reader may recall, the most common application of linear algebra is to be able to solve matrix equations of the form 𝐴𝑥 = 𝑏. (Here, we restrict our attention to the case where 𝐴 is an 𝑛 × 𝑛 matrix and 𝑥 and 𝑏 are unknown and known 𝑛 × 1 column vectors, respectively.) Solving 𝐴𝑥 = 𝑏 in general takes some effort (e.g., Gaussian reduction), but one easy case happens when 𝐴 is diagonal, i.e., of the form 𝑎11 ⋱ ] (blank entries are 0), for then 𝐴𝑥 = 𝑏 just becomes 𝑛 independent one[ 𝑎𝑛𝑛 variable equations. It would therefore certainly be convenient to be able to reduce the general case of 𝐴𝑥 = 𝑏 to the diagonal case, in the sense that if 𝑃 −1 𝐴𝑃 = 𝐷, where 𝐷 is diagonal, then a solution 𝑦 of 𝐷𝑦 = 𝑃 −1 𝑏 gives a solution 𝑥 = 𝑃𝑦 to 𝐴𝑥 = 𝑏 (Problem 9.3.1). Perhaps more importantly, we can obtain much information, both quantitative and qualitative, about solutions to 𝐴𝑥 = 𝑏 from the entries of 𝐷. (See below for some examples.) To get more traction on this problem of diagonalization, it is helpful to have the following abstraction available. Definition 9.3.1. A linear operator on 𝐂𝑛 is a linear transformation 𝑇 ∶ 𝐂𝑛 → 𝐂𝑛 (i.e., a map such that for all 𝑥, 𝑦 ∈ 𝐂𝑛 and 𝑎, 𝑏 ∈ 𝐂, 𝑇(𝑎𝑥 + 𝑏𝑦) = 𝑎𝑇(𝑥) + 𝑏𝑇(𝑦)). For example, if 𝐴 is an 𝑛 × 𝑛 matrix with entries in 𝐂, then 𝑇(𝑥) = 𝐴𝑥 defines a linear operator on 𝐂𝑛 ; indeed, for the rest of this section, we assume that 𝑇(𝑥) = 𝐴𝑥. We may also therefore restate the problem of solving 𝐴𝑥 = 𝑏 as solving 𝑇(𝑥) = 𝑏. In these terms, the key to diagonalization is the following circle of ideas. Definition 9.3.2. Let 𝑇 be a linear operator on 𝐂𝑛 . To say that 𝑣 ∈ 𝐂𝑛 is an eigenvector of 𝑇 means that 𝑣 ≠ 𝟎 and 𝑇(𝑣) = 𝜆𝑣 for some 𝜆 ∈ 𝐂. To say that 𝜆 ∈ 𝐂 is an eigenvalue of 𝑇 means that 𝑇(𝑣) = 𝜆𝑣 for some eigenvector 𝑣 of 𝑇. Jointly, if 𝑣 ≠ 𝟎 and 𝑇(𝑣) = 𝜆𝑣, we say that 𝑣 is an eigenvector of 𝑇 with eigenvalue 𝜆.

9.3. Diagonalization

211

The following term is nonstandard, but we introduce it here because we will find it to be quite useful. Definition 9.3.3. Let 𝑇 be a linear operator on 𝐂𝑛 . An eigenbasis for 𝑇 is a basis {𝑢1 , … , 𝑢𝑛 } for 𝐂𝑛 such that each 𝑢𝑖 is an eigenvector of 𝑇. Suppose 𝑃 is an 𝑛 × 𝑛 matrix whose columns are {𝑢1 , … , 𝑢𝑛 }. We recall from linear algebra that {𝑢1 , … , 𝑢𝑛 } is a basis for 𝐂𝑛 if and only if 𝑃 is invertible. The following is then a necessary and sufficient condition for diagonalization. Theorem 9.3.4. Let 𝐴 be an 𝑛 × 𝑛 matrix, and let 𝑇(𝑥) = 𝐴𝑥. If 𝑃 is an invertible 𝑛 × 𝑛 matrix with columns {𝑢1 , … , 𝑢𝑛 }, then the following are equivalent: (1) 𝑃−1 𝐴𝑃 is diagonal. (2) {𝑢1 , … , 𝑢𝑛 } is an eigenbasis for 𝑇. Furthermore, if those conditions hold, then 𝐷 = 𝑃 −1 𝐴𝑃 = [

𝜆1 ⋱

], where 𝜆𝑖 is the 𝜆𝑛

eigenvalue of 𝑢𝑖 . Proof. Problem 9.3.2. If 𝐴 (or equivalently, 𝑇) can be diagonalized, we obtain a number of consequences. For example: •

The eigenvalues {𝜆1 , … , 𝜆𝑛 } are the only eigenvalues of 𝑇 (Problem 9.3.4).

𝑇 is invertible if and only if all of the 𝜆𝑖 ≠ 0 (Problem 9.3.5).

In the above light, the reader can think of the material that we will consider in Chapter 10 as analogous to the finite-dimensional diagonalization theory we just described. More specifically, our basic strategy in studying operators is: (1) We look at a linear operator 𝑇 on a function space. Often 𝑇 is defined in terms of derivatives, like 𝑇(𝑓) = −𝑓″ . (The precise definition of operator is more subtle than one might think; see Section 10.1.) (2) We find an orthogonal eigenbasis for 𝑇 that we use to define (generalized) Fourier series. For 𝑇(𝑓) = −𝑓″ , one such eigenbasis is {𝑒𝑛 }, where 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 . (See Section 10.4.) (3) As a consequence, we see (Theorem 10.4.3) that ∞

𝑇 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝜆𝑛 𝑐𝑛 𝑒𝑛 , 𝑛=1

(9.3.1)

𝑛=1

where 𝜆𝑛 is the eigenvalue of 𝑒𝑛 . Note that in the finite-dimensional case, (9.3.1) follows from the linearity of 𝑇, but in the infinite-dimensional case, (9.3.1) is far from obvious. Looking ahead even further, the main idea of Chapter 11 can then be described as applying diagonalization to solve problems like those from Sections 9.1 and 9.2.

212

Chapter 9. PDEs and diagonalization

Problems. 9.3.1. Suppose 𝐴, 𝐷, and 𝑃 are 𝑛 × 𝑛 matrices, 𝑃 is invertible, and 𝑃−1 𝐴𝑃 = 𝐷. Prove that for column vectors 𝑦, 𝑏 ∈ 𝐂𝑛 , 𝐷𝑦 = 𝑃 −1 𝑏 if and only if 𝐴(𝑃𝑦) = 𝑏. 9.3.2. (Proves Theorem 9.3.4) Let 𝐴 be an 𝑛 × 𝑛 matrix, let 𝑇(𝑥) = 𝐴𝑥, and let 𝑃 be an invertible 𝑛 × 𝑛 matrix with columns {𝑢1 , … , 𝑢𝑛 }. Let 𝐷 = 𝑃 −1 𝐴𝑃. (a) Prove that 𝐷 is diagonal if and only if {𝑢1 , … , 𝑢𝑛 } is an eigenbasis for 𝑇. 𝜆1 ⋱ (b) Prove that if 𝐷 is diagonal, then 𝐷 = [ ], where 𝜆𝑖 is the eigenvalue 𝜆𝑛 of 𝑢𝑖 . 0 0 9.3.3. Let 𝐴 = [ ]. Prove that if 𝑃 is an invertible 2 × 2 matrix, then 𝑃−1 𝐴𝑃 is not a 1 0 diagonal matrix. 9.3.4. Suppose 𝑇 is a linear operator on 𝐂𝑛 and {𝑢1 , … , 𝑢𝑛 } is an eigenbasis for 𝑇 with corresponding eigenvalues {𝜆1 , … , 𝜆𝑛 }. Prove that if 𝜆 ∈ 𝐂 is not equal to any of the 𝜆𝑖 and 𝑇(𝑣) = 𝜆𝑣, then 𝑣 = 0. 9.3.5. Suppose 𝑇 is a linear operator on 𝐂𝑛 and ℬ = {𝑒1 , … , 𝑒𝑛 } is an eigenbasis for 𝑇 with corresponding eigenvalues {𝜆1 , … , 𝜆𝑛 }. (a) Prove that if 𝜆𝑖 = 0 for some 𝑖, then 𝑇 is not invertible. (b) Prove that if 𝜆𝑖 ≠ 0 for all 𝑖, then 𝑇 is invertible.

10 Operators on Hilbert spaces Operator, this is an emergency Operator, baby, burning up on me Operator, this is an emergency Operator, operator — Midnight Star, “Operator” In mathematics you don’t understand things. You just get used to them. — John von Neumann In this chapter, we develop the Hilbert space analogue of the finite-dimensional theory of operator diagonalization described in Section 9.3. Beginning with the (somewhat subtle) definition of operator (Section 10.1), we define the geometric properties of being Hermitian and positive (Section 10.2) and extend the definitions of eigenvectors and eigenvalues to the Hilbert space setting (Section 10.3). We then conclude the chapter by extending the finite-dimensional theory of diagonalization to Hilbert spaces (Section 10.4).

10.1 Operators on Hilbert spaces The first step in generalizing Section 9.3 is to define what a linear operator is. One surprise is that our very first definition is, by necessity, trickier than Definition 9.3.1. Definition 10.1.1. Let ℋ be a Hilbert space, or more generally, a function space. We define a linear operator, or simply operator, in ℋ to be a linear map 𝑇 ∶ 𝔇(𝑇) → ℋ such that 𝔇(𝑇) is a subspace of ℋ. In other words, we require the domain 𝔇(𝑇) of 𝑇 to contain the zero function and be closed under addition and scalar multiplication in ℋ (Definition 5.2.1); and for all 𝑓, 𝑔 ∈ 𝔇(𝑇) and 𝑐 ∈ 𝐂, we require 𝑇(𝑐𝑓) = 𝑐𝑇(𝑓),

𝑇(𝑓 + 𝑔) = 𝑇(𝑓) + 𝑇(𝑔).

(10.1.1)

(Compare Definition B.11.) 213

214

Chapter 10. Operators on Hilbert spaces

As we shall see momentarily, in contrast with the finite-dimensional definition (Definition 9.3.1), we do not require 𝑇 to be defined on all of ℋ in Definition 10.1.1 because some of our most important examples cannot be extended to all of ℋ in a natural manner. We reinforce this distinction by a careful use of prepositions: An operator on ℋ is an operator whose domain is all of ℋ, whereas in the general case, where the domain may be strictly smaller than ℋ, we refer to an operator in ℋ. Example 10.1.2. Let 𝑋 = 𝑆 1 , [𝑎, 𝑏], or 𝐑, and let ℋ = 𝐿2 (𝑋). For 𝜆 ∈ 𝐂, the map 𝜆𝐼 ∶ ℋ → ℋ defined by 𝜆𝐼(𝑓) = 𝜆𝑓 is a linear operator on ℋ with 𝔇(𝜆𝐼) = ℋ. (The notation 𝜆𝐼 is meant to suggest a generalization of 𝜆 times the identity matrix.) Example 10.1.3. Let 𝑋 = 𝑆 1 or [𝑎, 𝑏], and let ℋ = 𝐿2 (𝑋). The map 𝐷 ∶ 𝐶 1 (𝑋) → ℋ defined by 𝐷(𝑓) = −𝑖𝑓′ is a linear operator in ℋ with 𝔇(𝐷) = 𝐶 1 (𝑋). (Note the mysterious scalar factor −𝑖, which we will explain later; see Remarks 10.2.3 and 10.3.3.) Example 10.1.4. Similarly, for ℋ = 𝐿2 (𝐑), the map 𝐷 ∶ 𝐶𝑐1 (𝐑) → ℋ defined by 𝐷(𝑓) = −𝑖𝑓′ is a linear operator in ℋ, with 𝔇(𝐷) = 𝐶𝑐1 (𝐑), the space of continuously differentiable functions with compact support. (See Subsection 8.5.2.) Another useful domain for 𝐷 is the Schwartz space 𝒮(𝐑) of smooth functions that “decay rapidly at infinity” (Section 4.7); note that 𝐷 actually maps 𝒮(𝐑) into itself. Note that for 𝑓 ∈ 𝐶 1 (𝑆 1 ), since 𝑓′ ∈ 𝐶 0 (𝑆 1 ), by Theorem 6.4.1 and the Fundamental Theorem of Fourier Series (Theorem 8.1.1), we see that ̂ 𝐷(𝑓) = ∑ 2𝜋𝑛𝑓(𝑛)𝑒 (10.1.2) 𝑛 (𝑥), 𝑛∈𝐙 2

1

with convergence in 𝐿 (𝑆 ), though not necessarily pointwise (see Remark 8.5.20). In other words, 𝐷 essentially multiplies the 𝑛th Fourier coefficient of 𝑓 by 2𝜋𝑛. Example 10.1.5. If ℋ = 𝐿2 ([𝑎, 𝑏]), then the map 𝑀𝑥 ∶ ℋ → ℋ defined by 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥) is a linear operator with domain 𝔇(𝑀𝑥 ) = ℋ. More generally, for 𝑋 = [𝑎, 𝑏] or 𝑆 1 , ℋ = 𝐿2 (𝑋), and a piecewise continuous function 𝑔 ∶ 𝑋 → 𝐂, the map 𝑀𝑔 ∶ ℋ → ℋ defined by 𝑀𝑔 (𝑓) = 𝑔(𝑥)𝑓(𝑥) is a linear operator with domain 𝔇(𝑀𝑔 ) = ℋ. (This domain works because 𝑔(𝑥) is bounded, which means that 𝑔(𝑥)𝑓(𝑥) is bounded by a scalar multiple of |𝑓(𝑥)| and, therefore, is in 𝐿2 (𝑋); in fact, the same holds if we only require 𝑔(𝑥) to be bounded and measurable.) Example 10.1.6. Let ℋ = 𝐿2 (𝐑). The map 𝑀𝑥 ∶ 𝐶𝑐0 (𝐑) → ℋ defined by 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥), or more generally, for a piecewise continuous function 𝑔 ∶ 𝑋 → 𝐂, the map 𝑀𝑔 ∶ 𝐶𝑐0 (𝐑) → ℋ defined by 𝑀𝑔 (𝑓) = 𝑔(𝑥)𝑓(𝑥), is a linear operator with domain 𝔇(𝑀𝑔 ) = 𝐶𝑐0 (𝐑), the space of all continuous functions with compact support (see Definition 7.5.12), by reasoning similar to that of Example 10.1.5. We next consider some more abstract examples. Example 10.1.7. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, and let ∞ | ℋ0 = { ∑ 𝑐𝑛 𝑒𝑛 ∈ ℋ || all but finitely many 𝑐𝑛 = 0} . (10.1.3) | 𝑛=1

10.1. Operators on Hilbert spaces

215

Then we may define linear operators 𝜇 and 𝜄 in ℋ by the formulas ∞

𝜇 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑛𝑐𝑛 𝑒𝑛 , 𝑛=1 ∞

𝑛=1 ∞

𝜄 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ ( 𝑛=1

𝑛=1

𝑐𝑛 )𝑒 , 𝑛 𝑛

(10.1.4) (10.1.5)

where we take the domains of 𝜇 and 𝜄 to be 𝔇(𝜇) = ℋ0 and 𝔇(𝜄) = ℋ. Of course, we need to prove that (10.1.4) and (10.1.5) actually produce convergent elements of ℋ on the corresponding domains. For 𝜄, this is Problem 10.1.2, and for 𝜇, this is a special case of the following theorem: Theorem 10.1.8. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, let ℋ0 be as defined in (10.1.3), and let 𝑎(𝑛) be any function 𝑎 ∶ 𝐍 → 𝐂. Then ∞

𝛼 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑎(𝑛)𝑐𝑛 𝑒𝑛 𝑛=1

(10.1.6)

𝑛=1

defines a linear operator in ℋ with domain ℋ0 . Proof. Problem 10.1.3. We call an operator of the form described in Theorem 10.1.8 (possibly with a domain larger than ℋ0 ) a diagonal operator with respect to the basis {𝑒𝑛 }, as 𝛼 is an infinite-dimensional version of multiplication by a diagonal matrix. Equivalently, we say that the basis {𝑒𝑛 } diagonalizes the operator 𝛼. For example, (10.1.2) shows that the operator 𝐷 is a diagonal operator with respect to the usual basis {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} for ℋ = 𝐿2 (𝑆 1 ); equivalently, {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} diagonalizes 𝐷. (See Section 10.4 for much more on diagonalization.) Example 10.1.9. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}. We define the shift operator 𝜎 ∶ ℋ → ℋ by ∞

𝜎 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑐𝑛 𝑒𝑛+1 . 𝑛=1

(10.1.7)

𝑛=1

In other words, 𝜎 is the “shift” of each basis vector 𝑒𝑛 to 𝑒𝑛+1 , extended by (series) linearity. One surprise in the basic definitions of operator theory is that, in contrast to the usual situation of calculus, where continuous functions form the principal class of examples, some of our most important examples of operators are not continuous. To be specific, if 𝑇 is an operator in the Hilbert space ℋ, then Definition 7.2.12 defines what it means for 𝑇 to be continuous, and we also have the following idea. Definition 10.1.10. Let 𝑇 be an operator on the Hilbert space ℋ. To say that 𝑇 is bounded means that there exists some 𝑀 > 0 such that for all 𝑓 ∈ ℋ, we have ‖𝑇(𝑓)‖ ≤ 𝑀 ‖𝑓‖. Note that a bounded operator 𝑇 is not bounded in the sense of a bounded function on 𝐑; rather, 𝑇 is relatively bounded, in that elements of ℋ are magnified by a factor of at most 𝑀.

216

Chapter 10. Operators on Hilbert spaces

For example, the operator 𝜄 of Example 10.1.7 is bounded (Problem 10.1.4), but the differentiation operator 𝐷 of Example 10.1.3 is not bounded (Problem 10.1.5). The reader new to operators may be surprised to discover that the ideas of continuity and boundedness turn out to be equivalent for operators in a Hilbert space. In fact, we will need the following more general result. Theorem 10.1.11. Let ℋ and ℋ ′ be Hilbert spaces, let 𝔇(𝑇) be a subspace of ℋ, and suppose that 𝑇 ∶ 𝔇(𝑇) → ℋ ′ is linear (i.e., 𝑇 satisfies (10.1.1)). Then the following conditions are equivalent: (UC): 𝑇 is uniformly continuous over all of 𝔇(𝑇), or in other words, for any 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if 𝑓, 𝑔 ∈ 𝔇(𝑇) and ‖𝑓 − 𝑔‖ < 𝛿(𝜖), then ‖𝑇(𝑓) − 𝑇(𝑔)‖ < 𝜖. (C0): 𝑇 is continuous at 0 ∈ 𝔇(𝑇). (B): 𝑇 is bounded; i.e., there exists some 𝑀 > 0 such that for all 𝑓 ∈ 𝔇(𝑇), we have ‖𝑇(𝑓)‖ ≤ 𝑀 ‖𝑓‖. The reader should note that in the above statements, norms like ‖𝑓‖ are calculated in ℋ and norms like ‖𝑇(𝑓)‖ are calculated in ℋ ′ . Proof. Problem 10.1.6. In any case, since the differentiation operator 𝐷 and other differential operators are our most important examples of an operator in a Hilbert space and since 𝐷 and similar operators are not continuous, we must be very careful not to assume that operators are continuous. Put another way, we cannot assume that all operators commute with lim . 𝑛→∞

Remark 10.1.12. As an aside, another application of Theorem 10.1.11 is a proof of what is known as the Riesz Representation Theorem; see Problem 10.1.8. Finally, we describe how to take linear combinations and products of operators. Again, the key point is to be careful with definitions, especially about domains. Definition 10.1.13. Suppose 𝑆 and 𝑇 are operators in a Hilbert space ℋ. For 𝑎, 𝑏 ∈ 𝐂, we define a function (𝑎𝑆 + 𝑏𝑇) with domain 𝔇(𝑆) ∩ 𝔇(𝑇) by (𝑎𝑆 + 𝑏𝑇)(𝑓) = 𝑎𝑆(𝑓) + 𝑏𝑇(𝑓).

(10.1.8)

If we also have that 𝑇(𝔇(𝑇)) ⊆ 𝔇(𝑆), then we define a function 𝑆𝑇 with domain 𝔇(𝑇) by (𝑆𝑇)(𝑓) = 𝑆(𝑇(𝑓)). (10.1.9) In other words, we write composition as a product 𝑆𝑇. Of course, we need to verify that 𝑎𝑆 + 𝑏𝑇 and 𝑆𝑇 are indeed operators: Theorem 10.1.14. Let ℋ be a Hilbert space, and let 𝑆 and 𝑇 be operators in ℋ. (1) If 𝑎, 𝑏 ∈ 𝐂, then 𝑎𝑆 + 𝑏𝑇 is an operator in ℋ with domain 𝔇(𝑆) ∩ 𝔇(𝑇). (2) If 𝑇(𝔇(𝑇)) ⊆ 𝔇(𝑆), then the composition 𝑆𝑇 is an operator in ℋ with domain 𝔇(𝑇). Proof. Problem 10.1.7.

10.1. Operators on Hilbert spaces

217

Problems. 10.1.1. (a) Prove that the operator 𝐷(𝑓) = −𝑖𝑓′ (Example 10.1.3) is linear. (b) Prove that the operator 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥) (Example 10.1.5) is linear. 10.1.2. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}. Recall that ̂ = ⟨𝑓, 𝑒𝑛 ⟩ is the 𝑛th generalized Fourier coefficient. if 𝑓 ∈ ℋ, then 𝑓(𝑛) ∞

(a) Prove that if 𝑓 ∈ ℋ, then ∑ ( 𝑛=1

̂ 𝑓(𝑛) ) 𝑒𝑛 converges in ℋ (under the 𝐿2 metric). 𝑛

(b) Prove that ∞ ∞ ̂ 𝑓(𝑛) ̂ 𝜄 ( ∑ 𝑓(𝑛)𝑒 ) 𝑒𝑛 𝑛) = ∑ ( 𝑛 𝑛=1 𝑛=1

(10.1.10)

is an operator on ℋ that is well-defined on all of ℋ. 10.1.3. (Proves Theorem 10.1.8) Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, let ∞ | ℋ0 = { ∑ 𝑐𝑛 𝑒𝑛 ∈ ℋ || all but finitely many 𝑐𝑛 = 0} , | 𝑛=1

(10.1.11)

and let 𝑎 ∶ 𝐍 → 𝐂 be a function. Prove that the formula ∞

𝛼 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑎(𝑛)𝑐𝑛 𝑒𝑛 𝑛=1

(10.1.12)

𝑛=1

gives a well-defined linear map with domain ℋ0 . 10.1.4. Continuing Problem 10.1.2, prove that 𝜄 is a bounded operator. 10.1.5. Prove that the operator 𝐷(𝑓) = −𝑖𝑓′ , with domain 𝔇(𝐷) = 𝐶 1 (𝑆 1 ), is unbounded. 10.1.6. (Proves Theorem 10.1.11) Let ℋ and ℋ ′ be Hilbert spaces, let 𝔇(𝑇) be a subspace of ℋ, and suppose that 𝑇 ∶ 𝔇(𝑇) → ℋ ′ is linear (i.e., 𝑇 satisfies (10.1.1)). Consider the following properties: (UC): 𝑇 is uniformly continuous over all of 𝔇(𝑇), or in other words, for any 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if ‖𝑓 − 𝑔‖ < 𝛿(𝜖), then ‖𝑇(𝑓) − 𝑇(𝑔)‖ < 𝜖. (C0): 𝑇 is continuous at 0 ∈ 𝔇(𝑇). (B): 𝑇 is bounded; i.e., there exists some 𝑀 > 0 such that for all 𝑓 ∈ 𝔇(𝑇), we have ‖𝑇(𝑓)‖ ≤ 𝑀 ‖𝑓‖. Since (UC) implies (C0) a fortiori, the following completes the proof of Theorem 10.1.11. (a) Prove that (C0) implies (B). (b) Prove that (B) implies (UC). 10.1.7. (Proves Theorem 10.1.14) Let ℋ be a Hilbert space, let 𝑆 and 𝑇 be operators in ℋ and 𝑎, 𝑏 ∈ 𝐂, and define 𝑎𝑆 + 𝑏𝑇 and 𝑆𝑇 as in Definition 10.1.13. (a) Prove that 𝑎𝑆 + 𝑏𝑇 is an operator in ℋ with domain 𝔇(𝑆) ∩ 𝔇(𝑇).

218

Chapter 10. Operators on Hilbert spaces (b) Prove that if 𝑇(𝔇(𝑇)) ⊆ 𝔇(𝑆), then 𝑆𝑇 is an operator in ℋ with domain 𝔇(𝑇).

10.1.8. (*) Let ℋ be a Hilbert space with an orthonormal basis {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, and let 𝜆 ∶ ℋ → 𝐂 be a bounded linear map. Here we think of 𝐂 as a Hilbert space with ‖𝑧‖ = |𝑧| for all 𝑧 ∈ 𝐂, and 𝜆 is bounded in the sense that there exists some 𝑀 > 0 such that for all 𝑓 ∈ ℋ, we have ‖𝜆(𝑓)‖ = |𝜆(𝑓)| ≤ 𝑀 ‖𝑓‖. The goal of this problem is to prove the Riesz Representation Theorem, which says that there must exist some fixed 𝑔 ∈ ℋ such that 𝜆(𝑓) = ⟨𝑓, 𝑔⟩ for all 𝑓 ∈ ℋ. (a) Suppose we have 𝑔 ∈ ℋ such that 𝜆(𝑓) = ⟨𝑓, 𝑔⟩. Find a formula for 𝑔(𝑛) ̂ in terms of 𝜆. (b) Now suppose that 𝑔(𝑛) ̂ is described by the formula found in part (a). Prove that for any 𝑁 ∈ 𝐍, we have √ √ √

𝑁

2

∑ |𝑔(𝑛)| ̂ ≤ 𝑀.

(10.1.13)

√𝑛=1 (c) Prove that there exists some 𝑔 ∈ ℋ such that 𝜆(𝑓) = ⟨𝑓, 𝑔⟩ for all 𝑓 ∈ ℋ.

10.2 Hermitian and positive operators In this section, we examine two useful geometric properties that many of our favorite examples possess: namely, the properties of being Hermitian and positive. We begin with the first property. Definition 10.2.1. Let 𝑇 be a linear operator in a Hilbert space ℋ with domain 𝔇(𝑇). To say that 𝑇 is Hermitian means that for all 𝑓, 𝑔 ∈ 𝔇(𝑇), we have that ⟨𝑇(𝑓), 𝑔⟩ = ⟨𝑓, 𝑇(𝑔)⟩. As it turns out, most, but not all, of the examples discussed in Section 10.1 are Hermitian. Example 10.2.2. Let 𝐷 be the operator 𝐷(𝑓) = −𝑖𝑓′ on ℋ = 𝐿2 (𝑆1 ) defined in Example 10.1.3, where 𝔇(𝐷) = 𝐶 1 (𝑆 1 ). For 𝑓, 𝑔 ∈ 𝔇(𝐷), we have 1

⟨𝐷(𝑓), 𝑔⟩ = ∫ (−𝑖)𝑓′ (𝑥)𝑔(𝑥) 𝑑𝑥 0 1

1

= (−𝑖)𝑓(𝑥)𝑔(𝑥)|| − ∫ (−𝑖)𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 0

(*)

0

1

= (−𝑖)(𝑓(1)𝑔(1) − 𝑓(0)𝑔(0)) + ∫ 𝑖𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥

(**)

(10.2.1)

0 1

= ∫ 𝑓(𝑥)(−𝑖)𝑔′ (𝑥) 𝑑𝑥 0

= ⟨𝑓, 𝐷(𝑔)⟩ , where step (*) follows by integration by parts and the 𝑓(𝑥)𝑔(𝑥) term from step (**) cancels because 𝑓(0) = 𝑓(1) and 𝑔(0) = 𝑔(1). It follows that 𝐷 is Hermitian.

10.2. Hermitian and positive operators

219

Remark 10.2.3. Note that the factor of −𝑖 in 𝐷(𝑓) = −𝑖𝑓′ makes the signs in step (**) of (10.2.1) work out correctly, though replacing −𝑖 with any imaginary number would work just as well. See Remark 10.3.3 for a justification of our particular choice. Example 10.2.4. Let 𝐷 be the operator 𝐷(𝑓) = −𝑖𝑓′ on ℋ = 𝐿2 (𝐑) defined in Example 10.1.4, where 𝔇(𝐷) is either 𝐶𝑐1 (𝐑) or 𝒮(𝐑). For 𝑓, 𝑔 ∈ 𝔇(𝐷), we have ∞

⟨𝐷(𝑓), 𝑔⟩ = ∫ (−𝑖)𝑓′ (𝑥)𝑔(𝑥) 𝑑𝑥 −∞ ∞

= (−𝑖)𝑓(𝑥)𝑔(𝑥)||

−∞

− ∫ (−𝑖)𝑓(𝑥)𝑔′ (𝑥) 𝑑𝑥 −∞

(*)

(10.2.2)

= ∫ 𝑓(𝑥)(−𝑖)𝑔′ (𝑥) 𝑑𝑥 −∞

= ⟨𝑓, 𝐷(𝑔)⟩ , where (*) is again integration by parts (in the version from Theorem 4.8.7) and the 𝑓(𝑥)𝑔(𝑥) term from step (*) cancels because lim 𝑓(𝑥) = 0 = lim 𝑔(𝑥),

𝑥→±∞

𝑥→±∞

(10.2.3)

which holds either because 𝑓 and 𝑔 have compact support or because 𝑓, 𝑔 ∈ 𝒮(𝐑). Again, in either case, 𝐷 is Hermitian. Example 10.2.5. Let 𝑋 = [𝑎, 𝑏], and consider the operator 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥) in ℋ = 𝐿2 (𝑋) defined in Example 10.1.5. Then 𝑀𝑥 is Hermitian (Problem 10.2.1). Example 10.2.6. Similarly, let 𝑀𝑥 be the operator in 𝐿2 (𝐑) with domain 𝔇(𝑀𝑥 ) = 𝒮(𝐑) (the Schwartz space) given by 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥). Then 𝑀𝑥 is well-defined and Hermitian (Problem 10.2.2). Example 10.2.7. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, and let 𝛼 be the diagonal operator defined by ∞

𝛼 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑎(𝑛)𝑐𝑛 𝑒𝑛 , 𝑛=1

(10.2.4)

𝑛=1

where 𝑎(𝑛) is a real-valued function 𝑎 ∶ 𝐍 → 𝐑 and 𝔇(𝛼) = ℋ0 as defined in (10.1.3) (Theorem 10.1.8). Then 𝛼 is Hermitian (Problem 10.2.3). Example 10.2.8. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}. The shift operator 𝜎 (Example 10.1.9) on ℋ is not Hermitian (Problem 10.2.4). To define what it means to be a positive operator, we begin with the following observation. Theorem 10.2.9. Let 𝑇 be a Hermitian operator in a Hilbert space ℋ. Then for all 𝑓 ∈ 𝔇(𝑇), ⟨𝑇(𝑓), 𝑓⟩ is a real number. Proof. Problem 10.2.5. Because of Theorem 10.2.9, the inequality in the following definition makes sense.

220

Chapter 10. Operators on Hilbert spaces

Definition 10.2.10. Let 𝑇 be an operator in a Hilbert space ℋ with domain 𝔇(𝑇). To say that 𝑇 is positive means that 𝑇 is Hermitian and for all 𝑓 ∈ 𝔇(𝑇), we have that ⟨𝑇(𝑓), 𝑓⟩ ≥ 0. As the following examples show, some of the most natural examples of positive 𝑑2 operators come from − 2 . (This also partly explains why the inequality in Defi𝑑𝑥 nition 10.2.10 is not strict, as an operator 𝑇 defined by derivatives will often have 𝑇(𝑓) = 0.) Example 10.2.11. Let ℋ = 𝐿2 (𝑆 1 ), and let Δ(𝑓) = −𝑓″ be the operator in ℋ with domain 𝐶 2 (𝑆 1 ). We have 1

⟨Δ(𝑓), 𝑔⟩ = − ∫ 𝑓″ (𝑥)𝑔(𝑥) 𝑑𝑥 0 1

1

= − 𝑓′ (𝑥)𝑔(𝑥)|| + ∫ 𝑓′ (𝑥)𝑔′ (𝑥) 𝑑𝑥 0

(*)

0 1

= −(𝑓′ (1)𝑔(1) − 𝑓′ (0)𝑔(0)) + ∫ 𝑓′ (𝑥)𝑔′ (𝑥) 𝑑𝑥

(**)

(10.2.5)

0 1

= ∫ 𝑓′ (𝑥)𝑔′ (𝑥) 𝑑𝑥 0

= ⟨𝑓′ , 𝑔′ ⟩ , where step (*) follows by integration by parts and the 𝑓′ (𝑥)𝑔(𝑥) term from step (**) cancels because 𝑓′ (0) = 𝑓′ (1) and 𝑔(0) = 𝑔(1). The same idea shows that ⟨𝑓, Δ(𝑔)⟩ = ⟨𝑓′ , 𝑔′ ⟩, and therefore, Δ is Hermitian; moreover, (10.2.5) also shows that ⟨Δ(𝑓), 𝑓⟩ = ⟨𝑓′ , 𝑓′ ⟩ ≥ 0, and therefore, Δ is positive. The operator Δ is called the Laplacian on 𝑆 1 . Example 10.2.12. Let ℋ = 𝐿2 (𝐑), and let Δ(𝑓) = −𝑓″ be the operator in ℋ with domain either 𝐶𝑐2 (𝑅), the space of twice continuously differentiable functions with compact support, or 𝒮(𝐑) (see Example 10.1.4). Again, Δ is positive (Problem 10.2.6). Example 10.2.13. Let ℋ = 𝐿2 ([𝑎, 𝑏]), and let Δ(𝑓) = −𝑓″ be the operator in ℋ with one of the following domains: 𝔇(Δ)Dir = {𝑓 ∈ 𝐶 2 ([𝑎, 𝑏]) ∣ 𝑓(𝑎) = 0 = 𝑓(𝑏)} ,

(10.2.6)

𝔇(Δ)Neu = {𝑓 ∈ 𝐶 2 ([𝑎, 𝑏]) ∣ 𝑓′ (𝑎) = 0 = 𝑓′ (𝑏)} .

(10.2.7)

Note that (10.2.6) is the space of smooth functions satisfying the Dirichlet boundary conditions and (10.2.7) is the space of smooth functions satisfying the Neumann boundary conditions (see Question 9.1.3). The operator Δ, with either domain, is called the Laplacian on [𝑎, 𝑏], and Δ is once again positive (Problem 10.2.7). One may also consider more abstract examples of positive operators. Example 10.2.14. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, let 𝑎(𝑛) be real valued, and let 𝛼 be the diagonal operator defined by (10.2.4) with domain 𝔇(𝛼) = ℋ0 as defined in (10.1.3) (Theorem 10.1.8). Then 𝛼 is positive if and only if 𝑎(𝑛) ≥ 0 for all 𝑛 ∈ 𝐍 (Problem 10.2.8).

10.2. Hermitian and positive operators

221

Recall from Definition 10.1.13 that under suitable circumstances, for 𝑎, 𝑏 ∈ 𝐂, we may combine operators 𝑆 and 𝑇 in ℋ to form new operators 𝑎𝑆 + 𝑏𝑇 and 𝑆𝑇. As one might hope, suitable combinations of Hermitian operators are Hermitian, and suitable combinations of positive operators are positive. Theorem 10.2.15. Let ℋ be a Hilbert space, and let 𝑆 and 𝑇 be Hermitian operators in ℋ. (1) If 𝑎, 𝑏 ∈ 𝐑, then 𝑎𝑆 + 𝑏𝑇, with domain 𝔇(𝑆) ∩ 𝔇(𝑇), is a Hermitian operator in ℋ. (2) If 𝔇(𝑆) = 𝔇(𝑇) = ℋ0 , 𝑆(ℋ0 ) ⊆ ℋ0 , 𝑇(ℋ0 ) ⊆ ℋ0 , and 𝑆𝑇 = 𝑇𝑆, then 𝑆𝑇 is a Hermitian operator in ℋ with domain ℋ0 . Proof. Problem 10.2.9. Theorem 10.2.16. Let ℋ be a Hilbert space, and let 𝑆 and 𝑇 be positive operators in ℋ. If 𝑎, 𝑏 ∈ 𝐑 and 𝑎, 𝑏 ≥ 0, then 𝑎𝑆 + 𝑏𝑇, with domain 𝔇(𝑆) ∩ 𝔇(𝑇), is a positive operator in ℋ. Proof. Problem 10.2.10. Remark 10.2.17. The question of when the product 𝑆𝑇 of positive operators 𝑆 and 𝑇 is positive is much more involved and can be analyzed using the so-called square root of a positive operator; see, for example, Reed and Simon [RS80, VI.4].

Problems. 10.2.1. Let 𝑋 = 𝑆 1 or [𝑎, 𝑏], and let 𝑀𝑥 be the operator 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥) in ℋ = 𝐿2 (𝑋) defined in Example 10.1.5. Prove that 𝑀𝑥 is Hermitian. 10.2.2. Let 𝑀𝑥 be the operator 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥) in ℋ = 𝐿2 (𝑋) with domain 𝔇(𝑀𝑥 ) = 𝒮(𝐑) (the Schwartz space), as defined in Example 10.2.6. (a) Prove that if 𝑓 ∈ 𝒮(𝐑), then 𝑥𝑓(𝑥) ∈ 𝒮(𝐑) ⊂ 𝐿2 (𝐑). (b) Prove that 𝑀𝑥 is Hermitian. 10.2.3. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, and let ∞

𝛼 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑎(𝑛)𝑐𝑛 𝑒𝑛 𝑛=1

(10.2.8)

𝑛=1

be the operator from Theorem 10.1.8, where 𝑎(𝑛) is a complex-valued function on 𝐍. Prove that 𝛼 is Hermitian if and only if 𝑎(𝑛) ∈ 𝐑 for all 𝑛 ∈ 𝐍. 10.2.4. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}. Prove that the shift operator ∞

𝜎 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑐𝑛 𝑒𝑛+1 𝑛=1

(10.2.9)

𝑛=1

on ℋ is not Hermitian. 10.2.5. (Proves Theorem 10.2.9) Prove that if 𝑇 is a Hermitian linear operator in a Hilbert space ℋ, then for all 𝑓 ∈ 𝔇(𝑇), ⟨𝑇(𝑓), 𝑓⟩ is a real number. 10.2.6. Let ℋ = 𝐿2 (𝐑), and let Δ(𝑓) = −𝑓″ be the operator in ℋ with domain either 𝐶𝑐2 (𝑅) or 𝒮(𝐑). Prove that Δ is Hermitian and positive.

222

Chapter 10. Operators on Hilbert spaces

10.2.7. Let ℋ = 𝐿2 ([𝑎, 𝑏]), and let Δ(𝑓) = −𝑓″ . (a) Prove that Δ, with the domain 𝔇(Δ)Dir = {𝑓 ∈ 𝐶 2 ([𝑎, 𝑏]) ∣ 𝑓(𝑎) = 0 = 𝑓(𝑏)} ,

(10.2.10)

is Hermitian and positive. (b) Prove that Δ, with the domain 𝔇(Δ)Neu = {𝑓 ∈ 𝐶 2 ([𝑎, 𝑏]) ∣ 𝑓′ (𝑎) = 0 = 𝑓′ (𝑏)} ,

(10.2.11)

is Hermitian and positive. 10.2.8. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, and let 𝛼 be the diagonal operator from Problem 10.2.3, where we now assume that 𝑎(𝑛) is a real-valued function on 𝐍. Prove that 𝛼 is positive if and only if 𝑎(𝑛) ≥ 0 for all 𝑛 ∈ 𝐍. 10.2.9. (Proves Theorem 10.2.15) Let ℋ be a Hilbert space, and let 𝑆 and 𝑇 be Hermitian operators in ℋ. (a) Suppose 𝑎, 𝑏 ∈ 𝐑. Prove that 𝑎𝑆+𝑏𝑇, with domain 𝔇(𝑆)∩𝔇(𝑇), is a Hermitian operator in ℋ. (b) Suppose 𝔇(𝑆) = 𝔇(𝑇), 𝑆(𝔇(𝑆)) ⊆ 𝔇(𝑆), 𝑇(𝔇(𝑇)) ⊆ 𝔇(𝑇), and 𝑆𝑇 = 𝑇𝑆. Prove that 𝑆𝑇, with domain 𝔇(𝑆) = 𝔇(𝑇), is a Hermitian operator in ℋ. 10.2.10. (Proves Theorem 10.2.16) Let ℋ be a Hilbert space, let 𝑆 and 𝑇 be positive operators in ℋ, and suppose 𝑎, 𝑏 ∈ 𝐑 and 𝑎, 𝑏 ≥ 0. Prove that 𝑎𝑆 + 𝑏𝑇, with domain 𝔇(𝑆) ∩ 𝔇(𝑇), is a positive operator in ℋ.

10.3 Eigenvectors and eigenvalues Continuing our Hilbert space generalization of Section 9.3, we next generalize Definition 9.3.2 in a relatively straightforward manner. Definition 10.3.1. Let 𝑇 be a linear operator in a Hilbert space ℋ. An eigenvector of 𝑇 is defined to be some 𝑓 ≠ 0 in 𝔇(𝑇) such that 𝑇(𝑓) = 𝜆𝑓 for some 𝜆 ∈ 𝐂. To say that 𝜆 ∈ 𝐂 is an eigenvalue of 𝑇 means that 𝑇(𝑓) = 𝜆𝑓 for some eigenvector 𝑓 of 𝑇. Jointly, if 𝑓 ≠ 0 in 𝔇(𝑇) and 𝑇(𝑓) = 𝜆𝑓, we say that 𝑓 is an eigenvector of 𝑇 with eigenvalue 𝜆. Since our Hilbert spaces are all function spaces, we also call eigenvectors of 𝑇 eigenfunctions of 𝑇. As befits a new idea, we immediately apply Definition 10.3.1 to our favorite examples. Example 10.3.2. Consider the operator 𝐷(𝑓) = −𝑖𝑓′ in 𝐿2 (𝑆1 ), with domain 𝔇(𝐷) = 𝐶 ∞ (𝑆 1 ) (Example 10.1.3). Then for any 𝑛 ∈ 𝐙, 𝑒𝑛 = 𝑒2𝜋𝑖𝑛𝑥 is an eigenfunction of 𝐷 with eigenvalue 2𝜋𝑛, since 𝑒𝑛 ∈ 𝐶 ∞ (𝑆1 ) and 𝑑 (−𝑖) (𝑒𝑛 ) = (−𝑖)(2𝜋𝑖𝑛)𝑒𝑛 = (2𝜋𝑛)𝑒𝑛 . (10.3.1) 𝑑𝑥 Remark 10.3.3. Note that the choice of the factor −𝑖 in 𝐷(𝑓) = −𝑖𝑓′ , as opposed to some other imaginary factor, makes 2𝜋𝑛 the 𝑛th eigenvalue of 𝐷.

10.3. Eigenvectors and eigenvalues

223

Example 10.3.4. Consider the operator

𝑑 𝑑 (𝑓) = 𝑓′ in 𝐿2 (𝐑), with domain 𝔇( ) = 𝑑𝑥 𝑑𝑥

𝑑 has no eigenvectors or eigenvalues, because 𝑑𝑥 1 ′ if 𝜆 ∈ 𝐂 and 𝑓 ∈ 𝐶 (𝐑) such that 𝑓 (𝑥) = 𝜆𝑓(𝑥) for all 𝑥 ∈ 𝐑, we must have 𝑓(𝑥) = 𝑎𝑒𝜆𝑥 for some 𝑎 ∈ 𝐂 (Theorem 4.6.3); and 𝑒𝜆𝑥 is not in 𝐿2 (𝐑), since 𝐶𝑐1 (𝐑) or 𝒮(𝐑) (Example 10.1.3). Then

2

|𝑒𝜆𝑥 | = 𝑒𝜆𝑥 𝑒𝜆𝑥 = 𝑒𝑏𝑥 ,

(10.3.2)

where 𝑏 = 𝜆 + 𝜆 ∈ 𝐑 and ∫ 𝑒𝑏𝑥 𝑑𝑥 diverges (even if 𝑏 = 0). −∞

Example 10.3.5. In contrast, consider 𝐷(𝑓) = −𝑖𝑓′ in 𝐿2 ([𝑎, 𝑏]), with domain 𝔇(𝐷) = 𝐶 1 ([𝑎, 𝑏]). Then for any 𝜆 ∈ 𝐂, 𝑒𝑖𝜆𝑥 is an eigenvector of 𝐷 with eigenvalue 𝜆, since 𝑑 𝐷(𝑒𝑖𝜆𝑥 ) = (−𝑖) (𝑒𝑖𝜆𝑥 ) = (−𝑖)(𝑖𝜆)𝑒𝑖𝜆𝑥 = 𝜆𝑒𝑖𝜆𝑥 . (10.3.3) 𝑑𝑥 1

Example 10.3.6. Consider the operator Δ(𝑓) = −𝑓″ in 𝐿2 ([0, ]) (Example 10.2.13). 2

1

1

2

2

If we use the domain 𝔇(Δ)Dir = {𝑓 ∈ 𝐶 2 ([0, ]) ∣ 𝑓(0) = 0 = 𝑓( )}, then for every 𝑛 ∈ 𝐍, 4𝜋2 𝑛2 is an eigenvalue of Δ (Problem 10.3.1).

1

1

2

2

If we use the domain 𝔇(Δ)Neu = {𝑓 ∈ 𝐶 2 ([0, ]) ∣ 𝑓′ (0) = 0 = 𝑓′ ( )}, then for every integer 𝑛 ≥ 0, 4𝜋2 𝑛2 is an eigenvalue of Δ (Problem 10.3.1).

Note also that even though 𝑒𝑛 is an eigenvector of Δ if we use the domain 𝐶 ∞ (𝑆 1 ), 𝑒𝑛 is not an eigenvector of Δ using either the domain 𝔇(Δ)Dir or the domain 𝔇(Δ)Neu , as 𝑒𝑛 satisfies neither the Dirichlet nor the Neumann boundary conditions. Next, consider the operator 𝑀𝑥 (𝑓) = 𝑥𝑓(𝑥) in the Hilbert space ℋ = 𝐿2 (𝐑) with domain 𝔇(𝑀𝑥 ) = 𝐶𝑐0 (𝐑), the space of all continuous functions with compact support, as described in Example 10.1.6. Even though 𝑀𝑥 is Hermitian (Example 10.2.5), we have that: Theorem 10.3.7. The operator 𝑀𝑥 has no eigenvalues. Proof. Suppose 𝜆 ∈ 𝐂 and 𝑀𝑥 (𝑓) = 𝑥𝑓 = 𝜆𝑓 for some 𝑓 ∈ 𝐶𝑐0 (𝐑). Now, a priori, the equation 𝑥𝑓(𝑥) = 𝜆𝑓(𝑥) in 𝐿2 (𝐑) only holds a.e. in 𝐑, but since both 𝑥𝑓(𝑥) and 𝜆𝑓(𝑥) are continuous functions on 𝐑, by Corollary 7.4.9, we have that 𝑥𝑓(𝑥) = 𝜆𝑓(𝑥) for all 𝑥 ∈ 𝐑, which means that (𝑥 − 𝜆)𝑓(𝑥) = 0 for all 𝑥 ∈ 𝐑. It follows that 𝑓(𝑥) = 0 unless 𝑥 = 𝜆, and since a single value of 𝑥 ∈ 𝐑 is a set of measure zero, we see that 𝑓(𝑥) = 0 in ℋ = 𝐿2 (𝐑). Note that Example 10.3.4 and Theorem 10.3.7 are in marked contrast with the finite-dimensional theory, in which every operator has at least one eigenvalue. (See, for example, Messer [Mes97, Ch. 8].) Example 10.3.8. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, let ℋ0 be as defined in (10.1.3), let 𝑎(𝑛) be any function 𝑎 ∶ 𝐍 → 𝐂, and let 𝛼 be the operator ∞

𝛼 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑎(𝑛)𝑐𝑛 𝑒𝑛 , 𝑛=1

𝑛=1

(10.3.4)

224

Chapter 10. Operators on Hilbert spaces

with domain 𝔇(𝛼) = ℋ0 from Theorem 10.1.8. Then each 𝑒𝑛 is an eigenvector with eigenvalue 𝑎(𝑛) (Problem 10.3.2). Example 10.3.9. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}. The shift operator 𝜎 (Example 10.1.9) on ℋ has no eigenvectors or eigenvalues (Problem 10.3.3). We conclude this section with several useful properties of eigenvectors and eigenvalues (Theorems 10.3.10–10.3.13). Theorem 10.3.10. Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, and let 𝜆 ∈ 𝐂 be an eigenvalue of 𝑇. Then 𝜆 must actually be real; in other words, a Hermitian operator has only real eigenvalues. Proof. Problem 10.3.4. Theorem 10.3.11. Let 𝑇 be a positive operator in a Hilbert space ℋ, and let 𝜆 ∈ 𝐂 be an eigenvalue of 𝑇. Then 𝜆 ≥ 0; in other words, a positive operator has only nonnegative eigenvalues. Proof. Problem 10.3.5. Theorem 10.3.12. Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, and let {𝑢1 , … , 𝑢𝑛 } be a set of eigenvectors of 𝑇 with distinct eigenvalues 𝜆1 , … , 𝜆𝑛 (i.e., for 𝑖 ≠ 𝑗, 𝜆𝑖 ≠ 𝜆𝑗 ). Then {𝑢1 , … , 𝑢𝑛 } is an orthogonal set. Proof. Problem 10.3.6. Theorem 10.3.13. Let 𝑇 be a linear operator (not necessarily Hermitian) in a Hilbert space ℋ, and let {𝑢1 , … , 𝑢𝑛 } be a set of eigenvectors of 𝑇 with distinct eigenvalues 𝜆1 , … , 𝜆𝑛 (i.e., for 𝑖 ≠ 𝑗, 𝜆𝑖 ≠ 𝜆𝑗 ). Then {𝑢1 , … , 𝑢𝑛 } is linearly independent. Proof. Problem 10.3.7. The reader should compare Theorems 10.3.12 and 10.3.13 with our previously proven fact that any orthogonal set of nonzero vectors is linearly independent (Problem 7.3.2) and work out why none of these results are redundant.

Problems. 1

10.3.1. Consider the operator Δ(𝑓) = −𝑓″ in 𝐿2 ([0, ]) (Example 10.2.13). 2

1

1

2

2

(a) Suppose we use 𝔇(Δ)Dir = {𝑓 ∈ 𝐶 2 ([0, ]) ∣ 𝑓(0) = 0 = 𝑓( )}. Prove that for every 𝑛 ∈ 𝐍, 4𝜋2 𝑛2 is an eigenvalue of Δ. 1

1

2

2

(b) Suppose we use 𝔇(Δ)Neu = {𝑓 ∈ 𝐶 2 ([0, ]) ∣ 𝑓′ (0) = 0 = 𝑓′ ( )}. Prove that for every integer 𝑛 ≥ 0, 4𝜋2 𝑛2 is an eigenvalue of Δ.

10.3.2. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, let 𝑎(𝑛) be any function 𝑎 ∶ 𝐍 → 𝐂, and let ∞

𝛼 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑎(𝑛)𝑐𝑛 𝑒𝑛 , 𝑛=1

𝑛=1

(10.3.5)

10.4. Eigenbases

225

the operator with domain 𝔇(𝛼) = ℋ0 from Theorem 10.1.8. For 𝑛 ∈ 𝐍, prove 𝑒𝑛 is an eigenvector with eigenvalue 𝑎(𝑛). 10.3.3. Let ℋ be a Hilbert space with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍}, and let ∞

𝜎 ( ∑ 𝑐𝑛 𝑒𝑛 ) = ∑ 𝑐𝑛 𝑒𝑛+1 𝑛=1

(10.3.6)

𝑛=1

be the shift operator on ℋ. Prove that if 𝜎(𝑓) = 𝜆𝑓 for some 𝜆 ∈ 𝐂, then 𝑓 = 0. 10.3.4. (Proves Theorem 10.3.10) Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, and let 𝜆 ∈ 𝐂 be an eigenvalue of 𝑇. Prove that 𝜆 ∈ 𝐑. 10.3.5. (Proves Theorem 10.3.11) Let 𝑇 be a positive operator in a Hilbert space ℋ, and let 𝜆 ∈ 𝐑 be an eigenvalue of 𝑇. Prove that 𝜆 ≥ 0. 10.3.6. (Proves Theorem 10.3.12) Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, and let {𝑢1 , … , 𝑢𝑛 } be a set of eigenvectors of 𝑇 with distinct eigenvalues 𝜆1 , … , 𝜆𝑛 (i.e., for 𝑖 ≠ 𝑗, 𝜆𝑖 ≠ 𝜆𝑗 ). Prove that {𝑢1 , … , 𝑢𝑛 } is an orthogonal set. 10.3.7. (Proves Theorem 10.3.13) Let 𝑇 be a linear operator (not necessarily Hermitian) in a Hilbert space ℋ, and let {𝑢1 , … , 𝑢𝑛 } be a set of eigenvectors of 𝑇 with distinct eigenvalues 𝜆1 , … , 𝜆𝑛 (i.e., for 𝑖 ≠ 𝑗, 𝜆𝑖 ≠ 𝜆𝑗 ). Prove that {𝑢1 , … , 𝑢𝑛 } is linearly independent.

10.4 Eigenbases In some sense, all of the theory in this chapter is aimed towards establishing the following concept, which generalizes Definition 9.3.3 to infinite-dimensional Hilbert spaces. Definition 10.4.1. Let 𝑇 be an operator in a Hilbert space ℋ. An eigenbasis for 𝑇 is an orthogonal basis {𝑢𝑛 } for ℋ such that every 𝑢𝑛 is an eigenvector of 𝑇. (In particular, each 𝑢𝑛 must be contained in 𝔇(𝑇).) In other words, if 𝑇 is an operator in a Hilbert space ℋ, then an eigenbasis {𝑢𝑛 } for 𝑇 has the following properties: (1) Each 𝑢𝑛 ∈ 𝔇(𝑇). (2) The set ℬ = {𝑢𝑛 } is an orthogonal set of nonzero vectors. ∞

̂ = (3) For any 𝑓 ∈ ℋ, if 𝑓(𝑛)

⟨𝑓, 𝑢𝑛 ⟩ ̂ , then 𝑓 = ∑ 𝑓(𝑛)𝑢 𝑛 , where convergence is in ⟨𝑢𝑛 , 𝑢𝑛 ⟩ 𝑛=1

the norm metric on ℋ. (4) For each 𝑢𝑛 , 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 for some 𝜆𝑛 ∈ 𝐂. We sometimes summarize the above data by saying that {𝑢𝑛 } is an eigenbasis for 𝑇 with associated eigenvalues {𝜆𝑛 }. Now, there are a few notable theorems that give sufficient conditions on an operator 𝑇 that ensure the existence of an eigenbasis for 𝑇; see Section 11.8 for a statement of one such result. However, we will not seek such results; instead, for the most part, we will be content with merely producing examples of eigenbases of operators and describing what happens when such an eigenbasis exists.

226

Chapter 10. Operators on Hilbert spaces We begin with our canonical, and most important, example(s).

Example 10.4.2. Let ℋ = 𝐿2 (𝑆 1 ), and let 𝐷(𝑓) = −𝑖𝑓′ and Δ(𝑓) = 𝐷2 (𝑓) = −𝑓″ be the operators in ℋ described previously (Examples 10.1.3 and 10.2.11). Then {𝑒𝑛 ∣ 𝑛 ∈ 𝐙}, where 𝑒𝑛 (𝑥) = 𝑒2𝜋𝑖𝑛𝑥 , is an orthonormal basis for ℋ (Theorem 8.1.1), and each 𝑒𝑛 is an eigenfunction for both 𝐷 and Δ, with eigenvalues 2𝜋𝑛 and 4𝜋2 𝑛2 , respectively. It will be helpful to keep Example 10.4.2 in mind in the subsequent discussion of the consequences of the existence of an eigenbasis. The fundamental such consequence is the following theorem. Theorem 10.4.3 (Diagonalization Theorem). Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . Then for any 𝑓 ∈ 𝔇(𝑇), we have ∞

̂ ̂ 𝑇(𝑓) = 𝑇 ( ∑ 𝑓(𝑛)𝑢 𝑛 ) = ∑ 𝜆𝑛 𝑓(𝑛)𝑢𝑛 . 𝑛=1

(10.4.1)

𝑛=1

In other words, relative to the eigenbasis {𝑢𝑛 }, 𝑇 acts like a diagonal operator. (See Theorem 10.1.8; also compare Theorem 9.3.4.) Because of that fact, we also say that the eigenbasis {𝑢𝑛 } diagonalizes 𝑇. Proof. Problem 10.4.1. Note that (10.4.1) is a straightforward consequence of linearity in the finitedimensional case; the interesting point is that 𝑇 is linear with respect to an infinite sum. Put another way, as long as we only require convergence in 𝐿2 , the operator 𝑇 can be applied term by term to a generalized Fourier series with respect to an eigenbasis for 𝑇. This is particularly notable if 𝑇 is defined in terms of derivatives; compare Examples 4.2.3 and 4.2.4. We close this chapter with a few other consequences of diagonalization. For example, we have the following result, which may seem obvious until you think about it. Theorem 10.4.4. Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . If 𝑓 ∈ 𝔇(𝑇) is an eigenvector of 𝑇 with eigenvalue 𝜆, then 𝜆 must be equal to 𝜆𝑛 for at least one 𝑛, and 𝑓 is a linear combination (possibly an infinite one) of eigenvectors 𝑢𝑛 with 𝜆𝑛 = 𝜆. Proof. Problem 10.4.2. Building on Theorem 10.4.4, we also have the following result. Theorem 10.4.5 (Simultaneous Diagonalization). Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . Suppose 𝑆 is a Hermitian operator in ℋ with 𝔇(𝑆) = 𝔇(𝑇) = ℋ0 , 𝑆(ℋ0 ) ⊆ ℋ0 , 𝑇(ℋ0 ) ⊆ ℋ0 , and 𝑆𝑇 = 𝑇𝑆; and suppose also that the eigenvalues of 𝑇 are distinct, i.e., 𝜆𝑘 ≠ 𝜆𝑛 for 𝑘 ≠ 𝑛. Then {𝑢𝑛 } is also an eigenbasis for 𝑆. Proof. Problem 10.4.3. It is possible to obtain results similar to Theorem 10.4.5 under weaker hypotheses; see Problems 10.4.4 and 10.4.5.

10.4. Eigenbases

227

Problems. 10.4.1. (Proves Theorem 10.4.3) Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . Prove that if 𝑓 ∈ 𝔇(𝑇), then ∞

̂ ̂ 𝑇(𝑓) = 𝑇 ( ∑ 𝑓(𝑛)𝑢 𝑛 ) = ∑ 𝜆𝑛 𝑓(𝑛)𝑢𝑛 . 𝑛=1

(10.4.2)

𝑛=1

10.4.2. (Proves Theorem 10.4.4) Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . Suppose 𝑓 ∈ 𝔇(𝑇) and 𝑇(𝑓) = 𝜆𝑓 (𝜆 ∈ 𝐂). ̂ = 0. (a) Prove that if 𝜆𝑛 ≠ 𝜆, then 𝑓(𝑛) (b) Prove that if 𝑓 ≠ 0 (i.e., 𝑓 is an eigenvector and 𝜆 is an eigenvalue), then 𝑁

̂ 𝑘 )𝑢𝑛 , 𝑓 = ∑ 𝑓(𝑛 𝑘

(10.4.3)

𝑘=1

where 𝑁 is either a positive integer or ∞ and {𝑛𝑘 } is the set of all 𝑛 such that 𝜆𝑛 = 𝜆. 10.4.3. (Proves Theorem 10.4.5) Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . Suppose 𝑆 is a Hermitian operator in ℋ with 𝔇(𝑆) = 𝔇(𝑇) = ℋ0 , 𝑆(ℋ0 ) ⊆ ℋ0 , 𝑇(ℋ0 ) ⊆ ℋ0 , and 𝑆𝑇 = 𝑇𝑆; and suppose also that 𝜆𝑘 ≠ 𝜆𝑛 for 𝑘 ≠ 𝑛. (a) Prove that 𝑆(𝑢𝑛 ) is an eigenvector of 𝑇 with eigenvalue 𝜆𝑛 . (b) Prove that {𝑢𝑛 } is an eigenbasis for 𝑆. 10.4.4. (*) This problem requires some background knowledge from linear algebra. For example, we will need to use the fact that if 𝑇 ∶ 𝑉 → 𝑉 is an operator on a finite-dimensional (complex) vector space 𝑉, then 𝑇 will always have at least one eigenvalue 𝜆 ∈ 𝐂 (see Messer [Mes97, Ch. 8]). Let 𝑇 ∶ 𝑉 → 𝑉 be a Hermitian operator on an 𝑛-dimensional vector space 𝑉. Prove by induction on 𝑛 that 𝑇 is diagonalizable, or in other words, that there exists an eigenbasis {𝑢1 , … , 𝑢𝑛 } for 𝑇. 10.4.5. (*) Let 𝑇 be a Hermitian operator in a Hilbert space ℋ, let {𝑢𝑛 } be an eigenbasis for 𝑇, and let 𝑇(𝑢𝑛 ) = 𝜆𝑛 𝑢𝑛 . Suppose also that 𝑆 is a Hermitian operator in ℋ with 𝔇(𝑆) = 𝔇(𝑇) = ℋ0 , 𝑆(ℋ0 ) ⊆ ℋ0 , 𝑇(ℋ0 ) ⊆ ℋ0 , and 𝑆𝑇 = 𝑇𝑆. Finally, in contrast with Problem 10.4.3, instead of assuming that the 𝜆𝑛 are distinct, assume that any particular eigenvalue appears only finitely many times in the sequence 𝜆𝑛 . Prove that there exists an orthogonal basis {𝑣𝑛 }, not necessarily the same as {𝑢𝑛 }, that is an eigenbasis for both 𝑆 and 𝑇, simultaneously.

11 Eigenbases and differential equations The point is this: if from the outset we demand that our solutions be very regular, say 𝑘-times continuously differentiable, then we are usually going to have a really hard time finding them, as our proofs must then necessarily include possibly intricate demonstrations that the functions we are building are in fact smooth enough. A far more reasonable strategy is to consider as separate the existence and the smoothness (or regularity) problems. The idea is to define for a given PDE a reasonably wide notion of a weak solution, with the expectation that since we are not asking too much by way of smoothness of this weak solution, it may be easier to establish its existence, uniqueness, and continuous dependence on the given data. — Lawrence C. Evans, Partial Differential Equations [Eva10] And now, a word from our sponsors. That is, as promised in Chapter 9, in this chapter, we return to one of the main reasons Fourier series were invented: solving differential equations. As described in the epigraph above, the basic idea is to use the theory of “derivatives-as-operators” from Chapter 10 to turn a PDE into an operator equation, find solutions in 𝐿2 , and then prove those solutions actually have the desired derivatives. We begin with Fourier’s original motivating example of the heat equation (Section 11.1) and then generalize our solution of the heat equation to what we call the eigenbasis method (Section 11.2). The rest of the chapter consists of applications of the eigenbasis method to the wave equation (Section 11.3), boundary value problems (Section 11.4), Legendre polynomials (Section 11.5), and quantum mechanics (Sections 11.6 and 11.7). We conclude by describing a class of differential equations whose solution spaces naturally produce corresponding eigenbases, namely, the class of Sturm-Liouville equations (Section 11.8). 229

230

Chapter 11. Eigenbases and differential equations

11.1 The heat equation on the circle Before getting to a general description of the eigenbasis method for solving PDEs, we begin with a specific example, namely, the heat equation. To review, we recall that Question 9.1.1, together with (9.1.7), yields the following precise mathematical problem. (The reader may wish to review the notation from Definitions 5.2.6 and 7.5.10.) Question 11.1.1. Given an initial value 𝑓(𝑥) ∈ 𝐿2 (𝑆1 ), find 𝑢(𝑥, 𝑡) (𝑡 > 0) such that: (1) (Differentiable) For fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥2 (𝑆1 ), and for fixed 𝑥0 ∈ 𝑆 1 , 𝑢(𝑥0 , 𝑡) ∈ 𝐶𝑡1 ((0, +∞)). (2) (Initial value) For any 𝑥 ∈ 𝑆 1 , lim+ 𝑢(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0

(3) (PDE) For all 𝑡 > 0, −

𝜕2 𝑢 𝜕𝑢 =− . 𝜕𝑡 𝜕𝑥2

(11.1.1)

Note that in (11.1.1), we have multiplied both sides of the heat equation by −1 to 𝜕2 make the operator on the left-hand side equal to the positive operator Δ = − 2 (Ex𝜕𝑥 ample 10.2.11). Note also that the initial value condition could be satisfied a fortiori by finding 𝑢(𝑥, 𝑡) that is continuous on 𝑆1 × [0, +∞) and satisfies 𝑢(𝑥, 0) = 𝑓(𝑥); however, we use the looser limit condition here to allow for the possibility that the initial value 𝑓(𝑥) is not continuous. In any case, it is now time for another visit to THE LAND OF WISHFUL THINKING We start by recalling that ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} is an eigenbasis for the Hermitian operator Δ (Example 10.4.2). By the definition of basis, since 𝑓 ∈ 𝐿2 (𝑆 1 ), 𝑓(𝑥) = 1 2 2 2 1 ̂ ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥) in 𝐿 . Similarly, since we want 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥 (𝑆 ) ⊆ 𝐿𝑥 (𝑆 ) for fixed 𝑛∈𝐙

𝑡0 > 0, if we let 𝜓𝑛 (𝑡) be the 𝑛th Fourier coefficient of 𝑢(𝑥, 𝑡) at time 𝑡, then we must have 𝑢(𝑥, 𝑡) = ∑ 𝜓𝑛 (𝑡)𝑒𝑛 (𝑥) (11.1.2) 𝑛∈𝐙

for all 𝑡 > 0. The idea of looking for solutions of the form (11.1.2) is called separation of variables. Because {𝑒𝑛 } is an eigenbasis, we know that Δ diagonalizes the expression in (11.1.2) 𝜕 (Theorem 10.4.3). We similarly hope that (11.1.2) is also diagonalized by − (unjus𝜕𝑡 tified leap #1). In that case, since Δ(𝑒𝑛 ) = 4𝜋2 𝑛2 𝑒𝑛 , we get that Δ(𝑢)(𝑥, 𝑡) = ∑ 4𝜋2 𝑛2 𝑒𝑛 (𝑥)𝜓𝑛 (𝑡),

(11.1.3)

𝑛∈𝐙

𝜕𝑢 (𝑥, 𝑡) = ∑ (−1)𝑒𝑛 (𝑥)𝜓𝑛′ (𝑡). 𝜕𝑡 𝑛∈𝐙

(11.1.4)

Comparing (11.1.3) and (11.1.4), we see that it suffices to find functions 𝜓𝑛 (𝑡) continuous at 0 such that ̂ −𝜓𝑛′ (𝑡) = 4𝜋2 𝑛2 𝜓𝑛 (𝑡) and 𝜓𝑛 (0) = 𝑓(𝑛), (11.1.5)

11.1. The heat equation on the circle

231

where the latter condition comes from the further assumption that lim+ and ∑ com𝑡→0

𝑛∈𝐙

−4𝜋2 𝑛2 𝑡 ̂ , mute (unjustified leap #2). By Theorem 4.6.3, we must then have 𝜓𝑛 (𝑡) = 𝑓(𝑛)𝑒 which means that, at least in the Land of Wishful Thinking, we have the solution −4𝜋2 𝑛2 𝑡 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥)𝑒

(11.1.6)

𝑛∈𝐙

to (11.1.1), satisfying the initial condition lim+ 𝑢(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0

EXITING THE LAND OF WISHFUL THINKING Outside of the Land of Wishful Thinking, we may regard (11.1.6) as an educated guess of a solution 𝑢(𝑥, 𝑡) of the heat equation. To prove that (11.1.6) is an actual solution of the heat equation on 𝑆 1 , we need to justify not only the above formal manipulations, but also the convergence and differentiability of (11.1.6). We may also ask: Is (11.1.6) the only solution to the heat equation on 𝑆 1 ? Remark 11.1.2. To give an overview of what follows, it turns out that much of what we have to do to confirm our guessed solution 𝑢(𝑥, 𝑡) boils down to proving that ∑ 𝑛∈𝐙

commutes with either lim+ or differentiation. Looking back, we have developed two 𝑡→0

main methods to solve such a problem: (1) We can use the 𝑀-test (Theorem 4.3.7) to prove that the relevant function series converge uniformly, justifying operations like term-by-term differentation (Theorem 4.3.16). (2) By the Isomorphism Theorem for Fourier Series (Theorem 7.6.8), we can show that (for example) 𝑢(𝑥, 𝑡) converges to 𝑓(𝑥) in the 𝐿2 norm as 𝑡 → 0+ by showing that the Fourier coefficients of 𝑢(𝑥, 𝑡), taken in the variable 𝑥, converge appropriately to the Fourier coefficients of 𝑓(𝑥). As is our usual practice, we will do only one example of each method in detail and leave the others as problems. Rest assured, however, that much of the difficulty in this section, and later, in Section 11.3, can be handled by these two methods. Regularity. First, convergence to a genuine differentiable solution is relatively straightforward and, in fact, works unexpectedly well. Theorem 11.1.3. Consider the heat equation with initial values 𝑢(𝑥, 0) = 𝑓(𝑥) ∈ 𝐿2 (𝑆 1 ), and suppose 𝑢(𝑥, 𝑡) is given by (11.1.6). (1) For fixed 𝑡0 > 0, the function 𝑢(𝑥, 𝑡0 ) given by (11.1.6) is in 𝐶𝑥∞ (𝑆 1 ), and (11.1.3) holds. (2) For fixed 𝑥0 ∈ 𝑆 1 and variable 𝑡 > 0, the function 𝑢(𝑥0 , 𝑡) is in 𝐶𝑡∞ ((0, +∞)), and (11.1.4) holds. Proof. Let −4𝜋2 𝑛2 𝑡 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 . 𝑛 (𝑥)𝑒 𝑛∈𝐙

(11.1.7)

232

Chapter 11. Eigenbases and differential equations

Fixing 𝑡0 > 0 and thinking of 𝑢(𝑥, 𝑡0 ) as a function in 𝐿2𝑥 (𝑆1 ), we see that the −4𝜋2 𝑛2 𝑡0 ̂ . Since 𝑓 ∈ 𝐿2 (𝑆1 ), the Riemann𝑛th Fourier coefficient of 𝑢(𝑥, 𝑡0 ) is 𝑓(𝑛)𝑒 ̂ = 0; in particular, there Lebesgue Lemma (Corollary 7.6.6) tells us that lim 𝑓(𝑛) 𝑛→±∞

̂ || ≤ 𝐾 for all 𝑛 ∈ 𝐍. Therefore, if we let 𝑎 = exists a constant 𝐾 > 0 such that ||𝑓(𝑛) −4𝜋2 𝑡0 𝑒 < 1, we see that 1 −4𝜋2 𝑛2 𝑡0 | −𝑛2 ̂ ||𝑓(𝑛)𝑒 ≪ 𝑘 (11.1.8) | ≤ 𝐾𝑎 |𝑛| for every 𝑘 ∈ 𝐍, where the asymptotic estimate ≪ follows from Theorem 3.6.12. Corollary 8.5.2 then implies that 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥∞ (𝑆 1 ). Alternatively, instead of quoting Corollary 8.5.2, we can imitate its proof and apply method (1) of Remark 11.1.2. Again fixing 𝑡0 > 0 and writing out 𝑒𝑛 (𝑥) to make termby-term differentiation easier to see, for 𝑘 ∈ 𝐍, let 𝑘 2𝜋𝑖𝑛𝑥 −4𝜋2 𝑛2 𝑡0 ̂ 𝑔𝑘 (𝑥) = ∑ 𝑓(𝑛)(2𝜋𝑖𝑛) 𝑒 𝑒 .

(11.1.9)

𝑛∈𝐙

̂ || ≤ 𝐾 and taking 𝑎 = 𝑒−4𝜋2 𝑡0 < 1, we see that Again using the bound ||𝑓(𝑛) 2 𝑘 2𝜋𝑖𝑛𝑥 −4𝜋2 𝑛2 𝑡0 | 𝑘 𝑘 −𝑛2 ̂ ||𝑓(𝑛)(2𝜋𝑖𝑛) 𝑒 𝑒 ≤ 𝐶𝑛𝑘 𝑎−𝑛 | ≤ 𝐾(2𝜋) 𝑛 𝑎

(11.1.10)

−𝑛2

for some constant 𝐶 > 0. Therefore, if we take 𝑀𝑛 = 𝐶𝑛𝑘 𝑎 , since ∑ 𝑀𝑛 converges absolutely (Theorem 4.1.13), the series 𝑔𝑘 (𝑥) converges absolutely and uniformly. The𝑘 𝑑 orem 4.3.16 and a straightforward induction then show that 𝑔𝑘 (𝑥) = ( ) 𝑢(𝑥, 𝑡0 ) for 𝑑𝑥 every 𝑘 ∈ 𝐍, and therefore, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥∞ (𝑆 1 ). For the smoothness of 𝑢(𝑥0 , 𝑡) in the variable 𝑡, see Problem 11.1.1. The remarkable thing about Theorem 11.1.3 is that even when the initial values 𝑓(𝑥) are quite discontinuous, as functions in 𝐿2 (𝑆1 ) can be, the dampening factors 2 2 𝑒−4𝜋 𝑛 𝑡 force the corresponding solution 𝑢(𝑥, 𝑡) to be smooth for 𝑡 > 0. This smoothing effect is characteristic of what are called parabolic PDEs, of which the heat equation is perhaps the most fundamental example. For more on parabolic PDEs, see Friedman [Fri08]. Initial values. If we only require convergence in 𝐿2 , then the initial value condition of Question 11.1.1 always holds. More precisely: Theorem 11.1.4. Suppose 𝑓(𝑥) ∈ 𝐿2 (𝑆1 ) and 𝑢(𝑥, 𝑡) is defined by (11.1.6). Then lim ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖ = 0,

𝑡→0+

(11.1.11)

where convergence is in 𝐿2𝑥 (𝑆 1 ). Note that since we are trying to prove convergence in 𝐿2𝑥 (𝑆 1 ), we can apply method (2) of Remark 11.1.2. As we shall see, the 𝑀-test also plays a role here. Proof. We first observe that −4𝜋2 𝑛2 𝑡 ̂ ̂ 𝑢(𝑥, 𝑡) − 𝑓(𝑥) = ∑ 𝑓(𝑛)𝑒 − ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥)𝑒 𝑛 (𝑥) 𝑛∈𝐙

= ∑ (𝑒 𝑛∈𝐙

𝑛∈𝐙 −4𝜋2 𝑛2 𝑡

̂ − 1)𝑓(𝑛)𝑒 𝑛 (𝑥).

(11.1.12)

11.1. The heat equation on the circle

233 2

Therefore, if we define 𝑔(𝑡) = ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖ , by the Isomorphism Theorem for Fourier Series (Theorem 7.6.8), we have that 2

2

2 2 ̂ || . 𝑔(𝑡) = ∑ ||1 − 𝑒−4𝜋 𝑛 𝑡 || ||𝑓(𝑛)

(11.1.13)

𝑛∈𝐙

For 𝑡 ∈ [0, +∞), we have that 0 < 𝑒−4𝜋

2 𝑛2 𝑡

2

≤ 1, so 2

2

̂ || ≤ ||𝑓(𝑛) ̂ || . ||1 − 𝑒−4𝜋2 𝑛2 𝑡 || ||𝑓(𝑛)

(11.1.14)

2

̂ || . Since 𝑓 ∈ 𝐿2 (𝑆1 ), by the Isomorphism Theorem, we see that Let 𝑀𝑛 = ||𝑓(𝑛) 2

̂ || = ‖𝑓‖2 ∑ 𝑀𝑛 = ∑ ||𝑓(𝑛) 𝑛∈𝐙

(11.1.15)

𝑛∈𝐙

converges. Therefore, by the 𝑀-test, the series 𝑔(𝑡) converges uniformly to a continuous function on [0, +∞), and by continuity, we see that 2

2

2 ̂ || = 0. lim ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖ = 𝑔(0) = ∑ |1 − 𝑒0 | ||𝑓(𝑛)

𝑡→0+

(11.1.16)

𝑛∈𝐙

The theorem follows. If we make additional assumptions about the smoothness of 𝑓, we can actually show that 𝑢 converges uniformly to 𝑓 as 𝑡 → 0, in the following sense. Theorem 11.1.5. Consider Question 11.1.1, with the additional hypothesis that 𝑓 ∈ 𝐶 1 (𝑆1 ). Then 𝑢(𝑥, 𝑡) converges to 𝑓(𝑥) uniformly on 𝑆 1 , or more precisely, lim ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖∞ = 0,

𝑡→0+

(11.1.17)

where convergence is in the 𝐿∞ norm (Example 7.2.3). Proof. Since 𝜂(𝑡) = ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖∞ is a nonnegative function of 𝑡, by the Squeeze Lemma 3.1.23, it suffices to show that 𝜂(𝑡) is bounded above by a nonnegative continuous function ℎ ∶ [0, +∞) → 𝐑 such that ℎ(0) = 0. This is done in Problem 11.1.2. We will later be able to relax the hypothesis of Theorem 11.1.5 to 𝑓 ∈ 𝐶 0 (𝑆1 ); see Section 13.6. Uniqueness. We can use a more careful version of our “wishful thinking” argument to show that (11.1.6) is the only solution to the heat equation on the circle. For simplicity, we assume continuous initial values 𝑓(𝑥) and replace the limit condition of Question 11.1.1 with continuity of 𝑢(𝑥, 𝑡) at 𝑡 = 0. Theorem 11.1.6. Suppose 𝑓 ∈ 𝐶 0 (𝑆 1 ) and 𝑢 ∶ 𝑆 1 × [0, +∞) → 𝐂 is such that: (1)

𝜕𝑢 𝜕2 𝑢 and exist and are continuous on 𝑆 1 × (0, +∞). 2 𝜕𝑡 𝜕𝑥

(2) 𝑢(𝑥, 𝑡) is continuous (including at 𝑡 = 0) and 𝑢(𝑥, 0) = 𝑓(𝑥) for any 𝑥 ∈ 𝑆 1 . (3) For all 𝑡 > 0, −

𝜕2 𝑢 𝜕𝑢 =− . 2 𝜕𝑡 𝜕𝑥

(11.1.18)

234

Chapter 11. Eigenbases and differential equations

Then −4𝜋2 𝑛2 𝑡 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 . 𝑛 (𝑥)𝑒

(11.1.19)

𝑛∈𝐙

Proof. Since, for fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥2 (𝑆1 ), by Theorem 8.1.2, 𝑢(𝑥, 𝑡0 ) converges uniformly to its Fourier series (in 𝑥). Therefore, if we let 𝜓𝑛 (𝑡) be the 𝑛th Fourier coefficient (in 𝑥) of 𝑢(𝑥, 𝑡), we see that for all 𝑡 > 0, 𝑢(𝑥, 𝑡) = ∑ 𝜓𝑛 (𝑡)𝑒𝑛 (𝑥).

(11.1.20)

𝑛∈𝐙

Problem 11.1.3 shows that for 𝑡 > 0, 𝜓𝑛′ (𝑡) = −4𝜋2 𝑛2 𝜓𝑛 (𝑡).

(11.1.21)

Since Lemma 3.6.20 implies that 1

𝜓𝑛 (𝑡) = ∫ 𝑢(𝑥, 𝑡) 𝑒𝑛 (𝑥) 𝑑𝑥

(11.1.22)

0 −4𝜋2 𝑛2 𝑡 ̂ is continuous for 𝑡 ∈ [0, +∞), Theorem 4.6.3 then shows that 𝜓𝑛 (𝑡) = 𝑓(𝑛)𝑒 . The theorem follows.

Remark 11.1.7. The reader may find it interesting to know that the smoothing effect of Theorem 11.1.3 is not only mathematical, but also confirmed by physical experiment. For example, historically, the smoothing effect of heat transfer actually created difficulties when trying to transmit signals by wires across the Atlantic Ocean, as an initial discontinuous “pulse” will smooth out in space, slowing the possible transmission rate. See Körner [Kör89, Sects. 65–66] for an enlightening (and entertaining) analysis of this phenomenon.

Problems. Note that Theorems 3.6.12 and 4.1.13 and the Riemann-Lebesgue Lemma (Corollary 7.6.6) may be useful for several of the following problems. 11.1.1. (Proves Theorem 11.1.3) Suppose 𝑓 ∈ 𝐿2 (𝑆 1 ). Prove that for fixed 𝑥0 ∈ 𝑆 1 and variable 𝑡 > 0, the function −4𝜋2 𝑛2 𝑡 ̂ 𝑢(𝑥0 , 𝑡) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥0 )𝑒

(11.1.23)

𝑛∈𝐙

is infinitely differentiable in the variable 𝑡 and term-by-term differentiation −

𝜕𝑢 −4𝜋2 𝑛2 𝑡 ̂ (𝑥 , 𝑡) = ∑ (4𝜋2 𝑛2 )𝑓(𝑛)𝑒 𝑛 (𝑥0 )𝑒 𝜕𝑡 0 𝑛∈𝐙

(11.1.24)

is valid. 11.1.2. (Proves Theorem 11.1.5) Consider Question 11.1.1, with the additional hypothesis that 𝑓 ∈ 𝐶 1 (𝑆 1 ). Define 𝜂(𝑡) = ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖∞ , 2 2 ̂ || . ℎ(𝑡) = ∑ ||1 − 𝑒−4𝜋 𝑛 𝑡 || ||𝑓(𝑛)

(11.1.25)

𝑛∈𝐙

(a) Prove that ℎ(𝑡) converges uniformly on [0, +∞) to a continuous function. (b) Prove that 𝜂(𝑡) ≤ ℎ(𝑡) for 𝑡 ∈ [0, +∞).

11.2. The eigenbasis method

235

𝜕2 𝑢 11.1.3. (Proves Theorem 11.1.6) Suppose that 𝑢 ∶ 𝑆 1 × (0, +∞) → 𝐂 is such that 2 = 𝜕𝑥 𝜕𝑢 is continuous (and therefore, so is 𝑢). Let 𝜓𝑛 (𝑡) be the 𝑛th Fourier coefficient 𝜕𝑡 (in 𝑥) of 𝑢(𝑥, 𝑡). Prove that 𝜓𝑛′ (𝑡) = −4𝜋2 𝑛2 𝜓𝑛 (𝑡).

11.2 The eigenbasis method We may generalize the heuristic (“wishful thinking”) discussion of Section 11.1 as follows. Suppose 𝐿 is an operator (e.g., some kind of derivative) in the variable 𝑥 and 𝑇 is an operator in the variable 𝑡. Suppose also that we want to find one particular solution to the partial differential equation 𝐿(𝑢) = 𝑇(𝑢) for 𝑢(𝑥, 𝑡), given boundary conditions 𝜕𝑢 in 𝑥 and initial values 𝑢(𝑥, 0), (𝑥, 0), etc. In many interesting cases, we can find such 𝜕𝑡 a solution 𝑢(𝑥, 𝑡) by separation of variables, just as we did in Section 11.1. Specifically, we look for a solution of the form 𝑢(𝑥, 𝑡) = ∑ 𝜙𝑛 (𝑥)𝜓𝑛 (𝑡) as follows: (1) The boundary conditions in 𝑥 define a domain 𝔇(𝐿) for 𝐿 as an operator in some Hilbert space ℋ. We first prove that 𝐿 is Hermitian, which often relies on the fact that functions in 𝔇(𝐿) satisfy those boundary conditions (see Section 10.2). (2) The critical (and most difficult) step is to find an eigenbasis for 𝐿 (Definition 10.4.1). To review, this means that we want to find an orthogonal basis {𝜙𝑛 (𝑥)} for ℋ such that, for each 𝑛, (a) 𝜙𝑛 ∈ 𝔇(𝐿) and (b) 𝐿(𝜙𝑛 ) = 𝜆𝑛 𝜙𝑛 for some 𝜆𝑛 ∈ 𝐑 (since 𝐿 is Hermitian). Note that again, satisfying the condition 𝜙𝑛 ∈ 𝔇(𝐿) often comes down to satisfying the boundary conditions in the definition of 𝔇(𝐿). (3) Next, for each 𝑛, we solve the ordinary differential equation 𝑇(𝜓) = 𝜆𝑛 𝜓. For 𝑑 example, if 𝑇 = 𝑎 for some 𝑎 ∈ 𝐂, 𝑎 ≠ 0, then we get a solution space spanned 𝑑𝑡 𝑑2 𝜆𝑛 𝑡/𝑎 by 𝑒 (Theorem 4.6.3). If 𝑇 = − 2 , then for 𝜆𝑛 = 𝜅𝑛2 > 0 (i.e., 𝜅𝑛 = √𝜆𝑛 > 𝑑𝑡 0), we get a solution space spanned by {cos 𝜅𝑛 𝑡, sin 𝜅𝑛 𝑡} (Theorem 4.6.3); and for 𝜆𝑛 = 0, we get a solution space spanned by {1, 𝑡} (a calculus exercise). Note that {𝑒𝑖𝜅𝑛 𝑡 , 𝑒−𝑖𝜅𝑛 𝑡 } has the same span as {cos 𝜅𝑛 𝑡, sin 𝜅𝑛 𝑡}, but sines and cosines are more useful for initial conditions given at 𝑡 = 0. (4) To complete our solution, recall that by the Diagonalization Theorem 10.4.3, 𝐿 is diagonalizable with respect to the eigenbasis {𝜙𝑛 }, or in other words, 𝐿 can be applied term by term (with convergence in 𝐿2 ). We also temporarily assume that 𝑇 is diagonalizable with respect to the solutions of 𝑇(𝜓) = 𝜆𝑛 𝜓 described in the previous step, where 𝐿(𝜙𝑛 ) = 𝜆𝑛 𝜙𝑛 . We then have two cases of particular interest. (a) Suppose 𝑇(𝑢) = 𝑎

𝜕𝑢 for 𝑎 ∈ 𝐂, 𝑎 ≠ 0. Then as with the heat equation, 𝜕𝑡 ∞

𝑢(𝑥, 𝑡) = ∑ 𝐴𝑛 𝜙𝑛 (𝑥)𝑒𝜆𝑛 𝑡/𝑎 𝑛=1

(11.2.1)

236

Chapter 11. Eigenbases and differential equations is a solution to 𝐿(𝑢) = 𝑇(𝑢), assuming convergence of the right-hand side of (11.2.1). Furthermore, if we express 𝑓(𝑥) in the eigenbasis {𝜙𝑛 } as ∞

̂ 𝑓(𝑥) = ∑ 𝑓(𝑛)𝜙 𝑛,

(11.2.2)

𝑛=1

then

∞ 𝜆𝑛 𝑡/𝑎 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝜙 𝑛 (𝑥)𝑒

(11.2.3)

𝑛=1

is a solution to 𝐿(𝑢) = 𝑇(𝑢) with the initial condition 𝑢(𝑥, 0) = 𝑓(𝑥). (See Problem 11.2.1.) 𝜕2 𝑢 (b) Suppose 𝑇(𝑢) = − 2 , 𝜆0 = 0, and for 𝑛 > 0, 𝜆𝑛 = 𝜅𝑛2 > 0 (i.e., 𝜅𝑛 = √𝜆𝑛 ). 𝜕𝑡 (Note that having all 𝜆𝑛 ≥ 0 is equivalent to 𝐿 being positive.) Assume that 𝜙0 = 1. Then ∞

𝑢(𝑥, 𝑡) = 𝐴0 + 𝐵0 𝑡 + ∑ (𝐴𝑛 𝜙𝑛 (𝑥) cos 𝜅𝑛 𝑡 + 𝐵𝑛 𝜙𝑛 (𝑥) sin 𝜅𝑛 𝑡)

(11.2.4)

𝑛=1

is a solution to 𝐿(𝑢) = 𝑇(𝑢), assuming covergence of the right-hand side of (11.2.4). Furthermore, if we express 𝑓(𝑥) and 𝑔(𝑥) in the eigenbasis {𝜙𝑛 } as ∞

̂ 𝑓(𝑥) = ∑ 𝑓(𝑛)𝜙 𝑛,

𝑔(𝑥) = ∑ 𝑔(𝑛)𝜙 ̂ 𝑛,

𝑛=0

𝑛=0

(11.2.5)

then ̂ + 𝑔(0)𝑡 𝑢(𝑥, 𝑡) = 𝑓(0) ̂ ∞

̂ + ∑ (𝑓(𝑛)𝜙 𝑛 (𝑥) cos(𝜅𝑛 𝑡) + ( 𝑛=1

𝑔(𝑛) ̂ ) 𝜙𝑛 (𝑥) sin(𝜅𝑛 𝑡)) 𝜅𝑛

(11.2.6)

is a solution to 𝐿(𝑢) = 𝑇(𝑢) with the initial conditions 𝑢(𝑥, 0) = 𝑓(𝑥) and 𝜕𝑢 (𝑥, 0) = 𝑔(𝑥). (See Problem 11.2.2.) 𝜕𝑡 We define a formal solution to 𝐿(𝑢) = 𝑇(𝑢) to be a solution that is valid if we ignore questions of convergence and diagonalizability, like (11.2.1) and (11.2.4). Note that while formal solutions are useful, if we want (for example) actual suitably differentiable solutions, we must still address questions such as the convergence of the right-hand sides of (11.2.1), (11.2.3), (11.2.4), and (11.2.6), the validity of applying term-by-term operations, and the uniqueness (or lack thereof) of those solutions.

Problems. For Problems 11.2.1 and 11.2.2, let {𝜙𝑛 } be an eigenbasis for 𝐿 with associated eigenvalues {𝜆𝑛 }, and assume diagonalizability and convergence freely, as for the moment, we are only interested in formal solutions. 11.2.1. Let 𝑇(𝑢) = 𝑎

𝜕𝑢 for 𝑎 ∈ 𝐂, 𝑎 ≠ 0. 𝜕𝑡

(a) Show that ∞

𝑢(𝑥, 𝑡) = ∑ 𝐴𝑛 𝜙𝑛 (𝑥)𝑒𝜆𝑛 𝑡/𝑎 𝑛=1

is a solution to 𝐿(𝑢) = 𝑇(𝑢).

(11.2.7)

11.3. The wave equation on the circle

237

(b) Show that ∞ 𝜆𝑛 𝑡/𝑎 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝜙 𝑛 (𝑥)𝑒

(11.2.8)

𝑛=1

satisfies the initial conditions 𝑢(𝑥, 0) = 𝑓(𝑥). 11.2.2. Let 𝑇(𝑢) = −

𝜕2 𝑢 and 𝜆𝑛 = 𝜅𝑛2 ≥ 0. 𝜕𝑡2

(a) Show that ∞

𝑢(𝑥, 𝑡) = 𝐴0 + 𝐵0 𝑡 + ∑ (𝐴𝑛 𝜙𝑛 (𝑥) cos 𝜅𝑛 𝑡 + 𝐵𝑛 𝜙𝑛 (𝑥) sin 𝜅𝑛 𝑡)

(11.2.9)

𝑛=1

is a solution to 𝐿(𝑢) = 𝑇(𝑢). (b) Now assuming 𝜙0 = 1, show that ̂ + 𝑔(0)𝑡 𝑢(𝑥, 𝑡) = 𝑓(0) ̂ ∞

𝑔(𝑛) ̂ ̂ + ∑ (𝑓(𝑛)𝜙 ) 𝜙𝑛 (𝑥) sin(𝜅𝑛 𝑡)) 𝑛 (𝑥) cos(𝜅𝑛 𝑡) + ( 𝜅𝑛 𝑛=1 satisfies the initial conditions 𝑢(𝑥, 0) = 𝑓(𝑥) and

(11.2.10)

𝜕𝑢 (𝑥, 0) = 𝑔(𝑥). 𝜕𝑡

11.3 The wave equation on the circle As with the heat equation, Question 9.1.2 and (9.1.11) together yield the following precise mathematical problem. Question 11.3.1. Given initial values 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐿2 (𝑆 1 ), find 𝑢(𝑥, 𝑡) such that: (1) (Differentiable) For fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥2 (𝑆1 ), and for fixed 𝑥0 ∈ 𝑆 1 , 𝑢(𝑥0 , 𝑡) ∈ 𝐶𝑡2 ((0, +∞)). (2) (Initial value) For any 𝑥 ∈ 𝑆 1 , lim 𝑢(𝑥, 𝑡) = 𝑓(𝑥),

𝑡→0+

lim

𝑡→0+

𝜕𝑢 (𝑥, 𝑡) = 𝑔(𝑥). 𝜕𝑡

(11.3.1)

(3) (PDE) For all 𝑡 > 0, −

𝜕2 𝑢 𝜕2 𝑢 =− 2. 2 𝜕𝑥 𝜕𝑡

(11.3.2)

The reader who is (rightfully) concerned about the usefulness of a wave traveling in a circle should regard Question 11.3.1 as looking for solutions with a given period in the 𝑥 direction (here, period 1). We can apply the eigenbasis method of Section 11.2 to answer Question 11.3.1 as follows. Eigenbasis and formal solution. By Example 10.4.2, ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐙} is an ̂ eigenbasis for Δ. Furthermore, since 𝑓, 𝑔 ∈ 𝐿2 (𝑆 1 ), 𝑓(𝑥) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥) and 𝑛∈𝐙

238

Chapter 11. Eigenbases and differential equations

2 𝑔(𝑥) = ∑ 𝑔(𝑛)𝑒 ̂ 𝑛 (𝑥) in 𝐿 . So by (11.2.6), since 𝑒𝑛 is an eigenfunction for Δ with eigen𝑛∈𝐙

value 4𝜋2 𝑛2 , ̂ + 𝑔(0)𝑡 ̂ 𝑢(𝑥, 𝑡) = 𝑓(0) ̂ + ∑ (𝑓(𝑛)𝑒 𝑛 (𝑥) cos(2𝜋𝑛𝑡) + ( 𝑛≠0

𝑔(𝑛) ̂ ) 𝑒 (𝑥) sin(2𝜋𝑛𝑡)) (11.3.3) 2𝜋𝑛 𝑛

is a formal solution to (11.3.2) that satisfies the initial conditions 𝑢(𝑥, 0) = 𝑓(𝑥) and 𝜕𝑢 (𝑥, 0) = 𝑔(𝑥). 𝜕𝑡 As in Section 11.1, we verify our guess (11.3.3) by checking convergence, initial values, and uniqueness. “Drift” and domain. Note that our guess (11.3.3) is periodic in 𝑡 with period 1, except for the term 𝑔(0)𝑡, ̂ which is sometimes known as a “drift” term (imagine a wave descending at a constant rate). However, let 𝑔0 (𝑥) = 𝑔(𝑥) − 𝑔(0) ̂ (a drift-free initial velocity) and 𝑔(𝑛) ̂ ̂ 𝑢0 (𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 ) 𝑒𝑛 (𝑥) sin(2𝜋𝑛𝑡). 𝑛 (𝑥) cos(2𝜋𝑛𝑡) + ∑ ( 2𝜋𝑛 𝑛∈𝐙 𝑛≠0

(11.3.4)

If 𝑢0 (𝑥, 𝑡) is sufficiently differentiable, lim 𝑢0 (𝑥, 𝑡) = 𝑓(𝑥),

𝑡→0+

lim

𝑡→0+

𝜕𝑢0 (𝑥, 𝑡) = 𝑔0 (𝑥), 𝜕𝑡

(11.3.5)

and

𝜕2 𝑢 𝜕2 𝑢0 = − 20 (11.3.6) 2 𝜕𝑥 𝜕𝑡 for 𝑡 > 0, then it is certainly reasonable to say that 𝑢(𝑥, 𝑡) = 𝑢0 (𝑥, 𝑡) + 𝑔(0)𝑡 ̂ is a solution to Question 11.3.1, with the same regularity properties as 𝑢0 . For the rest of this section, then, we assume that 𝑔(0) ̂ = 0 and 𝑢0 (𝑥, 𝑡), as given by (11.3.4), is our guessed solution. As a bonus, we note that in fact, 𝑢0 (𝑥, 𝑡) is periodic in both 𝑥 and 𝑡. (In fact, 𝑢0 is in 𝐿2 (𝑆 1 ) in both variables; see Problem 11.3.1.) Physically, we may interpret this as saying that for a solution of the wave equation, periodicity in space forces periodicity in time; see Section 11.4 for more on this point. Furthermore, if we assume that 𝑓 ∈ 𝐶 1 (𝑆 1 ), 𝜕𝑢 we get the following formal term-by-term calculation of 0 . 𝜕𝑡 −

Lemma 11.3.2. Suppose 𝑓 ∈ 𝐶 1 (𝑆1 ), 𝑔 ∈ 𝐿2 (𝑆 1 ), and 𝑢0 is given by (11.3.4). Then ( formal) term-by-term differentiation gives 𝜕𝑢0 ̂ (𝑥, 𝑡) = − ∑ (2𝜋𝑛)𝑓(𝑛)𝑒 ̂ 𝑛 (𝑥) sin(2𝜋𝑛𝑡) + ∑ 𝑔(𝑛)𝑒 𝑛 (𝑥) cos(2𝜋𝑛𝑡), 𝜕𝑡 𝑛∈𝐙 𝑛≠0

(11.3.7)

and the right-hand side of (11.3.7) converges in both 𝐿2𝑥 (𝑆 1 ) and 𝐿2𝑡 (𝑆1 ). Proof. Problem 11.3.2. Initial values. In terms of regularity of solutions, we have good news and bad news. The good news is that after we assume an extra degree of differentiability to account for one of the initial values being a derivative, the initial value properties of (11.3.4) hold in a manner analogous to the initial value properties of the solution to the heat equation. We begin with the analogue of Theorem 11.1.4.

11.3. The wave equation on the circle

239

Theorem 11.3.3. Suppose 𝑓(𝑥) ∈ 𝐶 1 (𝑆 1 ), 𝑔(𝑥) ∈ 𝐿2 (𝑆 1 ), and 𝑢0 (𝑥, 𝑡) is defined by (11.3.4). Then ‖ 𝜕𝑢 ‖ lim ‖ 0 (𝑥, 𝑡) − 𝑔(𝑥)‖ = 0, (11.3.8) lim ‖𝑢0 (𝑥, 𝑡) − 𝑓(𝑥)‖ = 0, ‖ 𝑡→0+ 𝑡→0+ ‖ 𝜕𝑡 where convergence is in 𝐿2𝑥 (𝑆 1 ). Proof. Problems 11.3.3 and 11.3.4. Theorem 11.3.4. Consider Question 11.3.1, with the additional hypotheses that 𝑓 ∈ 𝜕𝑢 𝐶 2 (𝑆 1 ) and 𝑔 ∈ 𝐶 1 (𝑆1 ). Then 𝑢0 (𝑥, 𝑡) and 0 (𝑥, 𝑡) converge to 𝑓(𝑥) and 𝑔(𝑥) uniformly 𝜕𝑡 on 𝑆 1 , or more precisely, ‖ ‖ 𝜕𝑢 lim ‖𝑢0 (𝑥, 𝑡) − 𝑓(𝑥)‖∞ = 0, lim ‖ 0 (𝑥, 𝑡) − 𝑔(𝑥)‖ = 0. (11.3.9) ‖∞ 𝑡→0+ 𝑡→0+ ‖ 𝜕𝑡 Proof. Define 𝜂0 (𝑡) = ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖∞ ,

‖ 𝜕𝑢 ‖ 𝜂1 (𝑡) = ‖ 0 (𝑥, 𝑡) − 𝑔(𝑥)‖ . ‖ 𝜕𝑡 ‖∞

(11.3.10)

As in the proof of Theorem 11.1.5, by the Squeeze Lemma 3.1.23, it suffices to show that for 𝑖 = 0, 1, 𝜂𝑖 (𝑡) is bounded above by a nonnegative continuous function ℎ𝑖 ∶ [0, +∞) → 𝐑 such that ℎ𝑖 (0) = 0. See Problems 11.3.5 and 11.3.6. Pointwise convergence. One significant qualitative difference between the heat and wave equations is that the solution (11.3.4) to the wave equation lacks the exponentially decaying factors found in the solution (11.1.6) to the heat equation, so there is no wave equation analogue to the smoothing results of Theorem 11.1.3. However, as long as we assume one extra degree of differentiability, we can use the Extra Derivative Lemma 8.4.7 to obtain pointwise (or actually, uniform) convergence properties. (As a bonus, convergence also extends to 𝑡 = 0.) Theorem 11.3.5. Consider Question 11.3.1, with the additional hypotheses on the initial values 𝑓 and 𝑔 that 𝑓 ∈ 𝐶 3 (𝑆 1 ) and 𝑔 ∈ 𝐶 2 (𝑆1 ). Let 𝑢0 (𝑥, 𝑡) be given by (11.3.4). Then for fixed 𝑡0 ≥ 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥2 (𝑆 1 ); for fixed 𝑥0 ∈ 𝑆 1 , 𝑢(𝑥0 , 𝑡) ∈ 𝐶𝑡2 (𝑆 1 ); and (11.3.2) holds for all 𝑡 ≥ 0. Proof. Problem 11.3.7. 𝐿2 convergence. If we are willing to settle for 𝐿2 convergence of the solution, and not just pointwise/uniform convergence, we can weaken the hypotheses of Theorem 11.3.5 to 𝑓 ∈ 𝐶 2 (𝑆 1 ) and 𝑔 ∈ 𝐶 1 (𝑆 1 ). We begin by extending the domain of Δ. Definition 11.3.6. We define the operator Δ+ in ℋ = 𝐿2 (𝑆 1 ) to be the operator given by 2 2 ̂ ̂ Δ+ ( ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥)) = ∑ 4𝜋 𝑛 𝑓(𝑛)𝑒𝑛 (𝑥), 𝑛∈𝐙

(11.3.11)

𝑛∈𝐙

where the domain of Δ+ is defined by | 2 ̂ | ∑ 4𝜋2 𝑛2 𝑓(𝑛)𝑒 ̂ 𝔇(Δ+ ) = { ∑ 𝑓(𝑛) 𝑛 (𝑥) converges in 𝐿 } . | |𝑛∈𝐙 𝑛∈𝐙

(11.3.12)

240

Chapter 11. Eigenbases and differential equations

In other words, 𝔇(Δ+ ) is literally the largest possible subspace of 𝐿2𝑥 (𝑆 1 ) on which Δ+ could possibly converge. Theorem 11.3.7. Let Δ be the Laplacian on 𝑆 1 , and let Δ+ be the operator from Definition 11.3.6. (1) The domain 𝔇(Δ+ ) contains 𝐶 2 (𝑆 1 ). (2) For every 𝑓 ∈ 𝔇(Δ), Δ+ (𝑓) = Δ(𝑓), or in other words, Δ+ extends Δ. Proof. Problems 11.3.8. We may therefore rewrite the wave equation (11.3.6), in an extended sense, as + Δ+ 𝑥 (𝑢0 ) = Δ𝑡 (𝑢0 ),

Δ+ 𝑥

Δ+ 𝑡

(11.3.13)

+

and are the extended Laplacian Δ defined above, in the variables 𝑥 and where 𝑡, respectively. Note that if instead of the basis {𝑒𝑛 (𝑡)} for 𝐿2 (𝑆 1 ), we use {1, cos(2𝜋𝑛𝑡), sin(2𝜋𝑛𝑡)}, by grouping the ±𝑛 terms in (11.3.11), we see that for any ℎ ∈ 𝐿2𝑡 (𝑆 1 ) with ∞

ℎ(𝑡) = we have

𝑎0 + ∑ (𝑎𝑛 cos(2𝜋𝑛𝑡) + 𝑏𝑛 sin(2𝜋𝑛𝑡)), 2 𝑛=1

(11.3.14)

Δ+ (ℎ) = ∑ 4𝜋2 𝑛2 (𝑎𝑛 cos(2𝜋𝑛𝑡) + 𝑏𝑛 sin(2𝜋𝑛𝑡)).

(11.3.15)

𝑛=1

Putting it all together, we have the following theorem. Theorem 11.3.8. If 𝑓 ∈ 𝐶 2 (𝑆1 ) and 𝑔 ∈ 𝐶 1 (𝑆 1 ), then for 𝑡 > 0, (11.3.4) is a solution to (11.3.13) as an equation in either 𝐿2𝑥 (𝑆 1 ) or 𝐿2𝑡 (𝑆 1 ). Proof. Problem 11.3.9. Uniqueness. Uniqueness of the solution to the wave equation follows for much the same reasons that it does for the heat equation. We again make extra assumptions about continuity to simplify our discussion. Theorem 11.3.9. Suppose 𝑓 ∈ 𝐶 0 (𝑆 1 ), 𝑔 ∈ 𝐶 0 (𝑆 1 ), and 𝑢 ∶ 𝑆 1 × [0, +∞) → 𝐂 is such that: 𝜕2 𝑢 𝜕2 𝑢 (1) and exist and are continuous on 𝑆 1 × (0, +∞). 𝜕𝑥2 𝜕𝑡2 𝜕𝑢 (2) 𝑢 and are continuous (including at 𝑡 = 0) and for any 𝑥 ∈ 𝑆 1 , 𝑢(𝑥, 0) = 𝑓(𝑥) 𝜕𝑡 𝜕𝑢 (𝑥, 0) = 𝑔(𝑡). and 𝜕𝑡 (3) For all 𝑡 > 0, 𝜕2 𝑢 𝜕2 𝑢 − 2 =− 2. (11.3.16) 𝜕𝑥 𝜕𝑡 Then 𝑔(𝑛) ̂ ̂ ̂ ∑ (𝑓(𝑛)𝑒 𝑢(𝑥, 𝑡) = 𝑓(0)+ 𝑔(0)𝑡+ ̂ ) 𝑒𝑛 (𝑥) sin(2𝜋𝑛𝑡)) . (11.3.17) 𝑛 (𝑥) cos(2𝜋𝑛𝑡) + ( 2𝜋𝑛 𝑛≠0

11.3. The wave equation on the circle

241

Proof. As in the proof of Theorem 11.1.6, if we let 𝜓𝑛 (𝑡) be the 𝑛th Fourier coefficient (in 𝑥) of 𝑢(𝑥, 𝑡), we see that for all 𝑡 > 0, 𝑢(𝑥, 𝑡) = ∑ 𝜓𝑛 (𝑡)𝑒𝑛 (𝑥).

(11.3.18)

𝑛∈𝐙

Problem 11.3.10 shows that for 𝑡 > 0, 𝜓𝑛″ (𝑡) = −4𝜋2 𝑛2 𝜓𝑛 (𝑡).

(11.3.19)

Since Lemma 3.6.20 and Theorem 3.6.23 together imply that 1

1

𝜓𝑛 (𝑡) = ∫ 𝑢(𝑥, 𝑡) 𝑒𝑛 (𝑥) 𝑑𝑥,

𝜓𝑛′ (𝑡) = ∫

0

0

𝜕𝑢 𝑒 (𝑥) 𝑑𝑥 𝜕𝑡 𝑛

(11.3.20)

are continuous for 𝑡 ∈ [0, +∞), Theorem 4.6.3 then shows for 𝑛 ≠ 0 that ̂ ̂ cos(2𝜋𝑛𝑡) + 𝑔(𝑛) 𝜓𝑛 (𝑡) = 𝑓(𝑛) sin(2𝜋𝑛𝑡). (11.3.21) 2𝜋𝑛 As for 𝑛 = 0, it is a straightforward fact from calculus that if 𝜓 ∈ 𝐶 2 ((0, +∞)), 𝜓′ is continuous at 𝑡 = 0, and 𝜓0″ (𝑡) = 0 for 𝑡 > 0, then 𝜓0 (𝑡) = 𝜓0 (0) + 𝜓0′ (0)𝑡. The theorem follows. d’Alembert’s formula. The wave equation was actually first solved in 1746, quite some time before Fourier’s work, by d’Alembert, who found the solution 𝑥+𝑡

𝑢(𝑥, 𝑡) =

1 1 (𝑓(𝑥 + 𝑡) + 𝑓(𝑥 − 𝑡)) + ∫ 𝑔(𝑦) 𝑑𝑦. 2 2 𝑥−𝑡

(11.3.22)

We can use calculus to verify directly that if 𝑓 ∈ 𝐶 2 (𝑆1 ) and 𝑔 ∈ 𝐶 1 (𝑆1 ), then (11.3.22) is a solution to (9.1.11); see Problem 11.3.11. For our purposes, then, the Fourier series solution to the wave equation is therefore most notable as a second example of how eigenbases give a general method for solving differential equations. We also note that (11.3.3) actually gives the same answer as (11.3.22); see Problem 11.3.12. Remark 11.3.10. We note that d’Alembert’s formula (11.3.22) still makes sense when we only assume that (for example) 𝑓 and 𝑔 are integrable, and we therefore still think of (11.3.22) as a solution to the wave equation, even when 𝑢(𝑥, 𝑡) is no longer differentiable. We also note that (11.3.22) preserves, and even propagates, any singularities present in the initial conditions 𝑓 and 𝑔, a notable characteristic of hyperbolic PDEs (see Evans [Eva10]). In some applications, we actually regard this lack of smoothing as a feature, as, for example, the “broadcast” of initial singularities helps to make the transmission of information by electromagnetic waves work in practice. Remark 11.3.11. In a similar manner, we can extend our “drift-free” solution 𝑢0 (𝑥, 𝑡) from (11.3.4) to any (drift-free) 𝑓, 𝑔 ∈ 𝐿2 (𝑆 1 ) by thinking of (11.3.4) as defining an operator in the variables 𝑓 and 𝑔. Since ‖𝑢0 (𝑥, 𝑡)‖ ≤ ‖𝑓(𝑥)‖ + ‖𝑔(𝑥)‖ , 𝐿2𝑥 (𝑆 1 )

𝐿2𝑡 (𝑆 1 )

(11.3.23)

where ‖𝑢0 (𝑥, 𝑡)‖ is calculated in either or (Problem 11.3.13), this operator is continuous in both variables, and since any function in 𝐿2 (𝑆 1 ) is the limit of a sequence in 𝐶 ∞ (𝑆1 ), we can think of the solution 𝑢0 (𝑥, 𝑡) as extending to any initial values 𝑓, 𝑔 ∈ 𝐿2 (𝑆 1 ) by continuity. Again, however, this approach at best really only gives the same answer as d’Alembert’s formula, albeit from a different point of view.

242

Chapter 11. Eigenbases and differential equations

Problems. For Problems 11.3.1–11.3.9 and 11.3.13, we make the “drift-free” assumption 𝑔(0) ̂ = 0. While we do not include suggestions for the individual problems in this section, the reader looking for guidance should keep in mind the methods described in Remark 11.1.2 and exhibited in the proofs of Theorems 11.1.3 and 11.1.4. 11.3.1. (*) Suppose 𝑓, 𝑔 ∈ 𝐿2 (𝑆1 ), and let 𝑔(𝑛) ̂ ̂ 𝑢0 (𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 ) 𝑒 (𝑥) sin(2𝜋𝑛𝑡). 𝑛 (𝑥) cos(2𝜋𝑛𝑡) + ∑ ( 2𝜋𝑛 𝑛 𝑛≠0 𝑛∈𝐙

(11.3.24)

(a) Prove that for fixed 𝑡, the right-hand side of (11.3.24) converges in 𝐿2𝑥 (𝑆1 ). (b) Same, but for fixed 𝑥, in 𝐿2𝑡 (𝑆 1 ). 11.3.2. (*) (Proves Lemma 11.3.2) Suppose 𝑓 ∈ 𝐶 1 (𝑆 1 ) and 𝑢0 is given by (11.3.24), and let 𝜕𝑢0 ̂ (𝑥, 𝑡) = − ∑ (2𝜋𝑛)𝑓(𝑛)𝑒 ̂ (11.3.25) 𝑛 (𝑥) sin(2𝜋𝑛𝑡) + ∑ 𝑔(𝑛)𝑒 𝑛 (𝑥) cos(2𝜋𝑛𝑡). 𝜕𝑡 𝑛∈𝐙 𝑛≠0 Note that as long as the right-hand side of (11.3.25) converges, we can think of it 𝜕𝑢 as 0 , either in terms of term-by-term differentiation or as an operator extension 𝜕𝑡 𝜕 of . 𝜕𝑡 (a) Prove that for fixed 𝑡, the right-hand side of (11.3.25) converges in 𝐿2𝑥 (𝑆1 ). (b) Same, but for fixed 𝑥, in 𝐿2𝑡 (𝑆 1 ). 11.3.3. (*) (Proves Theorem 11.3.3) Suppose 𝑓 ∈ 𝐶 1 (𝑆 1 ), 𝑔 ∈ 𝐿2 (𝑆1 ), and 𝑢0 (𝑥, 𝑡) is 𝜕𝑢 defined by (11.3.24) (and therefore, 0 is given by (11.3.25); see Problem 11.3.2). 𝜕𝑡 Define ℎ1 , ℎ2 ∶ [0, +∞) → 𝐑 by 2

2 ̂ || , ℎ1 (𝑡) = ∑ |1 − cos(2𝜋𝑛𝑡)| ||𝑓(𝑛) 𝑛∈𝐙

2

̂ | | 𝑔(𝑛) 2 | |sin(2𝜋𝑛𝑡)| , ℎ2 (𝑡) = ∑ | | 2𝜋𝑛 |

(11.3.26)

𝑛≠0

which we think of as function series in the variable 𝑡. (a) Prove that for 𝑖 = 1, 2, ℎ𝑖 (𝑡) converges uniformly to a continuous function on [0, +∞) such that ℎ𝑖 (0) = 0. (b) Now let 𝜂0 (𝑡) = ‖𝑢0 (𝑥, 𝑡) − 𝑓(𝑥)‖, where the norm is computed in 𝐿2𝑥 (𝑆 1 ). Prove that 0 ≤ 𝜂0 (𝑡) ≤ √ℎ1 (𝑡) + √ℎ2 (𝑡), and therefore, that lim+ 𝜂0 (𝑡) = 0. 𝑡→0

11.3.4. (*) (Proves Theorem 11.3.3) Under the assumptions of Problem 11.3.3, define ℎ1 , ℎ2 ∶ [0, +∞) → 𝐑 by 2

̂ || |sin(2𝜋𝑛𝑡)|2 , ℎ1 (𝑡) = ∑ ||(2𝜋𝑛)𝑓(𝑛) 𝑛∈𝐙

2

2

|1 − cos(2𝜋𝑛𝑡)| , ℎ2 (𝑡) = ∑ |𝑔(𝑛)| ̂ 𝑛≠0

(11.3.27) which we think of as function series in the variable 𝑡. (a) Prove that for 𝑖 = 1, 2, ℎ𝑖 (𝑡) converges uniformly to a continuous function on [0, +∞) such that ℎ𝑖 (0) = 0.

11.3. The wave equation on the circle

243

‖ 𝜕𝑢 ‖ (b) Now let 𝜂1 (𝑡) = ‖ 0 (𝑥, 𝑡) − 𝑔(𝑥)‖, where the norm is computed in 𝐿2𝑥 (𝑆 1 ). ‖ 𝜕𝑡 ‖ Prove that 0 ≤ 𝜂1 (𝑡) ≤ √ℎ1 (𝑡) + √ℎ2 (𝑡), and therefore, that lim+ 𝜂1 (𝑡) = 0. 𝑡→0

11.3.5. (*) (Proves Theorem 11.3.4) Suppose 𝑢0 (𝑥, 𝑡) is given by (11.3.24) and that 𝑓 ∈ 𝐶 2 (𝑆1 ) and 𝑔 ∈ 𝐶 1 (𝑆1 ). Define 𝜂0 (𝑡) = ‖𝑢(𝑥, 𝑡) − 𝑓(𝑥)‖∞ ,

(11.3.28)

̂ | ̂ || + ∑ || 𝑔(𝑛) | |sin(2𝜋𝑛𝑡)| , ℎ0 (𝑡) = ∑ |1 − cos(2𝜋𝑛𝑡)| ||𝑓(𝑛) | 2𝜋𝑛 | 𝑛∈𝐙

(11.3.29)

𝑛≠0

where the 𝐿∞ norm is computed over all 𝑥 ∈ 𝑆 1 , fixing 𝑡. (a) Prove that ℎ0 (𝑡) converges uniformly on [0, +∞) to a continuous function. (b) Prove that 𝜂0 (𝑡) ≤ ℎ0 (𝑡) for 𝑡 ∈ [0, +∞). 11.3.6. (*) (Proves Theorem 11.3.4) Suppose 𝐶 2 (𝑆1 ) and 𝑔 ∈ 𝐶 1 (𝑆1 ). Define

𝜕𝑢0 is given by (11.3.25) and that 𝑓 ∈ 𝜕𝑡

‖ 𝜕𝑢 ‖ 𝜂1 (𝑡) = ‖ 0 (𝑥, 𝑡) − 𝑔(𝑥)‖ , ‖ 𝜕𝑡 ‖∞ ̂ || |sin(2𝜋𝑛𝑡)| + ∑ |𝑔(𝑛)| |1 − cos(2𝜋𝑛𝑡)| , ℎ1 (𝑡) = ∑ ||(2𝜋𝑛)𝑓(𝑛) ̂ 𝑛∈𝐙

(11.3.30) (11.3.31)

𝑛≠0

where the 𝐿∞ norm is computed over all 𝑥 ∈ 𝑆 1 , fixing 𝑡. (a) Prove that ℎ1 (𝑡) converges uniformly on [0, +∞) to a continuous function. (b) Prove that 𝜂1 (𝑡) ≤ ℎ1 (𝑡) for 𝑡 ∈ [0, +∞). 11.3.7. (*) (Proves Theorem 11.3.5) Suppose 𝑢0 (𝑥, 𝑡) is given by (11.3.24) and assume now that 𝑓 ∈ 𝐶 3 (𝑆1 ) and 𝑔 ∈ 𝐶 2 (𝑆1 ). (a) Prove that for 𝑘 ≤ 2, the series 𝑘 ̂ || + ∑ |2𝜋𝑛|𝑘−1 |𝑔(𝑛)| ∑ |2𝜋𝑛| ||𝑓(𝑛) ̂ 𝑛∈𝐙

(11.3.32)

𝑛≠0

converges. (b) Prove that the series (11.3.24) for 𝑢0 (𝑥, 𝑡) can be differentiated term by term twice in both 𝑥 and 𝑡. 𝜕 2 𝑢0 𝜕 2 𝑢0 (c) Prove that (𝑥, 𝑡) = (𝑥, 𝑡) for all 𝑥 ∈ 𝑆 1 and 𝑡 ≥ 0. 𝜕𝑥2 𝜕𝑡2 11.3.8. (*) (Proves Theorem 11.3.7) Let Δ be the Laplacian on 𝑆 1 , and let Δ+ be the extended operator 2 2 ̂ ̂ Δ+ ( ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥)) = ∑ 4𝜋 𝑛 𝑓(𝑛)𝑒𝑛 (𝑥) 𝑛∈𝐙

from Definition 11.3.6.

𝑛∈𝐙

(11.3.33)

244

Chapter 11. Eigenbases and differential equations (a) Prove that if 𝑓 ∈ 𝐶 2 (𝑆1 ), then 𝑓 is in | 2 ̂ ̂ | ∑ 4𝜋2 𝑛2 𝑓(𝑛)𝑒 𝔇(Δ+ ) = { ∑ 𝑓(𝑛) 𝑛 (𝑥) converges in 𝐿 } . | | 𝑛∈𝐙 𝑛∈𝐙

(11.3.34)

(b) Prove that if 𝑓 ∈ 𝐶 2 (𝑆1 ), then Δ+ (𝑓) = Δ(𝑓). 11.3.9. (Proves Theorem 11.3.8) Prove that if 𝑓 ∈ 𝐶 2 (𝑆 1 ) and 𝑔 ∈ 𝐶 1 (𝑆 1 ), then (11.3.24) is a solution to + Δ+ 𝑥 (𝑢0 ) = Δ𝑡 (𝑢0 ),

(11.3.35)

as an equation in either 𝐿2𝑥 (𝑆1 ) or 𝐿2𝑡 (𝑆 1 ). 11.3.10. (*) (Proves Theorem 11.3.9) Suppose that 𝑢 ∶ 𝑆 1 × (0, +∞) → 𝐂 is such that 𝜕2 𝑢 𝜕2 𝑢 = is continuous (and therefore, so is 𝑢). Let 𝜓𝑛 (𝑡) be the 𝑛th Fourier 𝜕𝑥2 𝜕𝑡2 coefficient (in 𝑥) of 𝑢(𝑥, 𝑡). Prove that 𝜓𝑛″ (𝑡) = −4𝜋2 𝑛2 𝜓𝑛 (𝑡). 11.3.11. (*) Prove that if 𝑓 ∈ 𝐶 2 (𝑆 1 ) and 𝑔 ∈ 𝐶 1 (𝑆 1 ), then d’Alembert’s formula 𝑥+𝑡

𝑢(𝑥, 𝑡) =

1 1 (𝑓(𝑥 + 𝑡) + 𝑓(𝑥 − 𝑡)) + ∫ 𝑔(𝑦) 𝑑𝑦 2 2 𝑥−𝑡

(11.3.36)

gives a valid solution to Question 11.3.1. 11.3.12. (*) Working formally (i.e., without worrying about convergence of infinite series, assuming term-by-term operations work, etc.), prove that d’Alembert’s formula (11.3.36) gives the same solution as ̂ ̂ ∑ (𝑓(𝑛)𝑒 𝑢(𝑥, 𝑡) = 𝑓(0)+ 𝑔(0)𝑡+ ̂ 𝑛 (𝑥) cos(2𝜋𝑛𝑡) + ( 𝑛≠0

𝑔(𝑛) ̂ ) 𝑒 (𝑥) sin(2𝜋𝑛𝑡)) . (11.3.37) 2𝜋𝑛 𝑛

11.3.13. (*) Suppose 𝑓, 𝑔 ∈ 𝐿2 (𝑆 1 ), and let 𝑔(𝑛) ̂ ̂ 𝑢0 (𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 ) 𝑒 (𝑥) sin(2𝜋𝑛𝑡). 𝑛 (𝑥) cos(2𝜋𝑛𝑡) + ∑ ( 2𝜋𝑛 𝑛 𝑛∈𝐙 𝑛≠0

(11.3.38)

(a) Prove that ‖𝑢0 (𝑥, 𝑡)‖ ≤ ‖𝑓(𝑥)‖ + ‖𝑔(𝑥)‖, where ‖𝑢0 (𝑥, 𝑡)‖ is calculated in 𝐿2𝑥 (𝑆1 ). In particular, 𝑢0 (𝑥, 𝑡) converges in 𝐿2𝑥 (𝑆1 ). (b) Same, but in 𝐿2𝑡 (𝑆1 ).

11.4 Boundary value problems We next turn to a situation that, in many ways, represents the most “applied” of the various ways in which we will solve the heat and wave equations: namely, on a closed and bounded interval, with various boundary conditions. For reasons that will become 1 clear, we will stick to the interval [0, ], but to be sure, our results hold for any interval 2 [𝑎, 𝑏] after suitable scaling and translation. We first restate the heat and wave equations as boundary value problems, making Question 9.1.3 precise.

11.4. Boundary value problems

245 1

Question 11.4.1. Given an initial value 𝑓(𝑥) ∈ 𝐿2 ([0, ]), find 𝑢(𝑥, 𝑡) (𝑡 ≥ 0) such 2 that: 1

1

2

2

(1) (Differentiable) For fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥2 ((0, )), and for fixed 𝑥0 ∈ (0, ), 𝑢(𝑥0 , 𝑡) ∈ 𝐶𝑡1 ((0, +∞)). (2) (Initial value) For any 𝑥 ∈ 𝑆 1 , lim+ 𝑢(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0 1

(3) (PDE) For all 𝑡 > 0 and 𝑥 ∈ (0, ), 2

𝜕2 𝑢 𝜕𝑢 =− . 𝜕𝑡 𝜕𝑥2

(11.4.1)

(4) (Boundary values) 𝑢(𝑥, 𝑡) satisfies one of the following sets of boundary conditions: 1

(a) Dirichlet boundary conditions: 𝑢(0, 𝑡) = 𝑢( , 𝑡) = 0 for all 𝑡. 2

(b) Neumann boundary conditions:

𝜕𝑢 𝜕𝑢 1 (0, 𝑡) = ( , 𝑡) = 0 for all 𝑡. 𝜕𝑥 𝜕𝑥 2 1

Question 11.4.2. Given initial values 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐿2 ([0, ]), find 𝑢(𝑥, 𝑡) (𝑡 ≥ 0) such 2 that: 1

1

2

2

(1) (Differentiable) For fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶𝑥2 ((0, )), and for fixed 𝑥0 ∈ (0, ), 𝑢(𝑥0 , 𝑡) ∈ 𝐶𝑡1 ((0, +∞)). (2) (Initial value) For any 𝑥 ∈ 𝑆 1 , lim 𝑢(𝑥, 𝑡) = 𝑓(𝑥),

𝑡→0+

lim

𝑡→0+

𝜕𝑢 (𝑥, 𝑡) = 𝑔(𝑥). 𝜕𝑡

(11.4.2)

(3) (PDE) For all 𝑡 > 0, −

𝜕2 𝑢 𝜕2 𝑢 =− 2. 2 𝜕𝑥 𝜕𝑡

(11.4.3)

(4) (Boundary values) 𝑢(𝑥, 𝑡) satisfies one of the following sets of boundary conditions: 1

(a) Dirichlet boundary conditions: 𝑢(0, 𝑡) = 𝑢( , 𝑡) = 0 for all 𝑡. 2

(b) Neumann boundary conditions:

𝜕𝑢 1 𝜕𝑢 (0, 𝑡) = ( , 𝑡) = 0 for all 𝑡. 𝜕𝑥 𝜕𝑥 2

As in Sections 11.1 and 11.3, to solve the heat and wave equation boundary value problems fully, we would begin by finding a formal solution and consider regularity later. However, since the regularity proofs for the boundary value problems greatly resemble the proofs we have seen previously, to avoid repetition, we restrict our attention to the formal problem and leave regularity to the reader (perhaps in another text or class). As previously discussed, solving the heat and wave boundary value problems (Questions 11.4.1 and 11.4.2) formally is another application of the eigenbasis method 1 (Section 11.2). The principal new content lies in expressing an arbitrary 𝑓 ∈ 𝐿2 ([0, ]) 2 in terms of an eigenbasis corresponding to the desired boundary conditions. Toward

246

Chapter 11. Eigenbases and differential equations 1

this end, for 𝑓 ∈ 𝐿2 ([0, ]), recall (Subsection 6.3.3) that the even extension and odd 2

extension of 𝑓 are, respectively, the functions 𝑓even , 𝑓odd ∈ 𝐿2 (𝑆 1 ) given by

𝑓even (𝑥) = {

1

𝑓(𝑥)

if 0 ≤ 𝑥 ≤ , 2

1

𝑓(−𝑥)

if − ≤ 𝑥 < 0, 2

1

⎧𝑓(𝑥) ⎪0 𝑓odd (𝑥) = ⎨−𝑓(−𝑥) ⎪ ⎩0

if 0 < 𝑥 < , 2

if 𝑥 = 0, 1

if − < 𝑥 < 0, 2

(11.4.4)

1

if 𝑥 = ± . 2

Recall also (Theorem 6.3.6) that the (complex) Fourier series of the even and odd extensions of 𝑓 become cosine and sine series, respectively: ∞

̂ (𝑛)𝑒𝑛 (𝑥) = 𝑓even (𝑥) = ∑ 𝑓even 𝑛∈𝐙

𝑎0 + ∑ 𝑎𝑛 cos(2𝜋𝑛𝑥), 2 𝑛=1

(11.4.5)

̂ (𝑛)𝑒𝑛 (𝑥) = ∑ 𝑏𝑛 sin(2𝜋𝑛𝑥), 𝑓odd (𝑥) = ∑ 𝑓odd 𝑛∈𝐙

(11.4.6)

𝑛=1

where 1/2

𝑎𝑛 = 4 ∫

1/2

𝑓(𝑥) cos(2𝜋𝑛𝑥) 𝑑𝑥,

𝑏𝑛 = 4 ∫

0

𝑓(𝑥) sin(2𝜋𝑛𝑥) 𝑑𝑥,

(11.4.7)

0

and convergence in 𝐿2 holds by Theorem 8.1.1. We also recall that by (6.3.7)–(6.3.9), the sets ℬeven = {cos(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 0} and ℬodd = {sin(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 1} are each orthogonal (but not orthonormal) sets of nonzero vectors. Finally, we observe that if a series 1 converges to 𝑓 in 𝐿2 (𝑆1 ), it must also converge to 𝑓 in 𝐿2 ([0, ]), as 2

1 2

1 2

2

2

∫ |𝑓(𝑥) − 𝑓𝑛 (𝑥)| 𝑑𝑥 ≤ ∫ |𝑓(𝑥) − 𝑓𝑛 (𝑥)| 𝑑𝑥. 1 − 2

0

(11.4.8)

Therefore, by the definition of orthogonal basis, we see that: Corollary 11.4.3. Each of the sets ℬeven = {cos(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 0} ,

(11.4.9)

ℬodd = {sin(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 1}

(11.4.10)

1

is an orthogonal basis for 𝐿2 ([0, ]). 2

1

Next, we consider the operator Δ(𝑓) = −𝑓″ in 𝐿2 ([0, ]), with one of the following 2 domains (Example 10.2.13): 1

1

2 1

2

𝔇(Δ)Dir = {𝑓 ∈ 𝐶 2 ([0, ]) ∣ 𝑓(0) = 0 = 𝑓( )} , 𝔇(Δ)Neu = {𝑓 ∈ 𝐶 2 ([0, ]) ∣ 𝑓′ (0) = 0 = 2

1 𝑓′ ( )} . 2

(11.4.11) (11.4.12)

Then we see that: Theorem 11.4.4. The set ℬodd = {sin(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 1} is an eigenbasis for Δ with domain 𝔇(Δ)Dir , and the set ℬeven = {cos(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 0} is an eigenbasis for Δ with domain 𝔇(Δ)Neu .

11.4. Boundary value problems

247

Proof. Problems 11.4.1 and 11.4.2. Therefore, if ∞

𝑓(𝑥) =

𝑎0 + ∑ 𝑎𝑛 cos(2𝜋𝑛𝑥) = ∑ 𝑏𝑛 sin(2𝜋𝑛𝑥) 2 𝑛=1 𝑛=1

(11.4.13)

gives the cosine and sine expansions of 𝑓, respectively, with convergence in 𝐿2 , the eigenbasis method (Section 11.2) gives formal solutions ∞

𝑢(𝑥, 𝑡) = ∑ 𝑏𝑛 sin(2𝜋𝑛𝑥)𝑒−4𝜋

2 𝑛2 𝑡

,

(11.4.14)

𝑛=1 ∞

𝑢(𝑥, 𝑡) =

𝑎0 2 2 + ∑ 𝑎𝑛 cos(2𝜋𝑛𝑥)𝑒−4𝜋 𝑛 𝑡 2 𝑛=1

(11.4.15)

1

for the heat equation on [0, ], with Dirichlet and Neumann boundary conditions, re2 spectively. Similarly, if ∞

𝑓(𝑥) =

𝑎0 + ∑ 𝑎𝑛 cos(2𝜋𝑛𝑥) = ∑ 𝑏𝑛 sin(2𝜋𝑛𝑥), 2 𝑛=1 𝑛=1

𝑔(𝑥) =

𝑐0 + ∑ 𝑐 cos(2𝜋𝑛𝑥) = ∑ 𝑑𝑛 sin(2𝜋𝑛𝑥) 2 𝑛=1 𝑛 𝑛=1

(11.4.16)

(11.4.17)

are the cosine and sine expansions of 𝑓 and 𝑔, respectively, with convergence in 𝐿2 , the eigenbasis method gives formal solutions ∞

𝑢(𝑥, 𝑡) = ∑ (𝑏𝑛 sin(2𝜋𝑛𝑥) cos(2𝜋𝑛𝑡) + ( 𝑛=1

𝑢(𝑥, 𝑡) =

𝑑𝑛 ) sin(2𝜋𝑛𝑥) sin(2𝜋𝑛𝑡)) , 2𝜋𝑛

(11.4.18)

𝑎0 𝑐0 + 𝑡 2 2 ∞

+ ∑ (𝑎𝑛 cos(2𝜋𝑛𝑥) cos(2𝜋𝑛𝑡) + ( 𝑛=1

𝑐𝑛 ) cos(2𝜋𝑛𝑥) sin(2𝜋𝑛𝑡)) 2𝜋𝑛

(11.4.19)

1

for the wave equation on [0, ], with Dirichlet and Neumann boundary conditions, 2 respectively. Remark 11.4.5. As we discussed in Remark 6.3.7, it may strike some readers as strange that we can have initial values 𝑓, 𝑔, etc., that do not satisfy the desired boundary conditions but nonetheless produce solutions 𝑢(𝑥, 𝑡) that do. The reason is that the sine/ cosine series for 𝑓, 𝑔, etc., converge in 𝐿2 , but not necessarily pointwise, which means that, in effect, we change 𝑓 and 𝑔 at the boundary to get functions that satisfy the boundary conditions. For example, if our problem involves Dirichlet boundary conditions and an initial 1 value 𝑓 ∈ 𝐿2 ([0, ]) given by 𝑓(𝑥) = 1, the corresponding sine series is 2

2(1 − (−1)𝑛 ) sin(2𝜋𝑛𝑥). 𝜋𝑛 𝑛=1

𝑓(𝑥) ∼ ∑

(11.4.20)

248

Chapter 11. Eigenbases and differential equations 1

(See Problem 11.4.3.) By Theorem 8.5.17, (11.4.20) converges to 1 for 𝑥 ∈ (0, ) and to 1

0 for 𝑥 = 0 and 𝑥 = ; see Figure 11.4.1 for what this looks like. 2

Figure 11.4.1. Sine series force Dirichlet boundary conditions.

Figure 11.4.2. Cosine series force Neumann boundary conditions.

2

11.4. Boundary value problems

249

Similarly, if our problem involves Neumann boundary conditions and an initial 1 value 𝑔 ∈ 𝐿2 ([0, ]) given by 𝑔(𝑥) = 𝑥, the corresponding cosine series is 2

𝑔(𝑥) ∼

(1 − (−1)𝑛 ) 1 −∑ cos(2𝜋𝑛𝑥). 4 𝑛=1 𝜋 2 𝑛2

(11.4.21) 1

(See Problem 11.4.4.) In fact, (11.4.21) converges to 𝑥 for 𝑥 ∈ [0, ] (Problem 11.4.5); 2 more relevantly, since the term-by-term derivative of (11.4.21) is precisely (11.4.20), 1 1 the “formal derivative” of 𝑔 converges to 1 for 𝑥 ∈ (0, ) and to 0 for 𝑥 = 0, . See 2 2 Figure 11.4.2 for some intution as to what this looks like. Remark 11.4.6. As promised back in Section 1.2, we can now supply some mathematical details of what happens when one presses down at the exact midpoint of a string on a stringed instrument. In one (highly simplified) standard model of a stringed in1 strument, a string of length (say) held fixed at both ends is modeled by the wave 2

1

equation with Dirichlet boundary conditions 𝑢(0, 𝑡) = 𝑢( , 𝑡) = 0 (left-hand side of 2 Figure 11.4.3). It follows that the height of the string at time 𝑡 must have the form ∞

𝑢(𝑥, 𝑡) = ∑ (𝑎𝑛 cos(2𝜋𝑛𝑡) + 𝑏𝑛 sin(2𝜋𝑛𝑡)) sin(2𝜋𝑛𝑥)

(11.4.22)

𝑛=1

for some coefficients 𝑎𝑛 , 𝑏𝑛 ∈ 𝐑.

x=0

x=1/2

neck

bridge

x=1/4

Figure 11.4.3. Pressing halfway on a string suppresses odd harmonics. Pressing the string at the halfway point 𝑥 = 1

1 4

(right-hand side of Figure 11.4.3) im-

poses an extra “boundary” condition 𝑢( , 𝑡) = 0, which suppresses the odd harmonics 4 𝑛 = 2𝑘 + 1, leaving the waveform ∞

𝑢(𝑥, 𝑡) = ∑ (𝑎2𝑘 cos(2𝜋(2𝑘)𝑡) + 𝑏2𝑘 sin(2𝜋(2𝑘)𝑡)) sin(2𝜋(2𝑘)𝑥).

(11.4.23)

𝑘=1

In particular, the lowest remaining frequency is 𝑛 = 2(1) = 2, twice the usual ground frequency, which a musician hears as being an “octave up.” Moreover, removing the odd harmonics that make the string sound more complex also results in a purer tone quality.

Problems. 11.4.1. (Proves Theorem 11.4.4) Prove that ℬodd = {sin(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 1} is an eigenbasis for Δ with domain 𝔇(Δ)Dir . 11.4.2. (Proves Theorem 11.4.4) Prove that ℬeven = {cos(2𝜋𝑛𝑥) ∣ 𝑛 ≥ 1} is an eigenbasis for Δ with domain 𝔇(Δ)Neu .

250

Chapter 11. Eigenbases and differential equations 1

11.4.3. Define 𝑓 ∈ 𝐿2 ([0, ]) by 𝑓(𝑥) = 1. Prove that the sine series of 𝑓 is given by 2

2(1 − (−1)𝑛 ) sin(2𝜋𝑛𝑥). 𝜋𝑛 𝑛=1 ∑

(11.4.24)

1

11.4.4. Define 𝑔 ∈ 𝐿2 ([0, ]) by 𝑔(𝑥) = 𝑥. Prove that the cosine series of 𝑔 is given by 2

1 (1 − (−1)𝑛 ) −∑ cos(2𝜋𝑛𝑥). 4 𝑛=1 𝜋 2 𝑛2

(11.4.25)

1

11.4.5. Prove that the cosine series of 𝑔(𝑥) = 𝑥 on [0, ] converges uniformly to 𝑔 on 2

1

[0, ]. 2

11.5 Legendre polynomials In this section, we consider the set ℬ of Legendre polynomials, which have applications in a number of physical problems, including the Laplacian in spherical coordinates (see Dym and McKean [DM85, 4.12]). However, reversing our usual practice of starting with a differential equation and finding an eigenbasis we can use to solve it, we start with ℬ and find a differential equation we can use to prove that ℬ is an orthogonal basis for 𝐿2 ([−1, 1]), or more precisely, an eigenbasis for a particular differential operator. We begin by defining the polynomials in question. Definition 11.5.1. For 𝑛 ≥ 0, the 𝑛th Legendre polynomial 𝑃𝑛 (𝑥) is defined to be 𝑛

𝑃𝑛 (𝑥) =

𝑑 1 ( ) ((𝑥2 − 1)𝑛 ) . 𝑛 2 𝑛! 𝑑𝑥

(11.5.1)

We have the following immediate observation. Theorem 11.5.2. The Legendre polynomial 𝑃𝑛 (𝑥) is a polynomial of degree 𝑛 with lead(2𝑛)! ing coefficient 𝑛 . 2 (𝑛! )2 Proof. Problem 11.5.1. For example, the first six Legendre polynomials are: 1

1

𝑃0 (𝑥) = 1,

𝑃2 (𝑥) = (3𝑥2 − 1),

𝑃4 (𝑥) = (35𝑥4 − 30𝑥2 + 3),

𝑃1 (𝑥) = 𝑥,

𝑃3 (𝑥) = (5𝑥3 − 3𝑥),

𝑃5 (𝑥) = (63𝑥5 − 70𝑥3 + 15𝑥).

2 1 2

8 1

(11.5.2)

8

Our first main task is to show that {𝑃𝑛 (𝑥)} is an orthogonal subset of 𝐿2 ([−1, 1]). As promised, our proof uses the following operator. Definition 11.5.3. We define an operator 𝐿 in 𝐿2 ([−1, 1]) with domain 𝔇(𝐿) = 𝐶 2 ([−1, 1]) by the formula 𝐿(𝑓) =

𝑑𝑓 𝑑 ((1 − 𝑥2 ) ) = (1 − 𝑥2 )𝑓″ − 2𝑥𝑓′ . 𝑑𝑥 𝑑𝑥

(11.5.3)

11.5. Legendre polynomials

251

To use the theory of Chapter 10 fully, we need to know that: Theorem 11.5.4. The operator 𝐿 of Definition 11.5.3 is Hermitian. Proof. Problem 11.5.2. We next digress slightly to make an algebraic observation about multiplication by 𝑥 and differentiation, considered as operators. Lemma 11.5.5. Let 𝐼 be an interval in 𝐑 (possibly 𝐼 = 𝐑), and suppose 𝑓 ∈ 𝐶 𝑛 (𝐼) for some 𝑛 ≥ 1. Then 𝑛

(

𝑛

𝑛−1

𝑑 𝑑 𝑑 ) (𝑥𝑓(𝑥)) = 𝑥 ( ) (𝑓(𝑥)) + 𝑛 ( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥

𝑛

(

(𝑓(𝑥)),

𝑛

(11.5.4)

𝑛−1

𝑑 𝑑 𝑑 ) ((𝑥2 − 1)𝑓(𝑥)) = (𝑥2 − 1) ( ) (𝑓(𝑥)) + 2𝑛𝑥 ( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥

(𝑓(𝑥))

(11.5.5)

𝑛−2

𝑑 (𝑓(𝑥)), ) 𝑑𝑥 where (11.5.4) holds for 𝑛 ≥ 1 and (11.5.5) holds for 𝑛 ≥ 2. + 𝑛(𝑛 − 1) (

𝑑 In other words, if we consider and 𝑋(𝑓(𝑥)) = 𝑥𝑓(𝑥) as operators, then (11.5.4) 𝑑𝑥 and (11.5.5) become 𝑛

(

𝑛

𝑑 𝑑 𝑑 ) 𝑋 = 𝑋 ( ) + 𝑛( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥

𝑛

𝑛−1

(11.5.6)

,

𝑛

𝑛−1

𝑑 𝑑 𝑑 𝑑 + 𝑛(𝑛 − 1) ( ) ) (𝑋 2 − 1) = (𝑋 2 − 1) ( ) + 2𝑛𝑋 ( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥 𝑑𝑥 where multiplication denotes composition of operators. (

𝑛−2

(11.5.7)

Proof. Problem 11.5.3. Orthogonality then comes from the following result. Theorem 11.5.6. For 𝑛 ≥ 0, the 𝑛th Legendre polynomial 𝑃𝑛 (𝑥) is an eigenvector of the operator 𝐿 of Definition 11.5.3 with eigenvalue −𝑛(𝑛 + 1). Consequently, {𝑃𝑛 (𝑥)} is an orthogonal subset of 𝐿2 ([−1, 1]). Proof. Problem 11.5.4. To finish the proof that ℬ = {𝑃𝑛 (𝑥)} is an orthogonal basis (Definition 7.3.14), it remains to show that the generalized Fourier series of any 𝑓 ∈ 𝐿2 ([−1, 1]) with respect to ℬ (Definition 7.3.9) converges to 𝑓 in 𝐿2 . Again, we first require an algebraic digression. Lemma 11.5.7. Suppose that for each 𝑛 ≥ 0, 𝑝𝑛 (𝑥) is a polynomial of degree 𝑛. Then any polynomial 𝑞(𝑥) of degree 𝑁 can be expressed as a linear combination of 𝑝0 , … , 𝑝𝑁 ; in other words, for every polynomial 𝑞(𝑥) of degree 𝑁, there exist 𝑎𝑛 ∈ 𝐂 such that 𝑁

𝑞(𝑥) = ∑ 𝑎𝑛 𝑝𝑛 (𝑥). 𝑛=0

(11.5.8)

252

Chapter 11. Eigenbases and differential equations

Proof. Problem 11.5.5. We come to the main result of this section. Theorem 11.5.8. The set ℬ = {𝑃𝑛 (𝑥)} of Legendre polynomials is an orthogonal basis for 𝐿2 ([−1, 1]). Proof. Comparing Definition 7.3.14 and Theorem 11.5.6, we see that it remains to show that for any 𝑓 ∈ 𝐿2 ([−1, 1]), the generalized Fourier series of 𝑓 with respect to ℬ converges to 𝑓 in 𝐿2 . This is proved in Problem 11.5.6. In principle, we now know that ℬ = {𝑃𝑛 (𝑥)} is an orthogonal basis for 𝐿2 ([−1, 1]). However, to use ℬ for calculations, we need to know ‖𝑃𝑛 (𝑥)‖. We therefore conclude this section with the following result. Theorem 11.5.9. We have that ⟨𝑃𝑛 , 𝑃𝑛 ⟩ =

2 . 2𝑛 + 1

Proof. Problem 11.5.7. Remark 11.5.10. We should mention that in other sources, our definition of Legendre polynomials (Definition 11.5.1) may appear as a theorem known as Rodrigues’s formula. Legendre polynomials can alternatively be defined as polynomial eigenfunctions of 𝐿 (Holland [Hol07, 2.8]) or coefficients of the power series expansion ∞

1 √1 − 2𝑥𝑡 + 𝑡2

= ∑ 𝑃𝑛 (𝑥)𝑡𝑛 .

(11.5.9)

𝑛=0

The latter approach occurs naturally when studying eigenfunctions of the Laplacian in spherical coordinates (Dym and McKean [DM85, 4.12]). Legendre polynomials can also be obtained by applying the Gram-Schmidt orthogonalization process to the set {𝑥𝑛 } (Dym and McKean [DM85, 1.3]).

Problems. 11.5.1. (*) (Proves Theorem 11.5.2) Prove that the 𝑛th Legendre polynomial 𝑛

𝑃𝑛 (𝑥) =

𝑑 1 ( ) ((𝑥2 − 1)𝑛 ) 2𝑛 𝑛! 𝑑𝑥

is a polynomial of degree 𝑛 with leading coefficient

(11.5.10)

(2𝑛)! . 2𝑛 (𝑛! )2

11.5.2. (*) (Proves Theorem 11.5.4) Recall that the inner product on 𝐿2 ([−1, 1]) is given by 1

⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥) 𝑑𝑥.

(11.5.11)

−1

Prove that the operator 𝐿(𝑓) =

𝑑𝑓 𝑑 ((1 − 𝑥2 ) ) = (1 − 𝑥2 )𝑓″ − 2𝑥𝑓′ 𝑑𝑥 𝑑𝑥

with domain 𝔇(𝐿) = 𝐶 2 ([−1, 1]) is Hermitian.

(11.5.12)

11.5. Legendre polynomials

253

𝑑 11.5.3. (*) (Proves Lemma 11.5.5) Consider and 𝑋(𝑓(𝑥)) = 𝑥𝑓(𝑥) as operators, and 𝑑𝑥 let multiplication denote composition of operators. (a) For 𝑛 ≥ 1, prove that 𝑛

(

𝑛

𝑑 𝑑 𝑑 ) 𝑋 = 𝑋 ( ) + 𝑛( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥

𝑛−1

.

(11.5.13)

(b) For 𝑛 ≥ 2, prove that 𝑛

(

𝑛

𝑑 𝑑 𝑑 ) (𝑋 2 − 1) = (𝑋 2 − 1) ( ) + 2𝑛𝑋 ( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥

𝑛−1

𝑛−2

+ 𝑛(𝑛 − 1) (

𝑑 ) 𝑑𝑥

.

(11.5.14)

11.5.4. (*) (Proves Theorem 11.5.6) Consider the operator 𝐿 from (11.5.12). (a) Check by direct calculation that 𝑃0 (𝑥) = 1 and 𝑃1 (𝑥) = 𝑥 are eigenfunctions of 𝐿. (b) Check by direct calculation that (𝑥2 − 1)

𝑑 2 (𝑥 − 1)𝑛 = 2𝑛𝑥(𝑥2 − 1)𝑛 . 𝑑𝑥

(11.5.15)

(c) For 𝑛 ≥ 2, by differentiating both sides of (11.5.15) 𝑛 + 1 times, prove that 𝑛

𝑦𝑛 = (

𝑑 ) (𝑥2 − 1)𝑛 𝑑𝑥

(11.5.16)

is an eigenvector of 𝐿 with eigenvalue −𝑛(𝑛 + 1), and prove that 𝑃𝑛 (𝑥) = 2𝑛 𝑛! 𝑦𝑚 is also an eigenvector of 𝐿 with eigenvalue −𝑛(𝑛 + 1). (d) Prove that {𝑃𝑛 (𝑥)} is an orthogonal subset of 𝐿2 ([−1, 1]). 11.5.5. (*) (Proves Lemma 11.5.7) Suppose that for each 𝑛 ≥ 0, 𝑝𝑛 (𝑥) is a polynomial of degree 𝑛. Prove that for every polynomial 𝑞(𝑥) of degree 𝑁, there exist 𝑎𝑛 ∈ 𝐂 such that 𝑁

𝑞(𝑥) = ∑ 𝑎𝑛 𝑝𝑛 (𝑥).

(11.5.17)

𝑛=0

11.5.6. (*) (Proves Theorem 11.5.8) Let ℬ = {𝑃𝑛 (𝑥)} be the set of Legendre polynomials, and suppose 𝑓 ∈ 𝐿2 ([−1, 1]). Prove that 𝑁

̂ lim ∑ 𝑓(𝑛)𝑢 𝑛 = 𝑓.

𝑁→∞

(11.5.18)

𝑛=1

11.5.7. (*) (Proves Theorem 11.5.9) The goal of this problem is to calculate ⟨𝑃𝑛 , 𝑃𝑛 ⟩, where again, 𝑛 1 𝑑 𝑃𝑛 (𝑥) = 𝑛 ( ) ((𝑥2 − 1)𝑛 ) . (11.5.19) 2 𝑛! 𝑑𝑥 (a) Prove that for 0 ≤ 𝑘 ≤ 𝑛 − 1, ⟨𝑥𝑘 , 𝑃𝑛 ⟩ = 0. 𝑘

(b) Prove that for 0 ≤ 𝑘 ≤ 𝑛 − 1, (

𝑑 ) ((𝑥2 − 1)𝑛 ) = 𝑞𝑘 (𝑥)(1 − 𝑥2 )𝑛−𝑘 , where 𝑑𝑥

𝑞𝑘 (𝑥) is a polynomial. (c) Prove that ⟨𝑃𝑛 , 𝑃𝑛 ⟩ = ⟨𝑎𝑛 𝑥𝑛 , 𝑃𝑛 ⟩, where 𝑎𝑛 𝑥𝑛 is the leading term of 𝑃𝑛 (𝑥).

254

Chapter 11. Eigenbases and differential equations 1

(d) For 𝑛 ≥ 0, let 𝐴𝑛 = ∫ (1 − 𝑥2 )𝑛 𝑑𝑥. Prove that 𝐴𝑛 = ( 0

2𝑛 for 𝑛 ≥ 1, )𝐴 2𝑛 + 1 𝑛−1

and find a general formula for 𝐴𝑛 . 2 (e) Prove that ⟨𝑃𝑛 , 𝑃𝑛 ⟩ = . 2𝑛 + 1

11.6 Hermite functions In this section, we introduce the eigenbasis of Hermite functions {ℎ𝑛 (𝑥)} for a differential operator in 𝐿2 (𝐑) coming from Schrödinger’s equations (9.2.8) for the quantum harmonic oscillator (see Section 9.2). As we shall see, the orthogonal basis {ℎ𝑛 (𝑥)} is not just useful for solving Schrödinger’s equations, to which we return in Section 11.7, but also turns out to be an eigenbasis for the Fourier transform, the subject of Chapters 12 and 13. As in the previous section, we begin by defining the functions in question. In keeping with our choices of conventions and constants, we follow the presentation and somewhat unusual conventions of Hermite functions and polynomials from Dym and McKean [DM85, 2.5]; see Remark 11.6.11 for a comparison with other definitions. Definition 11.6.1. For 𝑛 ≥ 0, the 𝑛th Hermite function ℎ𝑛 (𝑥) is defined to be 𝑛

ℎ𝑛 (𝑥) = (

(−1)𝑛 𝜋𝑥2 𝑑 2 )𝑒 ( ) 𝑒−2𝜋𝑥 . 𝑛! 𝑑𝑥

(11.6.1)

We again begin with an immediate observation. Theorem 11.6.2. The Hermite function ℎ𝑛 (𝑥) satisfies 2

ℎ𝑛 (𝑥) = 𝐻𝑛 (𝑥)𝑒−𝜋𝑥 ,

(11.6.2) 𝑛

where 𝐻𝑛 (𝑥) is a polynomial of degree 𝑛 with leading coefficient

(4𝜋) . 𝑛!

Proof. Problem 11.6.1. Definition 11.6.3. We define the 𝑛th Hermite polynomial to be the polynomial 𝐻𝑛 (𝑥) described in Theorem 11.6.2. For example, the first six Hermite polynomials are 1 𝐻0 (𝑥) = 1, 𝐻3 (𝑥) = (64𝜋3 𝑥3 − 48𝜋2 𝑥), 3! 1 (11.6.3) 𝐻1 (𝑥) = 4𝜋𝑥, 𝐻4 (𝑥) = (256𝜋4 𝑥4 − 384𝜋3 𝑥2 + 48𝜋2 ), 4! 1 1 𝐻2 (𝑥) = (16𝜋2 𝑥2 − 4𝜋), 𝐻5 (𝑥) = (1024𝜋5 𝑥5 − 2560𝜋4 𝑥3 + 960𝜋3 𝑥). 2! 5! As mentioned above, our main application of Hermite functions in this chapter is to study the following differential operator. Definition 11.6.4. We define the operator 𝐾 in 𝐿2 (𝐑) with domain 𝔇(𝐾) = 𝒮(𝐑) (the space of Schwartz functions from Section 4.7) by 𝐾(𝑓) = −

𝑑2𝑓 + 4𝜋2 𝑥2 𝑓. 𝑑𝑥2

(11.6.4)

11.6. Hermite functions

255 2

Note that since the Schwartz space 𝒮(𝐑) contains all functions of the form 𝑝(𝑥)𝑒−𝑎𝑥 , where 𝑝(𝑥) is a polynomial and 𝑎 > 0 (Theorem 4.7.4), the Hermite functions ℎ𝑛 (𝑥) are contained in the domain of 𝐾. Note also that by Examples 10.2.6 and 10.2.12 and Theorem 10.2.15, 𝐾 is Hermitian. In any case, as with the Legendre polynomials, our first main task is to prove that the ℎ𝑛 (𝑥) are eigenfunctions of 𝐾, which we do using the following formulas. Lemma 11.6.5. Setting ℎ−1 (𝑥) = 0, for 𝑛 ≥ 0, the Hermite functions ℎ𝑛 (𝑥) satisfy the following identities: 𝑑 − 2𝜋𝑥) ℎ𝑛 = ℎ𝑛′ − 2𝜋𝑥ℎ𝑛 = −(𝑛 + 1)ℎ𝑛+1 , (11.6.5) ( 𝑑𝑥 𝑑 (11.6.6) + 2𝜋𝑥) ℎ𝑛 = ℎ𝑛′ + 2𝜋𝑥ℎ𝑛 = 4𝜋ℎ𝑛−1 . ( 𝑑𝑥 𝑑 Consequently, if 𝑋(𝑓(𝑥)) = 𝑥𝑓(𝑥), we sometimes call the operator − 2𝜋𝑋 a 𝑑𝑥 𝑑 raising operator, and the operator + 2𝜋𝑋 a lowering operator. 𝑑𝑥 Proof. Problems 11.6.2 and 11.6.3. Theorem 11.6.6. For 𝑛 ≥ 0, the 𝑛th Hermite function ℎ𝑛 (𝑥) is an eigenfunction of the 1 operator 𝐾 of Definition 11.6.4 with eigenvalue 4𝜋 (𝑛 + ). Consequently, {ℎ𝑛 (𝑥)} is an 2

orthogonal subset of 𝐿2 (𝐑). Proof. Problem 11.6.4.

It therefore remains to show that for any 𝑓 ∈ 𝐿2 (𝐑), the generalized Fourier series of 𝑓 with respect to {ℎ𝑛 (𝑥)} converges to 𝑓. However, we will be able to prove this fact more naturally once we have the Fourier transform available to us, so we will delay the proof of the following theorem until Section 13.4. Theorem 11.6.7. The set ℬ = {ℎ𝑛 (𝑥)} of Hermite functions is an orthogonal basis for 𝐿2 (𝐑); moreover, it is an eigenbasis for the operator 𝐾 of Definition 11.6.4. 2

Remark 11.6.8. Recall that 𝒮(𝐑) contains all functions of the form 𝑝(𝑥)𝑒−𝑎𝑥 , where 𝑝(𝑥) is a polynomial and 𝑎 > 0 (Theorem 4.7.4). Actually, Theorem 11.6.7 implies that 2 such functions are dense in 𝐿2 (𝐑), so we may think of functions 𝑝(𝑥)𝑒−𝑎𝑥 as being, in some sense, typical elements of 𝒮(𝐑), or more loosely speaking, (approximately) typical elements of 𝐿2 (𝐑). Again, it will be helpful for calculations to know the values of ‖ℎ𝑛 ‖: Theorem 11.6.9. We have that ⟨ℎ𝑛 , ℎ𝑛 ⟩ =

(4𝜋)𝑛 √2𝑛!

.

Proof. Problem 11.6.5. Definition 11.6.10. We define the normalized Hermite functions 𝜓𝑛 (𝑥) to be 𝜓𝑛 (𝑥) = (

21/4 √𝑛! ) ℎ𝑛 (𝑥). (4𝜋)𝑛/2

(11.6.7)

256

Chapter 11. Eigenbases and differential equations

Note that Theorems 11.6.7 and 11.6.9 together imply that {𝜓𝑛 (𝑥)} is an orthonormal basis for 𝐿2 (𝐑). Remark 11.6.11. As mentioned at the beginning of this section, our definition of the Hermite polynomial includes some unusual choices. For comparison, two more common definitions for the Hermite polynomials (see [DLM, Sect. 18]) are 𝑛

𝑑 2 (11.6.8) ) 𝑒−𝑥 /2 , 𝑑𝑥 𝑛 𝑑 2 2 𝐻2𝑛 (𝑥) = (−1)𝑛 𝑒𝑥 ( ) 𝑒−𝑥 . (11.6.9) 𝑑𝑥 These polynomials 𝐻1𝑛 (𝑥) and 𝐻2𝑛 (𝑥) are related to our polynomials 𝐻𝑛 (𝑥) by 𝐻1𝑛 (𝑥) = (−1)𝑛 𝑒𝑥

2 /2

(

𝐻1𝑛 (𝑥) = (

𝑛! 𝑥 ) 𝐻𝑛 ( ), (4𝜋)𝑛/2 √4𝜋

(11.6.10)

𝐻2𝑛 (𝑥) = (

𝑛! 𝑥 ) 𝐻𝑛 ( ). 𝑛/2 (2𝜋) √2𝜋

(11.6.11)

Problems. 11.6.1. (*) (Proves Theorem 11.6.2) Prove that the 𝑛th Hermite function 𝑛

ℎ𝑛 (𝑥) = (

(−1)𝑛 𝜋𝑥2 𝑑 2 )𝑒 ( ) 𝑒−2𝜋𝑥 𝑛! 𝑑𝑥

(11.6.12)

2

has the form ℎ𝑛 (𝑥) = 𝐻𝑛 (𝑥)𝑒−𝜋𝑥 , where 𝐻𝑛 (𝑥) is a polynomial of degree 𝑛 with (4𝜋)𝑛 . leading coefficient 𝑛! 11.6.2. (*) (Proves Lemma 11.6.5) Prove that for 𝑛 ≥ 0, the Hermite functions ℎ𝑛 (𝑥) satisfy ℎ𝑛′ − 2𝜋𝑥ℎ𝑛 = −(𝑛 + 1)ℎ𝑛+1 . 11.6.3. (*) (Proves Lemma 11.6.5) Let ℎ−1 (𝑥) = 0. (a) Prove that for 𝑛 ≥ 1, 𝑛

𝑛+1

𝑛−1

𝑑 𝑑 𝑑 2 2 (𝑒−2𝜋𝑥 ) − 4𝜋𝑛 ( ) ) (𝑒−2𝜋𝑥 ) = − ( ) 𝑑𝑥 𝑑𝑥 𝑑𝑥 (b) Prove that for 𝑛 ≥ 0, ℎ𝑛′ + 2𝜋𝑥ℎ𝑛 = 4𝜋ℎ𝑛−1 .

4𝜋𝑥 (

2

(𝑒−2𝜋𝑥 ).

(11.6.13)

11.6.4. (*) (Proves Theorem 11.6.6) Let ℎ𝑛 (𝑥) be the 𝑛th Hermite function, and let 𝐾(𝑓) = −

𝑑2𝑓 + 4𝜋2 𝑥2 𝑓. 𝑑𝑥2

(11.6.14) 1

(a) Prove that ℎ𝑛 (𝑥) is an eigenfunction of 𝐾 with eigenvalue 4𝜋 (𝑛 + ). (b) Prove that {ℎ𝑛 (𝑥)} is an orthogonal subset of 𝐿2 (𝐑).

2

11.6.5. (*) (Proves Theorem 11.6.9) The goal of this problem is to calculate ⟨ℎ𝑛 , ℎ𝑛 ⟩. 2

(a) Prove that for 0 ≤ 𝑘 ≤ 𝑛 − 1, ⟨𝑥𝑘 𝑒−𝜋𝑥 , ℎ𝑛 ⟩ = 0. 2

(b) Prove that ⟨ℎ𝑛 , ℎ𝑛 ⟩ = ⟨𝑎𝑛 𝑥𝑛 𝑒−𝜋𝑥 , ℎ𝑛 ⟩, where 𝑎𝑛 𝑥𝑛 is the leading term of 𝐻𝑛 (𝑥). ∞ 𝑎 2 (c) Prove that ⟨ℎ𝑛 , ℎ𝑛 ⟩ = 𝑛 , starting with the fact that ∫ 𝑒−𝜋𝑥 𝑑𝑥 = 1 (The√2 −∞ orem 4.8.6).

11.7. The quantum harmonic oscillator

257

11.7 The quantum harmonic oscillator With the Hermite functions in hand, we now return to the solution of Schrödinger’s equation for the quantum harmonic oscillator (Question 9.2.2), restated here more precisely. Question 11.7.1. Given an initial value 𝑓(𝑥) ∈ 𝐿2 (𝐑), find Ψ(𝑥, 𝑡) (𝑡 > 0) such that: (1) (Differentiable) For fixed 𝑡0 > 0, Ψ(𝑥, 𝑡0 ) is a twice differentiable function on 𝐑, and for fixed 𝑥0 ∈ 𝑆 1 , Ψ(𝑥0 , 𝑡) is differentiable on (0, +∞). (2) (Initial value) For any 𝑥 ∈ 𝑆 1 , lim+ Ψ(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0

(3) (PDE) For all 𝑡 > 0, −

𝜕2 Ψ 𝜕Ψ + 4𝜋2 𝑥2 Ψ = 𝑖 . 𝜕𝑡 𝜕𝑥2

(11.7.1)

Note that (11.7.1) can be written as 𝐾(Ψ) = 𝑖

𝜕Ψ , 𝜕𝑡

(11.7.2)

𝑑2𝑓 where 𝐾 is the operator 𝐾(𝑓) = − 2 + 4𝜋2 𝑥2 𝑓 from Definition 11.6.4. Again confin𝑑𝑥 ing our interests to formal solutions, as we saw in Section 11.6, the normalized Hermite functions {𝜓𝑛 (𝑥)} form an orthonormal eigenbasis for 𝐾, where 𝜓𝑛 has the eigenvalue 1 4𝜋 (𝑛 + ). In particular, for any 𝑓 ∈ 𝐿2 (𝐑), 2

̂ 𝑓(𝑥) = ∑ 𝑓(𝑛)𝜓 𝑛 (𝑥),

(11.7.3)

𝑛=0

̂ is the generalized Fourier coefficient 𝑓(𝑛) ̂ where 𝑓(𝑛) = ⟨𝑓, 𝜓𝑛 ⟩ and convergence is 𝜕Ψ 2 in 𝐿 . The eigenbasis method (Section 11.2), in the case 𝑇(Ψ) = 𝑖 , then gives the 𝜕𝑡 solution ∞

1

−4𝜋𝑖(𝑛+ )𝑡 ̂ 2 Ψ(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝜓 𝑛 (𝑥)𝑒

(11.7.4)

𝑛=0

to Question 11.7.1, as the reader should reverify (Problem 11.7.1). Having solved Schrödinger’s equation, we turn to what in many ways is a much more difficult question: What does the state function Ψ mean? In fact, this question is not at all straightforward and is even still, as of this writing, a matter of quite some debate. To give one common interpretation, we first rephrase the question in the following way. Question 11.7.2. If the state of a given particle is given by Ψ(𝑥, 𝑡), what can we observe, or measure, about that particle? The following axiom then provides a mathematically precise answer to Question 11.7.2. (See Section 14.4 for a further refinement.) Axiom 11.7.3. An observable quantity of our system is represented by a Hermitian operator 𝑀 in 𝐿2 (𝐑). If {𝜓𝑛 } is an orthonormal eigenbasis for 𝑀 with 𝑀(𝜓𝑛 ) = 𝜆𝑛 𝜓𝑛

258

Chapter 11. Eigenbases and differential equations

and the state of our particle is Ψ = ∑ 𝑐𝑛 𝜓𝑛 , with ‖Ψ‖ = 1, then the only possible values of the observable quantity are the eigenvalues 𝜆𝑛 of 𝑀, and upon measurement, the state of the system collapses into the single state 𝜓𝑛 corresponding to the observed 2 value 𝜆𝑛 with probability |𝑐𝑛 | . For example, the energy of the particle in the quantum harmonic oscillator is represented by the operator 𝐾, so if we try to measure the energy of a particle in the state 2 ̂ || . Ψ(𝑥, 𝑡) from (11.7.4), the particle will enter the state 𝜓𝑛 with probability ||𝑓(𝑛) Axiom 11.7.3 has the following notable features: •

Perhaps the most famous, and most unsettling, aspect of this interpretation of quantum mechanics is the randomized nature of observation, famously described by Einstein as “God rolling dice.” The question of whether this feature is desirable or defensible is beyond the scope of this book; we note only that the great majority of experiments involve taking averages over an enormous number of particles, and in such cases, a probabilistic approach gives useful predictions that have an excellent record of being verified by experiment.

More concretely, the reader should verify that the collapsing event represents a genuine probability, or in other words, that the sum of the probabilities of all possible post-collapse states is actually 1; see Problem 11.7.2.

Returning to the energy levels of an oxygen molecule (Question 9.2.1), Axiom 11.7.3 and Theorems 11.6.6 and 11.6.7 predict that (up to scaling constants) the only pos1 sible total energy levels of a quantum harmonic oscillator are 4𝜋 (𝑛 + ). This 2 quantization is the reason for the “quantum” in quantum mechanics and explains why we only observe discrete energy levels in oxygen.

For more about quantum mechanics, including a more general mathematical formulation of quantum mechanics and further references, see Sections 13.5 and 14.4.

Problems. 11.7.1. Working formally (i.e., assuming that all series converge and all operators commute with infinite sums), prove that ∞

1

−4𝜋𝑖(𝑛+ )𝑡 ̂ 2 Ψ(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝜓 𝑛 (𝑥)𝑒

(11.7.5)

𝑛=0

is a solution to 𝐾(Ψ) = 𝑖

𝜕Ψ , where {𝜓𝑛 (𝑥)} is an orthonormal eigenbasis for 𝐾. 𝜕𝑡

11.7.2. Suppose the state of a particle at time 𝑡 is given by ∞

Ψ(𝑥, 𝑡) = ∑ 𝑐𝑛 𝜓𝑛 (𝑥)𝑒

1 2

−4𝜋𝑖(𝑛+ )𝑡

,

(11.7.6)

𝑛=0 ∞

2

where ‖Ψ‖ = 1 and {𝜓𝑛 (𝑥)} is an orthonormal eigenbasis for 𝐾. Prove that ∑ |𝑐𝑛 | 𝑛=0

= 1. (In other words, the collapse event of Axiom 11.7.3 is governed by a genuine probability distribution.)

11.8. Sturm-Liouville theory

259

11.8 Sturm-Liouville theory In this chapter, we have used several different eigenbases to solve differential equations: (1) {𝑒𝑛 (𝑥)} for the Laplacian Δ = −

𝑑2 in 𝐿2 (𝑆 1 ). 𝑑𝑥2 1

(2) {sin(2𝜋𝑛𝑥)} for the Laplacian in 𝐿2 ([0, ]) with Dirichlet boundary conditions. 2

1

(3) {cos(2𝜋𝑛𝑥)} for the Laplacian in 𝐿2 ([0, ]) with Neumann boundary conditions. 2

(4) {𝑃𝑛 (𝑥)} (Legendre polynomials) for the operator 𝐿(𝑓) = 𝐿2 ([−1, 1]).

𝑑𝑓 𝑑 ((1 − 𝑥2 ) ) in 𝑑𝑥 𝑑𝑥

𝑑2𝑓 + 4𝜋2 𝑥2 𝑓 in 𝐿2 (𝐑). 𝑑𝑥2 The reader may notice that all of the operators in question are some variation on the Laplacian; more precisely, they all have the following form. (5) {ℎ𝑛 (𝑥)} (Hermite functions) for the operator 𝐾(𝑓) = −

Definition 11.8.1. Let 𝑋 be either a closed (possibly infinite) interval in 𝐑 or 𝑆 1 . We define a Sturmian operator to be an operator in 𝐿2 (𝑋) of the form 𝐿(𝑓(𝑥)) =

𝑑𝑓 𝑑 (𝑝(𝑥) ) + 𝑟(𝑥)𝑓(𝑥) 𝑑𝑥 𝑑𝑥

(11.8.1)

for some real-valued 𝑝 ∈ 𝐶 1 (𝑋) and 𝑟 ∈ 𝐶 0 (𝑋). If 𝑋 is closed and bounded and 𝑝(𝑥) ≠ 0 for 𝑥 ∈ 𝑋, we say that 𝐿 is regular; otherwise, we say that 𝐿 is singular. Specifically, 𝑝(𝑥) = −1 and 𝑟(𝑥) = 0 gives the Laplacian; 𝑝(𝑥) = (1 − 𝑥2 ) and 𝑟(𝑥) = 0 gives the Legendre operator; and 𝑝(𝑥) = −1 and 𝑟(𝑥) = 4𝜋2 𝑥2 gives the operator associated with the Hermite functions. A Sturmian operator 𝐿 will be Hermitian under conditions on 𝔇(𝐿) that are often met in practice. More precisely, in the case where 𝑋 is a closed and bounded interval, we have: Theorem 11.8.2. Let 𝐿(𝑓(𝑥)) =

𝑑𝑓 𝑑 (𝑝(𝑥) ) + 𝑟(𝑥)𝑓(𝑥) 𝑑𝑥 𝑑𝑥

(11.8.2)

be a Sturmian operator in 𝐿2 ([𝑎, 𝑏]). Then 𝐿 is Hermitian if and only if for every 𝑓, 𝑔 ∈ 𝔇(𝐿), ′ 𝑏 𝑝(𝑥)(𝑓(𝑥)𝑔 (𝑥) − 𝑓′ (𝑥)𝑔(𝑥))| = 0. (11.8.3) 𝑎

We can rewrite (11.8.3) using the Wronskian 𝑊(𝑓1 , 𝑓2 ) = det (

𝑓1 (𝑥) 𝑓1′ (𝑥)

𝑓2 (𝑥) ) = 𝑓1 (𝑥)𝑓2′ (𝑥) − 𝑓2 (𝑥)𝑓1′ (𝑥). 𝑓2′ (𝑥)

(11.8.4)

In these terms, (11.8.3) becomes 𝑏

𝑝(𝑥)𝑊(𝑓, 𝑔)|𝑎 = 0. For more on the Wronskian, see Hartman [Har02, IV.8].

(11.8.5)

260

Chapter 11. Eigenbases and differential equations

Note that (11.8.3) holds if 𝔇(𝐿) satisfies Dirichlet or Neumann boundary conditions or if 𝑝(𝑎) = 𝑝(𝑏) = 0 (as is true for the Legendre operator). We also have the analogue of (11.8.3) if 𝑋 = 𝑆 1 , by periodicity, or if 𝑋 = 𝐑 and 𝔇(𝐿) = 𝒮(𝐑) (the Schwartz space), by taking limits. Proof. Problem 11.8.1. Sturm-Liouville theory studies the cases and conditions under which a Sturmian operator can be guaranteed to have an eigenbasis and also studies the resulting eigenbases. For example, the following result, whose proof is beyond the scope of this book, has a relatively straightforward statement. Theorem 11.8.3. Let 𝐿 be a regular (i.e., 𝑝(𝑥) > 0) Sturmian operator in 𝐿2 ([𝑎, 𝑏]), suppose that 𝔇(𝐿) ⊆ 𝐶 1 ([𝑎, 𝑏]), and suppose that any 𝑓 ∈ 𝔇(𝐿) satisfies the boundary conditions 𝛼0 𝑓(𝑎) + 𝛼1 𝑓′ (𝑎) = 0,

𝛽0 𝑓(𝑏) + 𝛽1 𝑓′ (𝑏) = 0,

(11.8.6)

for fixed 𝛼𝑖 , 𝛽𝑖 ∈ 𝐑 such that (𝛼0 , 𝛼1 ) and (𝛽0 , 𝛽1 ) ≠ (0, 0). Then there exists an orthonormal eigenbasis {𝜙𝑛 } for 𝐿 with associated eigenvalues {−𝜆𝑛 } such that the sequence 𝜆𝑛 is strictly increasing and lim 𝜆𝑛 = +∞. 𝑛→∞

Note that the boundary conditions (11.8.6) imply that 𝐿 is Hermitian (Problem 11.8.2). For a proof of Theorem 11.8.3 and much more about Sturm-Liouville theory, see Al-Gwaiz [AG08].

Problems. 11.8.1. Let

𝑑𝑓 𝑑 (11.8.7) (𝑝(𝑥) ) + 𝑟(𝑥)𝑓(𝑥) 𝑑𝑥 𝑑𝑥 be a Sturmian operator in 𝐿2 ([𝑎, 𝑏]). Prove that 𝐿 is Hermitian if and only if for every 𝑓, 𝑔 ∈ 𝔇(𝐿), 𝐿(𝑓(𝑥)) =

𝑏

𝑝(𝑥)(𝑓(𝑥)𝑔 (𝑥) − 𝑓′ (𝑥)𝑔(𝑥))|𝑎 = 0.

(11.8.8)

11.8.2. Let 𝐿 be a Sturmian operator in 𝐿2 ([𝑎, 𝑏]), and suppose that for some 𝛼𝑖 , 𝛽𝑖 ∈ 𝐑 such that (𝛼0 , 𝛼1 ) and (𝛽0 , 𝛽1 ) ≠ (0, 0), we have 𝛼0 𝑓(𝑎) + 𝛼1 𝑓′ (𝑎) = 0,

𝛽0 𝑓(𝑏) + 𝛽1 𝑓′ (𝑏) = 0,

for all 𝑓 ∈ 𝔇(𝐿). Prove that 𝐿 is Hermitian.

(11.8.9)

Part 4

The Fourier transform and beyond

12 The Fourier transform The integrals which we have obtained are not only general expressions which satisfy the differential equation, they represent in the most distinct manner the natural effect which is the object of the phenomenon. . . . [W]hen this condition is fulfilled, the integral is, properly speaking, the equation of the phenomenon; it expresses clearly the character and progress of it, in the same manner as the finite equation of a line or curved surface makes known all the properties of those forms. — Joseph Fourier, The Analytical Theory of Heat In this chapter, we introduce the Fourier transform, which one may view as the ̂ for continuous analogue of Fourier series; that is, instead of Fourier coefficients 𝑓(𝑛) ̂ 𝑛 ∈ 𝐙, we look at the transform 𝑓(𝛾) for 𝛾 ∈ 𝐑. After discussing some context (Section 12.1) and establishing fundamental tools (Section 12.2), we first establish the Fourier transform in the friendly confines of 𝒮(𝐑) (Sections 12.3 and 12.4) and then extend it to all of 𝐿2 (𝐑) (Section 12.5).

12.1 The big picture At this point, it seems appropriate to take stock, with the benefit of hindsight, of what we have done so far. In terms of overall theory, one way to look at Part 2 of this book is that we answered the following question: Question 12.1.1. Given 𝑓 ∈ 𝐿2 (𝑆 1 ), to what extent can we recover 𝑓 from its Fourier coefficients 1

̂ = ⟨𝑓, 𝑒𝑛 ⟩ = ∫ 𝑓(𝑥)𝑒𝑛 (𝑥) 𝑑𝑥? 𝑓(𝑛)

(12.1.1)

0

263

264

Chapter 12. The Fourier transform

As the reader may recall, Question 12.1.1 is not a question we initially set out to answer in Chapter 6, but it is a question for which we found a complete answer in the Inversion Theorem for Fourier Series (Theorem 8.1.1), which we repeat here in slightly different terms. 2 ̂ Theorem 12.1.2. If 𝑓 ∈ 𝐿2 (𝑆 1 ), then 𝑓 = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥) in the 𝐿 metric. In particular, 𝑛∈𝐙

̂ we can recover 𝑓 completely from its Fourier coefficients 𝑓(𝑛). From a theoretical point of view, the Fourier transform is the analogue of the Fourier ̂ that we get when we replace 𝑆 1 with 𝐑 and 𝐙 with 𝐑. coefficient mapping 𝑓 ↦ 𝑓(𝑛) More precisely, consider the following question. Question 12.1.3. Given 𝑓 ∈ 𝐿2 (𝐑), to what extent can we recover 𝑓 from the function 𝑓 ̂ ∶ 𝐑 → 𝐂 defined by ∞

̂ = ∫ 𝑓(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥? 𝑓(𝛾)

(12.1.2)

−∞

The function 𝑓 ̂ defined by (12.1.2) is called the Fourier transform of 𝑓. Note that ̂ that define a function with the Fourier transform, instead of having coefficients 𝑓(𝑛) ̂ on 𝐙, we have coefficients 𝑓(𝛾) that define a function on 𝐑. Reasoning analogously, ̂ represents the “frequency response” of a wave of known period 1 in each since 𝑓(𝑛) ̂ as representing the frequency discrete-valued frequency 𝑛 ∈ 𝐙, we can think of 𝑓(𝛾) response of a possibly nonperiodic wave in each continuously valued frequency 𝛾 ∈ 𝐑. In any case, we will eventually end up with much the same result: ̂ Theorem 12.1.4 (Inversion Theorem for the Fourier Transform). If 𝑓 ∈ 𝐿2 (𝐑) and 𝑓(𝛾) ̂ by the inverse Fourier transform is defined by (12.1.2), then 𝑓 can be recovered from 𝑓(𝛾) ∞

𝑓(𝑥) = ∫ 𝑓(𝛾)𝑒2𝜋𝑖𝑥𝛾 𝑑𝛾.

(12.1.3)

−∞

By now, the reader may be used to our delaying the proof of key results like the Inversion Theorem until later (Theorem 12.5.5). What may not be as apparent is that not only are there problems to overcome in the proof of Theorem 12.1.4, there are actually problems in the statement of Question 12.1.3. Most notably, if 𝑓 ∈ 𝐿1 (𝐑), then the integral (12.1.2) is well-defined (Problem 12.1.1), but if we only know that 𝑓 ∈ 𝐿2 (𝐑), then (12.1.2) might not be well-defined (Problem 12.1.2). (In particular, note that (12.1.2) is not an inner product in 𝐿2 , because 𝑒2𝜋𝑖𝛾𝑥 ∉ 𝐿2 (𝐑).) Conversely, if we only know that 𝑓 ∈ 𝐿1 (𝐑), 𝑓 ̂ may not be in 𝐿1 (𝐑) (Problem 12.1.3), causing problems defining the inverse transform (12.1.3). We therefore need to find a way to extend the definition of (12.1.2) to all of 𝐿2 (𝐑). Our basic strategy will be to develop the necessary theory in the space of Schwartz functions 𝒮(𝐑) (Sections 4.7 and 4.8) first and then, since 𝒮(𝐑) is dense in 𝐿2 (𝐑), extend to 𝐿2 (𝐑) by taking limits. Exactly how this works will unfold in the rest of this chapter, but in any case, we hope the reader now has some idea of why we will need to prove many of our main results twice.

12.1. The big picture

265

We also take this opportunity to review/recap material we will need about calculus on 𝒮(𝐑). Specifically, we remind the reader: •

Section 4.7 describes 𝒮(𝐑) and its basic properties.

Section 4.8 carries over results for integrals on finite intervals to improper integrals on 𝐑. In particular, we have improper versions of integration by parts (Theorem 4.8.7), differentiating an integral (Theorem 4.8.8), and Fubini’s Theorem (The∞

2

orem 4.8.11). We also have the integral ∫ 𝑒−𝜋𝑥 𝑑𝑥 = 1 (Theorem 4.8.6). −∞

In connection with the latter material, we will need several background lemmas. As with Section 4.8 itself, the first-time reader may choose to take these lemmas as given and return to their proofs later. Our first lemma is a “separation of variables” technique taken from Stein and Shakarchi [SS03, Ch. 5, Prop. 1.11]. Lemma 12.1.5. If 𝑓 ∈ 𝐶 0 (𝐑) is rapidly decaying (Definition 4.7.1), then for any 𝑘 ≥ 0, there exists some 𝐶𝑘 > 0 such that |𝑓(𝑥 − 𝑦)| ≤ (

𝐶𝑘 𝑘

|𝑥|

) (1 + |𝑦|)𝑘

(12.1.4)

for all 𝑥, 𝑦 ∈ 𝐑. Proof. Problem 12.1.4. Our other lemma describes some specific cases when Fubini’s Theorem 4.8.11 applies. 𝜕𝐺 𝜕𝐺 , and are all 𝜕𝑥 𝜕𝑦 bounded. Then the following functions satisfy the hypotheses of Fubini’s Theorem 4.8.11.

Lemma 12.1.6. Suppose 𝑓, 𝑔 ∈ 𝒮(𝐑) and 𝐺 ∈ 𝐶 1 (𝐑2 ) is such that 𝐺,

(1) The function 𝐹(𝑥, 𝑦) = 𝐺(𝑥, 𝑦)𝑓(𝑥)𝑔(𝑦). (2) The function 𝐹(𝑥, 𝑦) = 𝐺(𝑥, 𝑦)𝑓(𝑥 − 𝑦)𝑔(𝑦). Proof. Problem 12.1.5.

Problems. 12.1.1. Prove that if 𝑓 ∈ 𝐿1 (𝐑), then ∞

̂ = ∫ 𝑓(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥 𝑓(𝛾)

(12.1.5)

−∞

is well-defined. 12.1.2. Define 𝑓 ∶ 𝐑 → 𝐑 by 0 if 𝑥 < 1, (12.1.6) 𝑓 = {1 if 𝑥 ≥ 1. 𝑥 Prove that 𝑓 ∈ 𝐿2 (𝐑), but for any 𝛾 ∈ 𝐑, 𝑓(𝑥)𝑒2𝜋𝑖𝛾𝑥 is not Lebesgue integrable on 𝐑.

266

Chapter 12. The Fourier transform

12.1.3. Define 𝑓 ∶ 𝐑 → 𝐑 by 𝑓={

1 if −1 ≤ 𝑥 ≤ 1,

(12.1.7)

0 otherwise.

(a) Prove that for 𝛾 ≠ 0, ∞

1

̂ = ∫ 𝑓(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥 = ∫ 𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥 = sin(2𝜋𝛾) . 𝑓(𝛾) 𝜋𝛾 −∞ −1 (b) Let 𝑔(𝑥) =

(12.1.8)

sin(𝜋𝑥) . Prove that for 𝑛 ∈ 𝐍, 𝜋𝑥 𝑛+1

|𝑔(𝑥)| 𝑑𝑥 ≥

𝑛

1 2 ( ). 𝜋(𝑛 + 1) 𝜋

(12.1.9)

| sin(𝜋𝑥) | | 𝑑𝑥 diverges, and therefore, (c) Prove that the improper integral ∫ | | 𝜋𝑥 | 1 𝑓 ̂ ∉ 𝐿1 (𝐑). 12.1.4. (Proves Lemma 12.1.5) Prove that if 𝑓 ∈ 𝐶 0 (𝐑) is rapidly decaying (Definition 4.7.1), then for any 𝑘 ≥ 0, there exists some 𝐶𝑘 > 0 such that 𝑘

|𝑓(𝑥 − 𝑦)| |𝑥| ≤ 𝐶𝑘 (1 + |𝑦|)𝑘

(12.1.10) 𝑘

for all 𝑥, 𝑦 ∈ 𝐑. In other words, prove that |𝑓(𝑥 − 𝑦)| (

|𝑥| ) is bounded on (1 + |𝑦|)𝑘

𝐑 × 𝐑. 12.1.5. (Proves Lemma 12.1.6) Suppose 𝑓, 𝑔 ∈ 𝒮(𝐑) and 𝐺(𝑥, 𝑦) ∈ 𝐶 1 (𝐑2 ) is such that 𝜕𝐺 𝜕𝐺 𝐺, , and are all bounded. 𝜕𝑥 𝜕𝑦 (a) Let 𝐹(𝑥, 𝑦) = 𝐺(𝑥, 𝑦)𝑓(𝑥)𝑔(𝑦). Prove that 𝐹,

𝜕𝐹 𝜕𝐹 , and are all integrable by 𝜕𝑥 𝜕𝑦

separation (Definition 4.8.9). (b) Let 𝐹(𝑥, 𝑦) = 𝐺(𝑥, 𝑦)𝑓(𝑥 − 𝑦)𝑔(𝑦). Prove that 𝐹,

𝜕𝐹 𝜕𝐹 , and are all integrable 𝜕𝑥 𝜕𝑦

by separation.

12.2 Convolutions, Dirac kernels, and calculus on 𝐑 Recall that convolution (Section 8.2), Dirac kernels (Section 8.3), and well-chosen substitutions (such as Lemma 8.2.1) were key tools in proving the Inversion Theorem for Fourier Series in Section 8.4. In this section, we establish the analogous results for 𝒮(𝐑), replacing integrals on 𝑆 1 with (improper) integrals on 𝐑. The main idea is that because functions in 𝒮(𝐑) are so well-behaved, both in terms of differentiability and in terms of decay at infinity, we can more or less reuse the arguments of Sections 8.2–8.4, replacing integrals on 𝑆 1 with integrals on 𝐑. We begin with the 𝐑 versions of translation invariance and scaling.

12.2. Convolutions, Dirac kernels, and calculus on 𝐑

267

Lemma 12.2.1. For 𝑓 ∈ 𝒮(𝐑), 𝑎 ∈ 𝐑, 𝑎 ≠ 0, we have ∞

∫ 𝑓(𝑥 + 𝑎) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥, −∞

(12.2.1)

−∞ ∞

1 ∫ 𝑓(𝑎𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥. |𝑎| −∞ −∞

(12.2.2)

Note the sign when 𝑎 < 0 in (12.2.2), which is perhaps the only surprise here. Proof. Problem 12.2.1. Our first big idea to consider is convolution on 𝐑. Definition 12.2.2. For 𝑓, 𝑔 ∈ 𝐿2 (𝐑), the convolution 𝑓 ∗ 𝑔 ∶ 𝐑 → 𝐂 is defined by the formula ∞ (𝑓 ∗ 𝑔)(𝑥) = ∫ 𝑓(𝑥 − 𝑡)𝑔(𝑡) 𝑑𝑡.

(12.2.3)

−∞

Note that if 𝑓(𝑡) ∈ 𝐿2 (𝐑) as a function of 𝑡, so are 𝑓(−𝑡) and 𝑓(𝑥 − 𝑡), which means that (12.2.3) is well-defined; in fact, for fixed 𝑥, (12.2.3) is the inner product of 𝑓(𝑥 − 𝑡) and 𝑔(𝑡) as functions of 𝑡. Convolutions on 𝐑 have the same properties as convolutions on 𝑆 1 (Theorem 8.2.4), though again, the new wrinkle is that we need to be careful about convergence of integrals on 𝐑. In addition, we have the key additional property that convolution preserves 𝒮(𝐑), or more generally, the property of rapid decay. Theorem 12.2.3. If 𝑓, 𝑔 ∈ 𝐶 0 (𝐑) are rapidly decaying (Definition 4.7.1), then 𝑓 ∗ 𝑔 is rapidly decaying. Moreover, suppose 𝑓, 𝑔, ℎ ∈ 𝒮(𝐑). Then: (1) (𝑓 ∗ 𝑔)(𝑥) = (𝑔 ∗ 𝑓)(𝑥). (2) ((𝑓 ∗ 𝑔) ∗ ℎ)(𝑥) = (𝑓 ∗ (𝑔 ∗ ℎ))(𝑥). (3)

𝑑𝑓 𝑑 ((𝑓 ∗ 𝑔)(𝑥)) = ( ∗ 𝑔) (𝑥). 𝑑𝑥 𝑑𝑥

(4) 𝑓 ∗ 𝑔 ∈ 𝒮(𝐑). Proof. The rapid decay of 𝑓 ∗ 𝑔 is the most interesting part of this theorem; see Problem 12.2.2. The other statements are proven by arguments that mirror the analogous arguments of Section 8.2; see Problems 12.2.3–12.2.6. Next, we define Dirac kernels in much the same way as the analogous kernels defined on 𝑆 1 . The main difference is that instead of an integer parameter 𝑛 → ∞, we use a continuous parameter 𝑡 → 0. Definition 12.2.4. A Dirac kernel on 𝐑 is a one-parameter family of continuous functions 𝐾𝑡 ∶ 𝐑 → 𝐑 (𝑡 ∈ 𝐑, 𝑡 > 0) that are integrable on 𝐑 such that: (1) For all 𝑡 > 0 and all 𝑥 ∈ 𝐑, 𝐾𝑡 (𝑥) ≥ 0. ∞

(2) For all 𝑡 > 0, ∫ 𝐾𝑡 (𝑥) 𝑑𝑥 = 1. −∞

268

Chapter 12. The Fourier transform

(3) For any fixed 𝜂 > 0, we have lim ∫

𝑡→0+

𝐾𝑡 (𝑥) 𝑑𝑥 = 0.

(12.2.4)

|𝑥|≥𝜂

In other words, for any 𝜂 > 0 and 𝜖 > 0, there exists some 𝛿(𝜂, 𝜖) > 0 such that for 0 < 𝑡 < 𝛿(𝜂, 𝜖), we have 𝜂

1 − 𝜖 < ∫ 𝐾𝑡 (𝑥) 𝑑𝑥 ≤ 1.

(12.2.5)

−𝜂

As in Section 8.4, the key result is: Theorem 12.2.5. If {𝐾𝑡 } is a Dirac kernel and 𝑓 ∈ 𝒮(𝐑), then lim (𝑓 ∗ 𝐾𝑡 )(𝑥) = 𝑓(𝑥)

𝑡→0+

(12.2.6)

uniformly on 𝐑 (i.e., with convergence independent of 𝑥 ∈ 𝐑). To prove Theorem 12.2.5, following the proof of Theorem 8.4.1, we first bound the integral of |𝑓(𝑥 − 𝑦) − 𝑓(𝑥)| |𝐾𝑡 (𝑦)| for 𝑦 close to 0. Lemma 12.2.6. For any 𝜖1 > 0, there exists some 𝜂1 (𝜖1 ) > 0 such that for 0 < 𝜂 < 𝜂1 (𝜖1 ), any 𝑥 ∈ 𝐑, and any 𝑡 > 0, we have 𝜂

∫ |𝑓(𝑥 − 𝑦) − 𝑓(𝑥)| |𝐾𝑡 (𝑦)| 𝑑𝑦 < 𝜖1 .

(12.2.7)

−𝜂

Proof. Problem 12.2.7. Secondly, for fixed 𝜂 > 0, by keeping 𝑦 away from 0 and letting 𝑡 → 0, we can also force the integral of |𝑓(𝑥 − 𝑦) − 𝑓(𝑥)| |𝐾𝑡 (𝑦)| on |𝑦| ≥ 𝜂 to be as small as we like. Lemma 12.2.7. For any fixed 𝜂 > 0 and 𝜖2 > 0, there exists some 𝛿2 (𝜂, 𝜖2 ) such that for 0 < 𝑡 < 𝛿2 (𝜂, 𝜖2 ) and any 𝑥 ∈ 𝐑, we have |𝑓(𝑥 − 𝑦) − 𝑓(𝑥)| |𝐾𝑡 (𝑦)| 𝑑𝑦 < 𝜖2 .

(12.2.8)

|𝑦|≥𝜂

Proof. Problem 12.2.8. As in the proof of Theorem 8.4.1, Lemmas 12.2.6 and 12.2.7 combine to prove the desired theorem. Proof of Theorem 12.2.5.

Problem 12.2.9.

Of course, Theorem 12.2.5 is not of much use without a concrete example of a Dirac kernel. Example 12.2.8. The Gauss kernel {𝐺𝑡 } is 𝐺𝑡 (𝑥) = as shown in Figure 12.2.1.

−𝜋𝑥2 1 exp ( 2 ) , 𝑡 𝑡

(12.2.9)

12.2. Convolutions, Dirac kernels, and calculus on 𝐑

269

1 1

Figure 12.2.1. The Gauss kernel 𝐺𝑡 (𝑥) (𝑡 = 1, , ) 2 4

Theorem 12.2.9. The Gauss kernel is a Dirac kernel. Proof. Problem 12.2.10. Remark 12.2.10. We pause here to give an interpretation of the convolution 𝑓 ∗ 𝑔 that ∞

complements the one given in Remark 8.2.5. Suppose ∫ 𝑔(𝑡) 𝑑𝑡 = 1 and 𝑔(𝑡) ≥ 0. If −∞

we think of the integrand 𝑓(𝑥 − 𝑡)𝑔(𝑡) 𝑑𝑡 as being the value of 𝑓 taken from 𝑥 − 𝑡 with weight 𝑔(𝑡), then ∞

(𝑓 ∗ 𝑔)(𝑥) = ∫ 𝑓(𝑥 − 𝑡)𝑔(𝑡) 𝑑𝑡,

(12.2.10)

−∞

the value of 𝑓 ∗ 𝑔 at 𝑥, is obtained by averaging the values of 𝑓 taken from 𝑥 − 𝑡 with weight 𝑔(𝑡). Very loosely speaking, keeping Figure 12.2.1 and the example 𝑔(𝑥) = 𝐺𝑡 (𝑥) in mind, the values of (𝑓 ∗ 𝑔)(𝑥) are obtained by taking each 𝑓(𝑥) and smearing it out to nearby values 𝑥 + 𝑡 with weight 𝑔(𝑡). (Note that the + in 𝑥 + 𝑡 is not a mistake: If the value of 𝑓 is taken from 𝑥 − 𝑡 with weight 𝑔(𝑡), then the value of 𝑓 is sent to 𝑥 + 𝑡 with weight 𝑔(𝑡).) Again, see Sections 13.3 and 14.2 for further related discussion and other interpretations of convolution.

Problems. 12.2.1. (Proves Lemma 12.2.1) Suppose 𝑓 ∈ 𝒮(𝐑), 𝑎 ∈ 𝐑. ∞

(a) Prove that ∫ 𝑓(𝑥 + 𝑎) 𝑑𝑥 = ∫ 𝑓(𝑥) 𝑑𝑥. −∞

−∞

(b) For 𝑎 > 0, prove that ∫ 𝑓(𝑎𝑥) 𝑑𝑥 = −∞

1 ∫ 𝑓(𝑥) 𝑑𝑥. 𝑎 −∞

270

Chapter 12. The Fourier transform ∞

1 (c) For 𝑎 < 0, prove that ∫ 𝑓(𝑎𝑥) 𝑑𝑥 = − ∫ 𝑓(𝑥) 𝑑𝑥. 𝑎 −∞ −∞ 12.2.2. (Proves Theorem 12.2.3) Suppose 𝑓, 𝑔 ∈ 𝐶 0 (𝐑) are rapidly decaying (Definition 4.7.1). Prove that (𝑓 ∗ 𝑔)(𝑥) is rapidly decaying. 12.2.3. (Proves Theorem 12.2.3) For 𝑓, 𝑔 ∈ 𝒮(𝐑), prove that (𝑓 ∗ 𝑔)(𝑥) = (𝑔 ∗ 𝑓)(𝑥). 12.2.4. (Proves Theorem 12.2.3) For 𝑓, 𝑔, ℎ ∈ 𝒮(𝐑), prove that ((𝑓 ∗ 𝑔) ∗ ℎ)(𝑥) = (𝑓 ∗ (𝑔 ∗ ℎ))(𝑥).

(12.2.11)

12.2.5. (Proves Theorem 12.2.3) For 𝑓, 𝑔 ∈ 𝒮(𝐑), prove that 𝑑𝑓 𝑑 ((𝑓 ∗ 𝑔)(𝑥)) = ( ∗ 𝑔) (𝑥). 𝑑𝑥 𝑑𝑥

(12.2.12)

12.2.6. (Proves Theorem 12.2.3) Suppose 𝑓, 𝑔 ∈ 𝒮(𝐑). Prove that 𝑓 ∗ 𝑔 ∈ 𝒮(𝐑). For Problems 12.2.7–12.2.9, assume that {𝐾𝑡 } is a Dirac kernel (Definition 12.2.4) and 𝑓 ∈ 𝒮(𝐑). 12.2.7. (Proves Lemma 12.2.6) Prove that for any 𝜖1 > 0, there exists some 𝜂1 (𝜖1 ) > 0 such that for 0 < 𝜂 < 𝜂1 (𝜖1 ), any 𝑥 ∈ 𝐑, and any 𝑡 > 0, we have 𝜂

∫ |𝑓(𝑥 − 𝑦) − 𝑓(𝑥)| |𝐾𝑡 (𝑦)| 𝑑𝑦 < 𝜖1 .

(12.2.13)

−𝜂

12.2.8. (Proves Lemma 12.2.7) Prove that for any fixed 𝜂 > 0 and 𝜖2 > 0, there exists some 𝛿2 (𝜂, 𝜖2 ) such that for 0 < 𝑡 < 𝛿2 (𝜂, 𝜖2 ) and any 𝑥 ∈ 𝐑, we have ∫

|𝑓(𝑥 − 𝑦) − 𝑓(𝑥)| |𝐾𝑡 (𝑦)| 𝑑𝑦 < 𝜖2 .

(12.2.14)

|𝑡|≥𝜂

12.2.9. (Proves Theorem 12.2.5) Prove that for any 𝜖 > 0, there exists some 𝛿(𝑓, 𝜖), not depending on 𝑥 ∈ 𝐑, such that for all 𝑥 ∈ 𝐑 and all 𝑡 > 0, if 𝑡 < 𝛿(𝑓, 𝜖), then |(𝑓 ∗ 𝐾𝑡 )(𝑥) − 𝑓(𝑥)| < 𝜖. In other words, prove that 𝑓 ∗ 𝐾𝑡 converges uniformly to 𝑓 on 𝐑. 12.2.10. (Proves Theorem 12.2.9) Define 𝐺𝑡 (𝑥) =

1 −𝜋𝑥2 exp ( 2 ) . 𝑡 𝑡

(a) Prove that for any 𝑡 > 0, ∫ 𝐺𝑡 (𝑥) 𝑑𝑥 = 1. −∞ 𝜂

(b) Fix 𝜂 > 0. Prove that lim ∫ 𝐺𝑡 (𝑥) 𝑑𝑥 = 1. 𝑡→0

−𝜂

(12.2.15)

12.3. The Fourier transform on 𝒮(𝐑)

271

12.3 The Fourier transform on 𝒮(𝐑) We can now finally define the Fourier transform on 𝒮(𝐑). Definition 12.3.1. For 𝑓 ∈ 𝒮(𝐑), we define the Fourier transform of 𝑓 to be ∞

̂ = ∫ 𝑓(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥. 𝑓(𝛾)

(12.3.1)

−∞

Note that the integral in (12.3.1) is well-defined because 𝑓 is rapidly decaying and continuous (see Example 4.8.5). Remark 12.3.2. We will sometimes write the Fourier transform of 𝑓 as 𝑈(𝑓) = 𝑓.̂ When using this notation, it will sometimes be useful to let the transform variable also be 𝑥, or in other words, ∞

̂ = ∫ 𝑓(𝑦)𝑒−2𝜋𝑖𝑥𝑦 𝑑𝑦. (𝑈(𝑓))(𝑥) = 𝑓(𝑥)

(12.3.2)

−∞

As we shall see, the alternate choice of variables in (12.3.2) is especially useful when we consider the Fourier transform as an operator, so we call this choice of variables and the notation 𝑈(𝑓) for 𝑓 ̂ the operator notation for the Fourier transform. We collect some important properties of the Fourier transform in the following theorem. ̂ and 𝑎, 𝑏 ∈ 𝐑, 𝑏 > 0, Theorem 12.3.3. If the Fourier transform of 𝑓 ∈ 𝒮(𝐑) is 𝑓(𝛾) then the Fourier transforms of certain transformations of 𝑓 are given by Table 12.3.1. In particular, the Fourier transform of 𝑓 ∈ 𝒮(𝐑) is differentiable. Proof. Problems 12.3.1–12.3.3.

Function (in 𝑥) Fourier transform (in 𝛾) 𝑓(𝑥 + 𝑎)

̂ 𝑒2𝜋𝑖𝑎𝛾 𝑓(𝛾)

𝑒2𝜋𝑖𝑎𝑥 𝑓(𝑥)

̂ − 𝑎) 𝑓(𝛾

𝑓(𝑏𝑥)

1 ̂ 𝛾 𝑓( ) 𝑏 𝑏

𝑓(−𝑥)

̂ 𝑓(−𝛾)

𝑓′ (𝑥)

̂ (2𝜋𝑖𝛾)𝑓(𝛾)

(−2𝜋𝑖𝑥)𝑓(𝑥)

𝑓′̂ (𝛾)

Table 12.3.1. Some Fourier transform identities

Remark 12.3.4. It will also be useful to restate Table 12.3.1 using operator notation (Remark 12.3.2). First, for any 𝑎, 𝑏 ∈ 𝐑 and any polynomial 𝑝(𝑥), we define operators

272

Chapter 12. The Fourier transform

𝜏𝑎 , 𝜇𝑎 , 𝑠𝑏 , and 𝑀𝑝(𝑥) on 𝒮(𝐑) by (𝜏𝑎 (𝑓))(𝑥) = 𝑓(𝑥 + 𝑎), (𝑠𝑏 (𝑓))(𝑥) = 𝑓(𝑏𝑥),

(𝜇𝑎 (𝑓))(𝑥) = 𝑒2𝜋𝑖𝑎𝑥 𝑓(𝑥), (𝑀𝑝(𝑥) (𝑓))(𝑥) = 𝑝(𝑥)𝑓(𝑥).

(12.3.3)

Note that one must be careful when considering compositions of operators; for example, since 𝑠−1 (𝑓)(𝑥) = 𝑓(−𝑥), (𝜏−𝑎 (𝑠−1 (𝑓)))(𝑥) = (𝑠−1 (𝑓))(𝑥 − 𝑎) = 𝑓(−(𝑥 − 𝑎)) = 𝑓(𝑎 − 𝑥).

(12.3.4)

In any case, we may restate Table 12.3.1 as 𝑈(𝜏𝑎 (𝑓)) = 𝜇𝑎 (𝑈(𝑓)), 1 𝑈(𝑠𝑏 (𝑓)) = 𝑠1/𝑏 (𝑈(𝑓)), 𝑏 𝑑 𝑈 ( (𝑓)) = 𝑀2𝜋𝑖𝑥 (𝑈(𝑓)), 𝑑𝑥 where again we assume that 𝑏 > 0.

𝑈(𝜇𝑎 (𝑓)) = 𝜏−𝑎 (𝑈(𝑓)), 𝑈(𝑠−1 (𝑓)) = 𝑠−1 (𝑈(𝑓)), 𝑈(𝑀−2𝜋𝑖𝑥 (𝑓)) =

(12.3.5)

𝑑 𝑈(𝑓), 𝑑𝑥

As a result of Theorem 12.3.3, we see that the Fourier transform preserves 𝒮(𝐑): Corollary 12.3.5. If 𝑓 ∈ 𝒮(𝐑), then 𝑓 ̂ ∈ 𝒮(𝐑). Proof. Problem 12.3.4. The following formula should be familiar from its 𝑆 1 analogue (Theorem 8.2.4) and is proven similarly. (Note that 𝑓 ∗ 𝑔 ∈ 𝒮(𝐑) by Theorem 12.2.3.) ˆ ̂ 𝑔(𝛾). Theorem 12.3.6. If 𝑓, 𝑔 ∈ 𝒮(𝐑), then 𝑓 ∗ 𝑔(𝛾) = 𝑓(𝛾) ̂ Proof. Problem 12.3.5. We also have the following handy theorem, which we call the “Pass the Hat” formula. Theorem 12.3.7 (Pass the Hat). For 𝑓, 𝑔 ∈ 𝒮(𝐑), we have that ∞

̂ ∫ 𝑓(𝑥)𝑔(𝑥) 𝑑𝑥 = ∫ 𝑓(𝑥)𝑔(𝑥) ̂ 𝑑𝑥. −∞

(12.3.6)

−∞

Proof. Problem 12.3.6. The reader may have noticed that we have not yet calculated any specific examples of Fourier transforms. There are two good reasons: First, many natural examples are not in 𝒮(𝐑); and second, even for functions in 𝒮(𝐑), this calculation is not easy. We do, however, have the following crucial example. 2 2 ̂ Theorem 12.3.8. The Fourier transform of 𝑓(𝑥) = 𝑒−𝜋𝑥 is 𝑓(𝛾) = 𝑒−𝜋𝛾 ; in other words, 𝑓 is its own Fourier transform, or 𝑈(𝑓) = 𝑓. More generally, for 𝑡 > 0, let 𝐺𝑡 (𝑥) = 1 −𝜋𝑥2 exp ( 2 ) be the Gauss kernel. Then 𝑡 𝑡 2 2 (12.3.7) 𝐺̂ (𝛾) = 𝑒−𝜋𝑡 𝛾 , 𝑈(𝑈(𝐺 )) = 𝐺̂ = 𝐺 .

𝑡

Proof. Problem 12.3.7.

𝑡

𝑡

𝑡

12.4. Inversion and the Plancherel theorem

273

Problems. 12.3.1. (Proves Theorem 12.3.3) Suppose 𝑓 ∈ 𝒮(𝐑). ̂ (a) Let 𝑔(𝑥) = 𝑓(𝑥 + 𝑎). Prove that 𝑔(𝛾) ̂ = 𝑒2𝜋𝑖𝑎𝛾 𝑓(𝛾). ̂ − 𝑎). ̂ = 𝑓(𝛾 (b) Let ℎ(𝑥) = 𝑒2𝜋𝑖𝑎𝑥 𝑓(𝑥). Prove that ℎ(𝛾) 12.3.2. (Proves Theorem 12.3.3) Suppose 𝑓 ∈ 𝒮(𝐑) and 𝑏 > 0. 1 ̂ 𝛾 𝑓 ( ). 𝑏 𝑏 ̂ ̂ = 𝑓(−𝛾). (b) Let ℎ(𝑥) = 𝑓(−𝑥). Prove that ℎ(𝛾) (a) Let 𝑔(𝑥) = 𝑓(𝑏𝑥). Prove that 𝑔(𝛾) ̂ =

12.3.3. (Proves Theorem 12.3.3) Suppose 𝑓 ∈ 𝒮(𝐑). ̂ (a) Let 𝑔(𝑥) = 𝑓′ (𝑥). Prove that 𝑔(𝛾) ̂ = (2𝜋𝑖𝛾)𝑓(𝛾). ̂ (b) Let ℎ(𝑥) = (−2𝜋𝑖𝑥)𝑓(𝑥). Prove that ℎ(𝛾) = 𝑓′̂ (𝛾). 12.3.4. (Proves Corollary 12.3.5) Suppose 𝑓 ∈ 𝒮(𝐑). ∞

̂ || ≤ ∫ |𝑓(𝑥)| 𝑑𝑥. (It follows that 𝑓 ̂ is bounded.) (a) Prove that for any 𝛾 ∈ 𝐑, ||𝑓(𝛾) −∞ 𝑛

̂ (𝛾) is the Fourier transform of ( (b) Prove that for any 𝑛, 𝑘 ≥ 0, 𝛾𝑛 𝑓(𝑘)

1 𝑑 ) 2𝜋𝑖 𝑑𝑥

× (−2𝜋𝑖𝑥)𝑘 𝑓(𝑥). ̂ (𝛾) is bounded. (c) Prove that for any 𝑛, 𝑘 ≥ 0, 𝛾𝑛 𝑓(𝑘) ˆ ̂ 𝑔(𝛾). 12.3.5. (Proves Theorem 12.3.6) For 𝑓, 𝑔 ∈ 𝒮(𝐑), prove that 𝑓 ∗ 𝑔(𝛾) = 𝑓(𝛾) ̂ ∞

̂ 𝑑𝑥 = 12.3.6. (Proves Theorem 12.3.7) Prove that if 𝑓, 𝑔 ∈ 𝒮(𝐑), then ∫ 𝑓(𝑥)𝑔(𝑥) ∞

−∞

∫ 𝑓(𝑥)𝑔(𝑥) ̂ 𝑑𝑥. −∞ 2 ̂ 12.3.7. (Proves Theorem 12.3.8) Let 𝑓(𝑥) = 𝑒−𝜋𝑥 , and let 𝑦 = 𝐹(𝛾) = 𝑓(𝛾).

(a) Prove that 𝐹 ′ (𝛾) = −2𝜋𝛾𝐹(𝛾). (b) Find the value of 𝐹(0) by direct calculation. (c) Prove that 𝑓 ̂ = 𝑓 by solving the differential equation 𝐹 ′ (𝛾) = −(2𝜋𝛾)𝐹(𝛾). (d) For 𝑡 ∈ 𝐑, 𝑡 > 0, let 𝐺𝑡 (𝑥) be the Gauss kernel 𝐺𝑡 (𝑥) =

1 −𝜋𝑥2 exp ( 2 ). Prove 𝑡 𝑡

2 2 that 𝐺𝑡̂ (𝛾) = 𝑒−𝜋𝑡 𝛾 . (e) Prove that 𝐺̂𝑡 = 𝐺𝑡 .

12.4 Inversion and the Plancherel theorem In this section, we prove two important theorems about Fourier transforms on 𝒮(𝐑). We begin with the Inversion Theorem in 𝒮(𝐑) (Theorem 12.4.2), for which we need the following definition.

274

Chapter 12. The Fourier transform

Definition 12.4.1. For 𝑔(𝛾) ∈ 𝒮(𝐑), we define the inverse Fourier transform of 𝑔 to be ∞

(𝑈 ∗ (𝑔))(𝑥) = ∫ 𝑔(𝛾)𝑒2𝜋𝑖𝛾𝑥 𝑑𝛾.

(12.4.1)

−∞

Note that the integral in (12.4.1) is defined for the same reasons as the integral in Definition 12.3.1. Note that because of our conventions of where we place 2𝜋, our inverse transform greatly resembles our forward transform; with other conventions, a factor of 2𝜋, or sometimes √2𝜋, appears more prominently in the inverse (see Remark 13.1.1 below). The reader may also find our use of the name “inverse transform” a bit presumptuous; fortunately, that name is justified by the following result. Theorem 12.4.2 (Inversion Theorem in 𝒮(𝐑)). For 𝑓 ∈ 𝒮(𝐑), we have that ∞

̂ = ∫ 𝑓(𝛾)𝑒 −2𝜋𝑖𝛾𝑥 ̂ 𝑓(𝑥) 𝑑𝛾 = 𝑓(−𝑥).

(12.4.2)

−∞

In other words, replacing 𝑥 with −𝑥, for 𝑓 ∈ 𝒮(𝐑), we have that 𝑈 ∗ (𝑈(𝑓)) = 𝑓; i.e., the inverse Fourier transform is indeed the inverse of the Fourier transform (𝑈 ∗ = 𝑈 −1 ). As indicated by its title, Theorem 12.4.2 (or rather, its 𝐿2 analogue yet to come) is the Fourier transform analogue of the Inversion Theorem for Fourier Series, Theorem 8.1.1. Note also that in operator notation (Remarks 12.3.2 and 12.3.4), Theorem 12.4.2 is equivalent to saying that 𝑈(𝑈(𝑓)) = 𝑠−1 (𝑓) for all 𝑓 ∈ 𝒮(𝐑). Our second theorem is the transform version of the Isomorphism Theorem for Fourier Series, Theorem 7.6.8, though the statement may not appear analogous at first. Theorem 12.4.3 (Isomorphism Theorem in 𝒮(𝐑)). For 𝑓, 𝑔 ∈ 𝒮(𝐑), we have that ⟨𝑓,̂ 𝑔⟩̂ = ⟨𝑓, 𝑔⟩, or in operator terms, ⟨𝑈(𝑓), 𝑈(𝑔)⟩ = ⟨𝑓, 𝑔⟩ .

(12.4.3)

In particular, ‖𝑓‖ = ‖‖𝑓‖‖̂ . In operator terms, Theorem 12.4.3 (or again, its 𝐿2 analogue to come) says that the operator 𝑈 is an isomorphism of Hilbert spaces. (Compare Remark 7.6.9.) Operators that satisfy (12.4.3) are also known as unitary operators. Turning to the proof of Theorems 12.4.2 and 12.4.3, we begin with the following lemmas, which we use to hide a bit of grunge. Lemma 12.4.4. For 𝑓 ∈ 𝒮(𝐑) and constant 𝑥 ∈ 𝐑, let ℎ𝑥 (𝑦) = 𝑓(−𝑥 − 𝑦). Then ̂ − 𝑦) = ℎ ̂ (𝑦), where the Fourier transform is calculated in the variable 𝑦. 𝑓(𝑥 𝑥

12.4. Inversion and the Plancherel theorem

275

Proof. Using operator notation (Remark 12.3.4) in the variable 𝑦, we first observe that since (𝑠−1 (𝑓))(𝑦) = 𝑓(−𝑦), (𝜏𝑥 (𝑠−1 (𝑓)))(𝑦) = 𝑓(−(𝑦 + 𝑥)) = 𝑓(−𝑥 − 𝑦) = ℎ𝑥 (𝑦).

(12.4.4)

Also recall that by (12.3.5), we have that 𝜏−𝑥 𝑠−1 𝑈𝑈 = 𝜏−𝑥 𝑈𝑈𝑠−1 = 𝑈𝜇𝑥 𝑈𝑠−1 = 𝑈𝑈𝜏𝑥 𝑠−1 .

(12.4.5)

It follows that, still working with operator notation in the variable 𝑦, ̂ − 𝑦) = (𝜏 (𝑓))(−𝑦) ̂ ̂ 𝑓(𝑥 = (𝜏−𝑥 (𝑠−1 (𝑓)))(𝑦) = (𝑈(𝑈(𝜏𝑥 (𝑠−1 (𝑓)))))(𝑦) = ℎ𝑥̂ (𝑦). −𝑥 (12.4.6) The lemma follows. ̂ Lemma 12.4.5. For 𝑔 ∈ 𝒮(𝐑), let ℎ(𝑥) = 𝑔(𝑥). Then 𝑔(𝛾) ̂ = ℎ(−𝛾). Proof. Problem 12.4.1. We now prove our main results. Proof of Theorem 12.4.2. Suppose 𝑓 ∈ 𝒮(𝐑). Define 𝑔(𝑥) = 𝑓(−𝑥) = (𝑠−1 (𝑓))(𝑥), and for fixed 𝑥 ∈ 𝐑, define ℎ𝑥 (𝑦) = 𝑓(−𝑥 − 𝑦) = 𝑔(𝑥 + 𝑦), as in Lemma 12.4.4. Then for any fixed 𝑡 > 0, we have ∞

̂ − 𝑦)𝐺 (𝑦) 𝑑𝑦 (𝑓 ̂ ∗ 𝐺𝑡 )(𝑥) = ∫ 𝑓(𝑥 𝑡 −∞ ∞

= ∫ ℎ𝑥̂ (𝑦)𝐺𝑡 (𝑦) 𝑑𝑦

(Lemma 12.4.4)

−∞ ∞

= ∫ ℎ𝑥 (𝑦)𝐺̂𝑡 (𝑦) 𝑑𝑦

(Pass the Hat Theorem 12.3.7)

−∞ ∞

= ∫ 𝑔(𝑥 + 𝑦)𝐺𝑡 (𝑦) 𝑑𝑦

(Theorem 12.3.8)

(12.4.7)

−∞ ∞

= ∫ 𝑔(𝑥 − 𝑦)𝐺𝑡 (−𝑦) 𝑑𝑦

(Lemma 12.2.1)

−∞ ∞

= ∫ 𝑔(𝑥 − 𝑦)𝐺𝑡 (𝑦) 𝑑𝑦

(𝐺𝑡 is an even function)

−∞

= (𝑔 ∗ 𝐺𝑡 )(𝑥). The theorem follows by taking lim on both sides and applying Theorem 12.2.5. 𝑡→0

As for Theorem 12.4.3, see Problem 12.4.2 for a proof. We end this section with some consequences of the Inversion Theorem in 𝒮(𝐑). For example, reading Table 12.3.1 in reverse, we get a table of inverse transforms (Table 12.4.1). We can also add the following identities to the list in (12.3.5): 𝑈 2 = 𝑠−1 , 𝑈 ∗ = 𝑈 −1 = 𝑈𝑠−1 = 𝑠−1 𝑈.

(12.4.8)

276

Chapter 12. The Fourier transform Function (in 𝛾)

Inverse transform (in 𝑥)

̂ − 𝑎) 𝑓(𝛾

𝑒2𝜋𝑖𝑎𝑥 𝑓(𝑥)

̂ 𝑒2𝜋𝑖𝑎𝛾 𝑓(𝛾)

𝑓(𝑥 + 𝑎)

̂ 𝑓(𝑏𝛾)

1 ̂ 𝑥 𝑓( ) 𝑏 𝑏

̂ 𝑓(−𝛾)

𝑓(−𝑥)

̂ (2𝜋𝑖𝛾)𝑓(𝛾)

𝑓′ (𝑥)

𝑓′̂ (𝛾)

(−2𝜋𝑖𝑥)𝑓(𝑥)

Table 12.4.1. Some inverse Fourier transform identities

Problems. ̂ 12.4.1. For 𝑔 ∈ 𝒮(𝐑), let ℎ(𝑥) = 𝑔(𝑥). Prove that 𝑔(𝛾) ̂ = ℎ(−𝛾). 12.4.2. Suppose 𝑓, 𝑔 ∈ 𝒮(𝐑). Prove that ⟨𝑓,̂ 𝑔⟩̂ = ⟨𝑓, 𝑔⟩.

12.5 The 𝐿2 Fourier transform In this section, we extend the definition of the Fourier transform and the results of Sections 12.3 and 12.4 from the Schwartz space 𝒮(𝐑) to 𝐿2 (𝐑). Instead of hard work, we will achieve this mainly via TOTAL ABSTRACT NONSENSE Recall that by Corollary 8.5.7 and Theorem 2.4.17, every 𝑓 ∈ 𝐿2 (𝐑) is the limit in 𝐿 of some sequence of functions in 𝒮(𝐑). The following definition therefore at least makes some initial sense. 2

Definition 12.5.1. For 𝑓 ∈ 𝐿2 (𝐑), choose some sequence 𝑓𝑛 in 𝒮(𝐑) such that lim 𝑓𝑛 = 𝑛→∞

𝑓. We define the Fourier transform 𝑓 ̂ of 𝑓 to be 𝑓 ̂ = lim 𝑓𝑛̂ ,

(12.5.1)

𝑛→∞

where 𝑓𝑛̂ is the Fourier transform of 𝑓𝑛 as a function in 𝒮(𝐑) (Definition 12.3.1) and convergence in (12.5.1) is in the 𝐿2 metric. Theorem 12.5.2. For 𝑓 ∈ 𝐿2 (𝐑), the Fourier transform 𝑓 ̂ from Definition 12.5.1 is a well-defined function in 𝐿2 (𝐑). Specifically: (1) If 𝑓𝑛 is a sequence of functions in 𝒮(𝐑) such that lim 𝑓𝑛 = 𝑓 in the 𝐿2 metric, then the 𝑛→∞

sequence 𝑓𝑛̂ converges to some 𝑓 ̂ ∈ 𝐿2 (𝐑). (2) If 𝑓𝑛 and 𝑔𝑛 are two sequences in 𝒮(𝐑) such that lim 𝑓𝑛 = 𝑓 = lim 𝑔𝑛 , then lim 𝑓𝑛̂ = 𝑛→∞

lim 𝑔𝑛̂ .

𝑛→∞

𝑛→∞

𝑛→∞

12.5. The 𝐿2 Fourier transform

277

Proof. Problems 12.5.1 and 12.5.2. To make sure the theory works correctly, we also need to extend the inversion and isomorphism theorems to 𝐿2 functions; and for applications, we need to extend some of the formal properties of the Fourier transform (Theorems 12.3.3 and 12.3.6) to 𝐿2 functions. We begin with the following lemma. Lemma 12.5.3. As in Remark 12.3.4, let 𝑠−1 ∶ 𝐿2 (𝐑) → 𝐿2 (𝐑) be defined by (𝑠−1 (𝑓))(𝑥) = 𝑓(−𝑥). Then for all 𝑓, 𝑔 ∈ 𝐿2 (𝐑), we have ⟨𝑠−1 (𝑓), 𝑠−1 (𝑔)⟩ = ⟨𝑓, 𝑔⟩; consequently, 𝑠−1 is bounded and continuous. Proof. Problem 12.5.3. Curiously, it will be convenient to reverse the order of proof of the inversion and isomorphism theorems that we used in Section 12.4. Theorem 12.5.4 (Isomorphism Theorem for the Fourier Transform). For 𝑓, 𝑔 ∈ 𝐿2 (𝐑), we have that ⟨𝑓,̂ 𝑔⟩̂ = ⟨𝑓, 𝑔⟩. In particular, ‖𝑓‖ = ‖‖𝑓‖‖̂ . Proof. Problem 12.5.4. Theorem 12.5.5 (Inversion Theorem for the Fourier Transform). For 𝑓 ∈ 𝐿2 (𝐑), we ̂ = (𝑠 (𝑓))(𝑥), or in other words, 𝑓(𝑥) = 𝑓(−𝑥). ̂ have that 𝑓(𝑥) −1 Proof. Problem 12.5.5. Perhaps the trickiest part of defining the 𝐿2 Fourier transform is actually making sure that concrete formulas like (12.3.1) and (12.4.1) in our original definitions (Definition 12.3.1 and 12.4.1) make sense. The main technical issue is that if 𝑓 ∈ 𝐿2 (𝐑) but 𝑓 ∉ 𝐿1 (𝐑), then because ∞

∫ |𝑓(𝑥)𝑒−2𝜋𝑖𝛾𝑥 | 𝑑𝑥 = ∫ |𝑓(𝑥)| 𝑑𝑥 = +∞, −∞

(12.5.2)

−∞

the formula (12.3.1) is not well-defined as an ordinary Lebesgue integral (see Definition 7.5.1). We therefore need to use “improper Lebesgue integrals” in the following sense. Theorem 12.5.6. For 𝑓 ∈ 𝐿2 (𝐑) and 𝑏 > 0, define 𝑓𝑏 (𝑥) = {

𝑓(𝑥)

if |𝑥| ≤ 𝑏,

0

otherwise.

Then

(12.5.3)

𝑏

̂ = lim 𝑓𝑏̂ (𝛾) = lim ∫ 𝑓(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥. 𝑓(𝛾) 𝑏→∞

𝑏→∞

(12.5.4)

−𝑏

Proof. We first consider the special case of 𝑔 ∈ 𝐿2 (𝐑) with support contained in [−𝑏, 𝑏]; that is, 𝑔(𝑥) = 0 for |𝑥| > 𝑏. In that case, the theorem reduces to proving that 𝑏

𝑔(𝛾) ̂ = ∫ 𝑔(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥 = ∫ 𝑔(𝑥)𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥. −𝑏

(12.5.5)

278

Chapter 12. The Fourier transform

Note that while (12.5.5) certainly seems reasonable, 𝑔(𝛾) ̂ is only defined to be the limit of Fourier transforms of functions in 𝒮(𝐑), so we need to find a sequence providing the desired result. The idea is that we want to find functions 𝑔𝑛 (𝑥) ∈ 𝐶𝑐∞ (𝐑) (see Theorem 8.5.6) that approximate 𝑔(𝑥) closely on [−𝑏, 𝑏] and vanish outside a small neighborhood of [−𝑏, 𝑏]; see Figure 12.5.1, in which the dashed lines represent 𝑔(𝑥) and the solid lines represent 𝑔𝑛 (𝑥), for what this might look like.

b+ δ

−b−δ

Figure 12.5.1. Smoothly approximating a function with compact support To be precise, for each 𝑛 ∈ 𝐍, by Corollary 8.5.3, choose ℎ𝑛 (𝑥) ∈ 𝐶 ∞ (𝐑) (for example, a trigonometric polynomial) such that 𝑏

2

∫ |ℎ𝑛 (𝑥) − 𝑔(𝑥)| 𝑑𝑥 < −𝑏

1 . 2𝑛

(12.5.6)

Note that we are not yet done because it may be the case that ℎ𝑛 (𝑥) is large outside [−𝑏, 𝑏]. Next, let 𝑀𝑛 = max {|ℎ𝑛 (𝑥)| ∣ −𝑏 − 1 ≤ 𝑥 ≤ 𝑏 + 1} ,

𝛿𝑛 =

1 < 1, 4𝑛(𝑀𝑛 + 1)2

(12.5.7)

and by Theorem 8.5.5, choose a bump function 𝜙𝑛 (𝑥) ∈ 𝐶 ∞ (𝐑) with 𝜙(𝑥) = 1 on [−𝑏, 𝑏] and 𝜙(𝑥) = 0 for 𝑥 ∉ [−𝑏 − 𝛿𝑛 , 𝑏 + 𝛿𝑛 ]. Finally, let 𝑔𝑛 (𝑥) = ℎ𝑛 (𝑥)𝜙𝑛 (𝑥), which is in 𝒮(𝐑) because 𝜙𝑛 (𝑥) has compact support. Then ∞

2

∫ |𝑔𝑛 (𝑥) − 𝑔(𝑥)| 𝑑𝑥 −∞ −𝑏

= ∫
0. The Heaviside function 𝑢(𝑥) = {

1 if 𝑥 ≥ 0,

(13.1.2)

0 otherwise

is also useful for the sake of brevity, as 𝑓(𝑥) 𝑢(𝑥)𝑓(𝑥) = { 0

if 𝑥 ≥ 0,

𝑓(𝑥) 𝑢(−𝑥)𝑓(𝑥) = { 0

otherwise,

if 𝑥 ≤ 0, otherwise.

(13.1.3)

In any case, without further ado, see Table 13.1.1 for a brief list of transforms we will use, along with the problems where they are computed. For constants, we assume 𝛼 = 𝑎 + 𝑏𝑖 ∈ 𝐂, with 𝑎 > 0, 𝑡 > 0, and 𝑛 ≥ 1. Function (in 𝑥)

Fourier transform (in 𝛾)

1 −𝜋𝑥2 exp ( 2 ) 𝑡 𝑡

exp (−𝜋𝑡2 𝛾2 )

Problem 12.3.7

1 𝛼 + 2𝜋𝑖𝛾

Problem 13.1.1

𝑥𝑛−1 ) 𝑒−𝛼𝑥 (𝑛 − 1)!

1 (𝛼 + 2𝜋𝑖𝛾)𝑛

Problem 13.1.2

(−𝑥)𝑛−1 𝛼𝑥 )𝑒 (𝑛 − 1)!

1 (𝛼 − 2𝜋𝑖𝛾)𝑛

Problem 13.1.3

𝑒−𝛼|𝑥|

2𝛼 𝛼2 + 4𝜋2 𝑢2

Problem 13.1.4

𝜒𝑡 (𝑥)

2𝑡 sinc(2𝜋𝑡𝛾)

Problem 13.1.5

𝐺𝑡 (𝑥) =

𝑢(𝑥)𝑒−𝛼𝑥 𝑢(𝑥) (

Source

𝑢(−𝑥) (

Table 13.1.1. Some Fourier transform examples

Remark 13.1.1. The reader interested in applications should note that 𝜔 = 2𝜋𝛾 is a more commonly used variable for the Fourier transform and that many tables use 𝑗 instead of 𝑖 for √−1. Note that in the variable 𝜔, the Fourier transform becomes ∞

̂ 𝑓(𝜔) = ∫ 𝑓(𝑥)𝑒−𝑖𝜔𝑥 𝑑𝑥,

(13.1.4)

−∞

and the inverse transform becomes ∞

𝑓(𝑥) =

1 𝑖𝜔𝑥 ̂ ∫ 𝑓(𝜔)𝑒 𝑑𝜔. 2𝜋 −∞

(13.1.5)

13.2. Linear differential equations with constant coefﬁcients

283

Problems. 13.1.1. For 𝛼 = 𝑎 + 𝑏𝑖 ∈ 𝐂, 𝑎 > 0, let 𝑒−𝛼𝑥 𝑓(𝑥) = { 0

if 𝑥 ≥ 0, if 𝑥 < 0.

(13.1.6)

̂ Compute the Fourier transform 𝑓(𝛾). 13.1.2. Let

𝑥𝑛−1 ) 𝑒−𝛼𝑥 𝑓(𝑥) = { (𝑛 − 1)! 0 ̂ Compute the Fourier transform 𝑓(𝛾). (

if 𝑥 ≥ 0,

(13.1.7)

if 𝑥 < 0.

13.1.3. Let

𝑥𝑛−1 ) 𝑒−𝛼𝑥 𝑓(𝑥) = { (𝑛 − 1)! 0 ̂ Compute the Fourier transform 𝑓(𝛾). (

if 𝑥 ≤ 0,

(13.1.8)

if 𝑥 < 0.

̂ 13.1.4. Let 𝑓(𝑥) = 𝑒−𝛼|𝑥| . Compute the Fourier transform 𝑓(𝛾). 13.1.5. For 𝑡 ∈ 𝐑, 𝑡 > 0, let 𝑓(𝑥) = {

1 if |𝑥| ≤ 𝑡, 0 otherwise.

(13.1.9)

̂ Compute the Fourier transform 𝑓(𝛾). (Note that 𝑢 = 0 is a separate case.)

13.2 Linear differential equations with constant coefﬁcients 𝑑 into multiplicaRecall that Table 12.3.1 shows that the Fourier transform turns 𝑑𝑥 tion by 2𝜋𝑖𝛾 and, therefore, turns constant coefficient differential equations in 𝑥 into algebraic equations in 𝛾. We may therefore use the Fourier transform to find particular solutions, naturally expressed as convolutions, of linear differential equations with constant coefficients. To be able to discuss solutions coming from applications, we will concentrate on formal solutions and use results about convolution described (but not proven) in Remark 12.5.9. To be specific, consider the differential equation 𝑐𝑛 𝑦(𝑛) + 𝑐𝑛−1 𝑦(𝑛−1) + ⋯ + 𝑐1 𝑦′ + 𝑐0 𝑦 = ℎ(𝑥),

(13.2.1)

where each 𝑐𝑖 ∈ 𝐂 and ℎ ∈ 𝐿1 (𝐑). (Note that our initial data is in 𝐿1 (𝐑) and not 𝐿2 (𝐑), as has been more typical for us.) Let 𝑝(𝑡) = 𝑐𝑛 𝑡𝑛 + 𝑐𝑛−1 𝑡𝑛−1 + ⋯ + 𝑐0 and 𝑣 = 2𝜋𝑖𝛾. Taking the Fourier transform of both sides of (13.2.1), we get (Problem 13.2.1) ̂ ̂ + 𝑐𝑛−1 𝑣𝑛−1 𝑦(𝛾) ̂ + ⋯ + 𝑐1 𝑣𝑦(𝛾) ̂ + 𝑐0 𝑦 ̂ = ℎ(𝛾). 𝑐𝑛 𝑣𝑛 𝑦(𝛾) ̂ ̂ = ℎ(𝛾). In other words, 𝑝(𝑣)𝑦(𝛾)

(13.2.2)

284

Chapter 13. Applications of the Fourier transform ̂ to get 𝑦(𝛾) ̂ = Formally, at least, we may then solve for 𝑦(𝛾)

1 ̂ ℎ(𝛾) and find the 𝑝(𝑣)

solution 𝑦 = 𝑈 −1 (

1 ̂ ℎ(𝛾)) . 𝑝(𝑣)

If we happen to know the inverse Fourier transform 𝑈 −1 (

(13.2.3) 1 ), then by Theorem 12.3.6, 𝑝(𝑣)

we may express 𝑦 as the convolution 𝑦 = (𝑈 −1 (

1 )) ∗ ℎ, 𝑝(2𝜋𝑖𝛾)

(13.2.4)

1 ) and ℎ(𝑥) are in 𝐿1 (𝐑). 𝑝(2𝜋𝑖𝛾) As it turns out, as long as 𝑝(𝑣) has no zeros of the form 𝑖𝑡, where 𝑡 ∈ 𝐑 (including 𝑡 = 0), then we can use Table 13.1.1 and the method of partial fractions to find 1 𝑈 −1 ( ), as follows: 𝑝(𝑣) at least in the case where both 𝑈 −1 (

(1) Factor 𝑝(𝑣) into linear terms, which is always possible (at least in principle) over 𝐂. (2) As the reader may recall from calculus, the method of partial fractions shows that, if 𝑝(𝑣) has no repeated zeros, then 𝐴3 𝐴1 𝐴2 1 = + + +⋯ 𝛼1 ± 𝑣 𝛼2 ± 𝑣 𝛼3 ± 𝑣 𝑝(𝑣)

(13.2.5)

for some 𝐴𝑖 ∈ 𝐂, where the signs of the 𝐴𝑖 and the ±𝑣 are chosen so that 𝛼𝑖 has positive real part. (Here is where we assume that 𝑝(𝑣) has no purely imaginary zeros.) If 𝑝(𝑣) has some 𝑘-fold zero ±𝛼, replace the corresponding linear term in (13.2.5) with 𝐶𝑘 𝐵0 + 𝐵1 𝑣 + ⋯ + 𝐵𝑘−1 𝑣𝑘−1 𝐶1 +⋯+ = . (𝛼 ± 𝑣)1 (𝛼 ± 𝑣)𝑘 (𝛼 ± 𝑣)𝑘

(13.2.6)

(3) Solve for each coefficient 𝐴𝑖 of a multiplicity 1 term using Heaviside’s method: Multiply both sides by (𝛼𝑖 ± 𝑣) and plug in 𝑣 = ∓𝛼𝑖 to erase most of the terms and solve for 𝐴𝑖 . For higher multiplicity, one may need to do honest linear algebra, though if we have only one term of higher multiplicity, we can solve for everything else with Heaviside and determine the remaining term by subtraction. 1 as a linear combination of terms found in the 𝑝(𝑣) 1 second column of Table 13.1.1, so we can use Table 13.1.1 to calculate 𝑈 −1 ( ) 𝑝(𝑣) as a corresponding linear combination of inverse transforms.

(4) In any case, we can now express

See Problems 13.2.2–13.2.5 for some examples. Remark 13.2.1. In fact, it can be shown that in a suitably generalized sense, for any polynomial 𝑝(𝑡), (13.2.4) exists and is a solution to (13.2.1). A complete account can be found in Hörmander [Hör90, Ch. 7], but see also Section 14.2 for more on this point.

13.3. The heat and wave equations on 𝐑

285

Problems. 13.2.1. Show that if we take the Fourier transform of 𝑐𝑛 𝑦(𝑛) + 𝑐𝑛−1 𝑦(𝑛−1) + ⋯ + 𝑐1 𝑦′ + 𝑐0 𝑦 = ℎ(𝑥),

(13.2.7)

̂ ̂ + 𝑐𝑛−1 𝑣𝑛−1 𝑦(𝛾) ̂ + ⋯ + 𝑐1 𝑣𝑦(𝛾) ̂ + 𝑐0 𝑦 ̂ = ℎ(𝛾). 𝑐𝑛 𝑣𝑛 𝑦(𝛾)

(13.2.8)

we get 13.2.2. For ℎ ∈ 𝐿1 (𝐑), find a formal solution to the differential equation 𝑦″ − 𝑦′ − 2𝑦 = ℎ.

(13.2.9)

Express your answer as a convolution with ℎ. 13.2.3. Same as Problem 13.2.2, but for 𝑦″ + 2𝑦′ + 𝑦 = ℎ.

(13.2.10)

13.2.4. Same as Problem 13.2.2, but for 𝑦‴ − 𝑦 = ℎ.

(13.2.11)

13.2.5. Same as Problem 13.2.2, but for 𝑦‴ − 𝑦″ − 𝑦′ + 𝑦 = ℎ.

(13.2.12)

13.3 The heat and wave equations on 𝐑 Back in Chapter 11, we solved the heat and wave equations both on the circle and on an interval [𝑎, 𝑏] with suitable boundary conditions. With the Fourier transform, we can consider these problems on 𝐑, stated precisely as follows. Question 13.3.1. Given an initial value 𝑓(𝑥) ∈ 𝐿2 (𝐑), find 𝑢(𝑥, 𝑡) (𝑡 > 0) such that: (1) (Differentiable) For fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶 2 (𝐑), and for fixed 𝑥0 ∈ 𝐑, 𝑢(𝑥0 , 𝑡) ∈ 𝐶 1 ((0, +∞)). (2) (Initial value) For any 𝑥 ∈ 𝐑, lim+ 𝑢(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0

(3) (PDE) For all 𝑡 > 0, 𝜕2 𝑢 𝜕𝑢 = . 𝜕𝑡 𝜕𝑥2

(13.3.1)

Question 13.3.2. Given initial values 𝑓(𝑥), 𝑔(𝑥) ∈ 𝐿2 (𝐑), find 𝑢(𝑥, 𝑡) (𝑡 > 0) such that: (1) (Differentiable) For fixed 𝑡0 > 0, 𝑢(𝑥, 𝑡0 ) ∈ 𝐶 2 (𝐑), and for fixed 𝑥0 ∈ 𝐑, 𝑢(𝑥0 , 𝑡) ∈ 𝐶 2 ((0, +∞)). (2) (Initial value) For any 𝑥 ∈ 𝐑, lim 𝑢(𝑥, 𝑡) = 𝑓(𝑥),

𝑡→0+

lim

𝑡→0+

𝜕𝑢 (𝑥, 𝑡) = 𝑔(𝑥). 𝜕𝑡

(13.3.2)

(3) (PDE) For all 𝑡 > 0, 𝜕2 𝑢 𝜕2 𝑢 = 2. 2 𝜕𝑥 𝜕𝑡

(13.3.3)

286

Chapter 13. Applications of the Fourier transform

Our first observation is that, under the following conditions, taking the Fourier transform of 𝐹(𝑥, 𝑡) in the variable 𝑥 commutes with differentiation in the variable 𝑡. Theorem 13.3.3. Suppose 𝑢 ∶ 𝐑×[𝑎, 𝑏] → 𝐂 is continuous as a function of two variables and continuously differentiable in the variable 𝑡, and suppose that the sequences 𝑁

𝑁

lim ∫ |𝑢(𝑥, 𝑡)| 𝑑𝑥,

𝑁→∞

−𝑁

| 𝜕𝑢 | lim ∫ || (𝑥, 𝑡)|| 𝑑𝑥, 𝜕𝑡 𝑁→∞

(13.3.4)

−𝑁

converge uniformly (i.e., independently of 𝑡) on [𝑎, 𝑏] to the corresponding improper integrals. Then for all 𝑡 ∈ [𝑎, 𝑏] and all 𝑥 ∈ 𝐑, ˆ 𝜕𝑢̂ 𝜕𝑢 = . 𝜕𝑡 𝜕𝑡

(13.3.5)

Proof. Problem 13.3.1. So now, if 𝐿 is a differential operator with constant coefficients and 𝑇(𝑦) =

𝜕𝑦 or 𝜕𝑡

𝜕2 𝑦 , assuming (13.3.5) always holds, we may solve a PDE of the form 𝐿(𝑢) = 𝑇(𝑢) as 𝜕𝑡2 follows. (1) Let 𝑣 = 2𝜋𝑖𝛾. By (13.3.5), take the Fourier transform (in the variable 𝑥) of both 𝜕𝑢̂ 𝜕2 𝑢̂ sides of 𝐿(𝑢) = 𝑇(𝑢) to get 𝑝(𝑣)𝑢(𝛾, ̂ 𝑡) = or 2 , where 𝑝(𝑣) is a polynomial in 𝜕𝑡 𝜕𝑡 𝑣 (see Problem 13.2.1). (2) Integrate both sides in 𝑡 to solve for 𝑢(𝛾, ̂ 𝑡), treating 𝑝(𝑣) as a constant and taking ̂ initial conditions 𝑢(𝛾, ̂ 0) = 𝑓(𝛾), etc. (3) Take the inverse transform in 𝑥 to solve for 𝑢(𝑥, 𝑡). Specifically, for the heat equation, we have the following solution. Theorem 13.3.4. Suppose 𝑓 ∈ 𝒮(𝐑), and in the notation of Example 12.2.8, let 𝑔(𝑥, 𝑡) = 𝐺2𝜋√𝑡 (𝑥) =

1 2𝜋√𝑡

exp (−

𝑥2 ). 4𝜋2 𝑡

(13.3.6)

Let 𝑢(𝑥, 𝑡) = (𝑔 ∗ 𝑓)(𝑥, 𝑡) where the convolution is in the variable 𝑥 with 𝑡 fixed. Then 𝑢(𝑥, 𝑡) is a solution to the heat equation (Question 13.3.1). Note that our hypothesis of 𝑓 ∈ 𝒮(𝐑) is far stronger than necessary, both in terms of the smoothness and the decay rate of 𝑓. In fact, it can be shown, using similar methods but with better developed knowledge of integration, that the theorem holds if we only assume 𝑓 ∈ 𝐿1 (𝐑) (see Dym and McKean [DM85, Sect. 2.7.2]). Nevertheless, we stick with the case of 𝑓 ∈ 𝒮(𝐑) because it simplifies the technical details while conveying the main idea of what the Fourier transform contributes. Proof. Problem 13.3.2 formally justifies guessing the above solution 𝑢(𝑥, 𝑡), as long as Theorem 13.3.3 and Table 12.3.1 hold. Problem 13.3.3 then verifies that 𝑢(𝑥, 𝑡) actually does solve the heat equation.

13.3. The heat and wave equations on 𝐑

287

Remark 13.3.5. Recall that in Remark 12.2.10, we described the convolution 𝑓 ∗𝐺2𝜋√𝑡 as “smearing” the function 𝑓 around its domain 𝐑 with a distribution described by 𝐺2𝜋√𝑡 . Recall also that as 𝑡 → 0+ , 𝐺2𝜋√𝑡 approaches a delta function-like spike at 0, and as 𝑡 → ∞, 𝐺2𝜋√𝑡 becomes a wider and wider “bell curve” distribution, as illustrated by Figure 12.2.1. It follows that the solution 𝑓 ∗ 𝐺2𝜋√𝑡 starts out as 𝑓 and then, as time 𝑡 moves forward, becomes evenly smeared out over 𝐑. As mentioned previously in Remark 11.1.7, for a related (and entertaining) discussion of “pulse shape” over time and the resulting difficulties in building transatlantic cables in the 19th century, see Körner [Kör89, Chs. 62, 65–66]. Applying the same formal manipulation used in the proof of Theorem 13.3.4 to the wave equation yields d’Alembert’s solution (see Section 11.3) by yet another manner. Because we have discussed this solution already, we restrict our attention to its derivation as a formal solution. Theorem 13.3.6. Consider the wave equation (Question 13.3.2) with initial values 𝑓, 𝑔 ∈ 𝐿2 (𝐑). Working formally (e.g., assuming that Theorem 13.3.3 and Table 12.3.1 apply) and using the above method, we obtain the formal solution 𝑥+𝑡

𝑢(𝑥, 𝑡) =

1 1 (𝑓(𝑥 + 𝑡) + 𝑓(𝑥 − 𝑡)) + ∫ 𝑔(𝑦) 𝑑𝑦. 2 2 𝑥−𝑡

(13.3.7)

Proof. Problem 13.3.4.

Problems. 13.3.1. (*) (Proves Theorem 13.3.3) Suppose 𝑢 ∶ 𝐑 × [𝑎, 𝑏] → 𝐂 is continuous as a function of two variables and continuously differentiable in the variable 𝑡, and suppose that the sequences 𝑁

𝑁

lim ∫ |𝑢(𝑥, 𝑡)| 𝑑𝑥,

𝑁→∞

−𝑁

| | 𝜕𝑢 lim ∫ || (𝑥, 𝑡)|| 𝑑𝑥, 𝜕𝑡

𝑁→∞

(13.3.8)

−𝑁

converge uniformly (i.e., independently of 𝑡) on [𝑎, 𝑏] to the corresponding improper integrals. Prove that for all 𝑡 ∈ [𝑎, 𝑏] and all 𝑥 ∈ 𝐑, ˆ 𝜕𝑢 𝜕𝑢̂ = . 𝜕𝑡 𝜕𝑡

(13.3.9)

13.3.2. (*) (Proves Theorem 13.3.4) Consider the heat equation as described in Question 13.3.1, with initial values 𝑓 ∈ 𝒮(𝐑). Assume that Table 12.3.1 can be applied to all functions in this problem. 𝜕2 𝑢 𝜕𝑢 = and assuming (13.3.9) holds, show 2 𝜕𝑡 𝜕𝑥 2 2 that 𝑢(𝛾, ̂ 𝑡) = 𝐶(𝛾)𝑒−4𝜋 𝛾 𝑡 for some function 𝐶(𝛾). ̂ (b) Assuming that 𝑢(𝛾, ̂ 0) = 𝑓(𝛾), prove that 𝑢(𝑥, 𝑡) = (𝑔 ∗ 𝑓)(𝑥, 𝑡), where (a) Assuming that 𝑢(𝑥, 𝑡) satisfies

𝑔(𝑥, 𝑡) = 𝐺2𝜋√𝑡 (𝑥) =

1 2𝜋√𝑡

and convolution is in the first variable 𝑥.

exp (−

𝑥2 ) 4𝜋2 𝑡

(13.3.10)

288

Chapter 13. Applications of the Fourier transform

13.3.3. (*) (Proves Theorem 13.3.4) To check that the solution found in Problem 13.3.2 actually works, consider the heat equation as formulated in Question 13.3.1. Suppose 𝑓 ∈ 𝒮(𝐑), let 𝑔(𝑥, 𝑡) = 𝐺2𝜋√𝑡 (𝑥) =

1 2𝜋√𝑡

exp (−

𝑥2 ), 4𝜋2 𝑡

(13.3.11)

and let 𝑢(𝑥, 𝑡) = (𝑔 ∗ 𝑓)(𝑥, 𝑡) = ∫ 𝑔(𝑥 − 𝑦, 𝑡)𝑓(𝑦) 𝑑𝑦 (i.e., convolution in the first −∞

variable).

2 𝛾2 𝑡

−4𝜋 ̂ (a) Fix 𝑡 ≥ 0. Prove that 𝑢(𝛾, ̂ 𝑡) = 𝑓(𝛾)𝑒 again taken in the first variable. (b) Suppose ℎ ∈ 𝒮(𝐑), and let

, where the Fourier transform is

𝐹(𝑥, 𝑡) = (𝑔 ∗ ℎ)(𝑥, 𝑡) = ∫ 𝑔(𝑥 − 𝑦, 𝑡)ℎ(𝑦) 𝑑𝑦.

(13.3.12)

−∞

For 0 < 𝑎 ≤ 𝑡 ≤ 𝑏, prove that |𝐹(𝑥, 𝑡)| ≤

1 2𝜋√𝑎

∫ exp (− −∞

(𝑥 − 𝑦)2 ) |ℎ(𝑦)| 𝑑𝑦, 4𝜋2 𝑏

(13.3.13)

and prove that the right-hand side of (13.3.13) is rapidly decaying as a function of 𝑥. (c) Prove that for 0 < 𝑎 < 𝑏, the sequences 𝑁

𝑁

lim ∫ |𝑢(𝑥, 𝑡)| 𝑑𝑥,

𝑁→∞

−𝑁

| 𝜕𝑢 | lim ∫ || (𝑥, 𝑡)|| 𝑑𝑥 𝜕𝑡 𝑁→∞

(13.3.14)

−𝑁

converge uniformly (i.e., independently of 𝑡) for 𝑡 ∈ [𝑎, 𝑏] to the corresponding improper integrals. (d) Prove that 𝑢(𝑥, 𝑡) is a solution to the heat equation with initial values 𝑓(𝑥). In particular, prove that for fixed 𝑥 ∈ 𝐑, lim+ 𝑢(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0

13.3.4. (*) (Proves Theorem 13.3.6) Consider the wave equation as described in Quesˆ 𝜕𝑢 𝜕𝑢̂ tion 13.3.2. In this problem, we work formally, assuming the validity of = , 𝜕𝑡 𝜕𝑡 convolution formulas, and so on. Suppose 𝜕2 𝑢 𝜕2 𝑢 = 2. 2 𝜕𝑥 𝜕𝑡

(13.3.15)

(a) Assuming Theorem 13.3.3 applies, take the Fourier transform (in the variable 𝑥) of both sides of (13.3.15) to show that 𝑢(𝛾, ̂ 𝑡) = 𝐴(𝛾) cos(2𝜋𝛾𝑡) + 𝐵(𝛾) sin(2𝜋𝛾𝑡)

(13.3.16)

for some functions 𝐴(𝛾), 𝐵(𝛾). 𝜕𝑢̂ (𝛾, 0) = 𝑔(𝛾), ̂ solve for 𝐴(𝛾) and 𝐵(𝛾). 𝜕𝑡 (c) Assuming that Theorem 13.3.3 and Tables 12.3.1 and 13.1.1 hold, prove that

̂ and (b) Assuming that 𝑢(𝛾, ̂ 0) = 𝑓(𝛾)

𝑥+𝑡

𝑢(𝑥, 𝑡) =

1 1 𝑔(𝑦) 𝑑𝑦. (𝑓(𝑥 + 𝑡) + 𝑓(𝑥 − 𝑡)) + ∫ 2 2 𝑥−𝑡

(13.3.17)

13.4. An eigenbasis for the Fourier transform

289

13.4 An eigenbasis for the Fourier transform Recall that in Section 11.6 we considered the Hermite functions 𝑛

ℎ𝑛 (𝑥) = (

(−1)𝑛 𝜋𝑥2 𝑑 2 ( ) 𝑒−2𝜋𝑥 , )𝑒 𝑛! 𝑑𝑥

(13.4.1)

which have the form (Theorem 11.6.2) 2

ℎ𝑛 (𝑥) = 𝐻𝑛 (𝑥)𝑒−𝜋𝑥 ,

(13.4.2)

where 𝐻𝑛 (𝑥) is the Hermite polynomial of degree 𝑛. Recall also that if 𝐾 is the operator in 𝐿2 (𝐑) with domain 𝔇(𝐾) = 𝒮(𝐑) defined by 𝐾(𝑓) = −

𝑑2𝑓 + 4𝜋2 𝑥2 𝑓, 𝑑𝑥2

(13.4.3)

then {ℎ𝑛 (𝑥)} is an orthogonal set of eigenfunctions of 𝐾 in 𝐿2 (𝐑) (Theorem 11.6.6). Finally, we recall (Lemma 11.6.5) that the ℎ𝑛 (𝑥) satisfy the recurrence relation ℎ𝑛′ − (2𝜋𝑥)ℎ𝑛 = −(𝑛 + 1)ℎ𝑛+1 .

(13.4.4)

In this section, we complete the proof of Theorem 11.6.7, which states that {ℎ𝑛 (𝑥)} is an eigenbasis for 𝐾, in the course of proving the following remarkable result. Theorem 13.4.1. Let 𝑈 be the Fourier transform considered as an operator on 𝐿2 (𝐑). Then the set {ℎ𝑛 (𝑥)} is an eigenbasis for 𝑈. Our approach is taken from Dym and McKean [DM85, Sect. 2.5]. We begin by showing that the ℎ𝑛 (𝑥) are eigenfunctions of 𝑈. Theorem 13.4.2. The 𝑛th Hermite function ℎ𝑛 (𝑥) is an eigenfunction of the Fourier transform 𝑈 with eigenvalue (−𝑖)𝑛 . Proof. Let 𝑔𝑛 = 𝑖𝑛 ℎ𝑛̂ . To prove the theorem, it suffices to show that 𝑔𝑛 = ℎ𝑛 , and this is Problem 13.4.1. Having confirmed the eigenfunction portion of Theorem 11.6.7, it remains to show that {ℎ𝑛 (𝑥)} satisfies one of the equivalent conditions of the Isomorphism Theorem for Fourier Series, Theorem 7.6.8. We first have some preliminary lemmas. Lemma 13.4.3. Suppose 𝑓 ∈ 𝐿2 (𝐑) satisfies ⟨𝑓, ℎ𝑛 ⟩ = 0 for any Hermite function ℎ𝑛 . 2 Then for any polynomial 𝑝(𝑥), ⟨𝑓, 𝑝(𝑥)𝑒−𝜋𝑥 ⟩ = 0. Proof. Problem 13.4.2. Lemma 13.4.4. For any 𝑥, 𝛾 ∈ 𝐑 and any 𝑁 ∈ 𝐍, we have that | 𝑁 (−2𝜋𝑖𝛾𝑥)𝑛 | | ≤ 𝑒2𝜋|𝛾𝑥| . |∑ | | 𝑛! | |𝑛=0 Proof. Problem 13.4.3.

(13.4.5)

290

Chapter 13. Applications of the Fourier transform

Proof of Theorem 13.4.1. 2 𝑓(𝑥)𝑒−𝜋𝑥 . Then

Suppose 𝑓 ∈ 𝐿2 (𝐑) and ⟨𝑓, ℎ𝑛 ⟩ = 0 for all 𝑛 ∈ 𝐍. Let 𝐹(𝑥) = ∞

̂ = ∫ 𝑓(𝑥)𝑒−𝜋𝑥2 𝑒−2𝜋𝑖𝛾𝑥 𝑑𝑥 𝐹(𝛾) −∞ ∞

(13.4.6)

2

(−2𝜋𝑖𝛾𝑥)𝑛 ) 𝑑𝑥. 𝑛! 𝑛=0

= ∫ 𝑓(𝑥)𝑒−𝜋𝑥 ( ∑ −∞

Let 𝑁

2

(−2𝜋𝑖𝛾𝑥)𝑛 ), 𝑛! 𝑛=0

𝐺𝑁 (𝑥) = 𝑓(𝑥)𝑒−𝜋𝑥 ( ∑ 𝐺(𝑥) = |𝑓(𝑥)| 𝑒−𝜋𝑥

2 +2𝜋|𝛾𝑥|

(13.4.7) (13.4.8)

.

By Lemma 13.4.4, we see that |𝐺𝑁 (𝑥)| ≤ 𝐺(𝑥) for all 𝑁 ∈ 𝐍. Furthermore, since 2 +2𝜋|𝛾𝑥|

𝑒−𝜋𝑥

2 +2𝜋𝛾𝑥

≤ 𝑒−𝜋𝑥

+ 𝑒−𝜋𝑥

2

2 −2𝜋𝛾𝑥

,

−𝜋𝑥2 +2𝜋|𝛾𝑥|

(13.4.9) 2

a sum of two nonnegative functions in 𝐿 (𝐑), we see that 𝑒 ∈ 𝐿 (𝐑) and therefore, that 𝐺 ∈ 𝐿1 (𝐑) (Theorem 7.5.11). It follows that we may apply dominated convergence (Lebesgue Axiom 4) to (13.4.6). Therefore, ∞

𝑁

𝑛 ̂ = lim ∫ 𝑓(𝑥) ( ∑ (−2𝜋𝑖𝛾𝑥) ) 𝑒−𝜋𝑥2 𝑑𝑥 = 0, 𝐹(𝛾) 𝑛! 𝑁→∞ −∞ 𝑛=0

(13.4.10)

by Lemma 13.4.3. Taking inverse transforms (Theorem 12.4.2), we see that 𝐹(𝑥) = 0 a.e., which means that 𝑓(𝑥) = 0 a.e. Condition (4) of the Isomorphism Theorem for Fourier Series, Theorem 7.6.8, is therefore satisfied, and the theorem follows.

Problems. 13.4.1. (*) (Proves Theorem 13.4.2) In this problem, we use the operator notation for the Fourier transform (Remark 12.3.2), in the sense that we use the variable 𝑥 for both a function and its Fourier transform. Let 𝑔𝑛 (𝑥) = 𝑖𝑛 ℎ𝑛̂ (𝑥), where ℎ𝑛 is the 𝑛th Hermite function. (a) By taking the Fourier transform of the identity ℎ𝑛′ (𝑥) − (2𝜋𝑥)ℎ𝑛 (𝑥) = −(𝑛 + 1)ℎ𝑛+1 (𝑥)

(13.4.11)

from Lemma 11.6.5, prove that 𝑔′𝑛 (𝑥) − (2𝜋𝑥)𝑔𝑛 (𝑥) = −(𝑛 + 1)𝑔𝑛+1 (𝑥).

(13.4.12)

(b) Prove that for 𝑛 ≥ 0, we have 𝑔𝑛 = ℎ𝑛 . 13.4.2. (*) (Proves Lemma 13.4.3) Suppose 𝑓 ∈ 𝐿2 (𝐑) satisfies ⟨𝑓, ℎ𝑛 ⟩ = 0 for any Her2 mite function ℎ𝑛 . Prove that for any polynomial 𝑝(𝑥), ⟨𝑓, 𝑝(𝑥)𝑒−𝜋𝑥 ⟩ = 0. 13.4.3. (*) (Proves Lemma 13.4.4) Prove that for any 𝑥, 𝛾 ∈ 𝐑 and any 𝑁 ∈ 𝐍, 𝑛| |𝑁 | ∑ (−2𝜋𝑖𝛾𝑥) | ≤ 𝑒2𝜋|𝛾𝑥| . | | 𝑛! |𝑛=0 |

(13.4.13)

13.5. Continuous-valued quantum observables

291

13.5 Continuous-valued quantum observables Recall that in Section 11.7, we saw that the mathematical model of quantum mechanics found in Schrödinger’s equation can be abstracted as follows. (1) The state of a quantum mechanical system (e.g., a particle) at time 𝑡 can be expressed as a function Ψ(𝑥, 𝑡) such that for fixed 𝑡, Ψ(𝑥, 𝑡) ∈ 𝐿2 (𝐑) and ‖Ψ(𝑥, 𝑡)‖ = 1. (2) An observable quantity of our system is represented by a Hermitian operator 𝑀 in 𝐿2 (𝐑). (3) In the special case of an observable 𝑀 where we can find an orthonormal eigenbasis {𝜓𝑛 } for 𝑀 with 𝑀(𝜓𝑛 ) = 𝜆𝑛 𝜓𝑛 , then: • •

The only possible values of the observable quantity are the eigenvalues 𝜆𝑛 of 𝑀. If Ψ = ∑ 𝑐𝑛 𝜓𝑛 , then upon measurement, the state of the system collapses into the single state 𝜓𝑛 corresponding to the observed value 𝜆𝑛 with probability 2 |𝑐𝑛 | .

Two important observables in quantum mechanics are: • •

The position operator 𝑀𝑥 (𝑓(𝑥)) = 𝑥𝑓(𝑥). 𝑓′ (𝑥) 1 𝑑 The momentum operator (𝑓(𝑥)) = . (Note that the factor of 2𝜋 here 2𝜋𝑖 𝑑𝑥 2𝜋𝑖 is nonstandard and appears because our definition of the Fourier transform uses 𝑒−2𝜋𝑖𝛾𝑥 instead of 𝑒−𝑖𝜋𝛾𝑥 .)

However, as the reader may recall (Example 10.3.4 and Theorem 10.3.7), neither of these operators has any eigenfunctions or eigenvalues, so they cannot possibly have corresponding eigenbases. The questions arise, then: What are the possible values of position and momentum for a quantum system in 𝐑, and how are these observables modeled mathematically? Or to be more specific, what replaces an eigenbasis in the framework mentioned above? In this section, we focus less on results and more on developing a language for describing observables with continuous spectra, at least in one special case. In fact, while the Fourier transform is used at one key point, most of what we discuss here does not rely on it. Nevertheless, since the Fourier transform is the continuous analogue of Fourier series, this seems to be an appropriate place to consider the continuous analogue to the eigenbasis decomposition interpretation of measurement from Section 11.7. For simplicity and concreteness, we restrict our attention to one particular type of measurement of a continuous observable, taken from Braginsky, Khalili, and Thorne [BKT92, Sect. 2.6]; see the entirety of that reference for much more about quantum measurement. Turning first to the position operator, we have the following replacement for point (3) (above). Suppose a quantum system representing a particle is in the state Ψ(𝑥, 𝑡) at a given (fixed) time 𝑡. Then: •

The possible values of position are all 𝑥 ∈ 𝐑.

The only kind of measurement we consider is a “YES/NO detector” on a closed interval [𝑎, 𝑏] ⊆ 𝐑. That is, we are only allowed to ask the question, “Is the position of the particle between 𝑥 = 𝑎 and 𝑥 = 𝑏?”

292 •

Chapter 13. Applications of the Fourier transform Upon measurement at time 𝑡, a particle is found to have position in [𝑎, 𝑏] with 𝑏

2

probability ∫ |Ψ(𝑥, 𝑡)| 𝑑𝑥, and when the answer for a given particle is YES, the 𝑎

state of the system collapses to the state Ψ1 = Ψ0 / ‖Ψ0 ‖, where Ψ0 (𝑥) = {

Ψ(𝑥, 𝑡)

if 𝑎 ≤ 𝑥 ≤ 𝑏,

0

otherwise.

(13.5.1)

Note that we never consider the probability of the particle’s position collapsing to a single 𝑥 ∈ 𝐑; it only makes sense to discuss the probability of the particle being in some range [𝑎, 𝑏]. 1 𝑑 For the momentum operator , the possible values taken by the system in 2𝜋𝑖 𝑑𝑥 the state Ψ(𝑥, 𝑡) are not as immediately visible as they are with the position operator. What makes them visible is the Fourier transform, because by Remark 12.3.4, 𝑈(

1 𝑑 ) = 𝑀𝑥 𝑈. 2𝜋𝑖 𝑑𝑥

(13.5.2)

In physical terms, the Fourier transform turns the momentum operator in position space into the multiplication operator 𝑀𝛾 in the transform space, which is therefore called momentum space. We may therefore apply the same interpretation to Ψ̂ and momentum that we did for Ψ and position, namely, that: •

The possible values of momentum are all 𝛾 ∈ 𝐑.

Again, the only measurement we consider is the case of a “YES/NO detector” on a closed interval [𝑎, 𝑏] in momentum space. That is, we only measure the answer to the question, “Is the momentum of the particle between 𝛾 = 𝑎 and 𝛾 = 𝑏?”

Upon measurement at time 𝑡, a particle is found to have momentum in [𝑎, 𝑏] with 𝑏

2

̂ 𝑡)| 𝑑𝛾, and when the answer for a given particle is “YES”, the probability ∫ |Ψ(𝛾, 𝑎

state of the system collapses to the state Ψ̂ 1 = Ψ̂ 0 / ‖‖Ψ̂ 0 ‖‖, where Ψ(𝛾, 𝑡) Ψ̂ 0 (𝛾) = { 0

if 𝑎 ≤ 𝛾 ≤ 𝑏, otherwise.

(13.5.3)

Note that by the isomorphism theorem for the Fourier transform 12.5.4, we have that ∞

2

2

̂ 𝑡)| 𝑑𝛾 = 1; in other words, |Ψ(𝛾, ̂ 𝑡)| is indeed a genuine probability distri∫ |Ψ(𝛾, −∞

bution. To continue our discussion of the interpretation of Ψ(𝑥, 𝑡) and observables, we consider the expected value (or in other words, the mean value) of an observable 𝑀 for a given state Ψ. If 𝑀 has an eigenbasis {𝜓𝑛 } with the eigenvalue 𝜆𝑛 representing the observed value in the eigenstate 𝜓𝑛 and Ψ = ∑ 𝑐𝑛 𝜓𝑛 , then because our system collapses to 2 the state 𝜓𝑛 with probability |𝑐𝑛 | , we see that over many measurements, the expected value will be 2 ∑ 𝜆𝑛 |𝑐𝑛 | = ⟨∑ 𝜆𝑛 𝑐𝑛 𝜓𝑛 , ∑ 𝑐𝑛 𝜓𝑛 ⟩ (13.5.4) = ⟨𝑀 (∑ 𝑐𝑛 𝜓𝑛 ), ∑ 𝑐𝑛 𝜓𝑛 ⟩ = ⟨𝑀(Ψ), Ψ⟩ ,

13.5. Continuous-valued quantum observables

293

where the first equality holds by the Isomorphism Theorem for Fourier Series, Theorem 7.6.8, and the second holds by the Diagonalization Theorem, Theorem 10.4.3. Similarly, for the position operator 𝑀𝑥 , we see that the expected value of position for a particle in state Ψ(𝑥, 𝑡) is ∞

2

∫ 𝑥 |Ψ(𝑥, 𝑡)| 𝑑𝑥 = ∫ 𝑥Ψ(𝑥, 𝑡)Ψ(𝑥, 𝑡) 𝑑𝑥 = ⟨𝑥Ψ(𝑥, 𝑡), Ψ(𝑥, 𝑡)⟩ = ⟨𝑀𝑥 (Ψ), Ψ⟩ , −∞

−∞

(13.5.5) 1 𝑑 and for the momentum operator , we see that the expected value of momentum 2𝜋𝑖 𝑑𝑥 is ∞ 2 ̂ 𝑡)| 𝑑𝛾 = ⟨𝑀𝛾 (Ψ), ̂ Ψ⟩̂ = ⟨ 1 𝑑 (Ψ), Ψ⟩ , ∫ 𝛾 |Ψ(𝛾, (13.5.6) 2𝜋𝑖 𝑑𝑥 −∞ where the last equality follows from the Isomorphism Theorem for the Fourier Transform, Theorem 12.5.4. All of this is meant to motivate the following definition. Definition 13.5.1. Let 𝑇 be an operator in 𝐿2 (𝐑), not necessarily assumed to be Hermitian or representing an observable. We define the expected value of 𝑇 to be the function (13.5.7) ⟨𝑇⟩ = ⟨𝑇(Ψ), Ψ⟩ for all Ψ ∈ 𝔇(𝑇) with ‖Ψ‖ = 1. The reader should note the (well-established) abuse of notation whereby dependency on Ψ is implied on one side of (13.5.7) and explicit on the other. By a further abuse of notation, if Ψ is understood, we also use ⟨𝑇⟩ to denote the operation of scalar multiplication by ⟨𝑇(Ψ), Ψ⟩. So for example, with Ψ fixed, we have (13.5.8) ⟨⟨𝑇⟩Ψ, Ψ⟩ = ⟨𝑇⟩⟨Ψ, Ψ⟩ = ⟨𝑇⟩ = ⟨𝑇(Ψ), Ψ⟩ . With Definition 13.5.1 as our starting point, in the rest of this section, following Nielsen and Chuang [NC11, Sect. 2.2.5], we combine some probability theory and operator algebra to prove Heisenberg’s famous uncertainty principle. Throughout, we fix a Hilbert space ℋ. We assume that each operator 𝑀 in ℋ has the same domain 𝔇(𝑀) = ℋ0 and that ℋ0 is invariant under each operator 𝑀 (i.e., 𝑀(ℋ0 ) ⊆ ℋ0 ); as a consequence, we may form arbitrary compositions and linear combinations of operators. (Lest this last assumption seem restrictive, we note that for all of the operators we have mentioned in connection with quantum mechanics, ℋ0 = 𝒮(𝐑) works as a common invariant domain in ℋ = 𝐿2 (𝐑).) Definition 13.5.2. Let 𝑇 be an operator in ℋ. To say that 𝑇 is skew-Hermitian means that for every 𝑓, 𝑔 ∈ 𝔇(𝑇), ⟨𝑇𝑓, 𝑔⟩ = − ⟨𝑓, 𝑇𝑔⟩. Theorem 13.5.3. Let 𝑇 be an operator in ℋ. If 𝑇 is Hermitian, then the expected value ⟨𝑇⟩ (Definition 13.5.1) is real, or in other words, for every Ψ ∈ 𝔇(𝑇), ⟨𝑇(Ψ), Ψ⟩ ∈ 𝐑. Similarly, if 𝑇 is skew-Hermitian, then ⟨𝑇⟩ is purely imaginary, or in other words, for every Ψ ∈ 𝔇(𝑇), ⟨𝑇(Ψ), Ψ⟩ = 𝑏𝑖 for some 𝑏 ∈ 𝐑. Proof. Problem 13.5.1. Definition 13.5.4. Let 𝑀 be a Hermitian operator in ℋ, and note that as discussed in Definition 13.5.1, if we fix Ψ, since ⟨𝑀⟩ is real-valued (Theorem 13.5.3), we can think of

294

Chapter 13. Applications of the Fourier transform

⟨𝑀⟩ as a Hermitian operator (a real multiple of the identity). We may therefore define the (squared) standard deviation, or variance, of 𝑀 to be 𝜎(𝑀)2 = ⟨(𝑀 − ⟨𝑀⟩)2 ⟩ .

(13.5.9)

Note that, continuing the abuse of notation from Definition 13.5.1, (13.5.9) depends implicity on Ψ, ‖Ψ‖ = 1, so that 𝜎(𝑀)2 applied to Ψ is (applying the Hermitian property) 2

⟨(𝑀 − ⟨𝑀⟩)2 ⟩ = ⟨(𝑀 − ⟨𝑀⟩)Ψ, (𝑀 − ⟨𝑀⟩)Ψ⟩ = ‖(𝑀 − ⟨𝑀⟩)Ψ‖ .

(13.5.10)

2

Note also that, as in probability and statistics, since ⟨(𝑀 − ⟨𝑀⟩) ⟩ represents the expected value of the squared distance between 𝑀Ψ and ⟨𝑀⟩ Ψ, a larger variance for an observable quantity indicates a wider spread distribution, on average, for that quantity, which we can think of as being a “greater uncertainly” in the value of the observable. To practice proper notational abuse, the reader may want to try proving the following standard result from probability. Theorem 13.5.5. If 𝑀 is a Hermitian operator in a Hilbert space ℋ, then 2

𝜎(𝑀)2 = ⟨𝑀 2 ⟩ − ⟨𝑀⟩ .

(13.5.11)

Proof. Problem 13.5.2. In any case, continuing with operator algebra: Definition 13.5.6. Let 𝐴 and 𝐵 be operators in ℋ with common invariant domain ℋ0 . We define the commutator of 𝐴 and 𝐵 to be [𝐴, 𝐵] = 𝐴𝐵 − 𝐵𝐴,

(13.5.12)

and we define the anticommutator of 𝐴 and 𝐵 to be {𝐴, 𝐵} = 𝐴𝐵 + 𝐵𝐴.

(13.5.13)

Lemma 13.5.7. Let 𝐴 and 𝐵 be Hermitian operators in ℋ with common invariant domain ℋ0 . Then the commutator [𝐴, 𝐵] is skew-Hermitian, and the anticommutator {𝐴, 𝐵} is Hermitian. Proof. Problem 13.5.3. We come to the uncertainty principle itself. Theorem 13.5.8 (Uncertainty Principle). Let 𝑆 and 𝑇 be Hermitian operators in ℋ with common invariant domain ℋ0 . Then 2

𝜎(𝑆)2 𝜎(𝑇)2 ≥

|⟨[𝑆, 𝑇]⟩| . 4

(13.5.14)

Proof. Problem 13.5.4. 1 𝑑 For example, if 𝑆 = 𝑀𝑥 is the position operator in 𝐿2 (𝐑) and 𝑇 = is the 2𝜋𝑖 𝑑𝑥 momentum operator, then because 𝑑 𝑑 𝑑 (13.5.15) (𝑀𝑥 (𝑓)) = (𝑥𝑓(𝑥)) = 𝑥𝑓′ (𝑥) + 𝑓(𝑥) = 𝑀𝑥 + 1, 𝑑𝑥 𝑑𝑥 𝑑𝑥

13.5. Continuous-valued quantum observables we have that

295

[𝑆, 𝑇] = 𝑆𝑇 − 𝑇𝑆

𝑑 𝑑 1 − (𝑀𝑥 + 1)) (𝑀𝑥 (13.5.16) 2𝜋𝑖 𝑑𝑥 𝑑𝑥 1 =− . 2𝜋𝑖 1 The Uncertainty Principle then says that 𝜎(𝑆)2 𝜎(𝑇)2 ≥ , no matter what Ψ is. 16𝜋2 Note that contrary to popular belief, the point is not that we can be certain about only one of the quantities of position and momentum; indeed, one of the fundamental precepts of quantum mechanics is that we can be certain of neither! The point is actually that the more certain we are about the position of a particle (i.e., the smaller 𝜎(𝑆)2 is), the less certain we can be about its momentum (i.e., the larger 𝜎(𝑇)2 must be), and vice versa. =

Remark 13.5.9. Note that we have stated and proved the Uncertainty Principle, Theorem 13.5.8, in the setting of operators in an arbitrary Hilbert space, with no reference to either integrals or series. For a discussion of the foundations of quantum mechanics at roughly this level of abstraction, see Section 14.4.

Problems. For all problems, let ℋ be a Hilbert space. 13.5.1. (Proves Theorem 13.5.3) Let 𝑇 be an operator in ℋ. (a) Prove that if 𝑇 is Hermitian and Ψ ∈ 𝔇(𝑇), then ⟨𝑇(Ψ), Ψ⟩ ∈ 𝐑. (b) Prove that if 𝑇 is skew-Hermitian and Ψ ∈ 𝔇(𝑇), then ⟨𝑇(Ψ), Ψ⟩ = 𝑏𝑖 for some 𝑏 ∈ 𝐑. 2

13.5.2. Let 𝑀 be a Hermitian operator in ℋ. Prove that 𝜎(𝑀)2 = ⟨𝑀 2 ⟩ − ⟨𝑀⟩ , or in other words, prove that for Ψ ∈ 𝔇(𝑀) with ‖Ψ‖ = 1, we have ⟨(𝑀 − ⟨𝑀⟩)Ψ, (𝑀 − ⟨𝑀⟩)Ψ⟩ = ⟨𝑀Ψ, 𝑀Ψ⟩ − ⟨⟨𝑀⟩Ψ, ⟨𝑀⟩Ψ⟩ .

(13.5.17)

13.5.3. (Proves Lemma 13.5.7) Let 𝐴 and 𝐵 be Hermitian operators in ℋ with common invariant domain ℋ0 . (a) Prove that the commutator [𝐴, 𝐵] is skew-Hermitian. (b) Prove that the anticommutator {𝐴, 𝐵} is Hermitian. 13.5.4. (Proves Theorem 13.5.8) Let 𝐴, 𝐵, 𝑆, and 𝑇 be Hermitian operators in ℋ with common invariant domain ℋ0 . (a) Prove that ⟨{𝐴, 𝐵}⟩ and ⟨[𝐴, 𝐵]⟩ are the real and imaginary parts, respectively, of 2 ⟨𝐴𝐵⟩, and consequently, 2

2

2

|⟨{𝐴, 𝐵}⟩| + |⟨[𝐴, 𝐵]⟩| = 4 |⟨𝐴𝐵⟩| .

(13.5.18)

2

(b) Prove that |⟨𝐴𝐵⟩| ≤ ⟨𝐴2 ⟩ ⟨𝐵 2 ⟩. (c) Taking 𝐴 = 𝑆 − ⟨𝑆⟩ and 𝐵 = 𝑇 − ⟨𝑇⟩, prove that 2

𝜎(𝑆)2 𝜎(𝑇)2 ≥

|⟨[𝑆, 𝑇]⟩| . 4

(13.5.19)

296

Chapter 13. Applications of the Fourier transform

13.6 Poisson summation and theta functions The Poisson summation formula is the following striking result. Theorem 13.6.1 (Poisson Summation). Suppose 𝑓 ∈ 𝐶 1 (𝐑) and there exist constants 𝐶 and 𝑝 > 1 such that |𝑓(𝑥)| ≤

𝐶 𝑝, |𝑥|

|𝑓′ (𝑥)| ≤

𝐶 𝑝 |𝑥|

(13.6.1)

for all 𝑥 ∈ 𝐑. Then for all 𝑥 ∈ 𝐑, ̂ ∑ 𝑓(𝑥 + 𝑛) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥), 𝑛∈𝐙

(13.6.2)

𝑛∈𝐙

where 𝑓 ̂ is the Fourier transform of 𝑓. In particular, taking 𝑥 = 0, we have ̂ ∑ 𝑓(𝑛) = ∑ 𝑓(𝑛). 𝑛∈𝐙

(13.6.3)

𝑛∈𝐙

Proof. Problem 13.6.1. One classic application of the Poisson formula is the study of the following function. Definition 13.6.2. For 𝜏 ∈ 𝐑, 𝜏 > 0, the Jacobi theta function is defined to be 2

Θ(𝜏) = ∑ 𝑒−𝜋𝑛 𝜏 .

(13.6.4)

𝑛∈𝐙

Corollary 13.6.3. The Jacobi theta function satisfies the identity Θ(𝜏) =

1

1 Θ( ) 𝜏 √𝜏

(13.6.5)

for all 𝜏 > 0. Furthermore, for 𝑦 ∈ 𝐑, 𝑧 = 𝑖𝑦, 𝑦 > 0, if we let 𝑞 = 𝑒𝜋𝑖𝑧 and 2

2 𝑖𝑧

𝜃(𝑧) = ∑ 𝑞𝑛 = ∑ 𝑒𝜋𝑛 𝑛∈𝐙

,

(13.6.6)

𝑛∈𝐙

then we have 1 𝜃 (− ) = (√𝑧/𝑖)𝜃(𝑧), 𝑧

(13.6.7)

where we define √𝑧/𝑖 = √𝑦. Note that we must be specific about how we define √𝑧/𝑖 because the square root function cannot be defined consistently over all of 𝐂; see, for example, Waterhouse [Wat12]. Proof. Problem 13.6.2. Remark 13.6.4. If we rewrite 𝜃(𝑧) from (13.6.6) as a function of 𝑞 = 𝑒𝜋𝑖𝜏 , we get ∞ 2

𝜃(𝑞) = ∑ 𝑞𝑛 = ∑ 𝑎𝑘 𝑞𝑘 , 𝑛∈𝐙

𝑘=0

(13.6.8)

13.6. Poisson summation and theta functions

297

where ⎧2 if 𝑘 is a nonzero square, 𝑎𝑘 = (number of 𝑛 ∈ 𝐙 such that 𝑛 = 𝑘) = 1 if 𝑘 = 0, ⎨ ⎩0 otherwise. 2

(13.6.9)

We say that 𝜃 is a generating function for 𝑎𝑘 , or in other words, 𝜃 is a series whose coefficients 𝑎𝑘 have an interesting combinatorial interpretation. Many interesting generating functions have both notable combinatorial and notable analytic properties; for example, (13.6.7), extended to all 𝑧 ∈ 𝐂 with positive imaginary part, along with the property 𝜃(𝑧 + 2) = 𝜃(𝑧), implies that 𝜃(𝑧) is what is called a modular form. (For the reader who is somewhat familiar with modular forms, the fact that 𝜃 has period 2 and not period 1 means that 𝜃 is modular with respect to a proper subgroup of the full modular group.) Jacobi used modularity to prove a quantitative version of Lagrange’s four squares theorem, which states that any nonnegative integer is the sum of four squares. Roughly speaking, Jacobi’s proof starts with 2

2

2

2

𝜃(𝑧)4 = ( ∑ 𝑞𝑛1 ) ( ∑ 𝑞𝑛2 ) ( ∑ 𝑞𝑛3 ) ( ∑ 𝑞𝑛4 ) 𝑛𝑖 ∈𝐙

𝑛𝑖 ∈𝐙

𝑛𝑖 ∈𝐙

𝑛𝑖 ∈𝐙

=

∑ 𝑞

𝑛21 +𝑛22 +𝑛23 +𝑛24

= ∑ 𝑏𝑘 𝑞

(𝑛𝑖 )∈𝐙4

(13.6.10)

𝑘

𝑘=0

and derives a formula for 𝑏𝑘 coming from the fact that 𝜃(𝑧)4 is modular. In particular, this formula implies that 𝑏𝑘 ≥ 1 for all 𝑘 ≥ 0 (the four squares theorem). See, for example, Diamond and Shurman [DS05]. To give one other application of Poisson summation, we tie up a loose end from Section 11.1. Recall that we found the solution −4𝜋2 𝑛2 𝑡 ̂ (13.6.11) 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 𝑒𝑛 (𝑥) 𝑛∈𝐙

to the heat equation on the circle, where 𝑓 ∈ 𝐿2 (𝑆 1 ) is our given initial value. We then showed that the pointwise initial value condition lim 𝑢(𝑥, 𝑡) = 𝑓(𝑥)

𝑡→0+

(13.6.12)

holds for all 𝑓 ∈ 𝐶 1 (𝑆 1 ), and we also claimed that (13.6.12) still holds if we relax the assumption 𝑓 ∈ 𝐶 1 (𝑆 1 ) to 𝑓 ∈ 𝐶 0 (𝑆 1 ). We now use Poisson summation to prove this claim, beginning with the following slight variation on a familiar definition. Definition 13.6.5. Combining Definitions 8.3.1 and 12.2.4, to say that a one-parameter family of continuous functions 𝐾𝑡 ∶ 𝑆 1 → 𝐑 (𝑡 ∈ 𝐑, 𝑡 > 0) is a Dirac kernel means that: 1 1

(1) For all 𝑡 > 0 and all 𝑥 ∈ [− , ], 𝐾𝑡 (𝑥) ≥ 0. 2 2

1/2

(2) For all 𝑡 > 0, ∫

𝐾𝑡 (𝑥) 𝑑𝑥 = 1.

−1/2

(3) For any fixed 𝛿 > 0, we have lim ∫

𝑡→0+

𝛿≤|𝑥|≤

1 2

𝐾𝑡 (𝑥) 𝑑𝑥 = 0.

(13.6.13)

298

Chapter 13. Applications of the Fourier transform

Combining the proofs of Theorems 8.4.1 and 12.2.5 then gives the following result. (The details of the proof, which are essentially nothing new, are omitted to avoid excess repetition.) Theorem 13.6.6. If {𝐾𝑡 } is a Dirac kernel and 𝑓 ∈ 𝐶 0 (𝑆 1 ), then lim (𝑓 ∗ 𝐾𝑡 )(𝑥) = 𝑓(𝑥),

𝑡→0+

(13.6.14)

where convergence is uniform on 𝑆 1 . So now, let 𝐻𝑡 (𝑥) = ∑ 𝑒−4𝜋

2 𝑛2 𝑡

𝑒𝑛 (𝑥)

𝑛∈𝐙

(13.6.15)

= 1 + 2 ∑ 𝑒−4𝜋

2 𝑛2 𝑡

cos(2𝜋𝑛𝑥)

𝑛=1

be the heat kernel. We can then reformulate our solution (13.6.11) as follows. Theorem 13.6.7. For 𝑓 ∈ 𝐶 0 (𝑆 1 ), let 𝑢 be the solution (13.6.11) to the heat equation. Then for 𝑥 ∈ 𝑆 1 and 𝑡 > 0, we have that 𝑢(𝑥, 𝑡) = (𝑓 ∗ 𝐻𝑡 )(𝑥). Proof. Problem 13.6.3.

Figure 13.6.1. The heat kernel 𝐻𝑡 (𝑥) (𝑡 = 0.1, 0.01, 0.001) Therefore, by Theorem 13.6.6, it remains to show that 𝐻𝑡 is a Dirac kernel. Looking at examples (see Figure 13.6.1), this certainly seems plausible, but rigorously speaking, it is not obvious even that 𝐻𝑡 (𝑥) ≥ 0. The following result, however, clarifies matters considerably.

13.6. Poisson summation and theta functions

299

Theorem 13.6.8. For all 𝑥 ∈ 𝑆 1 , we have that 𝐻𝑡 (𝑥) = ∑ 𝐺√4𝜋𝑡 (𝑥 + 𝑛).

(13.6.16)

𝑛∈𝐙

In other words, 𝐻𝑡 (𝑥) is the “periodized” version of the Gauss kernel 𝐺√4𝜋𝑡 (𝑥) =

1 √4𝜋𝑡

exp (

−𝑥2 ). 4𝑡

(13.6.17)

(See Sections 12.2 and 12.3 for relevant facts about the Gauss kernel.) Proof. Problem 13.6.4. The fact that the heat kernel is a Dirac kernel in the sense of Definition 13.6.5 is then a consequence of the following more general result. Theorem 13.6.9. Let 𝐾𝑡 ∶ 𝐑 → 𝐑 be a Dirac kernel on 𝐑 (Definition 12.2.4), and suppose that for fixed 𝑡 > 0, 𝐻𝑡 (𝑥) = ∑ 𝐾𝑡 (𝑥 + 𝑛)

(13.6.18)

𝑛∈𝐙 1 1

converges uniformly on [− , ]. Then 𝐻𝑡 (𝑥) is a Dirac kernel on 𝑆 1 , in the sense of Defi2 2 nition 13.6.5. Proof. Uniform convergence proves that 𝐻𝑡 is continuous (Theorem 4.3.12), and the condition 𝐻𝑡 (𝑥) ≥ 0 follows because 𝐾𝑡 (𝑥) ≥ 0. The other conditions of Definition 13.6.5 are proven in Problem 13.6.5. In any case, we can now finally prove the result promised in Section 11.1. Corollary 13.6.10. For 𝑓 ∈ 𝐶 0 (𝑆 1 ), let −4𝜋2 𝑛2 𝑡 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 𝑒𝑛 (𝑥).

(13.6.19)

𝑛∈𝐙

Then lim+ 𝑢(𝑥, 𝑡) = 𝑓(𝑥). 𝑡→0

Proof. Fix 𝑡 > 0. Since 𝐺√4𝜋𝑡 (𝑥) is strictly decreasing as |𝑥| → ∞, we see that for 1 1

𝑛 ≠ 0 and 𝑥 ∈ [− , ], 2 2

||𝐺 | √4𝜋𝑡 (𝑥 + 𝑛)| ≤ 𝐺√4𝜋𝑡 (|𝑛| − 1) =

1 √4𝜋𝑡

exp (

−𝜋(|𝑛| − 1)2 ) ≤ 𝐶𝑎|𝑛|−1 4𝜋𝑡

(13.6.20)

for some 𝐶 > 0, 0 < 𝑎 < 1. Therefore, since ∑ 𝐶𝑎|𝑛|−1 is a convergent geometric series, by the Weierstrass 𝑀-test (Theorem 4.3.7), the heat kernel (13.6.16) satisfies the convergence hypothesis of Theorem 13.6.9. The corollary then follows by Theorems 13.6.6, 13.6.8, and 13.6.9.

300

Chapter 13. Applications of the Fourier transform

Problems. 13.6.1. (*) (Proves Theorem 13.6.1) Suppose 𝑓 ∈ 𝐶 1 (𝐑) and there exist constants 𝐶 and 𝐶 𝐶 ′ 𝑝 > 1 such that |𝑓(𝑥)| ≤ 𝑝 and |𝑓 (𝑥)| ≤ 𝑝 for all 𝑥 ∈ 𝐑. Let |𝑥| |𝑥| ℎ(𝑥) = ∑ 𝑓′ (𝑥 + 𝑛).

𝑔(𝑥) = ∑ 𝑓(𝑥 + 𝑛), 𝑛∈𝐙

(13.6.21)

𝑛∈𝐙

(a) Prove that both 𝑔 and ℎ converge absolutely and uniformly on [0, 1]. (b) Prove that 𝑔(𝑥 + 1) = 𝑔(𝑥), or in other words, 𝑔 is a well-defined function on 𝑆1 . (c) Prove that 𝑔′ (𝑥) = ℎ(𝑥) and, therefore, that 𝑔 ∈ 𝐶 1 (𝑆 1 ). ̂ (d) Prove that for 𝑛 ∈ 𝐙, 𝑔(𝑛) ̂ = 𝑓(𝑛), where 𝑔(𝑛) ̂ is the 𝑛th Fourier series coeffî is the Fourier transform of 𝑓 evaluated at 𝑛. cient of 𝑔 and 𝑓(𝑛) (e) Prove that for all 𝑥 ∈ 𝐑, ̂ ∑ 𝑓(𝑥 + 𝑛) = ∑ 𝑓(𝑛)𝑒 𝑛 (𝑥). 𝑛∈𝐙

(13.6.22)

𝑛∈𝐙 2𝜏

13.6.2. (*) (Proves Corollary 13.6.3) Let Θ(𝜏) = ∑ 𝑒−𝜋𝑛

for 𝜏 > 0, and let 𝜃(𝑧) =

𝑛∈𝐙 2 𝑖𝑧

∑ 𝑒𝜋𝑛

for 𝑦 ∈ 𝐑, 𝑧 = 𝑖𝑦, 𝑦 > 0.

𝑛∈𝐙

(a) Use Poisson summation to prove that Θ(𝜏) =

1

1 Θ( ) 𝜏 √𝜏

(13.6.23)

for all 𝜏 > 0. (b) Prove that for 𝑦 ∈ 𝐑, 𝑧 = 𝑖𝑦, 𝑦 > 0, we have 1 𝜃 (− ) = (√𝑧/𝑖)𝜃(𝑧), 𝑧

(13.6.24)

where we define √𝑧/𝑖 = √𝑦. 13.6.3. (*) (Proves Theorem 13.6.7) Suppose 𝑓 ∈ 𝐶 0 (𝑆 1 ), 𝐻𝑡 (𝑥) = ∑ 𝑒−4𝜋

2 𝑛2 𝑡

𝑒𝑛 (𝑥),

𝑛∈𝐙

𝑥 ∈ 𝑆 1 , and 𝑡 > 0. Let −4𝜋2 𝑛2 𝑡 ̂ 𝑢(𝑥, 𝑡) = ∑ 𝑓(𝑛)𝑒 𝑒𝑛 (𝑥).

(13.6.25)

𝑛∈𝐙

Prove that 𝑢(𝑥, 𝑡) = (𝑓 ∗ 𝐻𝑡 )(𝑥), where the convolution is taken on 𝑆 1 . In particular, prove that all function series involved in this problem converge absolutely and uniformly over all 𝑥 ∈ 𝑆 1 . 13.6.4. (*) (Proves Theorem 13.6.8) Prove that for 𝑥 ∈ 𝐑, ∑ 𝑒−4𝜋 𝑛∈𝐙

2 𝑛2 𝑡

𝑒𝑛 (𝑥) = ∑ 𝐺√4𝜋𝑡 (𝑥 + 𝑛). 𝑛∈𝐙

(13.6.26)

13.7. Miscellaneous applications of the Fourier transform

301

13.6.5. (*) (Proves Theorem 13.6.9) Let 𝐾𝑡 ∶ 𝐑 → 𝐑 be a Dirac kernel on 𝐑 (Definition 12.2.4), and suppose that for fixed 𝑡 > 0, 𝐻𝑡 (𝑥) = ∑ 𝐾𝑡 (𝑥 + 𝑛)

(13.6.27)

𝑛∈𝐙 1 1

converges uniformly on [− , ]. 2 2

(a) Prove that for fixed 𝑁 ∈ 𝐍, 1/2

𝑁+1/2

𝑁

∑ 𝐾𝑡 (𝑥 + 𝑛) = ∫

−1/2 𝑛=−𝑁

𝐾𝑡 (𝑥) 𝑑𝑥.

(13.6.28)

−𝑁−1/2 1/2

(b) Prove that for fixed 𝑡 > 0, ∫

𝐻𝑡 (𝑥) 𝑑𝑥 = 1.

−1/2

(c) Fix 𝛿 > 0. By rewriting ∞

𝐻𝑡 (𝑥) = 𝐾𝑡 (𝑥) + ∑ 𝐾𝑡 (𝑥 + 𝑛) + ∑ 𝐾𝑡 (𝑥 − 𝑛) 𝑛=1

(13.6.29)

𝑛=1

prove that lim ∫

𝑡→0+

𝛿≤|𝑥|≤

1 2

𝐻𝑡 (𝑥) 𝑑𝑥 = 0.

(13.6.30)

13.7 Miscellaneous applications of the Fourier transform In this section, we collect some other miscellaneous applications of the Fourier transform: local-global decay-smoothness relations (Subsection 13.7.1), the sampling theorem (Subsection 13.7.2), and the continuous Wiener-Khinchin theorem (Subsection 13.7.3).

13.7.1 Decay and smoothness for Fourier transforms. Recall that in Theorems 6.4.2 and 8.5.1, we exhibited a close connection between the smoothness of a function 𝑓 ∈ 𝐿2 (𝑆 1 ) and the decay rate of its Fourier coefficients. Here, we show that similar results hold for the Fourier transform. We first prove a direct analogue of Theorem 6.4.2. Theorem 13.7.1. Suppose 𝑓 ∶ 𝐑 → 𝐂 is a function such that for some 𝑘 ≥ 1, we have that 𝑓, 𝑓′ , … , 𝑓(𝑘) are all in 𝐶 0 (𝐑)∩𝐿2 (𝐑) and 𝑓(𝑘) ∈ 𝐿1 (𝐑). Then there exists some 𝐶 > 0 ̂ || ≤ 𝐶 for all 𝛾 ∈ 𝐑. such that ||𝑓(𝛾) 𝑘 |2𝜋𝛾| Proof. Problem 13.7.1. We also have the following converse, which has no direct analogue for Fourier series. (Note, however, that by Fourier inversion, this result also gives a transform analogue of Theorem 8.5.1.) Theorem 13.7.2. Suppose 𝑓 ∈ 𝐶 0 (𝐑) has the property that |𝑓(𝑥)| ≤ constants 𝐶 > 0 and 𝑝 > 1. Then 𝑓 ̂ ∈ 𝐶 𝑘 (𝐑) for 0 ≤ 𝑘 < 𝑝 − 1.

𝐶 𝑝 for some |𝑥|

302

Chapter 13. Applications of the Fourier transform

Proof. Problem 13.7.2.

13.7.2 The sampling theorem. Suppose a signal 𝑓(𝑥) has the property that for

̂ some 𝑏 > 0, 𝑓(𝛾) = 0 for all |𝛾| > 𝑏. We may think of this condition as stating that the frequencies present in 𝑓 are limited to some finite range, and so we say that 𝑓 is band-limited. The sampling theorem states that if 𝑓 is band-limited, then 𝑓 may be reconstructed by taking “samples” (finding values) of 𝑓 at discrete time intervals. Specifically:

̂ = 0 for all 𝛾 ∈ 𝐑 such that |𝛾| > 1 . Theorem 13.7.3. For 𝑓 ∈ 𝐿2 (𝐑), suppose that 𝑓(𝛾) 2 Then 𝑓(𝑥) = ∑ 𝑓(𝑛) sinc(𝜋(𝑥 − 𝑛)). (13.7.1) 𝑛∈𝐙

The sampling theorem says, roughly, that if 𝑓 is band-limited with (absolute) fre1 quency |𝛾| ≤ , then at least in principle, we can recover 𝑓 by taking samples 2

… , 𝑓(−2), 𝑓(−1), 𝑓(0), 𝑓(1), 𝑓(2), … at twice that rate (frequency 1). As a real-life example, one reason that audio compact discs (and later, MP3 sound files) were designed to work at a sample rate of 44.1 kHz (44,100 cycles per second) is that human hearing is only sensitive to frequencies ranging up to about 20 kHz. Proof. Problem 13.7.3. Remark 13.7.4. A number of authors have been cited as discovering the sampling theorem. The first explicit published result is due to Shannon [Sha49], but the basic idea appeared earlier in Nyquist [Nyq28]. The sampling theorem is therefore sometimes called the Nyquist-Shannon sampling theorem (or just the Nyquist sampling theorem), 1 and the frequencies corresponding to and 1 in Theorem 13.7.3, scaled appropriately, 2 are known as the Nyquist frequency and the Nyquist rate, respectively. See Oppenheim, Willsky, and Nawab [OWN97, Sec. 7.1.1] for a discusssion.

13.7.3 The continuous-time Wiener-Khinchin theorem. Recall that in Subsection 8.5.4, we considered statistics coming from (discrete) time series 𝑥 ∶ 𝐙 → 𝐂 and their frequency responses (i.e., discrete-time Fourier transforms) 𝑥̂ ∶ 𝑆 1 → 𝐂. Specifi2 cally, define the power spectrum of 𝑥 to be 𝑆𝑥 (𝛾) = |𝑥(𝛾)| ̂ , which we think of as describing the (continuous) distribution of signal power among all frequencies, and define the autocorrelation function of 𝑥(𝑡) with time lag 𝜏 to be 𝑟𝑥 (𝜏) = ⟨𝑥(𝑡), 𝑥(𝑡 − 𝜏)⟩, which we think of as measuring the correlation between 𝑥 and 𝑥 time-shifted by 𝜏. Then the discrete-time Wiener-Khinchin theorem, Theorem 8.5.12, says 1

𝑟𝑥 (𝜏) = ∫ 𝑆𝑥 (𝛾)𝑒𝜏 (𝛾) 𝑑𝛾.

(13.7.2)

0

In other words, 𝑟𝑥 (𝜏) is precisely the (−𝜏)th Fourier coefficient of the power spectrum 𝑆𝑥 (𝛾). In the continouous setting, we proceed in exactly the same manner to get the analogous result. For the convenience of the reader, we will proceed independently of

13.7. Miscellaneous applications of the Fourier transform

303

the discrete discussion, though we will be a bit more terse to avoid excess repetition. Throughout, we use the variable 𝑡 for the function variable and 𝛾 for the transform variable. Definition 13.7.5. Let 𝑥 ∈ 𝐿2 (𝐑) represent a (continuous-time) signal. Because Theorem 12.5.4 can be interpreted as saying that the Fourier transform preserves the total 2 power of 𝑥(𝑡), we can think of the function 𝑆𝑥 (𝛾) = |𝑥(𝛾)| ̂ as describing the (continuousfrequency) distribution of signal power among all frequencies. We therefore define 𝑆𝑥 (𝛾) to be the power spectrum of 𝑥(𝑡). Definition 13.7.6. Statistically, we may interpret the 𝐿2 (𝐑) inner product ⟨𝑥(𝑡), 𝑦(𝑡)⟩ = ∞

∫ 𝑥(𝑡)𝑦(𝑡) 𝑑𝑡 as describing the extent to which 𝑥 and 𝑦 are “pointed in the same di−∞

rection”, or correlated. For 𝜏 ∈ 𝐑, we therefore define ∞

𝑟𝑥 (𝜏) = ⟨𝑥(𝑡), 𝑥(𝑡 − 𝜏)⟩ = ∫ 𝑥(𝑡)𝑥(𝑡 − 𝜏) 𝑑𝑡

(13.7.3)

−∞

to be the autocorrelation function of 𝑥(𝑡) with time lag 𝜏, since 𝑟𝑥 (𝜏) describes how 𝑥(𝑡) is correlated with 𝑥(𝑡 − 𝜏) (i.e., 𝑥(𝑡) shifted by time 𝜏). In the above terms, the following theorem describes the relationship between the autocorrelation function and the power spectrum of a time series 𝑥(𝑡). Theorem 13.7.7 (Continuous-time Wiener-Khinchin). For 𝑥(𝑡) ∈ 𝐿2 (𝐑), we have ∞

𝑟𝑥 (𝜏) = ∫ 𝑆𝑥 (𝛾)𝑒2𝜋𝑖𝛾𝜏 𝑑𝛾.

(13.7.4)

−∞

In other words, the autocorrelation function value 𝑟𝑥 (𝜏) is precisely the inverse Fourier transform of the power spectrum 𝑆𝑥 (𝛾). Proof. Problem 13.7.4. Remark 13.7.8. Again, as in Subsection 8.5.4, Theorem 13.7.7 is really an “easy” case of the Wiener-Khinchin theorem, in that Wiener’s theorem extends (13.7.4) to a situation where the required Fourier transforms may not be defined, and Khinchin’s contribution is to extend Wiener’s result to the case where 𝑥(𝑡) is a random variable. As before, however, we hope that Theorem 13.7.7 gives a flavor of what Wiener-Khinchin says.

Problems. 13.7.1. (*) (Proves Theorem 13.7.1) Suppose 𝑓 ∶ 𝐑 → 𝐂 is a function such that for some 𝑘 ≥ 1, we have that 𝑓, 𝑓′ , … , 𝑓(𝑘) are all in 𝐶 0 (𝐑) ∩ 𝐿2 (𝐑) and 𝑓(𝑘) ∈ 𝐿1 (𝐑). (𝑘) is a bounded function on 𝐑. (a) Prove that 𝑓ˆ

̂ || ≤ (b) Prove that there exists some 𝐶 > 0 such that ||𝑓(𝛾)

𝐶 𝑘

|2𝜋𝛾|

for all 𝛾 ∈ 𝐑.

13.7.2. (*) (Proves Theorem 13.7.2) Suppose 𝑓 ∈ 𝐶 0 (𝐑) has the property that |𝑓(𝑥)| ≤ 𝐶 𝑝 for some constants 𝐶 > 0 and 𝑝 > 1. |𝑥|

304

Chapter 13. Applications of the Fourier transform (a) Prove that for 0 ≤ 𝑘 < 𝑝 − 1, we have that 𝑥𝑘 𝑓(𝑥) ∈ 𝐶 0 (𝐑) ∩ 𝐿1 (𝐑) ∩ 𝐿2 (𝐑). (b) Prove that for 0 ≤ 𝑘 ≤ 𝑝 − 1, we have that ∞

̂ (𝛾) = ∫ (2𝜋𝑖𝑥)𝑘 𝑓(𝑥)𝑒2𝜋𝑖𝛾𝑥 𝑑𝑥 𝑓(𝑘)

(13.7.5)

−∞

and 𝑓𝑘̂ is continuous on 𝐑. 13.7.3. (*) (Proves Theorem 13.7.3) Suppose 𝑓 ∈ 𝐿2 (𝐑) has the property that for all 1 ̂ 𝛾 ∈ 𝐑 such that |𝛾| > , 𝑓(𝛾) = 0. 2 1 1 (a) Considering 𝑓 ̂ as a function in 𝐿2 ([− , ]) = 𝐿2 (𝑆1 ), let 𝑐𝑛 be the 𝑛th Fourier 2 2 series coefficient of 𝑓.̂ Prove that 𝑐𝑛 = 𝑓(−𝑛).

(b) Prove that ∑ 𝑓(𝑛)𝑒−2𝜋𝑖𝑛𝛾

̂ = {𝑛∈𝐙 𝑓(𝛾) 0

1

if |𝛾| ≤ , 2 1

(13.7.6)

if |𝛾| > , 2

where convergence is in 𝐿2 (𝐑). (Note the signs and variable names.) (c) Use Fourier inversion to prove that 𝑓(𝑥) = ∑ 𝑓(𝑛) sinc(𝜋(𝑥 − 𝑛)) 𝑛∈𝐙 2

in 𝐿 (𝐑), where sinc is defined in Section 13.1. 13.7.4. (*) (Proves Theorem 13.7.7) Taking 𝑡 as our function variable and 𝛾 as our trans2 form variable, for 𝑥 ∈ 𝐿2 (𝐑), define 𝑆𝑥 (𝛾) = |𝑥(𝛾)| ̂ . ∞

(a) Explain why the integral ∫ 𝑆𝑥 (𝛾)𝑒2𝜋𝑖𝛾𝜏 𝑑𝛾 is well-defined, even though 𝑆𝑥 may not be in 𝐿2 (𝐑). (b) Prove that

−∞

⟨𝑥(𝑡), 𝑥(𝑡 − 𝜏)⟩ = ∫ 𝑆𝑥 (𝛾)𝑒2𝜋𝑖𝛾𝜏 𝑑𝛾. −∞

(13.7.7)

14 What’s next? One of the most profound ideas in mathematics, the Langlands program, relates number theory to function theory (harmonic analysis) on very special moduli spaces. . . . This is an extremely exciting and active area of mathematics, which counts among its recent triumphs the proof of Fermat’s Last Theorem. . . . — David Ben-Zvi, “Moduli spaces”, The Princeton Companion to Mathematics The reason why quantum probability is potentially useful for modeling the ambiguity and contextuality of humor is that, whereas in classical probability theory, events are drawn from a common sample space, in quantum probability theory, events can be drawn from different sample spaces. States and variables are defined with respect to a particular context, represented using a basis vector in a Hilbert space. . . . — Liane Gabora, “Toward a Quantum Model of Humor”, Psychology Today, April 6, 2017 As mentioned in the Introduction, while this book aims to be a satisfying “last math book you’ll ever read”, our not-so-secret hope is that it won’t be the last math book you’ll ever read. This raises the question: What’s next? As the reader can see from the epigraphs above, the places one can go next range from the sublime to (the study of) the ridiculous, and in this chapter, we briefly discuss a few of those places. Specifically, besides the most natural next stop of learning more analysis (Section 14.1), we describe where the reader can go to learn more about signal processing and distributions (Section 14.2), wavelets (Section 14.3), quantum mechanics (Section 14.4), spectral methods and number theory (Section 14.5), and abstract harmonic analysis (Section 14.6). Note that while not all of the references we give in each section will be accessible to the reader based only on the background from this book, we believe that each section gives some kind of a starting place on each topic for a reader who has absorbed the material presented here. Throughout, we stay mainly in descriptive mode, avoiding proofs to concentrate on the big picture. 305

306

Chapter 14. What’s next?

14.1 What’s next: More analysis As also discussed in the Introduction, probably the biggest gap in this book is our axiomatic approach to the Lebesgue integral, and the reader interested in continuing to study analysis should now go back and learn integration theory for real. Canonical sources include Royden and Fitzpatrick [RF10] and Rudin [Rud86]; the former has long been used in graduate analysis, and while the latter can be tough sledding for newcomers, it is an authoritative reference. More recent, and more accessible, sources include Nelson [Nel15] and Johnston [Joh15]; the former takes a fairly standard approach, and the latter takes an unusual approach (the Daniell-Riesz integral) that defines the Lebesgue integral without having to develop measure theory first. For viewpoints on Fourier analysis that are complementary to what we have done in this book, see Körner [Kör89] and Stein and Shakarchi [SS03]. We should also give special mention to Holland [Hol07], which is more oriented towards applications than proofs, but whose choice of topics nevertheless greatly influenced ours. In particular, Holland is a good place to see more of the Sturm-Liouville theory found in our Chapter 11, as is Al-Gwaiz [AG08]. For graduate-level Fourier analysis, we mention Dym and McKean [DM85] and Terras [Ter13]. The former is a classic, and the latter bends in the direction of number theory, but both are favorites of the author. The reader interested in Fourier analysis and number theory will eventually need a foundation in complex analysis, that is, the calculus of functions 𝑓 ∶ 𝐂 → 𝐂, as opposed to the functions 𝑓 ∶ 𝐑 → 𝐂 that are our main focus. Classic sources include Conway [Con78, Con96] and Ahlfors [Ahl79]; more recent and more accessible texts include Bak and Newman [BN10] and Needham [Nee99].

14.2 What’s next: Signal processing and distributions Probably the most common everyday use of Fourier analysis is, in one form or another, the study of signal processing. To paraphrase the foreward to Oppenheim, Willsky, and Nawab [OWN97], the basic idea of signal processing is to study a system that takes ̃ as input a signal 𝑓(𝑡) and produces some corresponding output 𝑓(𝑡). In these terms, two of the main problems in signal processing can be stated (somewhat abstractly) as follows. ̃ will (1) Given a system and an input signal 𝑓(𝑡), describe/predict what the output 𝑓(𝑡) be. ̃ with desired characteristics from (2) Design a system that will produce an output 𝑓(𝑡) a given 𝑓(𝑡). To give one concrete example, recall (Theorem 12.5.8) that for sufficiently “nice” functions 𝑓, 𝑔 ∶ 𝐑 → 𝐂, the convolution 𝑓 ∗ 𝑔 has the property ˆ ̂ 𝑔(𝛾). 𝑓 ∗ 𝑔(𝛾) = 𝑓(𝛾) ̂

(14.2.1)

14.2. What’s next: Signal processing and distributions

307

So if, for example, we want to remove all frequences with |𝛾| > Γ from the signal 𝑓, we can take the convolution of 𝑓 with a function 𝑔 such that

𝑔(𝛾) ̂ ={

1 if |𝛾| ≤ Γ, 0 otherwise.

(14.2.2)

By Table 13.1.1, the inversion theorem, and the fact that sinc is an even function, we see that taking the convolution of 𝑓(𝑡) with 𝑔(𝑡) = 2Γ sinc(2𝜋Γ𝑡)

(14.2.3)

works. Convolution with 𝑔(𝑡) is therefore called an ideal lowpass filter, in that only the low-magnitude frequencies of 𝑓(𝑡) are allowed to “pass through.” Convolution can also be used, for example, to simulate the reverb (echoing, or lack thereof) of a specific audio environment, real or imagined; see Opitz [Opi96]. One curious aspect of signal processing, which is the most obviously applicable of the topics discussed in this chapter, is that everyday signal processing often uses one of the more sophisticated concepts in analysis we will discuss here, namely, that of a distribution. To be specific, the reader who looks at a standard engineering table of Fourier transforms may notice that many rows have mysterious entries like ̂ = 1, 𝛿(𝛾)

(14.2.4)

or in other words, the Fourier transform of the Dirac delta function is the constant function 1. On the one hand, this is useful, as delta functions and constant signals certainly model the real-life phenomena of impulse signals and constant inputs/outputs. On the other hand, the alert reader will recall that we have said many times that 𝛿(𝑥) is not a function (see Section 8.3) and may also notice that the constant function 1 is contained in neither 𝐿2 (𝐑) nor 𝐿1 (𝐑). What this means is that, for entirely practical reasons, we need to extend the Fourier transform to the following class of objects. Definition 14.2.1. Recall that 𝐶𝑐∞ (𝐑) is the space of smooth functions with compact support. A distribution is a linear function Λ ∶ 𝐶𝑐∞ (𝐑) → 𝐂 that is continuous on 𝐶𝑐∞ (𝐑) in a particular sense (see Rudin [Rud91, Ch. 6] for precise details), and a tempered distribution is a continuous linear function Λ ∶ 𝒮(𝐑) → 𝐂. (Note that a fortiori, every tempered distribution defines a distribution, and in fact one can think of tempered distributions as distributions that are not too “wild” at infinity.) For example, if 𝑓 ∶ 𝐑 → 𝐂 is locally (Lebesgue) integrable (see Definition 4.8.1) then the function Λ𝑓 ∶ 𝐶𝑐∞ (𝐑) → 𝐂 given by ∞

Λ𝑓 (𝜙) = ∫ 𝑓(𝑥)𝜙(𝑥) 𝑑𝑥 −∞

(14.2.5)

308

Chapter 14. What’s next?

is a distribution. It follows that distributions generalize a wide class of functions on 𝐑. For another example, define 𝛿 ∶ 𝐶𝑐∞ (𝐑) → 𝐂 by 𝛿(𝜙) = 𝜙(0).

(14.2.6)

This is also a distribution, and so we finally have a rigorous definition of the Dirac delta “function.” Distributions allow us to extend differential calculus to many functions and distributions (like 𝛿(𝑥)) that are not differentiable in the ordinary sense. For example, if Λ is a distribution, then imitating the identity ∞

∫ 𝑓′ (𝑥)𝜙(𝑥) 𝑑𝑥 = − ∫ 𝑓(𝑥)𝜙′ (𝑥) 𝑑𝑥 −∞

(14.2.7)

−∞

coming from (what else?) integration by parts for 𝑓 ∈ 𝐶 1 (𝐑) and 𝜙 ∈ 𝐶𝑐∞ (𝐑), we define the derivative of Λ to be Λ′ (𝜙) = −Λ(𝜙′ ). Similarly, if Λ is a tempered distribution, then imitating “pass the hat” (Theorem 12.3.7), we define the Fourier transform of Λ ̂ For more about distributions and their use in solving differential ̂ by Λ(𝜙) = Λ(𝜙). equations, see Rudin [Rud91] and Hörmander [Hör90]. For the discrete/algebraic minded, we would be remiss not to mention the discrete Fourier transform (DFT), which, for a fixed 𝑁 ∈ 𝐍, is a transform on functions 𝑓 ∶ {0, … , 𝑁 − 1} → 𝐂 defined by 𝑁−1

̂ = 1 ∑ 𝑓(𝑛)𝑒−2𝜋𝑖𝑛𝑘/𝑁 . 𝑓(𝑘) 𝑁 𝑛=0

(14.2.8)

We see that the DFT is both an analogue of the Fourier transform and, taking 𝑥 = 𝑛/𝑁, also a discrete approximation of Fourier series on 𝑆 1 . Perhaps most notably, the DFT can be computed via the fast Fourier transform (FFT), with an exponential (𝑁 log 𝑁 vs. 𝑁 2 ) decrease in required computing time. That speedup, and the ubiquity of signal processing in modern life, is why the FFT was named one of the top 10 algorithms of the 20th century (Dongarra and Sullivan [DS00]). For more on the DFT and FFT, see Rockmore [Roc00].

14.3 What’s next: Wavelets As we have seen, in terms of signal processing (i.e., taking time 𝑡 to be our function variable), Fourier analysis (either series or transform) takes a signal 𝑓(𝑡) and encapsulates the part of that signal occuring at a single frequency 𝑛 ∈ 𝐙 (or 𝛾 ∈ 𝐑) in the Fourier ̂ (or 𝑓(𝛾)). ̂ coefficient 𝑓(𝑛) However, the resulting transform 𝑓 ̂ (in either sense) is not localized in time. Indeed, from the beginning, we have thought of Fourier coefficients as being averages over time (see Section 6.1), and one of the most interesting features of Fourier series is how they transform local features of 𝑓(𝑡), like differentiability, into ̂ global features of 𝑓(𝑛), like decay as 𝑥 → ±∞ (see Section 6.4 and Subsection 8.5.1). In contrast, wavelets localize a signal both in frequency and in time. While a fuller discussion is beyond the scope of this book, we will illustrate the idea of time localization with the simplest and oldest (Haar [Haa10]) example of a wavelet transform: the Haar wavelet basis for 𝐿2 ([0, 1]).

14.3. What’s next: Wavelets

309

Definition 14.3.1. The Haar wavelet family, as illustrated schematically in Figure 14.3.1, is defined to be the set of all 𝑤𝑛𝑘 ∈ 𝐿2 ([0, 1]) (𝑛 ≥ 0, 0 ≤ 𝑘 ≤ 2𝑛 − 1) defined by 𝑘 𝑘 1 ⎧−1 if 2𝑛 ≤ 𝑥 < 2𝑛 + 2𝑛+1 , ⎪ 𝑘 1 𝑘+1 𝑤𝑛𝑘 (𝑥) = (14.3.1) if 𝑛 + 𝑛+1 ≤ 𝑥 < 𝑛 , ⎨1 2 2 2 ⎪ otherwise, ⎩0 along with the constant function 1.

Figure 14.3.1. The functions 𝑤𝑛𝑘 from the Haar wavelet family In terms of theory, our main point about the Haar family {1, 𝑤𝑛𝑘 } is: Theorem 14.3.2. The Haar family ℬ = {1, 𝑤𝑛𝑘 } is an orthogonal basis for 𝐿2 ([0, 1]). Proof. Problem 14.3.1 shows that ℬ is an orthogonal set. To prove that ℬ is an orthogonal basis, we first show that (Problem 14.3.2): For every 𝑔 ∈ 𝐶 0 ([0, 1]) and every 𝜖 > 0, there exists a finite subset 𝑆 ⊆ [0, 1] and a (finite) linear combination ℎ(𝑥) of elements of ℬ = {1, 𝑤𝑛𝑘 } such that for all 𝑥 ∈ [0, 1] and 𝑥 ∉ 𝑆, we have that |𝑔(𝑥) − ℎ(𝑥)| < 𝜖. Since the integral of a piecewise continuous function is not affected by its values at finitely many points, this means that for every 𝜖 > 0, there exists a (finite) linear combination ℎ(𝑥) of elements of ℬ = {1, 𝑤𝑛𝑘 } such that the 𝐿2 norm ‖𝑔 − ℎ‖ < 𝜖. The theorem then follows by the same reasoning as the proof of the Inversion Theorem for Fourier Series, Theorem 8.1.1, in Section 8.4. In practical terms, the time localization properties of wavelets make them particularly useful for many applications. For example, consider a periodic signal 𝑓(𝑡) with a discontiuity at one particular location in 𝑆 1 . While that discontinuity will affect every ̂ Fourier coefficient 𝑓(𝑛), the time-localized nature of a wavelet series (e.g., the generalized Fourier series of 𝑓 with respect to the Haar basis) means that the same discontiuity will affect only a relatively sparse set of wavelet coefficients.

310

Chapter 14. What’s next?

The multiresolution nature of many wavelet families also makes them useful for data compression. To give an idea of why this might be true, consider the Haar wavelet series of 𝑓 ∈ 𝐿2 ([0, 1]) converging to 𝑓 in 𝐿2 : The 1 term is a function with the same average value as 𝑓 on [0, 1]; the 𝑤00 term then corrects this to a function with the same 1 1 average value as 𝑓 on both [0, ] and [ , 1]; the 𝑤1𝑘 terms correct this to a function 2

2

𝑘 𝑘+1

with the same average value as 𝑓 on each interval [ , ], and so on. This idea of an 4 4 approximation being resolved first coarsely and then at successively finer scales may seem familiar to any reader who has ever tried to view a slow-loading image over the Internet, and this is no coincidence: Multiresolution wavelets are actually incorporated in, for example, the JPEG2000 image compression standard. See Van Fleet [Fle] for a discussion of both the wavelets used in JPEG2000 and the discrete cosine transform used in the older JPEG standard. For authoritative references on wavelets, see, for example, Daubechies [Dau92] and Mallat [Mal08]; for approachable introductions, see Nievergelt [Nie00] and Walker [Wal08].

Problems. 14.3.1. (Proves Theorem 14.3.2) For 𝑛 ≥ 0, 0 ≤ 𝑘 ≤ 2𝑛 − 1, let 𝑘 1 𝑘 ⎧−1 if 2𝑛 ≤ 𝑥 < 2𝑛 + 2𝑛+1 , ⎪ 𝑘+1 𝑘 1 𝑤𝑛𝑘 (𝑥) = if 𝑛 + 𝑛+1 ≤ 𝑥 < 𝑛 , ⎨1 2 2 2 ⎪ 0 otherwise, ⎩

(14.3.2)

and let ℬ = {1, 𝑤𝑛𝑘 }. Prove that ℬ is an orthogonal set in 𝐿2 ([0, 1]). 14.3.2. (Proves Theorem 14.3.2) Let 𝑤𝑛𝑘 be defined as in (14.3.2), and for 𝑛 ≥ 0, 0 ≤ 𝑘 ≤ 2𝑛 − 1, let 𝑘 𝑘+1 1 if 𝑛 ≤ 𝑥 < 𝑛 , 2 2 (14.3.3) 𝜒𝑛𝑘 = { 0 otherwise. Note that 𝜒00 is precisely the constant function 1, except at 𝑥 = 1. (a) Prove that each of 𝜒10 and 𝜒11 is a linear combination of 𝜒00 and 𝑤00 . (b) Use induction on 𝑛 to prove that each 𝜒𝑛𝑘 (𝑛 ≥ 0, 0 ≤ 𝑘 ≤ 2𝑛 − 1) is a linear combination of {𝜒00 } ∪ {𝑤𝑚𝑘 ∣ 𝑚 < 𝑛, 0 ≤ 𝑘 ≤ 2𝑚 − 1}. (c) Fix 𝑔 ∈ 𝐶 0 ([0, 1]) and 𝜖 > 0. Prove that there exists a finite subset 𝑆 ⊆ [0, 1] and a (finite) linear combination ℎ(𝑥) of elements of ℬ = {1, 𝑤𝑛𝑘 } such that for all 𝑥 ∈ [0, 1] and 𝑥 ∉ 𝑆, we have that |𝑔(𝑥) − ℎ(𝑥)| < 𝜖.

14.4 What’s next: Quantum mechanics Another prominent application of the Fourier analysis and the Hilbert space theory we have discussed is in quantum mechanics. Here, we will go into a bit more detail than with the other sections in this chapter, because the general framework we discuss here, adapted from Nielsen and Chuang [NC11], is not easy to find in one place, and we think it may be helpful to the reader as a guide in studying other sources.

14.4. What’s next: Quantum mechanics

311

Specifically, so far in this book we have discussed the quantum mechanics of a single particle moving in one dimension. In Section 11.7, we discussed solutions to Schrödinger’s equation and the discrete-valued observable energy, and in Section 13.5, we looked at the continuous-valued observables position and momentum. In this section, we describe a set of axioms for quantum mechanics that encompasses all of these aspects in a common framework. We begin by defining calculus for functions taking values in a Hilbert space. The definitions should seem familiar, but they do need to be restated in this setting. Definition 14.4.1. Let 𝑋 be a subinterval of 𝐑 and let ℋ be a Hilbert space. For 𝑐 ∈ 𝑋, to say that Φ ∶ (𝑋\ {𝑐}) → ℋ has a limit 𝑓 ∈ ℋ as 𝑡 approaches 𝑐 means that for every 𝜖 > 0, there exists some 𝛿(𝜖) > 0 such that if 𝑡 ≠ 𝑐 and |𝑡 − 𝑐| < 𝛿(𝜖), then ‖Φ(𝑡) − 𝑓‖ < 𝜖. In that case, we write lim Φ(𝑡) = 𝑓. 𝑡→𝑐

(14.4.1)

Definition 14.4.2. Let 𝑋 be a subinterval of 𝐑 and let ℋ be a Hilbert space. To say that Φ ∶ 𝑋 → ℋ is differentiable at 𝑐 ∈ 𝑋 means that the limit 1 𝑓 = lim ( (14.4.2) ) (Φ(𝑡) − Φ(𝑐)) 𝑡→𝑐 𝑡 − 𝑐 𝑑Φ exists. In that case, we define (𝑐) = Φ′ (𝑐) = 𝑓. 𝑑𝑡 We also need to define some new operator terminology, though to shorten the discussion, we will keep certain technical points vague. Let 𝑇 be an operator in a Hilbert space ℋ. To say that 𝑇 is selfadjoint means that 𝑇 is Hermitian and 𝔇(𝑇) is “sufficiently large” (see Reed and Simon [RS80, VIII.2] for a precise definition). If 𝑇 is a selfadjoint operator, then the spectrum 𝜎(𝑇) of 𝑇 is the set of all 𝜆 ∈ 𝐂 such that the operator 𝜆𝐼 −𝑇 (𝐼 is the identity on ℋ) is not a bijection of 𝔇(𝑇) onto ℋ with a bounded inverse. For example, if 𝜆 is an eigenvalue of 𝑇, then 𝜆𝐼 − 𝑇 is not invertible, so 𝜆 ∈ 𝜎(𝑇). We may now define quantum mechanics in terms of four axioms. (1) Axiom 1: State space. For any isolated physical system (e.g., an electron, or the universe), there is a Hilbert space ℋ, called the state space of the system. The state of that system at a given time 𝑡 is represented by a unit vector Ψ(𝑡) ∈ ℋ. Note that if, for example, ℋ = 𝐿2 (𝐑), then Ψ(𝑡) is a function on 𝐑, so we may prefer to think of Ψ as a function of two variables Ψ(𝑥, 𝑡). Note also, however, that it is sometimes useful to consider finite-dimensional (and therefore, more algebraic in flavor) Hilbert spaces, like 𝐂𝑛 with the dot product (Example 7.1.5). (2) Axiom 2: Time evolution. For any isolated physical system with state space ℋ, the state Ψ(𝑡) of the system changes in one of the following (equivalent) ways: •

(2C) Continuous time evolution. There exists a selfadjoint linear operator 𝐻 on ℋ, called the Hamiltonian of the system, such that for any time 𝑡, 𝑑Ψ 𝐻(Ψ(𝑡)) = 𝑖ℏ (14.4.3) , 𝑑𝑡 where the derivative is taken in the sense of Definition 14.4.2 and ℏ is Planck’s constant. (For simplicity, we have been pretending that ℏ = 1 in this book, but the reader should be aware of its presence in other sources.)

312

Chapter 14. What’s next? •

(2D) Discrete time evolution. For any two times 𝑡1 < 𝑡2 , there exists a unitary operator (see (12.4.3) in the statement of Theorem 12.4.3) 𝑈(𝑡1 , 𝑡2 ) on ℋ such that (14.4.4) 𝑈(𝑡1 , 𝑡2 )Ψ(𝑡1 ) = Ψ(𝑡2 ).

For an explanation (in one direction) as to why (2C) and (2D) describe the same idea, see Problem 14.4.1. (3) Axiom 3: Observables. An observable quantity (position, momentum, spin, etc.) of our isolated system is represented by a selfadjoint operator 𝑀 on ℋ. The basic idea is that when we have a system that is currently in state Ψ and we measure the observable corresponding to 𝑀, the state of the system collapses into a randomly chosen (generalized) eigenstate (i.e., generalized eigenfunction) of 𝑀, with a probability distribution determined, roughly speaking, by the coordinates of Ψ relative to an orthogonal basis of (generalized) eigenstates. More specifically, here are the two cases of measurement we have discussed previously. •

Discrete spectrum. If {𝜓𝑛 } is an orthonormal eigenbasis for 𝑀 with corresponding eigenvalues 𝜆𝑛 , then as mentioned in Section 11.7, the only possible values of the observable are the 𝜆𝑛 , and upon measurement in state Ψ ∈ ℋ, the state of the system collapses into the eigenstate 𝜓𝑛 corresponding to the observed 2 value 𝜆𝑛 with probability |𝑐𝑛 | .

Continuous spectrum. If ℋ = 𝐿2 (𝐑) and 𝑀 is the multiplication operator 𝑀(𝑓(𝑥)) = 𝑥𝑓(𝑥), then as mentioned in Section 13.5, the possible values of the observable are all 𝑥 ∈ 𝐑. If we measure the YES/NO question “Is the value of the observable 𝑥 ∈ [𝑎, 𝑏]?”, the answer is YES with probability 𝑏

2

∫ |Ψ(𝑥, 𝑡)| 𝑑𝑥, and when the answer is YES, the state of the system collapses 𝑎

to the state Ψ1 = Ψ0 / ‖Ψ0 ‖, where Ψ0 (𝑥) = {

Ψ(𝑥, 𝑡) 0

if 𝑎 ≤ 𝑥 ≤ 𝑏, otherwise.

(14.4.5)

In general, an observable, or actually any selfadjoint operator, can have a mix of the two behaviors (eigenvalues and multiplication operators); see Reed and Simon [RS80, Thm. VIII.4] for a precise statement. For more on the physical interpretation of measurement, see Braginsky, Khalili, and Thorne [BKT92]. (4) Axiom 4: Composite systems. The last axiom is the most algebraic, so we only discuss it briefly and for the sake of completeness. Suppose ℋ1 and ℋ2 are the state spaces of two isolated physical systems. Then the state space of the composite system is the tensor product ℋ1 ⊗ ℋ2 . Without going into the details of the definition of ℋ1 ⊗ ℋ2 , suffice it to say that ℋ1 ⊗ ℋ2 is also a Hilbert space, and if {𝜙1𝑚 } and {𝜙2𝑛 } are orthogonal bases for ℋ1 and ℋ2 , respectively, then there exists an orthogonal basis {𝜙1𝑚 ⊗ 𝜙2𝑛 } for ℋ1 ⊗ ℋ2 . For finite-dimensional ℋ1 and ℋ2 , this means that instead of adding the dimensions of ℋ1 and ℋ2 to get the dimension of ℋ1 ⊗ ℋ2 , we multiply them.

14.4. What’s next: Quantum mechanics

313

It turns out that one of the more interesting recent applications of this abstract framework for quantum mechanics comes in the study of quantum computing. In one common approach (see Nielsen and Chuang [NC11]), the fundamental unit of quantum computing is the qubit, which is a quantum system with state space ℋ = 𝐂2 . The Hamiltonian of such a system is represented by a 2 × 2 Hermitian matrix 𝐻, and Axiom 2 becomes 𝑑Ψ 𝐻(Ψ(𝑡)) = 𝑖ℏ , (14.4.6) 𝑑𝑡 where both sides are interpreted in terms of functions with values in 𝐂2 . In quantum computation, we think of a qubit as the quantum analogue of a 0/1 bit from classical computation. Thanks to Axiom 4, the complexity (dimension) of an 𝑛-qubit quantum system increases exponentially as a function of 𝑛 (in fact, dim = 2𝑛 ), as opposed to the linear growth of a classical 𝑛-bit system, which is why certain apparently exponential problems, like factoring integers, can, in theory, be solved efficiently on a quantum computer. Probably the most notable “killer app” of quantum computation is Shor’s celebrated factoring algorithm, which, if implemented in a scalable fashion, would render most of the public-key encryption schemes used in almost every secure Internet transaction worthless. We mention Shor’s algorithm because the heart of that algorithm is another exponential speedup in the time required to compute a certain very specialized DFT ((log 𝑁)3 vs. 𝑁 log 𝑁 for the FFT). Compare Section 14.2, and see Nielsen and Chuang [NC11, Ch. 5] for details. For the reader interested in learning quantum mechanics from a physics point of view, see Griffiths [Gri04] and Shankar [Sha94]. For graduate-level mathematical introductions to quantum mechanics, see Hall [Hal13] and Teschl [Tes09], and for the operator theory used in quantum mechanics, see Reed and Simon [RS80].

Problems. 14.4.1. In this problem, if 𝐴 is a complex matrix, then 𝐴∗ denotes the conjugate transpose of 𝐴 (i.e., take the transpose of 𝐴 and the complex conjugate of its entries). You may also take it as given that if 𝐴 and 𝐵 are complex matrices and the product 𝐴𝐵 is defined, then (𝐴𝐵)∗ = 𝐵 ∗ 𝐴∗ ; and if furthermore, the entries of 𝐴 and 𝐵 are differentiable functions of 𝑡, then 𝑑 𝑑𝐴 𝑑𝐵 (𝐴𝐵) = 𝐵+𝐴 . (14.4.7) 𝑑𝑡 𝑑𝑡 𝑑𝑡 (Note that since matrices do not commute in general, we must be careful of the order of multiplication here.) (a) Let ⟨𝑥, 𝑦⟩ denote the standard dot product in 𝐂𝑛 (Example 7.1.5). Prove that for 𝑥, 𝑦 ∈ 𝐂𝑛 , ⟨𝐴𝑥, 𝑦⟩ = ⟨𝑥, 𝐴∗ 𝑦⟩. (b) Recall that for 𝑛 × 𝑛 matrices 𝑈 and 𝐻, to say that 𝑈 is unitary means that ⟨𝑈𝑥, 𝑈𝑦⟩ = ⟨𝑥, 𝑦⟩ for all 𝑥, 𝑦 ∈ 𝐂𝑛 , and to say that 𝐻 is Hermitian means that ⟨𝐻𝑥, 𝑦⟩ = ⟨𝑥, 𝐻𝑦⟩ for all 𝑥, 𝑦 ∈ 𝐂𝑛 . Prove that 𝑈 is unitary if and only if 𝑈𝑈 ∗ = 𝐼 and 𝐻 is Hermitian if and only if 𝐻 = 𝐻 ∗ . (c) Now suppose 𝑈(𝑡) is a family of unitary matrices whose entries are differen𝑑𝑈 ∗ tiable functions of 𝑡. Prove that 𝑖 𝑈 is Hermitian. 𝑑𝑡

314

Chapter 14. What’s next? (d) Suppose Ψ(𝑡) = 𝑈(𝑡)Ψ0 , where 𝑈(𝑡) is a family of unitary matrices whose entries are differentiable functions of 𝑡 and Ψ0 ∈ 𝐂𝑛 is some (constant) initial 𝑑Ψ state. Prove that 𝑖 = 𝐻(𝑡)Ψ(𝑡) for some family 𝐻(𝑡) of Hermitian matrices. 𝑑𝑡

14.5 What’s next: Spectra and number theory Recall that in Chapter 11, we solved the heat and wave equations in one physical dimension, and the key point was to find the eigenvalues of the 1-dimensional Laplacian 𝜕2 Δ = − 2 , subject to our desired boundary conditions, such as the Dirichlet condi𝜕𝑥 tions that 𝑢(𝑥, 𝑡) = 0 when 𝑥 is on the boundary of the physical region in question. In this section, we begin by asking, “What happens if we try to solve the heat and wave equations on a domain 𝑈 in 𝐑𝑛 ?” First, the PDEs for the heat and wave equations are not that different from their 1-dimensional versions. Specifically, if we define the 𝑛-dimensional Laplacian to be Δ=−

𝜕2 𝜕2 − ⋯ − , 𝜕𝑥1 2 𝜕𝑥𝑛 2

(14.5.1)

then the heat and wave equations become Δ(𝑢) = −

𝜕𝑢 , 𝜕𝑡

Δ(𝑢) = −

𝜕2 𝑢 , 𝜕𝑡2

(14.5.2)

respectively. It follows that if we know the eigenvalues of Δ (again, subject to our desired boundary conditions for our domain 𝑈), then the eigenbasis method of Section 11.2 works just as well as in the 1-dimensional versions, and we can proceed as before. However, as for computing those eigenvalues, because the geometry of a 2- or 3dimensional domain 𝑈 can be far more complicated than that of an interval, the geometry of 𝑈 and its boundary becomes a dominant factor. For clarity, let us focus on one specific problem: Dirichlet eigenvalue problem: For a given domain 𝑈 ⊆ 𝐑2 with boundary a simple closed curve, find all nonzero 𝑢 ∶ 𝑈 → 𝐂 such that Δ(𝑢) = 𝜆𝑢 on the interior of 𝑈 and 𝑢(𝑥, 𝑦) = 0 for all (𝑥, 𝑦) on the boundary of 𝑈. The reader who now pauses to draw a few random examples of closed curves can probably imagine how complicated the geometry of 𝑈 and its boundary can get. Indeed, aside from the cases of a few highly symmetric domains like rectangles and discs (see Holland [Hol07, Ch. 7]), computing exact values of eigenfunctions and eigenvalues is an intractable task. What we can do instead is to try to understand the qualitative behavior, especially the asymptotic behavior, of the eigenvalues of the Laplacian on a given domain 𝑈. To focus on one classic example, Kac [Kac66] famously asked: Can you hear the shape of a drum? That is, if the Laplacian (with Dirichlet boundary conditions) has exactly the same eigenvalues on two domains 𝑈1 and 𝑈2 , is it always the case that 𝑈1 is congruent to 𝑈2 ?

14.5. What’s next: Spectra and number theory

315

This is called “hearing” the shape of a drum because the sound of an idealized drum of shape 𝑈 is determined by the eigenvalues of the Laplacian on 𝑈 with Dirichlet boundary conditions. Now, while this may seem unlikely at first, Kac points out that one can “hear” the area and the perimeter of 𝑈 and one can even hear if 𝑈 is perfectly circular, making the question a little less clear. Indeed, it was not until about 25 years later that isospectral (i.e., same eigenvalues) but nonisomorphic drums were constructed by Gordon, Webb, and Wolpert [GWW92a,GWW92b]; moreover, Zelditch [Zel00,Zel09] later showed that if you know that a drum satisfies some mild symmetry conditions (e.g., having at least one mirror symmetry), then remarkably, you can hear its shape. Going in another direction, recall (Subsection 8.5.3) that for ℜ(𝑠) > 1, the Riemann zeta function is defined by ∞

1 . 𝑛𝑠 𝑛=1

𝜁(𝑠) = ∑

(14.5.3)

Remarkably, this definition can be extended uniquely to a holomorphic (i.e., complex differentiable) function on all 𝑠 ≠ 1 in 𝐂. This extended function is known to have zeros 1 when 𝑠 is a negative even integer (the so-called trivial zeros) and when ℜ(𝑠) = (zeros 2 on the critical line). The famous Riemann hypothesis conjectures that the only nontrivial zeros of 𝜁(𝑠) lie on the critical line. If proven, the Riemann hypothesis would have immense consequences in number theory, especially with respect to the distribution of prime numbers; see, for example, Iwaniec and Kowalski [IK04]. Back in the 1910s, Hilbert and Pólya suggested that the Riemann hypothesis might hold because the zeros of the zeta function were somehow connected with the eigenvalues of an unknown selfadjoint operator. Starting in the mid-1970s, this connection was made concrete by showing that the distribution of the imaginary part of the zeros of the zeta function on the critical line has the same statistical properties as the eigenvalues of a random unitary matrix (the Montgomery-Odlyzko law); see Rudnick and Sarnak [RS96], and see Katz and Sarnak [KS99] for a survey. Notably, in 2017, Bender, Brody, and Müller [BBM17] proposed a specific candidate, backed up by heuristic arguments, for an operator proving the Riemann hypothesis. To quote their abstract: A Hamiltonian operator 𝐻̂ is constructed with the property that if the eigenfunctions obey a suitable boundary condition, then the associated eigenvalues correspond to the nontrivial zeros of the Riemann zeta function. . . . A heuristic analysis is presented for the construction of the metric operator to define an inner-product space, on which the Hamiltonian is Hermitian. If the analysis presented here can be made rigorous to show that 𝐻̂ is manifestly selfadjoint, then this implies that the Riemann hypothesis holds true. As of this writing, their heuristic has yet to be turned into a rigorous proof, but perhaps the reader will be inspired to look for one! For an introduction to “drum”-related spectral problems, see Büser [Büs10]; for an introduction to number-theoretic spectral problems, see Terras [Ter13].

316

Chapter 14. What’s next?

14.6 What’s next: Harmonic analysis on groups In this final section, we put the theoretical structure of this book in an abstract framework that connects with some truly deep current mathematics. We begin a brief summary of some relevant material from a first course in abstract algebra, which the reader should regard as a list of necessary background material and not as an attempt to explain that material. Definition 14.6.1. A group is a set 𝐺 along with a binary operation ∗ that satisfies three axioms. •

Associativity: For any 𝑎, 𝑏, 𝑐 ∈ 𝐺, (𝑎 ∗ 𝑏) ∗ 𝑐 = 𝑎 ∗ (𝑏 ∗ 𝑐).

Identity: There exists some 𝑒 ∈ 𝐺 such that 𝑒 ∗ 𝑎 = 𝑎 = 𝑎 ∗ 𝑒 for all 𝑎 ∈ 𝐺.

Inverse: For any 𝑎 ∈ 𝐺, there exists some 𝑏 ∈ 𝐺 such that 𝑎 ∗ 𝑏 = 𝑒 = 𝑏 ∗ 𝑎.

If a group 𝐺 also satisfies 𝑎 ∗ 𝑏 = 𝑏 ∗ 𝑎 for all 𝑎, 𝑏 ∈ 𝐺, we say that 𝐺 is an abelian group; otherwise, we say that 𝐺 is nonabelian. Example 14.6.2. The set 𝑆 1 = {𝑒2𝜋𝑖𝑥 ∣ 𝑥 ∈ [0, 1]} with the operation of multiplication and the set 𝐑 with the operation of + are both abelian groups. Let 𝑀𝑛 (𝐑) be the set of all 𝑛 × 𝑛 matrices with real entries, let 𝐼𝑛 be the 𝑛 × 𝑛 identity matrix, and let 𝐴𝑇 denote the transpose of 𝐴. The sets 𝑆𝑂3 (𝐑) = {𝐴 ∈ 𝑀3 (𝐑) ∣ 𝐴𝐴𝑇 = 𝐼} ,

(14.6.1)

𝑆𝐿2 (𝐑) = {𝐴 ∈ 𝑀2 (𝐑) ∣ det 𝐴 = 1} ,

(14.6.2)

with the operation of matrix multiplication, are nonabelian groups. The following example also uses ideas from axiom-based linear algebra (Appendix B). Example 14.6.3. If 𝑉 is a (complex) vector space of dimension > 1 (possibly dim 𝑉 = ∞), then the set 𝐺𝐿(𝑉) of all invertible linear operators on 𝑉, with the operation of composition, is a group. If ℋ is a Hilbert space, then the set 𝑈(ℋ) of all unitary linear operators 𝑇 on ℋ (i.e., operators such that ⟨𝑇(𝑓), 𝑇(𝑔)⟩ = ⟨𝑓, 𝑔⟩ for all 𝑓, 𝑔 ∈ ℋ) is a group. The first part of the next definition should be familiar to the reader who has taken abstract algebra, but the rest of the definition may be less so. Definition 14.6.4. A homomorphism from a group 𝐺 to a group 𝐻 is a map 𝜙 ∶ 𝐺 → 𝐻 such that 𝜙(𝑔1 𝑔2 ) = 𝜙(𝑔1 )𝜙(𝑔2 ) for all 𝑔1 , 𝑔2 ∈ 𝐺. A (linear) representation of 𝐺 is a homomorphism 𝜙 ∶ 𝐺 → 𝐺𝐿(𝑉) for some vector space 𝑉, and a unitary representation of 𝐺 is a homomorphism 𝜙 ∶ 𝐺 → 𝑈(ℋ) for some Hilbert space ℋ. The dimension of a representation 𝜙 ∶ 𝐺 → 𝐺𝐿(𝑉) is the dimension of its underlying vector space 𝑉. We also have the following definition, which even a reader who has experience with both abstract algebra and topology may not have seen before. (Note that this is actually only a special case of the usual definition, chosen to avoid discussing point-set topology.)

14.6. What’s next: Harmonic analysis on groups

317

Definition 14.6.5. If 𝐺 is a group with operation ∗ that is also a subset of a metric space 𝑋, to say that 𝐺 is a topological group means that the operations of ∗ and inversion define continuous maps ∗ ∶ 𝐺 × 𝐺 → 𝐺 and −1 ∶ 𝐺 → 𝐺. In that case, to say that 𝐺 is compact means that 𝐺 is a compact subset of 𝑋 (Definition 2.6.6), and to say that 𝐺 is locally compact means that for every 𝑔 ∈ 𝐺, there is an open neighborhood 𝑈 of 𝑔 (Definition 2.6.1) whose closure (intersection of every closed set containing 𝑈) is compact. Example 14.6.6. It can be shown that the groups 𝑆 1 and 𝑆𝑂3 (𝐑) are compact and the groups 𝐑 and 𝑆𝐿2 (𝐑) are locally compact but not compact. The above background material provides the language we now use to describe abstract harmonic analysis, the generalization of Fourier analysis to many topological groups 𝐺. For example, when 𝐺 is abelian, we have the following key idea. Definition 14.6.7. For an abelian group 𝐺, let 𝐺̂ be the set of all 1-dimensional unitary representations of 𝐺 (which can be shown to be precisely the homomorphisms 𝜙 ∶ 𝐺 → 𝑆 1 ). We call 𝐺̂ the dual group of 𝐺, as it can be shown that 𝐺̂ has the structure of an abelian group under the operation of multiplication of functions. When 𝐺 is a compact abelian group, 𝐺̂ is countable, and functions on 𝐺 are the 𝐿2 limit of a generalized Fourier series; in other words, we have much the same situation as Fourier series on 𝑆 1 . In fact, if 𝐺 = 𝑆 1 , then 𝐺̂ = 𝐙, and we get ordinary Fourier series on 𝑆 1 . If 𝐺 is a locally compact, but not compact, abelian group, 𝐺̂ will be more like 𝐑, and instead of Fourier series, we have something more like the Fourier transform on 𝐑. In fact, if 𝐺 = 𝐑, then 𝐺̂ = 𝐑, and we get the ordinary Fourier transform on 𝐑. See Loomis [Loo11] for an account. When 𝐺 is a compact nonabelian group, we again get series-type behavior, but instead of only 1-dimensional representations, we need to use representations of finite (but possibly arbitrarily high) dimension. To give a concrete example, take 𝐺 = 𝑆𝑂3 (𝐑). There (and indeed, in any locally compact group) one can define an analogue of the Lebesgue integral via what is known as Haar measure, and one can define spaces like 𝐿2 (𝑆𝑂3 (𝐑)). Harmonic analysis on 𝑆𝑂3 (𝐑) can then be expressed as follows: For every 𝑛 ≥ 0, there exists a (2𝑛 + 1)-dimensional unitary representation 𝜙𝑛 of 𝐺 such that for any 𝑓 ∈ 𝐿2 (𝑆𝑂3 (𝐑)), we have ∞

̂ 𝑓(𝑔) = ∑ (2𝑛 + 1) tr(𝑓(𝑛)𝜙 𝑛 (𝑔)),

(14.6.3)

𝑛=0

̂ is a (2𝑛 + 1) × (2𝑛 + 1) matrix-valued generalized Fourier coefficient, and tr where 𝑓(𝑛) is the matrix trace (sum of the diagonal entries). See Dym and McKean [DM85, Ch. 4] for the specifics of 𝑆𝑂3 (𝐑), and see Loomis [Loo11, Ch. VIII] for the general compact nonabelian group. When 𝐺 is a locally compact, but not compact, nonabelian group, the situation is much more complicated and is beyond the scope of this book. For an account of what happens with 𝐺 = 𝑆𝐿2 (𝐑) and other matrix groups over 𝐑 or 𝐂, see Knapp [Kna01]. For a matrix group 𝐺(𝑘) defined by a set of polynomial equations with a field 𝑘 as a parameter, or in other words, an affine algebraic group, the problem of finding a suitable

318

Chapter 14. What’s next?

generalization of the Fourier transform is one of the central problems of what is known as the Langlands program (see Grojnowski [Gro08]). The reader should be cautioned, however, that describing the Langlands program as the search for a general notion of nonabelian Fourier transform is a bit like describing a nuclear reactor as a mechanism for boiling water: technically true, but missing out on the full flavor of what’s going on. Indeed, the Langlands program is something like a grand unified theory of classical mathematics, combining algebra, analysis, and number theory as a massive whole; in fact, with the possible exception of wavelets, every topic discussed in this chapter (including quantum mechanics; see Frenkel [Fre07]) is related to the Langlands program. Introductions to the Langlands program can be found in Gelbart [Gel84] and Knapp [Kna97]. However, perhaps the best exhibition to date of the potential power of the Langlands program is the proof of Fermat’s Last Theorem, which follows from the proof of the longstanding TaniyamaShimura-Weil conjecture, which is in turn just one piece of the Langlands program; see Cornell, Silverman, and Stevens [CSS97] and Darmon [Dar99].

Appendix

A

Rearrangements of series In this appendix, we examine the question: When does the order of summation affect the convergence or divergence of a series? For convenience, we assume that (after renumbering) the domain of every sequence is 𝐍. Definition A.1. A rearrangement of a sequence (𝑎𝑛 ) is a sequence (𝑏𝑛 ) such that 𝑏𝑛 = 𝑎𝜍(𝑛)

(A.1)

for some bijection 𝜎 ∶ 𝐍 → 𝐍. Similarly, if (𝑏𝑛 ) is a rearrangment of (𝑎𝑛 ), we also say that ∑ 𝑏𝑛 is a rearrangement of ∑ 𝑎𝑛 . Our main result is that for nonnegative series (Theorem A.2), or more generally, absolutely convergent series (Corollary A.3), convergence is independent of the order of summation. Theorem A.2. Let (𝑎𝑛 ) be a sequence such that 𝑎𝑛 ≥ 0 for 𝑛 ∈ 𝐍, and let (𝑏𝑛 ) be a rearrangement of (𝑎𝑛 ). If ∑ 𝑎𝑛 converges, then ∑ 𝑏𝑛 converges. Proof. Suppose 𝑏𝑛 = 𝑎𝜍(𝑛) for some bijection 𝜎 ∶ 𝐍 → 𝐍. By the Cauchy criterion, we know that for any 𝜖 > 0, there exists some 𝑁𝑎 (𝜖) such that if 𝑚 > 𝑘 > 𝑁𝑎 (𝜖), then |𝑚 | | ∑ 𝑎 | < 𝜖. 𝑛| | |𝑛=𝑘 |

(A.2)

𝑆(𝜖) = {𝑛 ∈ 𝐍 ∣ 𝜎(𝑛) ≤ 𝑁𝑎 (𝜖)} .

(A.3)

So now, for 𝜖 > 0, let Since 𝜎 is a bijection, 𝑆(𝜖) is a finite set, so we may define 𝑁(𝜖) = max 𝑆(𝜖). Now suppose 𝑚 > 𝑘 > 𝑁(𝜖). Let 𝑇 = {𝜎(𝑛) ∣ 𝑘 ≤ 𝑛 ≤ 𝑚} .

(A.4)

Since 𝜎 is a bijection, it maps the indices 𝑘, 𝑘 + 1, … , 𝑚 injectively into 𝑇, which is contained (possibly properly) in the set {𝑛′ ∣ min 𝑇 ≤ 𝑛′ ≤ max 𝑇}. Therefore, since 319

320

Appendix A. Rearrangements of series

the 𝑎𝑛 are all nonnegative, we see that 𝑚

𝑚

max 𝑇

∑ 𝑏𝑛 = ∑ 𝑎𝜍(𝑛) ≤ 𝑛=𝑘

𝑎𝑛′ .

(A.5)

𝑛′ =min 𝑇

𝑛=𝑘

However, since 𝑛 > max 𝑆(𝜖) for 𝑘 ≤ 𝑛 ≤ 𝑚, by definition of 𝑆(𝜖), we see that 𝑁𝑎 (𝜖) < min 𝑇 ≤ max 𝑇.

(A.6)

Therefore, by (A.2), 𝑚

∑ 𝑏𝑛 ≤ 𝑛=𝑘

max 𝑇

𝑎𝑛′ < 𝜖.

(A.7)

𝑛′ =min 𝑇

The theorem follows by the Cauchy criterion. Corollary A.3. Any rearrangement ∑ 𝑏𝑛 of an absolutely convergent series ∑ 𝑎𝑛 also converges absolutely. Proof. If ∑ 𝑎𝑛 converges absolutely, then ∑ |𝑏𝑛 | converges because it is a rearrangement of the convergent nonnegative series ∑ |𝑎𝑛 |. Therefore, ∑ 𝑏𝑛 converges absolutely. If ∑ 𝑎𝑛 converges conditionally, then rearrangements are completely unpredictable. To be precise, we have the following remarkable result, due to Riemann. Theorem A.4 (Riemann Rearrangement Theorem). If ∑ 𝑎𝑛 is a real-valued series that converges conditionally, then for any 𝐿 ∈ 𝐑 ∪ {+∞, −∞}, there is a rearrangement of ∑ 𝑎𝑛 that converges to 𝐿. Sketch of proof. For simplicity, assume 𝑎𝑛 is never 0. Let ∑ 𝑏𝑛 contain the positive terms of ∑ 𝑎𝑛 , and let ∑ 𝑐𝑛 contain the negative terms. If both ∑ 𝑏𝑛 and ∑ 𝑐𝑛 converge, we would have ∑ |𝑎𝑛 | = ∑ 𝑏𝑛 + ||∑ 𝑐𝑛 || , (A.8) and ∑ 𝑎𝑛 would converge absolutely. Furthermore, if ∑ 𝑏𝑛 = +∞ and ∑ 𝑐𝑛 is finite, then ∑ 𝑎𝑛 would diverge, and similarly for the case where ∑ 𝑏𝑛 is finite and ∑ 𝑐𝑛 = −∞. Therefore, it must be that ∑ 𝑏𝑛 = +∞ and ∑ 𝑐𝑛 = −∞. So now, since ∑ 𝑎𝑛 converges, Corollary 4.1.10 implies that lim 𝑎𝑛 = 0, which 𝑛→∞

in turn implies that any subset of {𝑎𝑛 } must have a largest element. We may therefore rearrange the 𝑏𝑛 and 𝑐𝑛 so they are both in decreasing order of size and assume, by symmetry, that 𝐿 ≥ 0. If 𝐿 < +∞, we arrange ∑ 𝑎𝑛 as follows: (1) Begin with the minimum number of positive terms 𝑏𝑛 required to achieve a sum greater than 𝐿. (This is always possible, even after having already used any finite number of terms, because ∑ 𝑏𝑛 = +∞.) (2) Then add the minimum number of negative terms 𝑐𝑛 required to bring the partial sum back down less than 𝐿. (3) Keep alternating: Add positive terms until we “overshoot” 𝐿, add negative terms until we “undershoot” 𝐿, and so on.

Appendix A. Rearrangements of series

321

Without going into the epsilonic details, because the overshoot is always at most 𝑏𝑛 and the undershoot is always at most 𝑐𝑛 , both of which go to 0, this rearrangement has a sum that converges to 𝐿. Similarly, if 𝐿 = +∞, we arrange ∑ 𝑎𝑛 as follows: (1) Begin with the minimum number of positive terms 𝑏𝑛 required to achieve a sum greater than |𝑐1 | + 1. (2) Then add the negative term 𝑐1 , giving a total greater than 1. (3) Keep alternating: Add new positive terms to achieve a sum greater than |𝑐2 | + 2, and then add 𝑐2 ; add new positive terms to get a sum greater than |𝑐3 | + 3, and then add 𝑐3 ; and so on. Again, omitting the details, this rearrangement sums to +∞. Note that the following example shows that the issue of order of summation arises very naturally with Fourier series. Example A.5. Consider 𝑓 ∶ 𝑆 1 → 𝐂 given by 1 1 ≤𝑥< . 2 2 By Example 6.2.6 and Theorem 8.5.17, we see that 𝑓(𝑥) = 𝑥

for −

(−1)𝑛 𝑒 (𝑥) 2𝜋𝑖𝑛 𝑛 𝑛≠0

𝑓(𝑥) = − ∑ 1

(A.9)

(A.10)

for all 𝑥 ∈ 𝑆 1 except 𝑥 = ± , as long as we sum the series synchronously (Defini2 tion 4.1.8). However, by Theorem A.4, we see that in general, we can rearrange (A.10) so that it sums to an “incorrect” value. To take a particularly extreme example, for 𝑥 = 0, the right-hand side of (A.10), summed synchronously, becomes 1 1 1 1 1 1 1 (A.11) (1 − 1 − + + − − + + ⋯) , 2𝜋𝑖 2 2 3 3 4 4 which indeed sums to 0 in that order but can be rearranged to sum to any purely imaginary number we like.

Appendix

B

Linear algebra In this appendix, we briefly describe the basic theory of an abstract vector space over a field 𝐹, where 𝐹 = 𝐂 or 𝐑. This theory is not substantively necessary to the rest of this book, but the reader who has experience with linear algebra may appreciate the connections with that material. Indeed, we only ever use the case 𝐹 = 𝐂, even implicitly, but we include the case 𝐹 = 𝐑 because it takes no extra effort and may help the reader connect with prior experience. We begin with the definition of an abstract vector space over the field of scalars 𝐹, where we will always assume that 𝐹 = 𝐂 or 𝐑. Definition B.1 (Vector space). We define a vector space over 𝐹 to be •

a set 𝑉, whose elements are called vectors,

a binary operation + ∶ 𝑉 × 𝑉 → 𝑉 (vector addition), written as 𝐯 + 𝐰 for 𝐯, 𝐰 ∈ 𝑉,

an operation 𝐹 × 𝑉 → 𝑉 (scalar multiplication), written as 𝑎𝐯 for 𝑎 ∈ 𝐹, 𝐯 ∈ 𝑉,

an element 𝟎 ∈ 𝑉 (zero element), and

for each 𝐯 ∈ 𝑉, a vector −𝐯 ∈ 𝑉 (negative of a vector),

such that the following properties hold for all 𝐯, 𝐰, 𝐱 ∈ 𝑉 and 𝑟, 𝑠 ∈ 𝐹: (A1) (𝐯 + 𝐰) + 𝐱 = 𝐯 + (𝐰 + 𝐱). (A2) 𝐯 + 𝐰 = 𝐰 + 𝐯. (A3) 𝐯 + 𝟎 = 𝐯. (A4) 𝐯 + (−𝐯) = 𝟎. (DL) 𝑟(𝐯 + 𝐰) = 𝑟𝐯 + 𝑟𝐰 and (𝑟 + 𝑠)𝐯 = 𝑟𝐯 + 𝑠𝐯. (SMA) 𝑟(𝑠𝐯) = (𝑟𝑠)𝐯. (SM1) 1𝐯 = 𝐯. 323

324

Appendix B. Linear algebra

Example B.2 (Function spaces). For a nonempty set 𝑋, we define 𝐅(𝑋, 𝐹) to be the set 𝐅(𝑋, 𝐹) = {𝑓 ∶ 𝑋 → 𝐹} ,

(B.1)

i.e., the set of all 𝐹-valued functions with domain 𝑋, and given 𝑓, 𝑔 ∈ 𝐅(𝑋, 𝐹) and 𝑎 ∈ 𝐹, we define 𝑓 + 𝑔, 𝑎𝑓, 𝟎, −𝑓 ∈ 𝐅(𝑋, 𝐹) by (𝑓 + 𝑔)(𝑥) = 𝑓(𝑥) + 𝑔(𝑥), (𝑎𝑓)(𝑥) = 𝑎𝑓(𝑥), 𝟎(𝑥) = 0,

(B.2)

(−𝑓)(𝑥) = −𝑓(𝑥). One may then verify that all of the axioms of a vector space hold in 𝐅(𝑋, 𝐹) (Problem B.1). When one defines an algebraic object called a foo, the next step is usually to define what a subfoo is, and vector spaces are no exception. Definition B.3 (Subspace). Let 𝑉 be a vector space over 𝐹. To say that 𝑊 ⊆ 𝑉 is a subspace of 𝑉 means that 𝑊 is a vector space over 𝐹 with addition and scalar multiplication defined by restricting the operations of 𝑉. As the reader may recall from linear algebra, while Definition B.3 may be the “morally correct” definition of subspace, in practice, the following theorem serves as an equivalent definition of subspace. Theorem B.4 (Subspace Theorem). For a vector space 𝑉 over 𝐹 and 𝑊 ⊆ 𝑉, the following are equivalent: (1) 𝑊 is a subspace of 𝑉. (2) The following conditions all hold: • • •

(Zero vector) 𝟎 ∈ 𝑊. (Closed under addition) For all 𝐯, 𝐰 ∈ 𝑉, 𝐯 + 𝐰 ∈ 𝑉. (Closed under scalar multiplication) For all 𝑎 ∈ 𝐹, 𝐯 ∈ 𝑉, 𝑎𝐯 ∈ 𝑉.

Sketch of proof. Since the zero vector condition shows that 𝑊 is nonempty, the most interesting thing to check is that vector addition and scalar multiplication in 𝑉 are well-defined in 𝑊; however, this is precisely the other two conditions. The axioms of a vector space all follow because they hold for the larger set 𝑉. The rest of linear algebra also works much as one might expect from previous experience, with one exception: We need to define a linear combination of an infinite set properly, as follows. Definition B.5 (Linear combination). Let 𝑉 be a vector space over 𝐹, and let 𝑆 be a subset of 𝑉, where we do not assume that 𝑆 is finite. We define a linear combination ′ of 𝑆 to be a restricted sum of the form ∑ 𝑎𝐯 𝐯 (𝐯 ∈ 𝐹), where by restricted sum (the ′

𝑣∈𝑆

symbol ∑ ) we mean that 𝑎𝐯 = 0 except for finitely many 𝑣. In other words, a linear combination of 𝑆 is the sum of finitely many scalar multiples of vectors in 𝑆, thus ensuring that the sum is well-defined. Note that by definition, the only linear combination of the empty set is the zero vector 𝟎.

Appendix B. Linear algebra

325

Definition B.6 (Linear independence). For a vector space 𝑉 over 𝐹 and 𝑆 ⊆ 𝑉, to say ′ that 𝑆 is linearly independent means that if we have a linear combination ∑ 𝑎𝐯 𝐯 that 𝑣∈𝑆

is equal to 𝟎, then every coefficient 𝑎𝐯 = 0. Definition B.7 (Algebraic span). For a vector space 𝑉 over 𝐹 and 𝑆 ⊆ 𝑉, we define ′ the algebraic span of 𝑆 to be the set of all linear combinations ∑ 𝑎𝐯 𝐯. (Note that 𝑣∈𝑆

again, each such linear combination really only involves finitely many 𝐯.) To say that 𝑆 algebraically spans 𝑉 means that the algebraic span of 𝑆 contains 𝑉 (and is therefore equal to 𝑉). As the reader may recall: Theorem B.8. For a vector space 𝑉 over 𝐹 and 𝑆 ⊆ 𝑉, the span of 𝑆 is a subspace of 𝑉. Proof. Problem B.2. Definition B.9 (Algebraic basis). For a vector space 𝑉 over 𝐹 and 𝑆 ⊆ 𝑉, to say that 𝑆 is an algebraic basis for 𝑉 means that 𝑆 algebraically spans 𝑉 and is linearly independent. Remark B.10. In contrast, when we define orthogonal bases (Definition 7.3.14), we implictly extend the definition of span to include “linear combinations” that are convergent infinite series. In fact, we can show that an infinite orthogonal basis for a Hilbert space ℋ is never an algebraic basis for ℋ (Problem B.3). As for linear independence, while it is not included explicitly in the definition of orthogonal basis, it follows from the properties of orthogonal sets of nonzero vectors; see Problem 7.3.2. Finally, recall that, in some sense, the point of linear algebra is to study the following type of functions. Definition B.11 (Linear functions). Let 𝑉 and 𝑊 be vector spaces. To say that a function 𝑇 ∶ 𝑉 → 𝑊 is linear means that for all 𝐯, 𝐰 ∈ 𝑉 and 𝑐 ∈ 𝐹, 𝑇(𝐯 + 𝐰) = 𝑇(𝐯) + 𝑇(𝐰),

𝑇(𝑐𝐯) = 𝑐𝑇(𝐯).

(B.3)

Problems. B.1. Prove that all of the axioms of a vector space hold in 𝐅(𝑋, 𝐹). (To avoid excess repetition, we suggest only doing a representative selection, such as (A1), (A4), (DL), and (SMA).) B.2. (Proves Theorem B.8) For a vector space 𝑉 over 𝐹 and 𝑆 ⊆ 𝑉, prove that the algebraic span of 𝑆 is a subspace of 𝑉. B.3. Let ℋ = ℓ2 (𝐍) (Definition 5.3.2), and recall that ℋ is a Hilbert space (Theorem 7.6.2) with orthonormal basis ℬ = {𝑒𝑛 ∣ 𝑛 ∈ 𝐍} (Example 7.3.19). (a) Prove that ℬ does not algebraically span ℋ. (b) Prove that any infinite orthonormal basis for a Hilbert space ℋ does not algebraically span ℋ.

Appendix

C

Bump functions In this appendix, we prove Theorem 8.5.5, which is repeated below as Theorem C.4 for the convenience of the reader. Most of the work in proving Theorem C.4 is in the very first step. Theorem C.1. The function 𝜙1 ∶ 𝐑 → 𝐑 defined by 𝜙1 (𝑥) = {

𝑒−1/𝑥

for 𝑥 > 0,

0

for 𝑥 ≤ 0

(C.1)

is in 𝐶 ∞ (𝐑). Proof. Problem C.1.

Figure C.1. The “seed” 𝜙1 (𝑥) The function 𝜙1 (𝑥), as shown in Figure C.1, can be thought of as the “seed” of all bump functions, as we can grow this seed into the bump functions we need using relatively straightforward calculus. For example, the following lemmas construct a “bump with compact support” 𝜙2 and a “smooth step function” 𝜙3 , as shown in Figure C.2. 327

328

Appendix C. Bump functions

Figure C.2. The 𝐶 ∞ functions 𝜙2 and 𝜙3 Lemma C.2. If 𝜙1 ∶ 𝐑 → 𝐑 is defined by (C.1), then the function 𝜙2 ∶ 𝐑 → 𝐑 defined by 𝜙2 (𝑥) = 𝜙1 (𝑥)𝜙1 (1 − 𝑥) (C.2) is a 𝐶 ∞ function such that 𝜙2 (𝑥) > 0 for 0 < 𝑥 < 1 and 𝜙2 (𝑥) = 0 otherwise. Proof. Problem C.2. 1

Lemma C.3. Let 𝜙2 ∶ 𝐑 → 𝐑 be defined by (C.2), and let 𝐴 = ∫ 𝜙2 (𝑥) 𝑑𝑥. Then the 0

function 𝜙3 ∶ 𝐑 → 𝐑 defined by 𝑥

𝜙3 (𝑥) =

1 ∫ 𝜙 (𝑡) 𝑑𝑡 𝐴 0 2

(C.3)

is an increasing 𝐶 ∞ function such that 𝜙3 (𝑥) = 0 for 𝑥 ≤ 0 and 𝜙3 (𝑥) = 1 for 𝑥 ≥ 1. Proof. Problem C.3. Proving Theorem C.4 (illustrated in Figure C.3 for 𝑎 = 1, 𝑏 = 3, 𝛿 = 1) is now a matter of precalculus.

Figure C.3. The bump function 𝜙(𝑥) for 𝑎 = 1, 𝑏 = 3, 𝛿 = 1

Appendix C. Bump functions

329

Theorem C.4. For 𝑎 < 𝑏 and 𝛿 > 0, there exists some 𝜙 ∶ 𝐑 → 𝐑 such that: (1) 𝜙 ∈ 𝐶 ∞ (𝐑). (2) For 𝑎 ≤ 𝑥 ≤ 𝑏, 𝜙(𝑥) = 1. (3) For 𝑎 − 𝛿 ≤ 𝑥 ≤ 𝑎 and 𝑏 ≤ 𝑥 ≤ 𝑏 + 𝛿, we have 0 ≤ 𝜙(𝑥) ≤ 1. (4) For 𝑥 ≤ 𝑎 − 𝛿 and 𝑏 + 𝛿 ≤ 𝑥, 𝜙(𝑥) = 0. Proof. Problem C.4.

Problems. C.1. (Proves Theorem C.1) Let 𝜙1 ∶ 𝐑 → 𝐑 be defined by (C.1). (a) Prove that if 𝑝(𝑥) is a polynomial, then lim 𝑝(1/𝑥)𝑒−1/𝑥 = 0.

𝑥→0+

(C.4)

(b) Prove by induction on 𝑘 ≥ 0 that 𝑝 (1/𝑥)𝑒−1/𝑥 (𝑘) 𝜙1 (𝑥) = { 𝑘 0

for 𝑥 > 0, for 𝑥 ≤ 0,

(C.5)

where 𝑝𝑘 (𝑢) is a polynomial defined recursively by 𝑝0 (𝑢) = 1,

𝑝𝑘+1 (𝑢) = 𝑢2 (𝑝𝑘 (𝑢) − 𝑝𝑘′ (𝑢)).

(C.6)

C.2. (Proves Lemma C.2) Let 𝜙1 be defined by (C.5) and 𝜙2 ∶ 𝐑 → 𝐑 be defined by 𝜙2 (𝑥) = 𝜙1 (𝑥)𝜙1 (1 − 𝑥). Prove that 𝜙2 is a 𝐶 otherwise.

(C.7)

function such that 𝜙2 (𝑥) > 0 for 0 < 𝑥 < 1 and 𝜙2 (𝑥) = 0

C.3. (Proves Lemma C.3) Let 𝜙1 and 𝜙2 be defined by (C.5) and (C.7), respectively, let 1

𝐴 = ∫ 𝜙2 (𝑥) 𝑑𝑥, and let 𝜙3 ∶ 𝐑 → 𝐑 be defined by 0

𝑥

𝜙3 (𝑥) =

1 ∫ 𝜙 (𝑡) 𝑑𝑡. 𝐴 0 2

(C.8)

Prove that 𝜙3 is an increasing 𝐶 ∞ function such that 𝜙3 (𝑥) = 0 for 𝑥 ≤ 0 and 𝜙3 (𝑥) = 1 for 𝑥 ≥ 1. C.4. (Proves Theorem C.4) Prove that for 𝑎 < 𝑏 and 𝛿 > 0, there exists some 𝜙 ∶ 𝐑 → 𝐑 such that: (1) (2) (3) (4)

𝜙 ∈ 𝐶 ∞ (𝐑). For 𝑎 ≤ 𝑥 ≤ 𝑏, 𝜙(𝑥) = 1. For 𝑎 − 𝛿 ≤ 𝑥 ≤ 𝑎 and 𝑏 ≤ 𝑥 ≤ 𝑏 + 𝛿, we have 0 ≤ 𝜙(𝑥) ≤ 1. For 𝑥 ≤ 𝑎 − 𝛿 and 𝑏 + 𝛿 ≤ 𝑥, 𝜙(𝑥) = 0.

Appendix

D

Suggestions for problems Suggestions for selected problems are listed here. Problem 2.1.3(c): Consider the cases 𝑎 > 0, 𝑎 = 0, 𝑎 < 0. Problem 2.1.5: Prove that the negations of each condition are equivalent. Problem 2.2.2(b): Find a formula for 𝑧−1 in terms of 𝑧. Problem 2.2.3: Prove that the properties in Theorem 2.1.3 cannot all possibly hold in 𝐂, no matter how we define ≤. Problem 2.3.1: Show that |𝑧|2 |𝑤|2 − ℜ(𝑧𝑤)2 is the square of a real number. Problem 2.3.2: Apply Cauchy-Schwarz to the expression (|𝑧| + |𝑤|)2 − |𝑧 + 𝑤|2 . Problem 2.4.1: Choose an 𝜖 in the definition of limit that will ensure that 𝑎𝑛 is close to 𝐿 and, therefore, far enough away from 0. 𝜖 Problem 2.4.3: Use . 2 Problem 2.4.4: It follows from Definition 2.4.2 that 𝑛𝑘 ≥ 𝑘. Problem 2.4.5: Contradiction. Problem 2.4.8(a): 𝜖 = 1/𝑛. Problem 2.4.8(b): Try contradiction. Problem 2.4.9: Use the Arbitrarily Close Criterion to find some 𝑎𝑁 close to 𝑢. Problem 2.4.11: 𝜖 = 1/𝑛. Problem 2.4.12: Suggestion: Theorem 2.1.4. 331

332

Appendix D. Suggestions for problems

Problem 2.5.1: Imitate the proof of Theorem 2.4.5, but replace the fact that all but finitely many terms in 𝑎𝑛 are close to 𝐿 with the fact that all but finitely many terms are close to some 𝑎𝑘 . Problem 2.5.3(b): Use Theorem 2.4.7. Note that by definition, a subsequence 𝑧𝑛𝑘 has real and imaginary parts 𝑎𝑛𝑘 and 𝑏𝑛𝑘 with the same 𝑛𝑘 . Problem 3.1.2: Sequential continuity. Problem 3.1.3(b): Induction and Theorem 3.1.22. Problem 3.1.4(a): Use Lemma 2.3.7. Problem 3.1.5(a): Negate the definition of uniform continuity. Problem 3.1.5(b): Use (a), Bolzano-Weierstrass, and sequential continuity to obtain a contradiction. Problem 3.1.6: By the first part of the proof of Theorem 3.1.15, 𝑓 is bounded above; now let 𝑆 = {𝑓(𝑧) ∣ 𝑧 ∈ 𝑋}, and use the Arbitrarily Close Criterion (Theorem 2.4.13), Bolzano-Weierstrass, and sequential continuity to prove that there exists some 𝑑 ∈ 𝑋 such that 𝑓(𝑑) = sup 𝑆. Problem 3.1.7: Use continuity of the absolute value and the Extreme Value Theorem. Problem 3.1.8(a): Arbitrarily Close Criterion and Theorem 2.4.10. Problem 3.1.8(b): Take 𝜖 = (𝑑 − 𝑓(𝑐))/2. Problem 3.1.9(b): Bad sequence. Problem 3.1.10: Select just one or two of the limit laws to prove, as the others are similar. Problem 3.1.12: Each of 𝑓 and ℎ has an associated 𝛿(𝜖). Problem 3.2.2(a): Compute lim 𝐸𝑓 (𝑧). 𝑧→𝑎

Problem 3.2.3: Use local linearity to approximate 𝑓(𝑧)𝑔(𝑧) near 𝑎. Problem 3.2.4: Take 𝑤 = 𝑓(𝑧) and 𝑏 = 𝑓(𝑎); use local linearity twice. Problem 3.2.7(a): Mean Value Theorem, Theorem 3.2.10. Problem 3.2.8(a): Mean Value Theorem. Problem 3.2.8(b): Intermediate Value Theorem. Problem 3.2.8(c): Mean Value Theorem. Problem 3.2.8(d): Use the limit definition of 𝑔′ and part (c). Problem 3.3.2(a): Use common refinements. Problem 3.3.2(b): The middle inequality is the interesting part; by part (a), every upper sum is an upper bound for ℒ and every lower sum is a lower bound for 𝒰.

Appendix D. Suggestions for problems

333

Problem 3.3.3(a): Use a right triangle. Problem 3.3.3(b): Use the Arbitrarily Close Criterion (Theorem 2.4.13). Problem 3.3.5: Prove that all upper Riemann sums 𝑈(𝑓; 𝑃) are equal and that all lower Riemann sums 𝐿(𝑓; 𝑃) are equal. Problem 3.4.3: Use Lemma 3.3.10 and the fact that if 𝑃 is a partition of [𝑎, 𝑏] and 𝑄 is a partition of [𝑏, 𝑐], then 𝑃 ∪ 𝑄 is a partition of [𝑎, 𝑐]. Problem 3.4.4(a): Extreme Value Theorem. Problem 3.4.4(c): Use Lemma 3.3.10. Problem 3.4.5(a): Lemma 3.4.7. Problem 3.4.5(b): Consider (𝑓(𝑥) + 𝑔(𝑥))2 . Problem 3.4.5(e): Consider 𝑓+ (𝑥) = max(𝑓(𝑥), 0) and 𝑓− (𝑥) = max(−𝑓(𝑥), 0), and express both sides of (3.4.27) in terms of 𝑓+ and 𝑓− . Problem 3.4.6(b): Separate [𝑎, 𝑏] into [𝑐 − 𝛿, 𝑐 + 𝛿] and its complement. Problem 3.5.2(b): What is 𝐻(𝑎)? Problem 3.5.3(a): Chain rule. Problem 3.5.3(b): Apply part (a) to some 𝐹(𝑥) such that 𝐹 ′ (𝑥) = 𝑔(𝑥). Problem 3.5.4: Combine the product rule and FTC II. Problem 3.6.4: Convergent sequences are bounded. Problem 3.6.5(a): L’Hôpital. Problem 3.6.5(b): Induction and L’Hôpital. Problem 3.6.7(b): Polar coordinates. Problem 3.6.8(a): Fundamental Theorem of Calculus. Problem 3.6.8(b): Why are

𝜕𝑓 𝜕𝑓 and bounded on 𝑋? 𝜕𝑥 𝜕𝑦

Problem 3.6.9: Use Corollary 3.6.14 to bound |𝐼(𝑥1 ) − 𝐼(𝑥0 )|. Problem 4.1.2(d): Absolute convergence. Problem 4.1.2(e): 𝑛th term test. Problem 4.1.3(c): Theorem 3.6.12. ∞

Problem 4.1.4(b): Compare ∑ 𝑎𝑛 with a geometric series. 𝑛=𝑁

Problem 4.1.4(d): 𝑛th term test. Problem 4.1.5(b): Use the fact that 𝑇𝑘+1 ⊆ 𝑇𝑘 .

334

Appendix D. Suggestions for problems

Problem 4.2.1: You may compute the necessary integrals using “area under the curve” and a picture. Problem 4.2.2: 𝑓0 (𝑧) + 𝑓1 (𝑧) − 𝑓0 (𝑧) + 𝑓2 (𝑧) − 𝑓1 (𝑧) + ⋯. Problem 4.3.3(a): Completeness of 𝐂. Problem 4.3.3(b): Theorem 2.4.10. Problem 4.3.3(c): Use part (b) of this problem and a well-chosen 𝜖0 . Problem 4.3.4: Take 𝜖 = 1. Problem 4.3.5(a): Figure 4.3.1. Problem 4.3.5(b): Given 𝜖 > 0, first choose an 𝑛, then a 𝛿. 𝑏 | 𝑏 | Problem 4.3.6: Use Theorem 3.4.8 to bound ||∫ 𝑓𝑛 (𝑥) 𝑑𝑥 − ∫ 𝑓(𝑥) 𝑑𝑥||. | 𝑎 | 𝑎

Problem 4.4.1(a): First prove that 𝑓(𝑅0 ) converges absolutely, and then use the Mtest. Problem 4.5.1(c): First consider partial sums. Problem 4.5.4: Instead of using the power series for 𝐶(𝑥) and 𝑆(𝑥), use 𝐸(𝑖𝑥), and keep 𝐸(𝑖𝑥) in mind. Problem 4.5.6(a): Arbitrarily Close Criterion and the inf definition of 𝜋. Problem 4.5.6(c): Intermediate Value Theorem. Problem 4.5.6(d): Why is 𝑆(𝜋/2) > 0? Problem 4.5.6(e): Use 𝐸(𝜋𝑖/2). Problem 4.6.4(c): Compute 𝑘′ (𝑥). Problem 4.6.5(b): Mean Value Theorem. Problem 4.6.5(c): For 𝑦 > 1, use the Intermediate Value Theorem; then for 0 < 𝑦 < 1, use Problem 4.5.2. Problem 4.6.6(a): Theorem 3.2.15. Problem 4.7.1(b): Prove that there exists some 𝑎 such that for |𝑥| > 𝑎, |𝑥𝑛 𝑓(𝑥)| < 1, and then use the fact that 𝑓 is continuous on [−𝑎, 𝑎]. Problem 4.7.2: For each part, we may regard the differentiability condition of Definition 4.7.1 as following immediately from the sum and product rules, so it suffices to prove that the (equivalent) “rapid decay” conditions of Definition 4.7.1 hold. Problem 4.8.1(a): Imitate the proof of Theorem 2.5.2.

Appendix D. Suggestions for problems

335 𝑛

Problem 4.8.1(b): Prove that the limit of the sequence ∫ 𝑓(𝑥) 𝑑𝑥 exists; then for 𝑏

𝑎

𝑏 ∈ 𝐑, approximate ∫ 𝑓(𝑥) 𝑑𝑥 by some term in that sequence. 𝑎

Problem 4.8.2: Imitate the proof of Corollary 4.1.4. Problem 4.8.3(a): Convert to polar coordinates and take advantage of the change of variables factor 𝑑𝑥 𝑑𝑦 = 𝑟 𝑑𝑟 𝑑𝜃. Problem 4.8.4: Carefully take limits in Theorem 3.5.6. Problem 4.8.5: Theorem 3.6.23 and Section 4.3. Problem 4.8.6: Draw graphs and calculate. Problem 4.8.7: Theorem 4.8.4. ∞

Problem 4.8.8(a): Carefully combine the convergence of the integrals ∫ 𝑔𝑖 (𝑦) 𝑑𝑦. −∞

Problem 4.8.8(b): Use part (a) and imitate Problem 4.3.6. Problem 5.1.1: By symmetry, we may take the integrals, or even the averages, on the 1 positive halves [0, ] and [0, 720] of the respective intervals. 2

Problem 5.1.2: See suggestion for Problem 5.1.1. Problem 5.1.3: See suggestion for Problem 5.1.1. Problem 5.2.2(a): This does not work if 𝑓, 𝑔, and ℎ are all constant. Problem 5.2.2(b): Use the Sup Inequality Lemma (Lemma 2.1.6). Problem 5.2.3: In other words, prove that for any 𝜖 > 0, there exists some 𝑁(𝜖) not depending on 𝑥 ∈ 𝑋 such that for all 𝑥 ∈ 𝑋, etc. Problem 5.2.4: The case 𝑎 = 0 can be handled separately, so for 𝑎 ≠ 0, prove 𝑑(𝑎𝑓, 0) ≤ |𝑎|𝑑(𝑓, 0) and use symmetry. Problem 5.3.3(a): Consider (|𝑎(𝑥)| − |𝑏(𝑥)|)2 . Problem 5.3.3(b): Comparison. Problem 6.1.1: Use the integral formulas in Section 4.6. Problem 6.2.2: Theorem 4.3.14. ̂ = 𝑐𝑛 + 𝑑𝑛 𝑖. Your proof should work equally well Problem 6.3.2(b): Decompose 𝑓(𝑛) for the case 𝑛 = 0. Problem 6.3.4: Combine (6.2.1) and (6.3.5). Problem 6.3.5(a): You only need to check what happens at 0 and 1/2, or equivalently, 0 and −1/2.

336

Appendix D. Suggestions for problems

Problem 6.3.5(b): Chain Rule. Problem 6.3.6: Use Problem 6.3.4. Problem 6.4.1: Integration by parts. Problem 6.4.2(a): Use Corollary 3.1.16 to estimate (6.2.1). Problem 6.4.2(b): Theorem 6.4.1. Problem 6.4.4: Theorem 6.4.2 and Weierstrass 𝑀-test (Theorem 4.3.7). Problem 7.1.2: Theorem 2.2.3. Problem 7.1.3: Use Lemma 3.4.9. Problem 7.1.6(c): Pythagorean Theorem. Problem 7.1.7(b): Use Theorem 7.1.11. Problem 7.1.8: Compare the square of both sides. Problem 7.1.9(a): Use Cauchy-Schwarz in 𝐂𝑁 . Problem 7.1.9(b): Theorem 2.4.14. Problem 7.2.1(b): Calculate the integral as a function of 𝑘 = ⌊log2 (𝑛)⌋. Problem 7.2.1(c): It is probably better to give a qualitative description of why this happens than to use formulas. Problem 7.2.2: See the proof of Theorem 2.4.5. Problem 7.2.3: See Section 2.4. Problem 7.2.4: Cauchy-Schwarz. Problem 7.2.5(b): Sequential definition of continuity. Problem 7.2.6: See Problem 2.5.1. Problem 7.3.1: Continuity of the inner product; compare Problem 5.3.2. Problem 7.3.2: What is ⟨𝑐1 𝑢1 + ⋯ + 𝑐𝑁 𝑢𝑁 , 𝑢𝑖 ⟩? Problem 7.3.3: Theorem 7.1.9 and induction. Problem 7.3.4(b): Pythagorean Theorem. Problem 7.3.5: Calculate ‖ projℬ 𝑓‖2 ; Problem 7.3.4 may be helpful. Problem 7.3.6: Best Approximation Theorem. 𝑁

̂ Problem 7.3.7: The main point is to prove that 𝑓 = ∑ 𝑓(𝑛)𝑒 𝑛 , and since ℬ is finite, 𝑛=1

this question is purely algebraic in nature.

Appendix D. Suggestions for problems

337

Problem 7.4.3: Recall that a countable union of countable collections is still a countable collection. Problem 7.4.4(b): Recall that by the general theory of equivalence classes, if some 𝑈 ∈ 𝒞 is equivalent to some 𝑈 ′ ∈ 𝒞 ′ , then 𝒞 = 𝒞 ′ . Problem 7.4.5(a): By symmetry, we may assume 𝑎′ ≤ 𝑎″ , and then we have two cases. Problem 7.4.5(b): Induction. Problem 7.5.3: Measure zero. Problem 7.5.4(c): Monotone convergence. Note that dominated convergence is not helpful here because it requires comparing 𝑓 to a known Lebesgue integrable function. Problem 7.5.5(c): See Problem 7.5.4. ∞

Problem 7.5.6: To prove that the improper integral ∫ |𝑓(𝑥)|2 𝑑𝑥 is finite, compare 𝑓(𝑥) to a function of the form 𝐾|𝑥|−1 .

1

Problem 7.5.7: Imitate the proof of Theorem 7.5.13. Problem 7.5.8: First truncate (Theorem 7.5.13), then approximate (Lebesgue Axiom 6), and then “make sure the ends are continuous”. Problem 7.5.9: Lebesgue Axiom 2. Problem 7.5.10(a): Consider ⟨|𝑓|, 1⟩. Problem 7.6.2(a): Bessel’s inequality and monotone convergence for sequences/series, along with Problem 7.6.1. Problem 7.6.2(b): Imitate the proof of the comparison test (Corollary 4.1.4). Problem 7.6.3: Corollary 7.6.5. Problem 7.6.4: Absolute convergence. Problem 7.6.5: Absolute convergence. Problem 7.6.6(b): Use the continuity of the inner product. Problem 7.6.7: The left-hand side must converge to some 𝑔 ∈ ℋ (why?). What is ⟨𝑓 − 𝑔, 𝑢𝑘 ⟩? Problem 8.2.1: Substitution and the periodicity of 𝑓. Be careful with the limits of integration. Problem 8.2.2: Use the uniform continuity of 𝑓 on 𝑆 1 (why is that possible?) to show that |(𝑓 ∗ 𝑔)(𝑥) − (𝑓 ∗ 𝑔)(𝑎)| is small. Problem 8.2.4: Substitution and translation invariance.

338

Appendix D. Suggestions for problems

Problem 8.2.5: Substitution, translation invariance, Fubini’s Theorem (Theorem 3.6.21). Problem 8.2.6: Theorem 3.6.23. Problem 8.2.7: Fubini, translation invariance. Problem 8.3.2(a): Compute ⟨𝐷𝑛 , 1⟩. Problem 8.3.3(a): Use well-known properties of sine. 1 1

Problem 8.3.3(b): Use (8.3.16) and then compare with an integral on [− , ]. 2 2

Problem 8.4.1: Use the fact that 𝑓 is uniformly continuous on 𝑆 1 (why?). 1

1

2

2

Problem 8.4.2: Note that the region of integration is [− , −𝛿] ∪ [𝛿, ]. Use the fact that 𝑓 is bounded on 𝑆 1 (why?). 1 2

Problem 8.4.3: Use 𝑓(𝑥) = ∫ 𝑓(𝑥)𝐾𝑛 (𝑡) 𝑑𝑡 (why?) and compare (𝑓 ∗ 𝐾𝑛 )(𝑥). −

1 2

Problem 8.4.4: Use Theorem 7.6.8, and keep in mind the definition of “equals” in 𝐿2 (𝑆 1 ). Problem 8.4.5(a): Keep in mind that 𝑎𝑛 is two-sided. Problem 8.4.5(b): Use the inner product in ℓ2 (𝐙). Problem 8.4.6: Theorem 6.4.1 and the Extra Derivative Lemma, Lemma 8.4.7. Problem 8.5.1(a): Imitate Problems 6.2.2 and 6.4.4, and use Section 8.4. Problem 8.5.1(b): Induction on ℓ and term-by-term differentiation. Problem 8.5.2: Apply Theorem 8.5.1. Problem 9.1.1: L’Hôpital’s Rule (Theorem 3.6.5). Problem 9.3.2(a): What are the columns of 𝐴𝑃 = 𝑃𝐷? Problem 9.3.3: First prove that if 𝐷 = 𝑃 −1 𝐴𝑃 is diagonal, then 𝐷 must be the zero matrix. Problem 9.3.4: Write out 𝑣 as a linear combination of {𝑢1 , … , 𝑢𝑛 }, and use the fact that {𝑢1 , … , 𝑢𝑛 } is linearly independent. Problem 9.3.5(b): Write down a formula for 𝑇 −1 . Problem 10.1.5: Find a sequence of vectors 𝑢𝑛 ∈ 𝔇(𝐷) such that

‖𝐷(𝑢𝑛 )‖ → ∞. ‖𝑢𝑛 ‖

Problem 10.1.6(a): Take 𝜖 = 1, and for any 𝑓 ∈ 𝔇(𝑇), scale 𝑓 to ensure it is near 0. Problem 10.2.3: Parseval. Problem 10.2.4: Try a few linear combinations of the 𝑒𝑛 .

Appendix D. Suggestions for problems

339

Problem 10.2.6: See Example 10.2.11, and justify vanishing terms carefully. Problem 10.2.7: See Example 10.2.11, and justify vanishing terms carefully. Problem 10.2.8: Parseval. Problem 10.3.1: Consider sines and cosines. Make sure your eigenfunctions satisfy the appropriate boundary conditions. ∞

Problem 10.3.3: Suppose 𝑓 = ∑ 𝑐𝑛 𝑒𝑛 , and prove that each of the 𝑐𝑛 = 0. 𝑛=1

Problem 10.3.4: Let 𝑓 be an eigenvector with eigenvalue 𝜆, and consider ⟨𝑇(𝑓), 𝑓⟩. Problem 10.3.5: See Problem 10.3.4. Problem 10.3.6: Consider ⟨𝑇(𝑢𝑖 ), 𝑢𝑗 ⟩. Problem 10.3.7: Use induction on 𝑛, and consider (𝑇−𝜆𝑛 𝐼) applied to 𝑐1 𝑢1 +⋯+𝑐𝑛 𝑢𝑛 . Problem 10.4.1: Compute the generalized Fourier coefficients of 𝑇(𝑓). Problem 10.4.2(a): Compute the generalized Fourier coefficients of 𝑇(𝑓). Problem 10.4.3(b): Theorem 10.4.4. Problem 11.1.1: Imitate the proof of Theorem 11.1.3. Problem 11.1.2(a): Theorem 6.4.1 and the Extra Derivative Lemma, Lemma 8.4.7. Problem 11.1.3: Compute 𝜓𝑛 (𝑡) from the definition and apply Theorem 3.6.23 and integration by parts. Problem 11.2.1: Theorem 4.6.3. Problem 11.2.2(a): Theorem 4.6.3. Problem 11.4.1: See Section 10.4. Problem 11.4.2: See Section 10.4. Problem 11.4.3: (11.4.7). Problem 11.4.4: (11.4.7) and integration by parts. Problem 11.4.5: 𝑀-test on the even extension of 𝑔 and imitate the proof of Theorem 8.1.2 in Section 8.4. Problem 11.8.1: As always, parts. Problem 11.8.2: Use the fact from linear algebra that the determinant of a square matrix is 0 if and only if its rows are linearly dependent. Problem 12.1.1: See Definition 7.5.1. Problem 12.1.2: Definition 7.5.1 and Theorem 7.5.3.

340

Appendix D. Suggestions for problems

Problem 12.1.3(b): Bound the integrand |𝑔(𝑥)| below by the “worst case” denominator. Problem 12.1.3(c): Sum the result of (b). Problem 12.1.4: Consider two cases: |𝑥| ≤ 2|𝑦| and |𝑥| ≥ 2|𝑦|. Problem 12.1.5(a): Example 4.8.5. Problem 12.1.5(b): Lemma 12.1.5. Problem 12.2.1: For all parts: Finite substitution and take limits. Note the sign in (c). Problem 12.2.2: Use Lemma 12.1.5 to bound |𝑥|𝑘 (𝑓 ∗ 𝑔)(𝑥). Problem 12.2.3: Substitution and translation invariance. Problem 12.2.4: Substitution, translation invariance, Lemma 12.1.6, and Fubini’s Theorem (Theorem 4.8.11). Problem 12.2.5: Theorem 4.8.8 and the fact that 𝑓 and 𝑓′ are bounded. Problem 12.2.6: Use the results of Problems 12.2.2 and 12.2.5. Problem 12.2.7: Corollary 4.7.3. Problem 12.2.8: Use the fact that 𝑓 is bounded on 𝐑. ∞

Problem 12.2.9: Use the fact that 𝑓(𝑥) = ∫ 𝑓(𝑥)𝐾𝑡 (𝑦) 𝑑𝑦 (why?). −∞

Problem 12.2.10(a): Theorem 4.8.6 and substitution. Problem 12.2.10(b): Substitute 𝑦 = 𝑥/𝑡 and pay close attention to the limits of integration. Problem 12.3.2: Lemma 12.2.1. Problem 12.3.3(a): Parts. Problem 12.3.3(b): Theorem 4.8.8. Problem 12.3.4(c): Use Problem 4.7.2. Problem 12.3.5: Lemma 12.1.6, Fubini’s Theorem (Theorem 4.8.11), and Lemma 12.2.1. Problem 12.3.6: Lemma 12.1.6 and Fubini’s Theorem (Theorem 4.8.11). The choice of variables in (12.3.2) may also be helpful. Problem 12.3.7(a): Use Theorems 3.6.23 and 12.3.3. Problem 12.3.7(d): Use Theorem 12.3.3 instead of doing more integrals. Problem 12.4.2: Pass the hat, inversion, and Lemma 12.4.5. You may also need the ˆ = ℎ(−𝛾) ̂ fact that ℎ(−𝛾) (why?). Problem 12.5.1: Use the Isomorphism Theorem in 𝒮(𝐑) to prove that the sequence 𝑓𝑛̂ is Cauchy.

Appendix D. Suggestions for problems

341

Problem 12.5.2: Put everything on one side. Problem 12.5.4: Use the continuity of the inner product and the Isomorphism Theorem in 𝒮(𝐑). Problem 12.5.5: Given lim 𝑓𝑛 = 𝑓, compute 𝑓 ̂ by finding a sequence of functions in 𝑛→∞

𝒮(𝐑) whose limit is 𝑓.̂ Lemma 12.5.3 also helps. Problem 13.1.2: Induction and parts; the base case 𝑛 = 1 is Problem 13.1.1. Problem 13.1.3: Use previous results instead of integrating. Problem 13.1.4: Either integrate or combine previous results. Problem 13.5.1(a): ⟨𝑇(Ψ), Ψ⟩. Problem 13.5.4(a): Theorem 13.5.3. Keep the implied Ψ ∈ ℋ0 with ‖Ψ‖ = 1 in mind. Problem 13.5.4(b): Cauchy-Schwarz. Problem 14.3.1: Consider the cases ⟨1, 𝑤𝑛𝑘 ⟩, ⟨𝑤𝑛𝑘 , 𝑤𝑛ℓ ⟩, and ⟨𝑤𝑛𝑘 , 𝑤𝑚ℓ ⟩ for 𝑛 < 𝑚, and keep Figure 14.3.1 in mind. Problem 14.3.2(c): Use {𝜒𝑛𝑘 } instead of ℬ and use the uniform continuity of 𝑔. Problem 14.4.1(c): Differentiate both sides of 𝑈𝑈 ∗ = 𝐼. Problem 14.4.1(d): Insert 𝐼 = 𝑈 ∗ 𝑈 at an appropriate place. Problem B.1: Remember that two functions with the same domain are equal precisely when they give the same output for any given input. Problem B.3(a): Find a specific element of ℓ2 (𝐍) not contained in the algebraic span of ℬ. Problem B.3(b): Isomorphism Theorem for Fourier Series. Problem C.1(a): Asymptotics (Section 3.6). Problem C.1(b): Use the definition of the derivative at 𝑥 = 0. Problem C.4: Do precalculus operations on the function 𝜑3 given by (C.8).

Bibliography [AG08] [Ahl79] [Apo69]

[AW02] [BBM17]

[BKT92] [BN10] [Büs10] [Car66] [Con78] [Con96] [CSS97]

[Dar99] [Dau92]

[DD10] [DLM]

[DM85] [DS00] [DS05] [Edw01] [ER85]

M. A. Al-Gwaiz, Sturm-Liouville theory and its applications, Springer Undergraduate Mathematics Series, Springer-Verlag London, Ltd., London, 2008. MR2368365 L. Ahlfors, Complex analysis, 3rd ed., McGraw-Hill, 1979. T. M. Apostol, Calculus. Vol. II: Multi-variable calculus and linear algebra, with applications to differential equations and probability, 2nd ed., Blaisdell Publishing Co. Ginn and Co., Waltham, Mass.-Toronto, Ont.-London, 1969. MR0248290 J. F. Alm and J. S. Walker, Time-frequency analysis of musical instruments, SIAM Rev. 44 (2002), no. 3, 457–476, DOI 10.1137/S00361445003822. MR1951367 C. M. Bender, D. C. Brody, and M. P. Müller, Hamiltonian for the zeros of the Riemann zeta function, Phys. Rev. Lett. 118 (2017), no. 13, 130201, 5, DOI 10.1103/PhysRevLett.118.130201. MR3685953 V. B. Braginsky, F. Y. Khalili, and K. S. Thorne, Quantum measurement, Cambridge University Press, 1992. J. Bak and D. J. Newman, Complex analysis, 3rd ed., Undergraduate Texts in Mathematics, Springer, New York, 2010. MR2675489 P. Buser, Geometry and spectra of compact Riemann surfaces, reprint of the 1992 edition, Modern Birkhäuser Classics, Birkhäuser Boston, Inc., Boston, MA, 2010. MR2742784 L. Carleson, On convergence and growth of partial sums of Fourier series, Acta Math. 116 (1966), 135–157, DOI 10.1007/BF02392815. MR0199631 J. B. Conway, Functions of one complex variable, 2nd ed., Graduate Texts in Mathematics, vol. 11, Springer-Verlag, New York-Berlin, 1978. MR503901 J. B. Conway, Functions of one complex variable II, Graduate Texts in Mathematics, vol. 159, Springer-Verlag, 1996. G. Cornell, J. H. Silverman, and G. Stevens (eds.), Modular forms and Fermat’s last theorem, papers from the Instructional Conference on Number Theory and Arithmetic Geometry held at Boston University, Boston, MA, August 9–18, 1995, Springer-Verlag, New York, 1997. MR1638473 H. Darmon, A proof of the full Shimura-Taniyama-Weil conjecture is announced, Notices Amer. Math. Soc. 46 (1999), no. 11, 1397–1401. MR1723249 I. Daubechies, Ten lectures on wavelets, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 61, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992. MR1162107 K. R. Davidson and A. P. Donsig, Real analysis and applications: Theory in practice, Undergraduate Texts in Mathematics, Springer, New York, 2010. MR2568574 NIST digital library of mathematical functions, http://dlmf.nist.gov/, release 1.0.14 of 2016-12-21, F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, and B. V. Saunders, eds. H. Dym and H. P. McKean, Fourier series and integrals, Academic Press, 1985. J. Dongarra and F. Sullivan, Guest editors’ introduction: The top 10 algorithms, Computing in Science & Engineering 2 (2000), no. 1, 22–23. F. Diamond and J. Shurman, A first course in modular forms, Graduate Texts in Mathematics, vol. 228, Springer-Verlag, New York, 2005. MR2112196 H. M. Edwards, Riemann’s zeta function, reprint of the 1974 original [Academic Press, New York; MR0466039 (57 #5922)], Dover Publications, Inc., Mineola, NY, 2001. MR1854455 R. Eisberg and R. Resnick, Quantum physics of atoms, molecules, solids, nuclei, and particles, 2nd ed., Wiley & Sons, 1985.

343

344 [Eva10]

BIBLIOGRAPHY

L. C. Evans, Partial differential equations, 2nd ed., Graduate Studies in Mathematics, vol. 19, American Mathematical Society, Providence, RI, 2010. MR2597943 [Fey11] R. P. Feynman, The Feynman lectures on physics, New Millenium edition, Basic Books, 2011. [Fle] P. J. Van Fleet, Image compression: How math led to the JPEG2000 standard, http://www. whydomath.org/node/wavlets/index.html, accessed: 2017-05-08. [Fre07] E. Frenkel, Lectures on the Langlands program and conformal field theory, Frontiers in number theory, physics, and geometry. II, Springer, Berlin, 2007, pp. 387–533, DOI 10.1007/978-3-54030308-4_11. MR2290768 [Fri08] A. Friedman, Partial differential equations of parabolic type, Dover Publications, 2008. [Gal12] J. Gallian, Contemporary abstract algebra, 8th ed., Cengage, 2012. [Gel84] S. Gelbart, An elementary introduction to the Langlands program, Bull. Amer. Math. Soc. (N.S.) 10 (1984), no. 2, 177–219, DOI 10.1090/S0273-0979-1984-15237-6. MR733692 [Gri04] D. J. Griffiths, Introduction to quantum mechanics, 2nd ed., Pearson Prentice Hall, 2004. [Gro08] I. Grojnowski, Representation theory, in Timothy Gowers, June Barrow-Green, and Imre Leader, editors, The Princeton companion to mathematics, chapter IV.9, pages 419–431, Princeton University Press, Princeton, NJ, 2008. [GWW92a] C. Gordon, D. Webb, and S. Wolpert, Isospectral plane domains and surfaces via Riemannian orbifolds, Invent. Math. 110 (1992), no. 1, 1–22, DOI 10.1007/BF01231320. MR1181812 [GWW92b] C. Gordon, D. L. Webb, and S. Wolpert, One cannot hear the shape of a drum, Bull. Amer. Math. Soc. (N.S.) 27 (1992), no. 1, 134–138, DOI 10.1090/S0273-0979-1992-00289-6. MR1136137 [Haa10] A. Haar, Zur Theorie der orthogonalen Funktionensysteme (German), Math. Ann. 69 (1910), no. 3, 331–371, DOI 10.1007/BF01456326. MR1511592 [Hal13] B. C. Hall, Quantum theory for mathematicians, Graduate Texts in Mathematics, vol. 267, Springer, New York, 2013. MR3112817 [Har02] P. Hartman, Ordinary differential equations, corrected reprint of the second (1982) edition [Birkhäuser, Boston, MA; MR0658490 (83e:34002), with a foreword by Peter Bates], Classics in Applied Mathematics, vol. 38, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002. MR1929104 [Hol07] S. S. Holland, Applied analysis by the Hilbert space method: An introduction with applications to the wave, heat, and Schrödinger equations, Dover Publications, 2007. [Hör90] L. Hörmander, The analysis of linear partial differential operators. I, Distribution theory and Fourier analysis, 2nd ed., Springer Study Edition, Springer-Verlag, Berlin, 1990. MR1065136 [HS75] E. Hewitt and K. Stromberg, Real and abstract analysis: A modern treatment of the theory of functions of a real variable, 3rd printing, Graduate Texts in Mathematics, No. 25, Springer-Verlag, New York-Heidelberg, 1975. MR0367121 [IK04] H. Iwaniec and E. Kowalski, Analytic number theory, American Mathematical Society Colloquium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004. MR2061214 [Joh15] W. Johnston, The Lebesgue integral for undergraduates, MAA Textbooks, Mathematical Association of America, Washington, DC, 2015. MR3363530 [Kac66] M. Kac, Can one hear the shape of a drum?, Amer. Math. Monthly 73 (1966), no. 4, 1–23, DOI 10.2307/2313748. MR201237 [Kat66] Y. Katznelson, Sur les ensembles de divergence des séries trigonométriques (French), Studia Math. 26 (1966), 301–304, DOI 10.4064/sm-26-3-305-306. MR199632 [Kna97] A. W. Knapp, Introduction to the Langlands program, Representation theory and automorphic forms (Edinburgh, 1996), Proc. Sympos. Pure Math., vol. 61, Amer. Math. Soc., Providence, RI, 1997, pp. 245–302, DOI 10.1090/pspum/061/1476501. MR1476501 [Kna01] A. W. Knapp, Representation theory of semisimple groups: An overview based on examples, Princeton Mathematical Series, vol. 36, Princeton University Press, 2001. [Kol26] A. Kolmogorov, Une série de Fourier-Lebesgue divergente partout, C. R. Acad. Sci. Paris Sér. A-B 183 (1926), 1327–1328. [Kör89] T. W. Körner, Fourier analysis, 2nd ed., Cambridge University Press, Cambridge, 1989. MR1035216 [KS99] N. M. Katz and P. Sarnak, Zeroes of zeta functions and symmetry, Bull. Amer. Math. Soc. (N.S.) 36 (1999), no. 1, 1–26, DOI 10.1090/S0273-0979-99-00766-1. MR1640151 [Loo11] L. H. Loomis, Introduction to abstract harmonic analysis, Dover Publications, 2011. [Mal08] S. Mallat, A wavelet tour of signal processing: The sparse way, 3rd ed., Academic Press, 2008. [Mes97] R. Messer, Linear algebra: Gateway to mathematics, Pearson, 1997. [Mun97] J. R. Munkres, Analysis on manifolds, Westfield Press, 1997.

BIBLIOGRAPHY [Mun00] [NC11] [Nee99] [Nel15] [Nie00] [Nyq28] [Opi96] [OWN97] [RF10] [Roc00] [Ros13] [RS80] [RS96]

[Rud76] [Rud86] [Rud91] [Sha49] [Sha94] [SS03] [Ter13] [Tes09]

[Wal08] [Wat12] [Zel00] [Zel09]

345

J. R. Munkres, Topology, second edition of [MR0464128], Prentice Hall, Inc., Upper Saddle River, NJ, 2000. MR3728284 M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information, 10th anniversary ed., Cambridge University Press, 2011. T. Needham, Visual complex analysis, Clarendon Press, 1999. G. S. Nelson, A user-friendly introduction to Lebesgue measure and integration, Student Mathematical Library, vol. 78, American Mathematical Society, Providence, RI, 2015. MR3409206 Y. Nievergelt, Wavelets made easy, corrected ed., Birkhaüser, 2000. H. Nyquist, Certain topics in telegraph transmission theory, Trans. AIEE 47 (1928), no. 2, 617– 644, reprinted in Proc. IEEE, vol. 90, no. 2. M. Opitz, Method of simulating a room and/or sound impression, August 6, 1996, US Patent 5,544,249. A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and systems, 2nd ed., Prentice-Hall, 1997. Halsey R. and P. Fitzpatrick, Real analysis, 4th ed., Pearson, 2010. D. Rockmore, The FFT: An algorithm the whole family can use, Computing in Science & Engineering 2 (2000), no. 1, 60–64. K. A. Ross, Elementary analysis: The theory of calculus, 2nd ed., in collaboration with Jorge M. López, Undergraduate Texts in Mathematics, Springer, New York, 2013. MR3076698 M. Reed and B. Simon, Functional analysis, Academic Press, 1980. Z. Rudnick and P. Sarnak, Zeros of principal 𝐿-functions and random matrix theory: A celebration of John F. Nash, Jr., Duke Math. J. 81 (1996), no. 2, 269–322, DOI 10.1215/S0012-7094-96-081156. MR1395406 W. Rudin, Principles of mathematical analysis, 3rd ed., McGraw-Hill, 1976. W. Rudin, Real and complex analysis, 3rd ed., McGraw-Hill, 1986. W. Rudin, Functional analysis, 2nd ed., International Series in Pure and Applied Mathematics, McGraw-Hill, Inc., New York, 1991. MR1157815 C. E. Shannon, Communication in the presence of noise, Proc. I.R.E. 37 (1949), 10–21. MR28549 R. Shankar, Principles of quantum mechanics, 2nd ed., Plenum Press, New York, 1994. MR1343488 E. M. Stein and R. Shakarchi, Fourier analysis: An introduction, Princeton Lectures in Analysis, vol. 1, Princeton University Press, Princeton, NJ, 2003. MR1970295 A. Terras, Harmonic analysis on symmetric spaces—Euclidean space, the sphere, and the Poincaré upper half-plane, 2nd ed., Springer, New York, 2013. MR3100414 G. Teschl, Mathematical methods in quantum mechanics: With applications to Schrödinger operators, Graduate Studies in Mathematics, vol. 99, American Mathematical Society, Providence, RI, 2009. MR2499016 J. S. Walker, A primer on wavelets and their scientific applications, 2nd ed., Studies in Advanced Mathematics, Chapman & Hall/CRC, Boca Raton, FL, 2008. MR2400818 W. C. Waterhouse, Square root as a homomorphism, Amer. Math. Monthly 119 (2012), no. 3, 235–239, DOI 10.4169/amer.math.monthly.119.03.235. MR2911438 S. Zelditch, Spectral determination of analytic bi-axisymmetric plane domains, Geom. Funct. Anal. 10 (2000), no. 3, 628–677, DOI 10.1007/PL00001633. MR1779616 S. Zelditch, Inverse spectral problem for analytic domains. II. ℤ2 -symmetric domains, Ann. of Math. (2) 170 (2009), no. 1, 205–269, DOI 10.4007/annals.2009.170.205. MR2521115

Index of Selected Notation ≪, 65 ∑ , 75 𝑛∈𝐙

⟨𝑇⟩, 293 ∼, 128, 152 [𝐴, 𝐵], 294 {𝐴, 𝐵}, 294 (Δ𝑥)𝑖 , 46

𝐷𝑁 (𝑥), 182 𝐸𝑓 (𝑧), 41 𝐸(𝑓; 𝑃), 50 𝑒𝑛 , 151 𝑒𝑛 (𝑥), 101 𝑒𝑧 , 96

𝑎(𝑥), 122

𝑓even , 134 𝑓odd , 134 𝑓𝑁 (𝑥), 127 𝑓(𝑎± ), 196 𝑓(𝑥) vs. 𝑓(𝑧), 32 𝑓 ∗ 𝑔, 179, 267 ̂ 264 𝑓(𝛾), ̂ 127, 152 𝑓(𝑛), 𝐹𝑁 (𝑥), 182

ℬ, 151

𝐺𝑡 (𝑥), 268

𝐂, 13 𝐶 0 (𝑋), 118 𝐶𝑐0 (𝑋), 168 𝜒𝑡 (𝑥), 282 𝐶 ∞ (𝑋), 118 𝐶𝑐∞ (𝐑), 191 𝐶 𝑟 (𝑋), 118 𝐶𝑐𝑟 (𝐑), 214 𝐶𝑥𝑟 (𝑋), 118

ℋ, 172 ℋ0 , 214 𝐻𝑛 (𝑥), 254 ℎ𝑛 (𝑥), 254 𝐻𝑡 (𝑥), 298

⟨𝑓, 𝑔⟩, 139 ‖𝑓‖, 140, 145 ‖𝑓‖1 , 145 ‖𝑓‖∞ , 145 ‖𝑓‖𝑝 , 163 |𝑧|, 13

𝐷(𝑓), 214 𝔇(𝑇), 213 𝑑(𝑥, 𝑦), 15 Δ, 220 𝛿(𝑥), 181 Δ+ , 239 𝔇(Δ)Dir , 𝔇(Δ)Neu , 220

ℑ(𝑧), 14 𝐾(𝑓), 254 𝐾𝑛 (𝑥), 181 𝐾𝑡 (𝑥), 267, 297 ℓ(𝑈), 157 𝐿(𝑓), 250 𝐿1 (𝑋), 162 𝐿2 (𝑋), 166 𝐿2𝑥 (𝑆1 ), 230 ℓ2 (𝑋), 122 347

348 𝐿∞ (𝑋), 163 𝐿𝑝 (𝑋), 163 𝐿(𝑣; 𝑃), 47 ℳ(𝑋), 162 𝑀𝑝(𝑥) , 272 𝜇𝑎 , 272 𝜇(𝑓; 𝑃, 𝑖), 50 𝑀(𝑣; 𝑃, 𝑖), 46 𝑚(𝑣; 𝑃, 𝑖), 46 𝑀𝑥 , 214 𝐍, 9 𝒩𝑟 (𝑧), 19 𝒩𝑟 (𝑧), 19 𝒩𝑟 (𝑥), 25 𝒩𝑟 (𝑥), 25 𝒫, 47 𝑃𝑛 (𝑥), 250 projℬ 𝑓, 152 proj𝑔 (𝑓), 141 Ψ(𝑥, 𝑡), 209

INDEX OF SELECTED NOTATION 𝑟𝑥 (𝜏), 193, 302, 303 𝒮(𝐑), 104 𝑆 1 , 119 𝑠𝑏 , 272 𝜎, 215 𝜎(𝑀)2 , 294 sinc(𝑥), 282 𝑠𝑁 (𝑥), 183 𝑆𝑥 (𝛾), 193, 303 𝒯, 27 𝜏𝑎 , 272 Θ(𝜏), 296 𝜃(𝑧), 296 𝑈(𝑓), 271 𝑢(𝑥), 282 𝑈(𝑣; 𝑃), 46 𝑉 𝑐 , 19 𝑊(𝑓1 , 𝑓2 ), 259

𝐐, 9

𝑥(𝛾), ̂ 192

𝐑, 10 ℜ(𝑧), 14 ℛ(𝑋), 118, 166

𝐙, 9 𝑧, 13 𝜁(𝑠), 192

Index a.e., 157 absolute value, 13 abstract harmonic analysis, 317 acoustics, 4 affine algebraic group, 317 algebraic basis, 325 algebraic span, 325 almost all, 157 almost everywhere, 157 alternating series, 78 Always Better Theorem, 154 angle, 121 anticommutator, 294 Arbitrarily Close Criterion, 11, 12, 20 asymptotics, 63, 66 autocorrelation function, 193, 302, 303 average value, 116 band-limited, 302 basis, 155 Bessel’s inequality, 153 Best Approximation Theorem, 153 Black-Scholes equation, 205 Bolzano-Weierstrass Theorem, 24, 26 bound lower, 10 upper, 10 boundary conditions, 207, 235, 244, 245 Dirichlet, 207, 245, 314 Neumann, 207, 245 boundary value problem, 207, 244 bounded operator, 215 set, 17, 146 bounded function, 35

bump function, 191 𝐶𝑐0 (𝑋), 168 𝐶 0 (𝑋), 118 𝐶𝑐∞ (𝐑), 191 𝐶 ∞ (𝑋), 118 𝐶 𝑟 (𝑋), 118 𝐶𝑥𝑟 (𝑋), 118 Cauchy, 23 in norm metric, 147 uniformly, 85 Cauchy criterion for improper integrals, 106 for series, 75 Cesàro sum, 183 closed, 19, 26, 27 closed disc, 19, 25 closed 𝑟-neighborhood, 25 coefficient of a power series, 95 of a trigonometric polynomial, 125 collapse (of a quantum state), 258, 291, 292, 312 commutator, 294 compact, 26, 35, 317 point-set, 27 sequential, 27 Comparison Test, 75 for improper integrals, 107 complement, 19 completeness, 23 Cauchy, 23 in norm metric, 148 order, 10 complex analysis, 43 complex conjugate, 13 349

350 complex numbers, 13 complex plane, 13 conjugate, see also complex conjugate continuous function, 32 𝜖-𝛿, 32 in norm metric, 146 𝜖-𝛿, 146 on a set, 146 sequentially, 146 metric, 32 𝜖-𝛿, 32 on a set, 32 sequentially, 32 on a set, 32 piecewise, 37 point-set, 38 sequentially, 32 uniformly, 34 with compact support, 168 convergence absolute, 75 of a sequence, 17 of a series, 74 of a two-sided series, 75 pointwise, 80 synchronous, 76 uniform, 84 consequences of, 86–89 Convergence of Monotone Sequences, 20 convolution, 177 on 𝐑, 267 on 𝑆 1 , 179 correlated, 193, 303 critical line, 315 Daniell-Riesz integral, 306 decay, 104, 189 Dedekind cuts, 10 dense, 21 diagonal, 210 diagonalization, 210, 215, 226 Diagonalization Theorem, 226 differentiable function, 40, 311 continuously differentiable, 40 on a set, 40 piecewise, 37

INDEX totally, 67 differential equation, 3 differentiating under the integral sign, 69, 108 diffusion, 205 dimension, 316 Dirac delta function, 181, 307 Dirac kernel, 177 on 𝐑, 267 on 𝑆 1 , 181, 297 Dirichlet kernel, 182 discrete Fourier transform, 308 discrete-time Fourier transform, 192 distribution, 307 tempered, 307 divergence of a sequence, 17 of a series, 74 of a two-sided series, 76 synchronous, 76 domain (of an operator), 213 dominate, 164 dot product, 115, 121, 140 dual group, 317 eigenbasis, 211, 225 eigenbasis method, 229 eigenfunction, 222 eigenvalue, xii, 210, 213, 222 associated (with an eigenbasis), 225 eigenvector, 210, 213, 222 energy, 4 conservation of, 209 kinetic, 209 potential, 209 energy operator, 258, 311 even extension, 134, 246 even function, 133 expected value, 292, 293 extended nonnegative real number, 162 Extra Derivative Lemma, 178, 188 Extreme Value Theorem, 35 ̂ 271 𝑓(𝛾), ̂ 127 𝑓(𝑛), 𝑓𝑁 (𝑥), 127 fast Fourier transform, 308

INDEX Fejér kernel, 182 field, 10 ordered, 10 formal solution, 236 Fourier coefficient, 127 generalized, 152 Fourier cosine series, 135 Fourier polynomial, 127 Fourier series, xii, 4, 128 generalized, 152 lacunary, 194 real, 133 Fourier sine series, 135 Fourier transform, xii, 254, 263, 264 on 𝐿2 , 276 on 𝒮(𝐑), 271 Fourier’s Law, 204 frequency response, 302 Fubini’s Theorem, 68 on 𝐑, 110 function space, xii, 2, 21, 115, 117 fundamental frequency, 4 Fundamental Theorems of Calculus, 58–60 Gauss kernel, 268 generating function, 297 geometric series, 77 two-sided, 77 Gram-Schmidt orthogonalization, 252 greatest lower bound, see also infimum group, 316 abelian, 316 additive abelian, 10 Haar measure, 317 Haar wavelet, 308, 309 Hamiltonian, 311 harmonic first, 4 𝑛th, 4 second, 4 heat equation, 203, 205 heat kernel, 298 Heaviside function, 282 Heaviside’s method, 284 Heisenberg Uncertainty Principle, 293, 294

351 Hermite functions, 254, 289 normalized, 255 Hermite polynomials, 254, 289 Hilbert space, 139, 171 Hilbert Space Absolute Convergence Theorem, 172 Hilbert Space Comparision Test, 173 holomorphic, 43 homomorphism, 316 ideal lowpass filter, 307 imaginary part (of a complex number), 14 incomplete, 23 index set, 151 Inf Inequality Lemma, 12 infimum, 10 initial value problem, 204 inner product, 121, 139 𝐿2 , 140 inner product space, 139, 140 integrable by separation, 108 integral, see also Lesbesgue integral or Riemann integral integration by parts, 61 on 𝐑, 107 Intermediate Value Theorem, 35 interval chain, 158 interval of continuity, 37 inverse Fourier transform, 274 Inversion Theorem for Fourier Series, 178 for the Fourier Transform, 264, 277 in 𝒮(𝐑), 274 isomorphism of Hilbert spaces, 174 Isomorphism Theorem for Fourier Series, 174 for the Fourier Transform, 277 in 𝒮(𝐑), 274 isospectral, 315 Jacobi theta function, 296 L’Hôpital’s Rule, 63 Lagrange’s four squares theorem, 297 Langlands program, 318 Laplacian, 220 Laurent polynomial, 126

352 least upper bound, see also supremum Lebesgue Axioms, 169 Lebesgue integrable, 162 Lebesgue integral, 139, 148, 155, 156, 162, 163 axioms for, 162–167, 169 Legendre polynomials, 250 length (of an interval), 157 lim sup, 78 limit, 311 limit of a function, 36 𝜖-𝛿, 36 at infinity, 63 sequential, 36 limit point, 36 linear combination, 324 linear map, 213, 325 linear operator, see also operator linearly independent, 120, 325 Lipschitz function, 196 piecewise, 37, 196 local linear approximation, 41 local linearity, 41 locally compact, 317 locally integrable, 106 locally rectangular, 67 lowering operator, 255 𝑀-test, Weierstrass, 85, 86 matrix Hermitian, 313 unitary, 313 mean value, 292 Mean Value Theorem, 42 measurable function, 162 measure, 156, 158 measure zero, 156, 157 measurement (of a quantum state), 258, 291, 292, 312 metric, 14, 15, 117 𝐿∞ , 85, 120 𝐿2 , xii, 121 norm, 145 metric space, 15 modular form, 297 momentum operator, 291, 311 momentum space, 292

INDEX Montgomery-Odlyzko law, 315 multiresolution, 310 nonabelian, 316 norm, 13, 140, 144 𝐿1 , 145 𝐿2 , 140 𝐿∞ , 145 𝐿𝑝 , 163 inner product, 140 normed space, 144 𝑛th Term Test for Divergence, 77 Nyquist frequency, 302 Nyquist rate, 302 Nyquist sampling theorem, 302 Nyquist-Shannon sampling theorem, 302 observable, 257, 291, 312 odd extension, 134, 246 odd function, 133 open, 19, 26, 27 open cover, 27 countable, 157 open disc, 19, 25 open 𝑟-neighborhood, 25 operator, xii, 210, 211, 213 diagonal, 215 Hermitian, 213, 218 on 𝐂𝑛 , 210 positive, 213, 218–220 selfadjoint, 311 skew-Hermitian, 293 unitary, 274, 316 operator notation (Fourier transform), 271 order, total, 10 orderable, 10 orthogonal (vectors), 121, 141 orthogonal basis, 150, 154, 325 orthogonal set, 151 orthonormal, 121 orthonormal basis, 154 orthonormal set, 151 𝑝-series, 78 Parseval’s identity, 154, 174, 176 partial derivative, 67

INDEX partial differential equation (PDE), 203 hyperbolic, 241 parabolic, 232 partial fractions, 284 partial sums, 1, 74 synchronous, 76 partition, 46 standard, 46 Pass the Hat, 272 path, 42 segment, 42 path integral, 61 path-connected, 42 piecewise (property), 37 Planck’s constant, 311 plane wave, 207 Poisson summation, 281, 296 position operator, 291, 311 position space, 292 power series, 95 power spectrum, 193, 302, 303 projection, 141, 152 Pythagorean Theorem, 141, 151 quantization, 208, 258 quantum computing, 313 quantum harmonic oscillator, 210 qubit, 313 ℛ(𝑋), 118 radius of convergence, 95 raising operator, 255 rapidly decaying, 104, 137 Ratio Test, 77 real numbers, 10 real part (of a complex number), 14 rearrangement, 319 rectangle, 20, 67 refinement, 46 common, 46 relative error, 41 representation, 316 restricted sum, 324 Riemann hypothesis, 192, 315 Riemann integrable, 47 on 𝐑, 106 Riemann integral, 45, 47

353 improper, 106 convergence, 106 existence, 106 indefinite, 59 lower, 47 upper, 47 Riemann sum, 45 lower, 47 upper, 46 Riemann zeta function, 192, 315 Riemann-Lebesgue Lemma, 173 Riesz Representation Theorem, 216, 218 ring, commutative — with unity, 10 Rodrigues’s formula, 252 root test, 78 𝒮(𝐑), 104 𝑆 1 , 119 sampling theorem, 302 scalar, 323 Schrödinger’s equation, 210 Schwartz space, 104, 214, 264 separation of variables, 230, 235 sequence, 17 two-sided, 75 Sequential Criteria for Integrability, 47, 49, 50 series, 74 of functions, 80 two-sided, 75 shift operator, 215 signal processing, 306 simultaneous diagonalization, 226 six NO’s, 80 smooth function, 97 span, 120 spectrum, 311 continuous, 291, 312 discrete, 312 square root (of an operator), 221 Squeeze Lemma for functions, 37 for sequences, 21 standard deviation, 294 state function, 209, 257, 291 state space, 311

354 Sturm-Liouville equation, 229 Sturm-Liouville theory, 260 Sturmian operator, 259 regular, 259 singular, 259 subcover countable, 28 finite, 27 subinterval, 46 subsequence, 17 subspace, 117, 324 Sup Inequality Lemma, 12 support, 168 supremum, 10 synchronous sum, 76 tensor product, 312 term-by-term differentiation, 2, 91 timbre, 4 time series, 192, 302 tone, 4 topological group, 317 topology, 26, 27

INDEX point-set, 27 translation invariant, 179 trigonometric polynomial, 125 trivial zeros (of zeta function), 315 unitary representation, 316 variance, 294 vector, 117, 323 vector space, 323 wave equation, 203, 207 wavelets, 308 Weierstrass Approximation Theorem, 190 Wiener-Khinchin Theorem Continuous-time, 303 Discrete-time, 193 Wronskian, 259 Zero Derivative Theorem, 42 zero function, 117 zero vector, 118

TEXTBOOKS

Fourier Series, Fourier Transforms, and Function Spaces is designed as a textbook for a second course or capstone course in analysis for advanced undergraduate or beginning graduate students. By assuming the existence and properties of the Lebesgue integral, this book makes it possible for students who have previously taken only one course in real analysis to learn Fourier analysis in terms of Hilbert spaces, allowing for both a deeper and more elegant approach. This approach also allows junior and senior undergraduates to study topics like PDEs, quantum mechanics, and signal processing in a rigorous manner. Students interested in statistics (time series), machine learning (kernel methods), mathematical physics (quantum mechanics), or electrical engineering (signal processing) will ﬁnd this book useful. With 400 problems, many of which guide readers in developing key theoretical concepts themselves, this text can also be adapted to self-study or an inquiry-based approach. Finally, of course, this text can also serve as motivation and preparation for students going on to further study in analysis.