Mathematical analysis and its inherent nature 9781470428075, 1470428075

896 139 8MB

English Pages 348 [373] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Mathematical analysis and its inherent nature
 9781470428075, 1470428075

Table of contents :
Rebuilding the calculus building --
The real number system revisited --
Sequences and series of real numbers --
Limit and continuity of real functions --
Derivative and differentiation --
The Riemann integral --
Abstraction and generalization --
Basic theory of metric spaces --
Sequences in general metric spaces --
Limit and continuity of functions in metric spaces --
Sequences and series of functions --
Appendix --
Real sequences and series --
Limit and continuity of functions --
The concepts of derivative and differentiability --
The Riemann integral.

Citation preview

Sally

The

SERIES

Pure and Applied UNDERGRADUATE TEXTS

Mathematical Analysis and Its Inherent Nature Hossein Hosseini Giv

American Mathematical Society

25

Mathematical Analysis and Its Inherent Nature

Sally

The

Pure and Applied UNDERGRADUATE TEXTS • 25

SERIES

Mathematical Analysis and Its Inherent Nature Hossein Hosseini Giv

American Mathematical Society Providence, Rhode Island

EDITORIAL COMMITTEE Gerald B. Folland (Chair) Jamie Pommersheim

Steven J. Miller Serge Tabachnikov

2010 Mathematics Subject Classification. Primary 26A06, 54E35, 54E45, 54E50.

For additional information and updates on this book, visit www.ams.org/bookpages/amstext-25

Library of Congress Cataloging-in-Publication Data Names: Hosseini Giv, Hossein, 1983– Title: Mathematical analysis and its inherent nature / Hossein Hosseini Giv. Description: Providence, Rhode Island : American Mathematical Society, 2016. | Series: Pure and applied undergraduate texts ; volume 25 | Includes bibliographical references and index. Identifiers: LCCN 2016018812 | ISBN 9781470428075 (alk. paper) Subjects: LCSH: Calculus—Textbooks. | Mathematical analysis—Textbooks. | AMS: Real functions – Functions of one variable – One-variable calculus. msc | General topology – Spaces with richer structures – Metric spaces, metrizability. msc | General topology – Spaces with richer structures – Compact (locally compact) metric spaces. msc | General topology – Spaces with richer structures – Complete metric spaces. msc Classification: LCC QA303.2 .H67 2016 | DDC 515—dc23 LC record available at https://lccn.loc.gov/2016018812

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Permissions to reuse portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink service. For more information, please visit: http://www.ams.org/rightslink. Send requests for translation rights and licensed reprints to [email protected]. Excluded from these provisions is material for which the author holds copyright. In such cases, requests for permission to reuse or reprint material should be addressed directly to the author(s). Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the first page of each article within proceedings volumes. c 2016 by the American Mathematical Society. All rights reserved.  The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at http://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

21 20 19 18 17 16

To my wife Elham and my daughter Mobina with love and affection

Contents

To the Instructor

xi

To the Student

xiii

Introduction and Outline of the Book

xvii

Acknowledgments

xxi

Part 1. Rebuilding the Calculus Building Chapter 1. The Real Number System Revisited

3

1.1. The Algebraic Axioms

5

1.2. The Order Axioms

7

1.3. Absolute Value, Distance, and Neighborhoods

9

1.4. Natural Numbers and Mathematical Induction

12

1.5. The Axiom of Completeness and Its Uses

21

1.6. The Complex Number System

33

Notes on Essence and Generalizability

39

Exercises

41

Chapter 2. Sequences and Series of Real Numbers

47

2.1. Real Sequences, Their Convergence, and Boundedness

48

2.2. Subsequences, Limit Superior and Limit Inferior

66

2.3. Cauchy Sequences

74

2.4. Sequences in Closed and Bounded Intervals

76

2.5. Series: Revisiting Some Convergence Tests

78

2.6. Rearrangements of Series

90

2.7. Power Series

92 vii

viii

Contents

Notes on Essence and Generalizability

96

Exercises

97

Chapter 3. Limit and Continuity of Real Functions

103

3.1. Limit Points and Some Other Classes of Points in R

104

3.2. A More General Definition of Limit

112

3.3. Limit at Infinity

126

3.4. One-Sided Limits

130

3.5. Continuity and Two Kinds of Discontinuity

136

3.6. Continuity on [a, b]: Results and Applications

142

3.7. Uniform Continuity

149

Notes on Essence and Generalizability

151

Exercises

152

Chapter 4. Derivative and Differentiation 4.1. The Why and What of the Concept of Derivative

159 160

4.2. The Basic Properties of Derivative

168

4.3. Local Extrema and Derivative

172

4.4. The Mean Value Theorem: More Applications of Derivative

175

4.5. Taylor Series: A First Glance

181

4.6. Taylor’s Theorem and the Convergence of Taylor Series

185

Notes on Essence and Generalizability

189

Exercises

190

Chapter 5. The Riemann Integral

193

5.1. Motivation: The Area Problem

194

5.2. The Riemann Integral: Definition and Basic Results

197

5.3. Some Integrability Theorems

214

5.4. Antiderivatives and the Fundamental Theorem of Calculus

220

Notes on Essence and Generalizability

228

Exercises

228

Part 2. Abstraction and Generalization Chapter 6. Basic Theory of Metric Spaces

235

6.1. A First Generalization: The Definition of Metric Space

239

6.2. Neighborhoods and Some Classes of Points

245

6.3. Open and Closed Sets

255

6.4. Metric Subspaces

262

6.5. Boundedness and Total Boundedness

265

Notes on Essence and Generalizability

269

Contents

Exercises

ix

270

Chapter 7. Sequences in General Metric Spaces 7.1. Convergence and Divergence in Metric Spaces 7.2. Cauchy Sequences and Complete Metric Spaces 7.3. Compactness: Definition and Some Basic Results 7.4. Compactness: Some Equivalent Forms 7.5. Perfect Sets and Cantor’s Set Notes on Essence and Generalizability Exercises

275 275 284 287 290 294 296 296

Chapter 8. Limit and Continuity of Functions in Metric Spaces 8.1. The Definition of Limit in General Metric Spaces

299 299

8.2. Continuity and Uniform Continuity 8.3. Continuity and Compactness 8.4. Connectedness and Its Relation to Continuity 8.5. Banach’s Fixed Point Theorem Notes on Essence and Generalizability Exercises

302 307 310 314 316 317

Chapter 9. Sequences and Series of Functions 9.1. Sequences of Functions and Their Pointwise Convergence 9.2. Uniform Convergence 9.3. Weierstrass’s Approximation Theorem 9.4. Series of Functions and Their Convergence Notes on Essence and Generalizability Exercises

319 319 323 328 332 333 334

Appendix Real Sequences and Series

337 337

Limit and Continuity of Functions The Concepts of Derivative and Differentiability The Riemann Integral

339 340 340

Bibliography

343

Index

345

To the Instructor

This book has been written with the belief that emphasis on the essence or inherent nature of a mathematical discipline helps students to understand it much better. With this in mind and based on the essence of undergraduate mathematical analysis, the text is divided into two parts. The first part describes those aspects of analysis which complete a corresponding area of calculus theoretically, while the second part concentrates on the way analysis generalizes some aspects of calculus to a more general framework. Presenting the contents in this way has an important advantage: The students first learn the most important aspects of analysis on the classical space R and fill in the gaps of their calculus-based knowledge, and then proceed to a step-by-step development of an abstract theory, namely, the theory of metric spaces. Moreover, with this approach, the students understand which aspects of classical analysis are generalizable to the abstract context and which ones are not. To help them in this direction, each chapter is concluded with an unnumbered section entitled “Notes on Essence and Generalizability”. Based on the different missions that the two parts of the book have, it is quite natural to see differences in the way one is expected to teach them and in the way the material is presented in these parts. The difference in the teaching method is an obvious one, namely, the difference between the teaching of classical and abstract mathematics. Regarding the presentation, there is an important difference which should be mentioned here. Since the material of Part 1 is related to notions that are already known to the students, such as real numbers, sequences and series, etc., each chapter in this part begins with questions that motivate the presentation. These questions aim to show that the students’ knowledge obtained from calculus has some gaps which should be filled in the relevant chapter. In the chapters of the second part, however, questions of this kind cannot be found. This is, evidently, because the students have no acquaintance with what is

xi

xii

To the Instructor

going to be developed. Instead, the chapters begin with arguments that motivate the generalizations we wish to present in the sequel. There are some points at which the presentation differs considerably from that of most of the existing books, although there is no claim of novelty. We list them below. • Compactness is first introduced in the form of sequential compactness, and after some primary treatments, the open-cover definition is presented as an equivalent formulation, together with some other relevant conditions in Section 7.4. This is because we always thought that sequential compactness is easier to motivate than the usual definition of compactness. It is however emphasized that we will frequently use the open-cover definition when it seems to be more appropriate. • Closed and open subsets of metric spaces are first defined in terms of the inclusion or exclusion of the boundary points, and characterizing them in terms of limit and interior points, respectively, is done later. This has some benefits, at least as we believe. First, open and closed sets are defined in terms of a single notion, namely, the concept of boundary point. Next, the boundary definition of such sets can be more intuitively related to the labels closed and open. • Although perfect sets could be defined along with closed sets, we postponed their definition to Section 7.5, where necessary progress is made in the theory in order to treat them more efficiently. Cantor’s set is also presented in that section, as an important instance of a set which is both perfect and compact. • Banach’s fixed point theorem is presented at the end of Chapter 8 to emphasize that even an abstract theory may be used in applications. • The concept of connectedness is introduced in Chapter 8, where we try to generalize the intermediate value theorem to the metric space context. This is because we felt that the students understand the importance of connectedness much better at this point. I am sure that your comments will help me in improving the content and style of the text. So, please kindly send your comments to [email protected] or [email protected]. Hossein Hosseini Giv March 2016 Zahedan, Iran

To the Student

Mathematics is often thought to be one of the most difficult fields of study. Perhaps every mathematics major once heard sentences like these: “So you study mathematics. How can you understand it? I think math is horrible!” What do you think about the reason for this difficulty? What makes mathematics into such a horrible discipline? One possible answer, which is perhaps the same as yours, uses the special features of mathematics textbooks. Every such book contains numerous definitions and theorems to be learned and understood, and many difficult problems to be solved. But, is this the right answer? We think not! Everything should be presented in some way, and what we just discussed is the best way of presenting mathematics. So, what is the problem? We believe that the most important reason for failure in mathematics is the lack of mathematical insight. When you see someone who cannot describe the essence of analytic geometry after passing a course on it, you have met someone who has poor mathematical insight. If a student fails to interpret mathematical theorems in his own words, you can be sure that his mathematical insight is still at low levels. When you study a mathematical text without any insight, you will see that it is a messy collection of complicated symbols and formulas. To strengthen your mathematical insight, you can use the following simple instructions. (1) Think about the essence of every mathematical discipline you are expected to learn. If, for example, your instructor introduced a book entitled Calculus with Analytic Geometry for your course, ask him or her about the essence of calculus and about that of analytic geometry. Try to relate this new course to the ones you have passed before. Studying a course without knowing its essence is like walking aimlessly through an endless desert! (2) If necessary, try to interpret mathematical theorems in your own words. To understand how, consider the following theorem, which you probably know: xiii

xiv

To the Student

If x and y are real numbers with x < y, then there exists some rational number q such that x < q < y. In mathematics, theorems are usually stated in this way. Clearly, remembering theorems in their formal appearance may be confusing. For this reason, it is better to find informal interpretations, if possible. For example, we may keep the above theorem in mind as follows. Between any two real numbers, a rational number can be found. Another less obvious instance of such interpretations is the way we may interpret the following rule of elementary mathematical logic, known as the law of disjunctive syllogism. This is written symbolically as (p ∨ q)∧ ∼ p ⇒ q. Here, p and q are statements, ∧ and ∨ are the connectives of conjunction and disjunction, respectively, and ∼ p is the negation of p. The usual reading of this law is as follows. p or q, and the negation of p, imply q. But, this way of thinking tells us nothing special about the statement. A better way of interpreting the law is to say: If at least one of the statements p and q is true, and it is known that p is not true, then q must be true. Noticing the logical law in this way reveals that it is quite natural and can be accepted by any thoughtful person. Compare this with the above symbolic representation, which may be confusing at first glance. Of course, it should be noted that the above sentence is an informal interpretation. In fact, when we write p in mathematical logic, we do not mean that p is indeed true; it is understood that p has one of the truth values true or false. The same can be said for ∼ p and q. To help you in this direction, we presented the informal interpretation of many results in boxes as shown below. What does Theorem ... say? Theorem . . . says that . . . . Such boxes are sometimes used to discuss the important points and notices on a result or example.

Our aim in this book is to present the most important ideas of undergraduate mathematical analysis using an insightful approach. To this end, we tried, in view of the above discussion, to describe the essence of analysis and also to present informal interpretations of results when it appeared to be necessary.

To the Student

xv

We assume that you have already passed a course on one-variable calculus, and that you are particularly familiar with the well-known transcendental functions. Some familiarity with the basics of set theory and such logical rules as the contrapositive law is also assumed. The necessary background for answering the questions posed in the beginning of Chapters 2–5 is collected in a brief appendix at the end of the book. To start our work, we describe the essence of undergraduate analysis in an introductory section.

Introduction and Outline of the Book

What is Mathematical Analysis? Usually, mathematical disciplines are defined in terms of the problems they concern, and the way experts think about them. We follow this tradition to answer the above question. Mathematical analysis, in the sense contained in the undergraduate mathematics curriculum, is often referred to as generalized calculus. But, as you will see in this book, analysis is more than a mere generalization of calculus. First of all, it is true that analysis studies some ideas of calculus in more general frameworks. An instance is the abstract definition of metric space, which generalizes the usual notion of distance from R (the set of real numbers) to arbitrary sets. We will study this generalization in Chapter 6, but is analysis restricted to such extensions? The answer is an emphatic NO! There are analytic arguments which aim, instead, at completing a corresponding part of calculus theoretically. Of course, when you generalize a theory, you make it more complete; but you can complete a theory without any generalizations, that is, without going to more general frameworks. This can be clarified by comparing the way calculus and analysis treat the real numbers. In calculus one uses the real numbers extensively. The numbers arise naturally in the definition of real functions, their limit, continuity, derivative, and integral. They are also the cornerstones of the theory of real sequences and series. Nevertheless, there are important aspects of the real number system which are usually neglected in calculus texts. The following are instances of such aspects. (1.a) Why do we use the real numbers in calculus? Can we utilize the rational numbers or some more elementary number system instead of them? (1.b) Given x, y and z in R, how do you prove that x + z = y + z implies x = y? How can you deduce xz < yz from x < y and z > 0? xvii

xviii

Introduction and Outline of the Book

(1.c) Let A ⊂ R be such that for some a ∈ R, x ≤ a holds for every x ∈ A. Is there a smallest a with this property? (1.d) How can you prove that between any two real numbers a rational number can be found? Analysis enables us to answer these questions. The properties of the real numbers, asked to be proved in (1.b), are usually stated without proof in calculus texts. Analysis provides a couple of basic rules for the addition of real numbers, known as the axioms of addition, from which many important facts, such as the cancellation law in (1.b), can be deduced. These include the rules of commutativity, associativity, the existence of neutral element, and the existence of additive inverses. It also gives a similar collection of basic rules for multiplication of real numbers, referred to as the axioms of multiplication. The second part of (1.b) can be answered using some basic rules that govern the order relation 0 and y ∈ R there exists a positive integer n with nx > y. We will answer the four questions in Chapter 1. What is important at this stage is to realize that, analysis aims at strengthening calculus, and this is done in two ways: completing calculus theoretically and extending it to more general frameworks. If you think that this twofold role of analysis is still ambiguous, please do not worry. This entire book is written to help you understand it! Outline of the Book Since our aim is to explain the essence of mathematical analysis, we present the material in two parts to emphasize the twofold role of analysis mentioned above. The first part of the book, entitled “Rebuilding the Calculus Building”, describes those analytic issues which are developed for completing calculus theoretically. This part consists of five chapters which we now briefly discuss. The first chapter begins with a simple argument that demonstrates the necessity of working with the real numbers in calculus and analysis. We then try to complete our knowledge of the real numbers by presenting the basic real number properties, known as axioms, from which all the other rules and properties can be derived. As we mentioned before, these include the field or algebraic axioms, the order axioms, and the axiom of completeness. In each case, we show the way one may use the axioms to prove other rules or properties. An important result in this chapter, which is a consequence of the axiom of completeness, is the Archimedean property. We

Introduction and Outline of the Book

xix

will also discuss the suprema and infima of sets of real numbers and introduce the extended real number system. Finally, complex numbers are introduced to remove some algebraic difficiencies of the real number system. In the second chapter, we begin with some convergence theorems for real sequences. Then we introduce Cauchy sequences and show that the convergence of a sequence is equivalent to the truth of Cauchy’s criterion. We then discuss the limit superior and limit inferior of real sequences and apply them to the study of convergence and divergence problems. Next, we turn to infinite series of real numbers and use these limits to prove the strengthened version of some convergence tests, such as the root and the ratio tests. Then we study the absolute convergence and rearrangements of series in detail, as these are often discussed superficially in calculus. Finally, we study power series in a hope to use them later in Chapter 4 in connection with Taylor series. In Chapter 3 we study the limit and continuity of real functions on a theoretically more advanced level than that of calculus books. In particular, we prove all important theorems of the theory, including the intermediate and extreme value theorems. The material presented in this chapter provides us with necessary motivation for the definition of many important concepts of Part 2. Chapter 4 is devoted to the derivative and differentiation of real functions. Besides the important results of the differential calculus, such as the mean value theorem, we will prove Taylor’s theorem and discuss Taylor series in this chapter. The problems and examples we present in this chapter are of a higher level than those of standard calculus texts. Finally, in Chapter 5, we study the Riemann integral. The definition of the Riemann integral, presented in most calculus texts, is somewhat vague. This chapter enhances our knowledge of the Riemann integral via a mathematically solid definition of the integral, due to Darboux, that uses suprema and infima to be stated. Based on this precise definition, we present a complete theory for the integral, including the complete proof of the fundamental theorem of calculus. In the second part, entitled “Abstraction and Generalization”, we present those aspects of analysis which extend a relevant part of calculus to some general framework. The chapters included in this part are as follows. In Chapter 6, the basic theory of metric spaces is investigated. At the starting point, we use the usual distance function defined on the real line to motivate our definition of metric and metric spaces. We emphasize that the definition of metric spaces is a good instance of abstraction. We then discuss the various metric space concepts in detail, including open and closed sets, the interior and closure of sets, and so on. Chapter 7 is devoted to the study of sequences in general metric spaces. We first recall that the notion of distance in the real line allows the definition of convergence for real sequences, and we then use this definition as a prototype for the definition of convergence in the general context of metric spaces. We also introduce Cauchy sequences in metric spaces and examine their role in the proof of convergence results. The concept of compactness, one of the most important notions of metric space

xx

Introduction and Outline of the Book

theory, is also defined in this chapter. It is emphasized that the notion of series is not generalizable to the context of metric spaces. In Chapter 8 the limit and continuity of functions, which are defined between metric spaces, is discussed. Just as we reached from convergence in the real line to the convergence of sequences in metric spaces, we can go from continuity of real functions to that of functions defined on metric spaces (the definition of limit in metric spaces is a little more difficult). As our main observation in this chapter, we examine the relation of continuity to compactness and connectedness. The relation between continuity of functions and their impact on convergent sequences is also examined. It is finally emphasized that the derivative may not be defined for functions defined on general metric spaces. Finally, in Chapter 9, the sequences and series of real-valued functions are studied. This chapter begins with the general definition of a sequence of realvalued functions defined on a set X. Then pointwise and uniform convergence of such sequences and their associated series is introduced. Next, we consider the case where X is a metric space and discuss the effect of continuity of the terms of the sequence on that of their limit function. When X is the set of real numbers, the relation between Riemann integrability of the terms to that of the limit function is explained. We conclude this chapter with a discussion of the Stone–Weierstrass theorem and a study of series of functions. Prerequisites We assume that the reader has passed a course in one-variable calculus, and an elementary course on the foundations of mathematics. This latter should include familiarity with countably infinite and uncountable sets, the principle of mathematical induction, the well-ordering principle, and such logical rules as the contrapositive law; we will present these two principles in Chapter 1. We also assume that the reader is already familiar with the sets N, Z, Q, and R of natural, integer, rational and real numbers, respectively.

Acknowledgments

There is no doubt that writing a book for the AMS is a great opportunity for any mathematician who wishes to write influential books. As a (relatively!) young mathematician, I owe this opportunity to Ina Mette, Senior Editor of the AMS Book Program, and the Editorial Committee of the AMS Pure and Applied Undergraduate Texts series. I hereby express my deep gratitude to them. I am particularly indebted to Ina Mette for her unfailing support. It was a wonderful experience for me to work with such a great editor. I am also thankful to the anonymous referees and to Professor Stephen Kennedy for their valuable comments on earlier versions of this book. Marcia Almeida, as Ina Mette’s assistant, helped me a lot by her promptness in answering my queries and by her timely management of the official affairs. Barbara Beeton of the AMS Author Support Team assisted the book project by her detailed answers to my technical queries. I really appreciate her generosity in sharing valuable experiences. My dear colleague and friend Dr. Seyed Alireza Ahmadi took and prepared the photo that appeared on the back cover of the book. I would like to thank him for doing this with all possible care. I am thankful to Jennifer Wright Sharp, production editor of the book, for her careful reading of the final manuscript and the helpful comments she made to enhance the writing of the text. Finally, I would like to thank my wife for her support and patience during the time I was preparing this book.

xxi

Part 1

Rebuilding the Calculus Building

Chapter 1

The Real Number System Revisited

In calculus one uses the real numbers and their basic properties extensively. For this reason, a discipline which tries to complete calculus theoretically, that is mathematical analysis, should study the set R of real numbers at its starting point. Before starting our careful examination of the real numbers, let us answer our question (1.a) posed in the Introduction. Why do we use the real numbers in calculus? Can we utilize the set Q of rational numbers or some more elementary number system instead of R? The following simple argument answers these questions. Let P be a polynomial defined on R, and let a and b be real numbers with a < b. By the intermediate value theorem, which is applicable because of the continuity of P on [a, b], for every real number c between P (a) and P (b) there exists x ∈ (a, b) with P (x) = c. What happens if we replace R by Q? In fact, let P be a polynomial with rational coefficients, and let a, b ∈ Q satisfy a < b. Is it necessarily true that for every rational number c between P (a) and P (b) some x ∈ Q ∩ (a, b) can be found with P (x) = c? The polynomial P (x) = x2 − 2 shows that the answer is negative: Although 0 lies between P (0) and P (2), no rational number x satisfying P (x) = 0 can be found. This is an outgrowth of our first assertion. Proposition 1.1. For every rational number x, x2 = 2. Proof. Assume to the contrary that a rational number x satisfies x2 = 2. We may assume without loss of generality that x = m/n, where m and n = 0 are relatively prime integers. Since m2 = 2n2 by the assumption x2 = 2, m2 is even. This shows that m is also even, for otherwise m2 would be odd. If m = 2k for some integer k, then the equality m2 = 2n2 yields n2 = 2k2 , showing that n2 is even. Thus we conclude that n is also even, meaning that 2 is a common divisor of m and n. This contradicts our assumption that m and n are relatively prime.  The existence of a positive real number whose square is equal to√2 follows√ from Theorem 1.59 below. As we know, the square of the real numbers 2 and − 2 is equal to 2. By Proposition 1.1, these numbers are irrational. 3

4

1. The Real Number System Revisited

As a result of the above discussion, we find that if we work with the rational numbers instead of the real ones, polynomials may fail to have the intermediate value property. It is for this reason and some others, which we will see gradually, that we prefer to work with the real numbers in calculus and, accordingly, in analysis. Now that we see a simple manifestation of the necessity of working with the real numbers, it is time to start a careful examination of them. We begin with the natural question of what the real numbers are. We may answer this question in two different ways. In fact, we may either: (1) assume familiarity with some simpler number system, such as Q or Z, the set of integers, and then construct the real numbers using them; or (2) assume that the real numbers are known mathematical objects and develop their theory on the basis of some basic facts whose truth is assumed without proof. The first approach is chosen, for example, in Walter Rudin’s celebrated book Principles of mathematical analysis, where the real numbers are constructed from the rational ones via the so-called Dedekind cuts. But, as we mentioned in (1), this choice amounts to the assumption that our readers are already familiar with the rational numbers and their basic properties. For this reason, we prefer to choose the second approach. We will therefore assume that the reader already knows what the real numbers are, and we will just state, without proof, some basic assertions about them. An assertion whose truth is accepted without proof is usually known as an axiom. Thus, we have chosen an axiomatic approach to the introduction of the real numbers. Now, our only task is to choose the right collection of axioms for real numbers. But, how can we find such a collection of axioms? What are the most important features of our desired collection? Clearly, an appropriate collection of real number axioms should be composed of those which are • consistent, meaning that no one of the axioms can be negated by a collection of another; • independent, in the sense that no one of the axioms may be deduced from a collection of some others; and • exhaustive, meaning that any other real number property must be easily proved using them (and perhaps using some simpler properties whose truth is established previously) on the basis of the logical rules of reasoning. These features can be summarized as follows. The collection of axioms should be the minimal set of consistent axioms from which all other rules or properties of real numbers can be deduced. Certainly, finding such a collection of axioms involves a trial-and-error process. Accordingly, we will not discuss the way analysts found the appropriate real number axioms. We will only state them and then observe their strength in proving other familiar properties. The first group of our axioms is introduced in the next section.

1.1. The Algebraic Axioms

5

1.1. The Algebraic Axioms As we all know, any two real numbers can be added to and multiplied by each other. For this reason, our real number axioms should include a collection of algebraic axioms that describe the basic properties of addition and multiplication. If we denote by x + y and xy the addition and multiplication of real numbers x and y, respectively, then the desired axioms are as follows. (A-1) For arbitrary real numbers x and y, x + y and xy are also real numbers. (A-2) The operations of addition and multiplication are commutative in the sense that x + y = y + x and xy = yx for all x, y ∈ R. (A-3) Addition and multiplication are associative, meaning that for arbitrary x, y, and z in R, x + (y + z) = (x + y) + z and x(yz) = (xy)z. (A-4) There are real numbers 0 and 1, the neutral elements of addition and multiplication, respectively, such that x + 0 = x and 1x = x for every x ∈ R. (A-5) For any real number x, a real number −x, called the additive inverse of x, exists such that x + (−x) = 0. For any x = 0, a real number 1/x, referred to as the multiplicative inverse of x, exists such that x (1/x) = 1. (A-6) Multiplication is distributive with respect to addition, that is, x(y + z) = xy + xz for every x, y and z in R. What does the associativity axiom say? When we add three real numbers together, we first add two of them and then add the third one to the resulting number. The axiom (A-3) says that in this situation, our choice of the first two numbers is not important. The assertion that multiplication is associative can be interpreted similarly. If x is an arbitrary real number and n ∈ N (the set of natural numbers or positive integers), we denote by xn the product x · · · x and by nx the summation x + · · · + x, where x appears n times in each case. The number xn is called the nth n power of x. Moreover, we define x−n = (1/x) for every x = 0. Each of the axioms (A-1)–(A-5) has two parts, one associated with addition and one related to multiplication. In what follows, we will refer to these parts as the additive and multiplicative parts of the axioms, respectively. Sometimes, we will also speak of the axioms of addition and the axioms of multiplication to indicate the additive and multiplicative parts of (A-1)–(A-5), respectively. As a matter of fact, when the existence of some mathematical object is asserted, the most natural question is to determine whether it is unique or not. As you may know, the real number system has a unique neutral element for addition, and every x ∈ R has a unique additive inverse. In the following example we try to prove these facts on the basis of our axioms. To make our notation simpler, we write x − y instead of x + (−y) and so on.

6

1. The Real Number System Revisited

Example 1.2. Use the axioms of addition to show that (1) the neutral element of addition is unique, and (2) the additive inverse of any real number x is unique. Solution. (1) We should show that the truth of (1.1)

x+y =x

for every x ∈ R implies y = 0. It can be proved, however, that the truth of (1.1) for only one x ∈ R yields y = 0. Indeed, if (1.1) holds for some x ∈ R, then y = y + 0 = y + (x − x) = (y + x) − x = x − x = 0. Here we used (A-4), (A-5), (A-3), our assumption (1.1) together with (A-2), and (A-5), respectively, in the equations. (2) If y is such that x + y = 0, then y = y + 0 = y + (x − x) = (y + x) − x = 0 − x = −x. It is interesting that all familiar arithmetical properties of the real numbers can be deduced from the algebraic axioms. The following example illustrates this fact. Example 1.3. Prove that for arbitrary real numbers x, y, and z, (1) x + z = y + z implies x = y, (2) xz = yz implies x = y if z = 0, (3) −(−x) = x, (4) 0x = 0, and (5) xy = 0 implies x = 0 or y = 0. Solution. (1) In view of the axioms of addition, x = x + (z − z) = (x + z) − z = (y + z) − z = y + (z − z) = y. (2) The proof is similar to that of (1), except that we use the axioms of multiplication to deduce       1 1 1 1 x=x z = (xz) = (yz) =y z = y. z z z z (3) Since −(−x) is the additive inverse of −x, −x − (−x) = 0. Thus −x − (−x) = −x + x, which by Example 1.3(1) implies −(−x) = x. (4) Using (A-6), we can write 0x = (0 + 0)x = 0x + 0x. This gives us (1.1) with x and y replaced by 0x, from which it follows that 0x = 0. (5) If xy = 0 and y = 0, then x = x (y(1/y)) = (xy)(1/y) = 0(1/y) = 0, by (4). Similarly, we can show that xy = 0 and x = 0 imply y = 0. This gives us the desired result.

1.2. The Order Axioms

7

A note on Example 1.3. Items (1) and (2) in Example 1.3 are known as cancellation laws. Notice that by proving (1), we answered the first part of our question (1.b) posed in the Introduction. It is also important to note that once a new result is established, it can be used alongside the axioms in the proof of some other consequences. This is the reason we used (1) and (4) in the proof of (3) and (5), respectively. Also, as the proof shows, the assumption z = 0 is essential in (2). In fact, if z = 0, x = 1, and y = 2, then xz = yz = 0 by (4), while x = y. We conclude this section with a simple exercise. Exercise 1.4. Let x and y be real numbers. (1) Assume further that x = 0, and use the axioms of multiplication to show that • xy = x implies y = 1, • xy = 1 yields y = 1/x, and • 1/(1/x) = x. (2) Use the algebraic axioms to prove that • x(−y) = −(xy) = (−x)y, and • (−x)(−y) = xy.

1.2. The Order Axioms Up to now we answered question (1.a) and the first part of question (1.b) posed in the Introduction, but how can we answer the second part of (1.b)? More precisely, how can we deduce xz < yz from x < y and z > 0? To answer questions like this we need a couple of basic rules that govern the order relation < on R. We assume that the reader already knows the order 0 and y > 0, then xy > 0.

8

1. The Real Number System Revisited

A note on the order axioms. When we work on a particular set in mathematics, and particularly in analysis, the relation of any newly defined mathematical structure to its preceding ones should be determined. The axioms (O-3) and (O-4) illustrate this rule because they determine the relation of the order < to the operations of addition and multiplication. Sometimes, we write y > x (y is greater than x) instead of x < y (x is less than y). Also, if we know that x is not greater than y, then in view of (O-1) we must have either x < y or x = y. We agree to write x ≤ y in this situation (x is less than or equal to y). Similarly, we write x ≥ y for the negation of x < y. Example 1.5. Prove that x < y and y ≤ z imply x < z. Solution. Since y ≤ z, we have either • y = z, in which case we deduce x < z from our assumption x < y, or • y < z, which together with the assumption x < y and (O-2) gives x < z. If x > 0 (resp., x < 0), we say that x is positive (resp., negative). Since x ≥ 0 is the negation of x < 0, we read it as “x is nonnegative”. With these conventions, the axiom (O-4) says that if x and y are positive real numbers, then xy is also positive. Exercise 1.6. If x and y are nonnegative real numbers, prove that xy and x + y are also nonnegative. Example 1.7. Let x and y be real numbers. (1) Show that x > y if and only if x − y is a positive real number. (2) If x < y +  for every  > 0, prove that x ≤ y. Solution. (1) If x − y > 0, add y to the both sides of this inequality and use (O-3) to obtain x > y. If this last inequality holds, add −y to the both sides and use (O-3) to find x − y > 0. (2) If we assume that x is greater than y, then x − y > 0 by (1). So, letting  = x − y we find the contradiction x < y + (x − y) = x. Similarly, it can be shown that x > y if and only if y − x is negative. Thus, x − y is positive if and only if y − x is negative. Since y − x is the additive inverse of x − y, this leads us to the following more general result. Exercise 1.8. Prove that a real number x is positive if and only if its additive inverse −x is negative. Exercise 1.9. Observe that x ≥ y if and only if x−y is a nonnegative real number. Then, prove that when a and b are nonnegative real numbers, a ≤ b holds if and only if a2 ≤ b2 . Is this result true when at least one of a and b is negative?

1.3. Absolute Value, Distance, and Neighborhoods

9

Now, we are ready to answer the second part of question (1.b) posed in the Introduction. See item (1) of the following example. Example 1.10. Let x, y, and z be real numbers. Show that (1) if x < y and z > 0, then xz < yz; and (2) if x < y and z < 0, then xz > yz. Solution. (1) Since y − x > 0 and z > 0, (O-4) tells us that z(y − x) > 0. Hence zy − zx > 0 by the distributivity axiom (A-6), and this is our desired result. (2) We have y − x > 0 and −z > 0. Thus −z(y − x) > 0 by (O-4). This implies zx − zy > 0, which is the desired result. We leave the verification of some other properties as exercise. Exercise 1.11. If x is a nonzero real number, prove that x2 is positive. Deduce in particular that 1 > 0. Also, verify that x2 = 0 if and only if x = 0. If x and y are both positive or both negative, then prove that x < y implies 1/y < 1/x.

1.3. Absolute Value, Distance, and Neighborhoods Sometimes, it is necessary to make a positive number from a negative one. To understand when this happens, recall the geometric interpretation of real numbers as points of an axis: The real numbers are in one-to-one correspondence with the points of an axis known as the real line, depicted in Figure 1.

Figure 1. The origin is assigned to 0, x is negative, and y is positive.

This geometric understanding allows us to think of the distance between the real numbers x and y as that of their corresponding points on the axis. In the special case x = −1 and y = 0, the distance is obviously equal to 1. If we wish to find the distance between x = −2 and y = 0, we obtain a distance of two units. Similarly, if n is an arbitrary natural number, then the distance between x = −n and y = 0 is n. Here, n can be obtained from −n, the original number we are interested in, by removing the minus. The above process of “removing the minus” can be easily applied to noninteger numbers. For example, we can make the positive number 1.6 from the negative one −1.6. The operation that gives us the positive part of negative numbers is known as absolute value. Clearly, the positive part of a positive number must be the number itself, while that of 0 should be 0 itself, because 0 has no positive part at all. Definition 1.12. Let x be a real number. The absolute value of x, denoted by |x|, is defined to be x itself when x ≥ 0, and is equal to −x when x < 0. Thus, | − 2| = 2, |1 + π| = 1 + π, and so on. It follows from the definition that x ≤ |x| and |x|2 = x2 for every x ∈ R.

10

1. The Real Number System Revisited

Example 1.13. If a and b are arbitrary real numbers, prove that |ab| ≤

1 2 (a + b2 ). 2

Solution. First note that by the axioms (A-6) and (A-2), (a − b)2 = a2 + b2 − 2ab and (a + b)2 = a2 + b2 + 2ab hold for all a, b ∈ R. Since (a − b)2 and (a + b)2 are both nonnegative, we find that 1 2 (a + b2 ) ≥ ab 2

(1.2) and

1 2 (a + b2 ) ≥ −ab. 2

(1.3)

But, |ab| is one of the quantities ab and −ab. Hence we obtain the desired inequality from (1.2) and (1.3). With our definition of the absolute value, the distance between −n and 0 is | − n − 0| = | − n| = n, which we already knew. This observation leads us to the following definition. Definition 1.14. The distance between real numbers x and y is defined to be |x − y|. For example, the distance between −1 and 2 is | − 1 − 2| = | − 3| = 3. This can be also obtained from |2 − (−1)| = |3| = 3. The following proposition presents those basic properties of the absolute value which will be of frequent use. Proposition 1.15. If x and y are arbitrary real numbers, then (1) |x| ≥ 0, and |x| = 0 if and only if x = 0; (2) |xy| = |x||y|, and in particular, | − x| = |x|; and (3) |x + y| ≤ |x| + |y|. Proof. We only prove (3), which is known as the triangle inequality. The proof of the other parts is left to the reader. For (3), it is enough to prove that the inequality |x + y|2 ≤ (|x| + |y|)2 holds. To see this, we write |x + y|2 = (x + y)2 = x2 + 2xy + y 2 = |x|2 + 2xy + |y|2 ≤ |x|2 + 2|x||y| + |y|2 = (|x| + |y|)2 .



1.3. Absolute Value, Distance, and Neighborhoods

11

A note on Proposition 1.15. In general, it is worthwhile to think about the necessity of considering any statement you encounter in mathematical theorems and propositions. For example, why should we care about items (2) and (3) of Proposition 1.15? One possible answer is that the items determine the relation of the absolute value to the operations of addition and multiplication, respectively. Moreover, we can interpret (2) as follows: the absolute value of the product of two real numbers is the product of the absolute values of the numbers. Exercise 1.16. If a is a positive real number, prove that (1) |x| < a if and only if −a < x < a; and (2) |x| > a if and only if x > a or x < −a. Then, state and prove appropriate versions of these results for ≤ and ≥. The Euclidean Distance Function de . It is appropriate for our destinations in the second part of the book to introduce the Euclidean distance function de as a function from the Cartesian product R × R into R via de (x, y) := |x − y|. This is called a distance function as it is a function which assigns to each pair x and y of elements of R their distance |x − y|. It is called Euclidean because it gives us the distance which is consistent with the Euclidean geometry we learned in high school, namely, the one in which the distance between two points is the length of the line segment that joins them together. In most of the remaining chapters, de will play a decisive role. In the next two chapters, de will help us to develop a solid theory for real sequences and series, and for the limit and continuity of real functions, respectively. The distance function will also serve as a prototype for the notion of distance in generic sets, which will be defined in Chapter 6. For now, we state the most important properties of de in a proposition whose proof is left as exercise. Proposition 1.17. If de is defined as above on R × R, then the following hold. (1) For all x, y ∈ R, de (x, y) ≥ 0. (2) The distance de (x, y) is 0 if and only if x = y. (3) For all x, y ∈ R, de (x, y) = de (y, x). (4) For all x, y, and z in R, de (x, y) ≤ de (x, z) + de (z, y). The fourth property is known as the triangle inequality, at least because it follows from the triangle inequality for the absolute value. We will present a better justification of this name in Chapter 6. The proof of Proposition 1.17 is based on the known properties of the absolute value. We will interpret the four properties in a broader context in a few first pages of Chapter 6. For now, try to interpret them in your own words. For example, item (2) says that the distance between x and y is 0 if and only if x and y are represented by the same point, or that they are equal as real numbers.

12

1. The Real Number System Revisited

Neighborhoods. Perhaps the most important consequence of the existence of de as a distance function is that it enables us to think of neighborhoods. To understand what is meant by a neighborhood, try to answer this simple question: Which people do you consider to be your neighbors? Certainly, your answer involves the notion of distance, as you probably said something like this: Our neighbors are those people whose home is near ours. Here, the word near is ambiguous, and everyone has a definition of which home is near and which one is far. Nevertheless, every family understands which homes, and more precisely which families, lie in its neighborhood. So, if we personalize numbers and think of them as human beings, we may think of neighborhoods of real numbers. Definition 1.18. A neighborhood of a real number x is the set of all real numbers whose Euclidean distance from x is less than a prescribed value ε > 0. If we want to emphasize ε, we speak of the ε-neighborhood of x. Such a neighborhood is denoted by Nε (x). More precisely, Nε (x) is the set {y ∈ R : de (y, x) < ε} or the open interval (x − ε, x + ε) = {y ∈ R : x − ε < y < x + ε} depicted in Figure 2 below.

Figure 2. The ε-neighborhood Nε (x).

1.4. Natural Numbers and Mathematical Induction Although we assume that you are already familiar with real numbers and, in particular, with natural numbers, it will be instructive to see the way the existence of natural numbers follows from the algebraic axioms. In more formal treatments of the real number system, one assumes that mathematical objects, called real numbers, exist which satisfy axioms (A-1)–(A-6), (O-1)–(O-4), and axiom (C) of the next section, and then one tries to identify them on the basis of these properties. Finding a description of N is usually the first step in this process. Another important step is to identify real numbers as decimal expansions, which is sketched in Exercise 21 of the Exercise section at the end of this chapter. For now let us recall our understanding of the set N as a subset of R, and then we will try to formalize this using the axioms. The elements of N are all obtained from 1 via addition: 2 = 1 + 1, 3 = 1 + 1 + 1, 4 = 1 + 1 + 1 + 1, . . . . So, we may think of N as a subset of R with the following properties. (1) The neutral element of multiplication, namely 1, is an element of N. (2) The successor n + 1 of every n ∈ N belongs to N. Of course, the sets Z, Q and R also satisfy similar properties, but N is the smallest set of this kind. To pursue our discussion more formally, let us give sets with the above properties a specific name.

1.4. Natural Numbers and Mathematical Induction

13

Definition 1.19. A set A ⊆ R is said to be inductive if (1) A contains 1; and (2) for every element n of A, n + 1 is also an element of A. With this definition, the sets N, Z, Q, and R are all inductive subsets of R. As a result of the above discussion, we are now able to define N within the axiomatic framework. Indeed, as we assume that a set R exists whose elements satisfy the algebraic axioms, the set R is inductive by these assumptions. Now, we define N as the intersection of all inductive subsets of R, that is, the smallest inductive subset of R in the sense of inclusion, and you know that Z and Q may be defined using N by Z := N ∪ {0} ∪ {−n : n ∈ N} and  m : m ∈ Z, n ∈ N . Q= n With this definition of N, all familiar properties of the natural numbers can be proved. Exercise 1.20. Use the above definition of N to show that (1) the number 1 is the smallest natural number; and (2) if n ∈ N is not equal to 1, then 1/n ∈ N. Example 1.21. Show that if n is a natural number other than 1, then n − 1 is also a natural number. Solution. We show that A = {1} ∪ {n ∈ N : n − 1 ∈ N} is an inductive set of real numbers. First, it is obvious that 1 is an element of A. If k ∈ A, then k is a natural number and hence k + 1 is also an element of N. Since (k + 1) − 1 = k ∈ N, we find that k + 1 ∈ A. Thus, A is an inductive set contained in N, and we must have A = N by our definition of N. This proves the desired result. The Principle of Mathematical Induction. Besides the real number axioms presented so far and the axiom we will consider in the next section, it is sometimes necessary to utilize another statement, known as the principle of mathematical induction (PMI): If P (n) is a statement concerning natural numbers such that (I.1) P (n) is true for n = 1, and (I.2) the truth of P (n) implies that of P (n + 1) for every n, then P (n) is true for all natural numbers. (We used I in (I.1) and (I.2) as the abbreviation for induction.) What does the PMI say? Recall from set theory that every statement P (n) defines a set of natural numbers, namely {n ∈ N : P (n) is true}. The PMI therefore says that when this subset of N is inductive, it must be equal to N. This is now obvious for us, as we defined N to be the smallest inductive subset of R. We used such a technique in Example 1.21 above.

14

1. The Real Number System Revisited

When we use the PMI in a problem, the verification of the truth of (I.1) is known as the base step. Also, when we make the assumption that P (n) is true in order to deduce P (n + 1), as in (I.2), the assumption is called the induction hypothesis. As a first application of the PMI, we solve some simple problems. Example 1.22. Use induction to prove that the nth power of a positive real number is positive for every n ∈ N. Solution. Consider a positive real number a. We wish to show that an > 0 for every n ∈ N. This is true for n = 1 by our assumption. If we assume that an > 0 for some n, then an+1 = an a is greater than zero, because the product of two positive numbers is positive. The result now follows from the PMI. Here, we could consider P (n) as the statement “the nth power of a is positive” for a fixed a. Exercise 1.23. If a is a positive real number, prove that a−n is positive for every n ∈ N. Example 1.24. Let x and y be real numbers, and let m, n ∈ N. Show that (1) (xy)n = xn y n , (2) xn+m = xn xm , (3) (xn )m = xnm , and (4) 0 < x < y implies xn < y n . Solution. We use induction on n in the proof of all parts. In each item, the desired result is obvious for n = 1. In the proofs of (2) and (3), we first consider an arbitrary but fixed natural number m. (1) If we assume that the given equality is true, then (xy)n+1 = (xy)n (xy) = xn y n xy = xn+1 y n+1 by the fact that multiplication is commutative. The PMI now tells us that (1) is true for every n ∈ N. (2) If we assume the given equality, then xn+1 xm = xn xxm = xn+m x = x(n+m)+1 = x(n+1)+m by the facts that multiplication and addition are commutative and addition is associative. In view of the PMI this proves (2) for arbitrary n and the fixed m. Since m was arbitrary, it follows that (2) is true for all m, n ∈ N. (3) Assume the equality and note that by (1) and (2), (xn+1 )m = (xn x)m = xnm xm = xnm+m = x(n+1)m . Now, the PMI tells us that (3) is true for fixed m and every n ∈ N. Since m was arbitrary, (3) follows for all m, n ∈ N. (4) If xn < y n for some n, then (1.4)

xn+1 = xxn < xy n

1.4. Natural Numbers and Mathematical Induction

15

by the assumption that x > 0. On the other hand, y > 0 implies by the previous example that y n > 0. So, xy n < yy n = y n+1 .

(1.5)

Then xn+1 < y n+1 follows from (1.4) and (1.5), and the PMI gives us the desired result. Exercise 1.25. Let x and y be nonzero real numbers and m, n ∈ N. Verify that (1) (xy)−n = x−n y −n , (2) x−n−m = x−n x−m , (3) (x−n )−m = xnm , and (4) 0 < x < y implies x−n > y −n . It can be shown that for x = 0 and m, n ∈ N, xm x−n = xm−n . Hence x0 = 1 for x = 0. In the context of power series (Section 2.7) we will use the convention 00 = 1. This convention is also used in Example 1.33 below. Example 1.26. Verify that 1 + 3 + · · · + (2n − 1) = n2 for every n ∈ N. Solution. This is evident for n = 1. If we assume that the given equality is true, we see that 1 + 3 + · · · + (2n − 1) + (2(n + 1) − 1) = n2 + (2n + 1) = (n + 1)2 , that is, the equality also holds for n + 1. The desired equality now follows from the PMI. Example 1.27. Show that n < 2n < 3n for every n ∈ N. Solution. We only prove the first inequality; the proof of the second inequality is left as an exercise to the reader. The inequality n < 2n is obviously true for n = 1. If we assume that this inequality is true for n, then n + 1 ≤ n + n = 2n < 2 2n = 2n+1 . Hence the inequality also holds for n + 1. The PMI now tells us that the inequality is true for all natural numbers. Example 1.28. Use the PMI to prove that the identity (1.6)

(xn − y n ) = (x − y)

n 

xn−i y i−1

i=1

is valid for all x, y ∈ R and every n ∈ N. Solution. Let x and y be arbitrary real numbers. The equality (1.6) is obviously true for n = 1, as the summation on the right-hand side of (1.6) becomes x0 y 0 = 1 in this case. To complete the proof by induction, we assume that (1.6) holds and then show that it remains true if we replace n by n + 1. To find the value of xn+1 − y n+1 using the induction hypothesis, we write xn+1 − y n+1 = xn+1 − xy n + xy n − y n+1 .

16

1. The Real Number System Revisited

This allows us to write xn+1 − y n+1

= x(xn − y n ) + (x − y)y n   n  = x (x − y) xn−i y i−1 + (x − y)y n i=1

= (x − y) = (x − y) = (x − y)

 n  i=1  n 



x

n+1−i i−1

x

n+1−i i−1

y

+y

n

 y

+x

n+1−(n+1) (n+1)−1

y

i=1 n+1 

xn+1−i y i−1 .

i=1

The identity (1.6) is perhaps better known to you if we rewrite it without the sigma notation: (1.7)

(xn − y n ) = (x − y) (xn−1 + xn−2 y + · · · + xy n−2 + y n−1 ).

Using this identity, we are now able to prove the converse of Example 1.24(4). Example 1.29. Suppose that x and y are nonnegative real numbers. If xn < y n for some n ∈ N, prove that x < y. Solution. Note that when x and y are nonnegative, then so is the summation on the right-hand side of (1.7). Since xn < y n is equivalent to xn − y n < 0, (1.7) implies x − y < 0, or equivalently x < y. The Generalized Principle of Mathematical Induction. Sometimes, an assertion P (n) is not true for all natural numbers, but it holds for all numbers which are greater than some k ∈ N. In such situations, we use the generalized principle of mathematical induction (GPMI). This is obtained from the PMI by replacing (I.1) and (I.2) with the following conditions, respectively. (GI.1) The assertion P (n) is true for n = k. (GI.2) The truth of P (n) for an arbitrary n ≥ k implies that of P (n + 1). The GPMI then asserts that a statement P (n) for which (GI.1) and (GI.2) are valid is true for every n ≥ k. (We used GI as the abbreviation of generalized induction in (GI.1) and (GI.2).) Example 1.30. For which values of n is the inequality 2n < n! true? Solution. It can be easily verified that 2n > n! for n = 1, 2, 3. Since 24 < 4! and 25 < 5!, it seems that the inequality is true for all n ≥ 4. To verify this, we use the GPMI with k = 4. That the inequality is true for n = 4 has already been verified. If n > 4 is any natural number for which the inequality is true, then 2n+1 = 2 2n < (n + 1) 2n < (n + 1) n! = (n + 1)!. Thus, the GPMI tells us that 2n < n! holds for all n ≥ 4.

1.4. Natural Numbers and Mathematical Induction

17

When using the GPMI for n ≥ k in a problem, the verification of the truth of P (k) is known as the base step. The base step is important! The base step is crucial in induction. This can be seen by considering the equality (1.8)

1 + 3 + · · · + (2n − 1) = n2 + 4,

which we know from Example 1.26 to be wrong for all natural numbers. If we do not consider the base step, we see that the truth of (1.8) implies 1 + 3 + · · · + (2n − 1) + (2(n + 1) − 1) = n2 + 4 + (2n + 1) = (n + 1)2 + 4. This shows that the truth of (I.2) or (GI.2) is not sufficient if we want to use induction.

Some Useful Inequalities and Identities. In this subsection we utilize induction to establish some important results which will enhance our knowledge of real numbers. The results will be used later in this book. Example 1.31. Prove that for every n ∈ N and every real number x ≥ −1, Bernoulli’s inequality (1.9)

(1 + x)n ≥ 1 + nx

is true. Solution. Fix an arbitrary x ≥ −1. We prove (1.9) by induction on n. For n = 1, the both sides of (1.9) are equal to 1 + x. To complete the proof we show that the truth of (1.9) for n implies that for n + 1. Thus, assume that (1.9) holds and notice that then (1 + x)n+1 (1.10)

= (1 + x)(1 + x)n ≥ (1 + x)(1 + nx)

(1.11)

= 1 + nx + x + nx2 ≥ 1 + (n + 1)x.

Therefore, the given inequality is also true for n + 1. Note that (1.10) follows from the induction hypothesis and the fact that 1 + x ≥ 0, while (1.11) is a result of the inequality nx2 ≥ 0. An application of Bernoulli’s inequality can be found in Example 2.15(4). The following general version of Bernoulli’s inequality is also worth mentioning. Exercise 1.32. Let n be any natural number, and let x1 , . . . , xn be real numbers greater than −1 which are either all positive or all negative. Prove that (1 + x1 ) · · · (1 + xn ) ≥ 1 + x1 + · · · + xn . The following important formula will be used in connection with geometric series in the next chapter; see Example 2.84(4).

18

1. The Real Number System Revisited

Example 1.33. Let a = 1 be a real number. Prove that for every n ∈ N, n 

(1.12)

ak =

k=0

1 − an+1 . 1−a

Solution. This is true for n = 1, because 1 

ak = 1 + a =

k=0

1 − a2 . 1−a

If we assume (1.12), then we may write n+1  k=0

ak =

1 − an+1 1 − an+2 + an+1 = . 1−a 1−a

Thus the PMI shows that (1.12) holds for every n ∈ N. A note on the use of induction in Example 1.33. Induction gives us a method for proving formulas which are known or those which we guess are true. For instance, our inductive proof of (1.12) relies on the assumption that the formula is given. The formula can be proved alternatively by writing n+1 

ak =

k=0

and

n+1 

n 

ak + an+1

k=0

ak = 1 + a

k=0

because then 1+a

n  k=0

n 

ak ,

k=0

ak =

n 

ak + an+1 ,

k=0

and (1.12) follows by a simple calculation. Our next important result is the Binomial Theorem which helps us find the nth power of a sum a + b of real numbers on the basis of the various powers of a and b. It can be applied in many situations. Instances include the proof of Theorem 1.59, Example 1.63, and Example 2.15(5). To prove this theorem more easily, we need the following lemma. Recall that for nonnegative integers n and k with n ≥ k,   n n! , := k (n − k)! k! where we use the convention that 0! = 1. For example   5 5! = 10. = 2 3! 2!

1.4. Natural Numbers and Mathematical Induction

19

A note on the value of 0! Combinatorially, n! is the number of ways we can permute n objects, if n ∈ N. Since there is only one way to permute 0 objects—that is, to do nothing—we agree to make the convention 0! = 1.

Lemma 1.34. Let n and k be natural numbers, and let 1 ≤ k ≤ n. Then 

     n n n+1 + = . k−1 k k

Proof. This follows directly from the definition:    n! n n n! = + + k k−1 (n − k + 1)! (k − 1)! (n − k)! k!   1 n! 1 = + (n − k)! (k − 1)! n − k + 1 k   n+1 n! = (n − k)! (k − 1)! (n − k + 1)k   n+1 = . k





Now, the Binomial Theorem can be stated as follows. Theorem 1.35 (Binomial Theorem). If a and b are arbitrary real numbers and n is a natural number, then

(1.13)

(a + b)n =

n    n k n−k . a b k

k=0

Proof. We use induction on n. The truth of (1.13) can be easily verified for n = 1. So, we assume that (1.13) is true, and we prove that

(1.14)

(a + b)n+1 =

n+1  k=0

 n + 1 k n+1−k a b k

20

1. The Real Number System Revisited

must be also true. But by assuming (1.13), we find that (a + b)n+1

= (a + b)n (a + b)   n    n k n−k (a + b) = a b k k=0 n   n    n k+1 n−k  n k n+1−k = b + a a b k k k=0 k=0 n+1 n    n   n k n+1−k = ak bn+1−k + a b k−1 k k=1 k=0    n   n n = an+1 + + ak bn+1−k + bn+1 k−1 k k=1  n   n + 1 k n+1−k n+1 = a + + bn+1 , a b k k=1

where we used Lemma 1.34 to obtain the last equality. Now, the proof is complete by noticing that the last line is just the right-hand side of (1.14).  For example, by letting n = 3 in (1.13), we obtain the familiar identity (a + b)3 = b3 + 3ab2 + 3a2 b + a3 . The Well-Ordering Principle. The PMI has a logically equivalent form which is of frequent use in areas of mathematics for which natural numbers play a role, such as number theory. This is the well-ordering principle (WOP) which we will use in Lemma 1.48 below. The well-ordering principle. Every nonempty set of natural numbers contains a least element. Here, the least element is determined relative to the order 0 and y be real numbers. Then, there exists a natural number n such that nx > y. What does Theorem 1.39 say? This is evident when y ≤ 0, as we may choose n = 1 in this case. Let us see what the theorem says in the nontrivial case y > 0. If y is a large positive real number and x is a very small number of this kind, then the theorem says that we can go beyond y by adding x to itself n times, for some sufficiently large n. Intuitively, this means that we can pass a very long distance, say a distance of 10000 miles, with a sufficiently large number of steps, each of a 0.5-meter length! Of course, this last statement is a mere mathematical certainty, and to accept it we need to ignore such obstacles as limitations in time and energy.

Figure 3. An illustration of the Archimedean property.

Proof of Theorem 1.39. If such a number n could not be found, we would have y as an upper bound for the set D = {nx : n ∈ N}. This, in view of the axiom of completeness, gives us a real number a as the least upper bound of D. The assumption x > 0 implies a − x < a, and a − x is not accordingly an upper bound for D. Therefore, n0 ∈ N can be found such that n0 x > a − x. Adding x to the both sides of this inequality, we find (n0 + 1)x > a. Since n0 + 1 is also an element of N, this contradicts our assumption that a is the supremum of D.  The Archimedean property has many important implications. The following example presents consequences which will be of frequent use in the next chapters. Example 1.40. Prove the following statements. (1) If x is any positive real number, then there exists a natural number n such that 1/n < x. (2) If y is any real number, then there is a natural number n such that n > y.

1.5. The Axiom of Completeness and Its Uses

23

Solution. (1) Use Theorem 1.39 with y = 1 to find a natural number n satisfying nx > 1. The result then follows from this inequality if we multiply both sides by 1/n. (2) Use the Archimedean property with x = 1. As a first application of the above example, we refine Example 1.7(2). Example 1.41. If x < y + 1/n for every natural number n, prove that x ≤ y. Solution. If x > y, then x − y > 0, and we can find a natural number n0 such that 1/n0 < x − y. This gives x > y + 1/n0 , contradicting our assumption. Example 1.40(1) can be also used to solve problems concerning the least upper bounds. The following examples show the way it can be applied in this connection. Example 1.42. Let A be a nonempty subset of R, and let a be a real number satisfying the following property. For every n ∈ N, a + 1/n is an upper bound for A and a − 1/n is not an upper bound for this set. Prove that a is the supremum of A. Solution. Assume that some x ∈ A is greater than a. Then x − a > 0 and so we can find n1 ∈ N such that 1/n1 < x − a or, equivalently, x > a + 1/n1 . This contradicts the assumption that a + 1/n1 is an upper bound for A. Thus we must have x ≤ a for every x ∈ A. This shows that a is an upper bound for A. Now consider a real number b less than a. There exists n2 ∈ N such that 1/n2 < a − b. Thus b < a − 1/n2 and since a − 1/n2 is not an upper bound for A, b is not an upper bound for the set as well. This completes the proof. Example 1.43. Find the supremum of the following sets of real numbers. (1) A = (0, 1). (2) B = {1 − 1/n : n ∈ N}. ∞ (3) C = n=1 [−1/n, 1 − 1/n]. Solution. (1) It is clear that 1 is an upper bound for A. We claim that sup A = 1. To see this, consider some x < 1. We prove that x cannot be an upper bound for A. In fact, the assumption 1 − x > 0 allows us to choose a natural number n0 ≥ 2 such that 1/n0 < 1 − x. Since 1 − 1/n0 is an element of A, this shows that x is not an upper bound for A. (2) Clearly, 1 is an upper bound for B. If x < 1, we may find m ∈ N such that x < 1 − 1/m. Since 1 − 1/m is an element of B, this proves that sup B = 1. (3) We claim that C = {0}. First, it is clear that 0 ∈ C. If x is any nonzero real number, we may consider two cases. • x > 0. Then x ∈ [−1, 0] and therefore x ∈ C. • x < 0. In this case −x > 0, and so we can find n1 ∈ N such that 1/n1 < −x. Then x < −1/n1 and so x ∈ [−1/n1 , 1 − 1/n1 ]. This shows that x ∈ C also in this case. Therefore, 0 is the only element of C. The supremum of C is 0 itself. The supremum of a set may or may not belong to it. The items (3) and (1) of the above example illustrate these facts, respectively.

24

1. The Real Number System Revisited

Definition 1.44. Let A be a subset of R which is bounded from above. When the supremum of A is an element of this set, we say that it is the maximum or the greatest element of A. If the supremum is a, we write a = max A in this situation. Since every maximum is a supremum, the maximum of a set is unique whenever it exists. Example 1.45. Prove that every nonempty finite subset of R contains a greatest element. Solution. Let A be a nonempty finite subset of R with no greatest element. Choose x1 ∈ A by the assumption that A = ∅. Since x1 is not the greatest element of A, we find x2 ∈ A such that x2 > x1 . But x2 is not the greatest element of A, so that we may find x3 ∈ A with x3 > x2 . Continuing in this way, we obtain an infinite subset {x1 , x2 , . . .} of A. This contradicts our assumption that A is finite. Using the above example, we can prove the existence of the integral parts of real numbers. Proposition 1.46. For every real number x there exists a unique integer l such that l ≤ x < l + 1. Proof. If x ∈ Z, then l = x is the desired integer. Otherwise, use Example 1.40(2) to find m, n ∈ N such that n > x and m > −x. Then −m < x < n. The set D = {k ∈ Z : −m ≤ k ≤ n and k < x} is nonempty as it contains −m, and this set is obviously finite (The set D can have at most n + m elements). Denote the greatest element of D, whose existence follows from the above example, by l. Since l is the maximum of D, l + 1 ∈ D and therefore it follows that l + 1 > x. In summary, l is an integer such that l < x < l + 1. That l is the unique integer with this property follows if we verify that the value of l is independent of our initial choice of m and n. We leave the verification of this latter fact to the reader.  Figure 4 illustrates the idea of the proof of Proposition 1.46.

Figure 4. The integer l is the integral part of x.

Definition 1.47. Let x be a real number, and let l be the integer which is associated to x as in Proposition 1.46. We call l the integral part of x and denote it by [x]. For example, [1.2] = 1, [−1.3] = −2 and [0.6] = 0. With the notation of Definition 1.47 we find that for every real number x, [x] ≤ x < 1 + [x]. The Density of Q in R. An important corollary of the Archimedean property is the density of the rational numbers in R, meaning that between any two distinct real numbers at least one rational number can be found. To prove this interesting result more simply, we first establish the following (somewhat obvious) lemma.

1.5. The Axiom of Completeness and Its Uses

25

Lemma 1.48. If a and b are real numbers with b − a > 1, then there exists an integer m with a < m < b. Proof. First assume that 0 ≤ a < b. Let m be the smallest natural number which is greater than a. (By Example 1.40(2) at least one such natural number exists. For the existence of the smallest natural number with this property, you may invoke the well-ordering principle.) Then m − 1 ≤ a and hence a < m ≤ a + 1 < b. Now suppose that a < 0, and find a natural number n such that n > −a. Then 0 < n + a < n + b, and (n + b) − (n + a) = b − a > 1. So, by the previous argument we find a natural number m with n+a < m < n+b. This shows that a < m−n < b and completes the proof.  Figure 5 illustrates the proof of Lemma 1.48.

Figure 5. The assumption b − a > 1 yields b > a + 1.

What does Lemma 1.48 say? Lemma 1.48 says that when a and b are real numbers with a distance of more than one unit, an integer can be found between them. Now, the density theorem can be stated as follows. This answers our question (1.d) posed in the Introduction. Theorem 1.49 (Density of the rational numbers in R). Let x and y be real numbers with x < y. Then there is a rational number q such that x < q < y. Proof. Choose a natural number n such that 1/n < y − x. Then ny − nx > 1. Hence by Lemma 1.48 we find an integer m such that nx < m < ny, or equivalently, x < m/n < y. Letting q = m/n, we obtain the desired result.  Why is Q said to be dense in R? We say that Q is dense in R because the rational numbers can be found everywhere in R by Theorem 1.49. Exercise 1.50. Verify that between arbitrary real numbers x and y with x < y an infinite number of rational numbers can be found. It can be shown that between any two distinct real numbers an irrational number can be found. We leave the verification of this as an exercise to the reader (Exercise 35 at the end of this chapter).

26

1. The Real Number System Revisited

Lower Bounds and Infima. Up to now, we just presented one half of the boundedness theory for subsets of R. The neglected half of the theory is related to lower bounds and boundedness from below of such sets. Definition 1.51. Let B be a nonempty set of real numbers. A real number b is said to be a lower bound of B if x ≥ b for every x ∈ B. If such a bound can be found for B, we say that B is bounded from below. If c is a lower bound of B and no number greater than c is also, we call c the greatest lower bound or the infimum of B, and we denote it by inf B. Example 1.52. Find the infimum of the sets given in Example 1.43. Solution. (1) Let A = (0, 1). It is clear that 0 is a lower bound for A. If x > 0, we may find a natural number n greater than 1 such that 1/n < x. Thus, x can not be a lower bound for A. This proves that inf A = 0. (2) Consider B = {1 − 1/n : n ∈ N}. It is clear that 0 is a lower bound for B. If x > 0, then x cannot be a lower bound for the set, because 0 ∈ B. Thus, inf B = 0. ∞ (3) We see that C = n=1 [−1/n, 1 − 1/n] = {0}. So inf C = 0. Just like what we saw for suprema, a set may or may not contain its infimum. Definition 1.53. Let B be a subset of R which is bounded from below. When the infimum of B is an element of this set, we say that it is the minimum or the least element of B. If the infimum is b, we write b = min B in this situation. Since every minimum is an infimum, the minimum of a set is unique whenever it exists. As we saw in the previous section, the WOP guarantees the existence of the least element for any nonempty set of natural numbers. It is clear that a generic set of real numbers may fail to have a least element. An example is the interval (−∞, 1) which is not bounded from below. Exercise 1.54. Prove that every nonempty finite set of real numbers has a least element. Example 1.55. Find the supremum and infimum of the following sets of real numbers. (1) A = {x ∈ R : x2 − 3x + 2 < 0}. (2) B = { 21n +

1 3m

: m, n ∈ N}.

Solution. (1) Since x2 − 3x + 2 = (x − 1)(x − 2), it can be easily verified that x − 3x + 2 < 0 if and only if 1 < x < 2. Thus A = (1, 2), and for this reason inf A = 1 and sup A = 2. 2

(2) It is clear that 5/6 = (1/2) + (1/3) is an upper bound for B. Since this number is an element of B, we find that sup B = max B = 5/6. On the other hand, every element of B is greater than 0. So, 0 is a lower bound for the set. If b > 0, find n0 ∈ N such that 1/n0 < b/2. Since 0 < n0 < 2n0 < 3n0 by Example 1.27, 1 1 1 b < n0 < < . 3n0 2 n0 2

1.5. The Axiom of Completeness and Its Uses

27

This shows (1/2n0 ) + (1/3n0 ) < b, which means that b is not a lower bound for B. Thus, inf B = 0. Next, we show that the axiom of completeness is equivalent to its lower bound analogue. Theorem 1.56. The axiom of completeness is equivalent to the following assertion: Every nonempty subset of R which is bounded from below has a greatest lower bound. Proof. First assume the axiom (C), and let A be a nonempty subset of R which is bounded from below. If a is a lower bound for A, then −a is an upper bound for the nonempty set −A := {−x : x ∈ A}. By (C), −A has a supremum in R, say α. We claim that −α is the infimum of A. Since α is an upper bound for −A, its additive inverse is a lower bound for A. Now if β > −α, then −β < α and since α = sup(−A), we can find an element of −A, say −x for some x ∈ A, such that −x > −β. But, this last inequality means x < β, showing that β is not a lower bound for A. This gives −α = inf A and completes the proof of the first part. The proof of the second part, that the assertion of this theorem implies axiom (C), is similar. We therefore leave it to the reader.  Bounded Sets. When a set A is bounded from above and below, in which case it has both a supremum and an infimum, we say that A is bounded. The following characterization of bounded sets will be used later in Chapter 6. Proposition 1.57. A subset A of R is bounded if and only if M > 0 can be found such that |x| < M for every x ∈ A. Proof. If A is bounded, let α = sup A and β = inf A. If we choose M > max{α, −β}, which is of course possible because we may choose some M ∈ N with this property, then for every x ∈ A, −M < β ≤ x ≤ α < M, that is, |x| < M . Conversely, if M > 0 is such that |x| < M for every x ∈ A, then −M and M are lower and upper bounds for A, respectively, and A is bounded accordingly.  What does Proposition 1.57 say? Proposition 1.57 says that a subset of R is bounded if and only if it lies entirely in a (sufficiently large) neighborhood of 0. A Simple Characterization of Intervals. Among the various subsets of R, intervals are of the greatest importance in calculus because they are used there as the domain of real-valued functions. It is therefore essential to know intervals more deeply. Intervals satisfy a certain connectedness property which other kinds of sets fail to have: if I ⊆ R is an interval of any kind, then x < z < y and x, y ∈ I imply z ∈ I. This means that given two points of an interval I, the interval contains all the points that lie between them.

28

1. The Real Number System Revisited

Certainly, this property is not satisfied by many subsets of R, Q being just one example. This follows from the known fact that given p, q ∈ Q, at least one irrational number r can be found between them. We may describe this fact geometrically by thinking of Q as a disconnected subset of R. It is interesting that the connectedness property mentioned above characterizes the intervals completely. We will prove this using what we learned about suprema and infima. Theorem 1.58. A subset I of R is an interval if and only if, for every pair x and y of elements of I, the closed interval [x, y] is contained in I. Proof. We consider four cases for the boundedness situation of I as follows. (1) The set I is bounded. In this case a = inf I and b = sup I exist and it is clear that I ⊆ [a, b]. We show, on the basis of the connectedness property, that (a, b) ⊆ I. Once this is proved, I will be of one of the forms (a, b), (a, b], [a, b), and [a, b]. For example, I is equal to (a, b) when a, b ∈ I, I = (a, b] if a ∈ I and b ∈ I, etc. Consider some z ∈ (a, b). Since z < b and b = sup I, there exists y ∈ I such that z < y ≤ b. Similarly, z > a and a = inf I give us some x ∈ I such that a ≤ x < z. In summary, we found x, y ∈ I such that z ∈ (x, y). Now, the connectedness assumption implies z ∈ I, as desired. (2) The set I is bounded from above and unbounded from below. In this case, let α denote the supremum of I and note that I ⊆ (−∞, α]. We show that (−∞, α) ⊆ I, from which we can deduce either I = (−∞, α] or I = (−∞, α). Consider some z < α. Then, the assumption α = sup I gives us some y ∈ I with z < y ≤ α. Since I is not bounded from below, we find x ∈ I such that x < z. Thus, we found x, y ∈ I such that z ∈ (x, y), and the connectedness assumption says that z is an element of I. (3) The set I is bounded from below and unbounded from above. Then I is either (β, +∞) or [β, +∞), where β = inf I. (4) The set I is unbounded from above and below. Then I is (−∞, +∞), that is R. We leave the verification of the last two cases as exercise.



Theorem 1.58 will be used in the proof of Theorem 3.86 which asserts that the continuous image of an interval is an interval. As we will see in Chapter 3, Theorem 3.86 is in turn used in connection with the continuity of the inverse functions (Theorem 3.88). The Existence of nth Roots. As another application of the axiom of completeness, we now prove the existence of the nth roots for positive real numbers. Recall that the following theorem is usually applied without proof in calculus, so that one may consider it to be a trivial fact. As our proof shows, this is a highly nontrivial fact indeed! Theorem 1.59. For every positive real number a and every natural number n there exists a unique positive number b such that bn = a.

1.5. The Axiom of Completeness and Its Uses

29

Proof. The uniqueness is obvious because when b1 and b2 are such that 0 < b1 < b2 , bn1 < bn2 by Example 1.24(4), so that bn1 and bn2 cannot be both equal to a. Next, we note that it is enough to prove the existence part for the case a ≥ 1. For if a < 1, then 1/a > 1, and when b > 0 is such that bn = 1/a, we find that n a = (1/b) . So assume that a ≥ 1 and consider the set S = {x ∈ R : x ≥ 1, xn ≤ a}. To complete the proof we show that (1) S is nonempty, (2) S is bounded from above, and (3) the supremum of S is the number we are looking for. To prove (1), note that 1 ∈ S by the assumption a ≥ 1. As for (2), notice that for every x ∈ S the assumption x ≥ 1 shows x ≤ xn ≤ a, and hence a is an upper bound for S. To prove (3), let b denote the supremum of S. It is clear that b ≥ 1 > 0. We show that bn = a. To see this, we observe that both bn < a and bn > a result in contradiction. • bn < a. In this case, we find δ > 0 such that (1.16)

(b + δ)n < a.

Since this implies that b + δ ∈ S and b + δ > b = sup S, we get a contradiction that shows bn < a is not possible. To find the particular δ > 0 for which (1.16) holds, note that by the Binomial Theorem,     n n−1 n n−2 2 δ+ δ + · · · + δn b b (b + δ)n = bn + 1 2 ≤ bn + M δ + M δ 2 + · · · + M δ n ,

  n n−k M := max b : k = 1, . . . , n . k If 0 < δ < 1, then we get (b + δ)n < bn + nM δ. So, if δ < (a − bn )/(nM ), then (b + δ)n < a. In summary, if

a − bn 0 < δ < min 1, , nM where

then b + δ ∈ S, contradicting the assumption that b = sup S. • bn > a. In this case we use the binomial theorem to show that for M as above and

n b −a 0 < δ < min 1, , nM (1.17)

(b − δ)n > a.

Since b − δ < b, b − δ is not an upper bound for S, so that x ∈ S can be found with b − δ < x. Now, (b − δ)n < xn ≤ a, contradicting (1.17). This shows that  bn > a is not also possible, and completes the proof. In the statement of Theorem 1.59, a could be equal to 0. We assumed that a is positive because when a = 0, b = 0 is obviously the unique real number satisfying bn = a for every n.

30

1. The Real Number System Revisited

Definition 1.60. If a and b are related as in Theorem 1.59, we say that b is the nth root of a. 1

Instead of bn = a, we sometimes√write b = a n or b = case n = 2, we omit n and write b = a.

√ n a. In the particular

Example 1.61. Find the supremum of the set A = {x ∈ Q : x ≥ 0, x2 < 2}. √ Solution. √ By Example 1.29, 2 is an upper bound for A. If γ√is a real number less than 2, let q be a positive rational number between γ and 2. Then q 2 < 2, and hence q ∈ A. Since√q > γ, this proves that γ is not an upper bound for A, and therefore that sup A = 2. The number system Q is not complete. √ The set A of the above example is indeed a subset of Q, but its supremum 2 is not a rational number by Proposition 1.1. This shows another weakness of Q in comparison with R: although statements similar to the axioms (A-1)–(A-6) and (O-1)–(O-4) are true for rational numbers, we found a set of rational numbers which has upper bounds in Q (2 is a rational upper bound for the set A of the above example), but fails to have a least upper bound in Q. Rational Powers of Real Numbers. If a is positive and r = m/n with m ∈ Z and n ∈ N is a rational number, we define ar by 1

ar := (am ) n . If p ∈ Z and q ∈ N are such that r is also equal to p/q, then

qn 1 = (ap )n = apn , (ap ) q and

qn 1 = (am )q = amq . (am ) n

Since pn = mq, the above equalities show 1

1

(ap ) q = (am ) n , so that our definition of ar is well defined. 1 If a < 0 and n ∈ N is even, a n has no meaning in the real number system. This 1 n is because a = (a n ) must be nonnegative (and indeed positive) when a is nonzero 1 and n is even. If n is odd, however, we may define a n by 1

1

a n := −(−a) n . Exercise 40 at the end of this chapter shows the way we can define irrational powers of a real number. Exercise 1.62. Let √ x be a nonzero real number. Describe the reason we used to apply the formula x2 = |x| in calculus.

1.5. The Axiom of Completeness and Its Uses

31

Next, we prove some inequalities which include rational powers and which will be useful in the next chapters. The following inequality will be used in Example 7.5. Example 1.63. Let x1 , . . . , xn be nonnegative real numbers, and let m ∈ N. Prove that √ √ √ m (1.18) x1 + · · · + xn ≤ m x1 + · · · + m xn . Solution. If m = 1, then the inequality becomes an equality. If m > 1, we use induction on n. The inequality is obviously true for n = 1, as it is made into equality in this case. To complete the proof, we first show that the inequality also √ √ holds for n = 2. To see this, let a = m x1 and b = m x2 . Then, the Binomial Theorem says that m−1  m (a + b)m = am + am−k bk + bm ≥ am + bm , k so that a + b ≥ (1.19)

√ m

k=1

am + bm , or equivalently √ √ √ m x1 + x2 ≤ m x1 + m x2 .

Now assume that (1.18) holds. If xn+1 is another nonnegative number, then by (1.19),  √ √ m (x1 + · · · + xn ) + xn+1 ≤ m x1 + · · · + xn + m xn+1 √ √ √ ≤ m x1 + · · · + m xn + m xn+1 . This, in view of the PMI, completes the proof that (1.18) is true. The following is one of the most important inequalities of mathematical analysis. Theorem 1.64 (The Cauchy–Schwartz Inequality). If a1 , . . . , an and b1 , . . . , bn are arbitrary real numbers, then   n  n  12  n  12       2 2 ai bi  ≤ ai bi . (1.20)    i=1

i=1

i=1

Proof. It is enough to consider the case in which both sums on the right-hand side of (1.20) are nonzero, as otherwise the inequality is obviously true. To prove (1.20), we first prove it in the case where   n   n   2 2 ai = bi = 1. (1.21) i=1

In this case, (1.20) becomes (1.22)

i=1

 n      ai bi  ≤ 1.    i=1

32

1. The Real Number System Revisited

Since |ai bi | ≤ (1/2)(a2i + b2i ), for every i ∈ {1, . . . , n} by Example 1.13 our assumption (1.21) yields  n   n  n   1     ai bi  ≤ a2i + b2i = 1,    2 i=1

i=1

i=1

and (1.22) follows. In the general case, in place of the ai ’s and bi ’s we consider the numbers ai bi  ai :=  1 , bi := n 2 12 . n 2 2 ( a ) ( i=1 bi )  n 2  i=1 i n 2 = = 1 and the above argument proves Then, i=1 ai i=1 bi  n       ai bi  ≤ 1.    i=1



It is now easy to see that (1.20) follows from this.

A simple consequence of the Cauchy–Schwartz inequality is Minkowski’s inequality which will be of great importance in the proof of Proposition 1.71(4). This will be also used in Chapter 6, where we use it to show that the usual distance function of Rn satisfies the triangle inequality (Example 6.6). Corollary 1.65 (Minkowski’s Inequality). If a1 , . . . , an and b1 , . . . , bn are arbitrary real numbers, then  n  n  12  12  n  12    2 2 2 (1.23) (ai + bi ) ≤ ai + bi . i=1

Proof. First note that n  (ai + bi )2

i=1

=

i=1

n 

(a2i + 2ai bi + b2i )

i=1



n 

i=1

a2i

+2

i=1

 n 

a2i

 12  n 

i=1

i=1

 12 b2i

⎛  12  n  12 ⎞2 n   ⎠ , = ⎝ a2i + b2i i=1

+

n 

b2i

i=1

i=1

where the inequality follows from the Cauchy–Schwartz inequality. Now (1.23) follows by taking square roots from the above inequality.  The Extended Real Number System. When a set A ⊆ R is not bounded from above, it fails to have a supremum in R. A similar assertion is true for sets which are not bounded from below. We now introduce the extended real number system R∗ in which any nonempty subset of R has both a supremum and an infimum. This can be obtained by adjoining two new elements −∞ and +∞ to R: R∗ := R ∪ {−∞, +∞}. When we want to talk about an element of R∗ , we speak of an extended real number. So, an extended real number is either a real number, +∞ or −∞. The extended real number system will be used extensively in the next

1.6. The Complex Number System

33

chapter, where we study subsequences and introduce the concepts of limit superior and limit inferior. We equip R∗ with an order, again denoted by 0}. (2) B = (1, 3) ∪ [4, +∞). Solution. (1) It can be shown that A = (−∞, 1) ∪ (2, +∞); therefore, inf A = −∞ and sup A = +∞ in R∗ . (2) It is easy to see that inf B = 1 (in both R and R∗ ). Also, we have sup B = +∞ in R∗ . The set B has no supremum in R. Finally, we present some algebraic conventions about R∗ . If x ∈ R, then • x + (+∞) = +∞, x − (+∞) = −∞, x + (−∞) = −∞, and x − (−∞) = +∞; • x(+∞) = +∞ and x(−∞) = −∞ if x > 0, and x(+∞) = −∞ and x(−∞) = +∞ if x < 0; •

x +∞

=

x −∞

= 0.

As you may remember from calculus, terms like 0.∞ and ∞ − ∞ were among the indeterminate forms of the theory of limits. Here, we leave these terms undefined.

1.6. The Complex Number System Although our axioms (A-1)–(A-6), (O-1)–(O-4), and (C) allow us to develop a solid theory for real numbers and their allied concepts studied in the next chapters, the real number system still has an important algebraic deficiency. To understand this, recall that the square of any real number is nonnegative (Exercise 1.11). Thus x2 + 1 ≥ 1 > 0 for every x ∈ R, and this shows that the seemingly simple polynomial equation x2 + 1 = 0 has no solution in R!

34

1. The Real Number System Revisited

Since the last equation has no solution in our real numbers, we may think of an imaginary solution for it. If we denote this solution by i, it would satisfy the equality i2 = −1. This imaginary number is our basic tool for the construction of complex numbers. A complex number can be defined as an expression of the form a+bi in which a and b are real √ numbers and i is the imaginary unit discussed above. For example, 1 − 2i and − 2 + πi are complex numbers. If in the general form a + bi of complex numbers we let a = 0 and b = 1, then we see that the imaginary unit i itself is a complex number. Also, by letting b = 0, we find that every real number is a complex number. If we denote the set of all complex numbers by C, then these observations tell us that C is a set of numbers containing R in which the equation x2 + 1 = 0 has a solution. Algebraic Operations on Complex Numbers. Since complex numbers are defined in terms of real numbers, it is natural to think of their addition, multiplication, and order. In this connection, the definition of addition is quite straightforward: If z = a + bi and w = x + yi, we define the addition of z and w by (1.24)

z + w := (a + x) + (b + y)i.

If we call a and b the real and imaginary parts of a complex number a + bi, respectively, then (1.24) says that to add two complex numbers together, we just need to add their respective real and imaginary parts. The multiplication of z and w is also defined by (1.25)

zw := (ax − by) + (ay + bx)i.

This definition is motivated by the informal calculation (a + bi)(x + yi) = ax + ayi + bxi + byi2 from which we obtain (1.25) by letting i2 = −1 and grouping the real and imaginary parts. √ Example 1.67. If z = 1 − 2i and w = 2 + 3i, then √ √ √ z + w = (1 + 2) + i, zw = ( 2 + 6) + (3 − 2 2)i. As we mentioned earlier, every real number is a complex number. In fact, a real number is a complex number whose imaginary part is 0. With this understanding, we find that the formulas (1.24) and (1.25) reduce to the usual addition and multiplication of the real numbers if we set the imaginary parts of z and w to be 0. Also, it can be easily verified that the algebraic axioms (A-1)–(A-6) for real numbers can be used to prove similar assertions for complex numbers. For example, the commutativity axiom (A-2) shows that for complex numbers z and w as above, z + w = (a + x) + (b + y)i = (x + a) + (y + b)i = w + z, that is, the addition of complex numbers is commutative. Below, we list the relevant algebraic properties of the complex numbers, and we leave the verification of these to the reader (see Exercise 6 at the end of this chapter).

1.6. The Complex Number System

35

Proposition 1.68. The following statements are true. (1) The addition and multiplication of two complex numbers is again a complex number. (2) Addition and multiplication of complex numbers are commutative, meaning that for all z, w ∈ C, z + w = w + z,

zw = wz.

(3) Addition and multiplication of complex numbers are associative, in the sense that for arbitrary z, w and α in C, z + (w + α) = (z + w) + α,

z(wα) = (zw)α.

(4) The complex number 0 (= 0+0i) is the unique element of C such that z +0 = z for every z ∈ C. (5) The complex number 1 (= 1 + 0i) is the unique element of C such that z1 = z for every z ∈ C. (6) For every z = x + yi in C, the complex number −z = −x − yi is the unique additive inverse of z. This means that z + (−z) = 0 and −z is the unique complex number with this property. (7) For every complex number z = x + yi other than 0, the complex number     −y x 1 + i = z x2 + y 2 x2 + y 2 is the unique multiplicative inverse of z. This means that z(1/z) = 1, and that 1/z is the only element of C with this property. Complex Numbers: A More Formal Treatment. As we mentioned earlier, i is just an imaginary number. But, is it justifiable to think of an imaginary object in mathematics, a discipline in which precision is everything? Certainly we should find some way to give a precise meaning to i. Fortunately, this can easily be done by using another approach to the definition of complex numbers. To avoid the use of i in the definition of complex numbers, we may define a complex number equivalently (and formally) as an ordered pair of real numbers. Definition 1.69. A complex number is an ordered pair (a, b) in which a and b are real numbers. With this definition, the real numbers can be identified with the complex numbers whose second component is equal to 0. More precisely, we identify the pair (a, 0) with the real number a. The addition and multiplication of complex numbers in this sense can be defined as (1.26)

(a, b) + (x, y) = (a + x, b + y), (a, b)(x, y) = (ax − by, ay + bx).

Note that these are motivated by (1.24) and (1.25). With this approach to the definition of the complex numbers and their addition and multiplication, we can prove the existence of a complex number whose square is equal to −1. Proposition 1.70. A complex number i exists such that i2 = −1.

36

1. The Real Number System Revisited

Proof. We claim that i := (0, 1) is the desired complex number. To see this we note that by the above definition of multiplication of complex numbers, i2 = (0, 1)(0, 1) = (−1, 0) = −1.



Using the complex number i whose existence is now established rigorously, we are able to retrieve the previously used representation of complex numbers. In fact, if z = (x, y) is any complex number, then z = (x, y) = (x, 0) + (0, y) = (x, 0) + (y, 0)(0, 1) = x + yi. What is the difference between C and R2 ? Since we defined a complex number formally as an ordered pair of real numbers, the set C of all complex numbers is nothing but R2 , the Cartesian product of R by itself. The difference between C and R2 , however, is in the algebraic operations we consider on them. In fact, the algebraic operations of C are the addition and multiplication defined in (1.26), while those of R2 are the operations of addition and scalar multiplication defined by (a, b) + (x, y) = (a + x, b + y), α(a, b) = (αa, αb), where α is a real number. Distance in the Complex Plane. Since we defined complex numbers formally as ordered pairs of real numbers, it is natural to represent them geometrically as the points of a plane in which the Cartesian (or orthogonal) coordinate system is established. Figure 6 below illustrates the way we may represent a complex number in the plane. To emphasize that the plane is used to represent the complex numbers, we call it the complex plane. With this understanding, we refer to the x- and y-axes as the real and imaginary axes, respectively. This geometric interpretation of complex numbers allows us to think of the distance between two complex numbers z and w as the distance between their representative points in the plane. In fact, let z = a + bi and w = x + yi be arbitrary complex numbers. The distance between the points that represent z and w in the plane is the length of the line segment that joins them together. As can

Figure 6. The complex number is represented by the point (a, b) in the plane.

1.6. The Complex Number System

37

Figure 7. The length of the hypotenuse is the distance between z and w.

be seen in Figure 7, this line segment is the hypotenuse of the right-angled triangle whose sides are of length |x − a| and |b − y|. The distance between z and w now can be obtained as  dC (z, w) = (x − a)2 + (y − b)2 by the Pythagorean theorem. It is interesting that the distance function dC extends the Euclidean distance function de . To see this, note that when z and w are real numbers, which is the case where b = y = 0, then  dC (z, w) = (x − a)2 = |x − a| = de (x, a). What do C and R2 have in common? As we mentioned above, C and R2 are different from an algebraic point of view. Nevertheless, we represent the elements of C and R2 geometrically in the same way. The notion of distance in R2 is therefore quite similar to that in C: given elements (a, b) and (x, y) of R2 , we define their distance by  (1.27) d2e ((a, b), (x, y)) = (x − a)2 + (y − b)2 . It is clear that the right-hand side of (1.27) is equal to dC (a + bi, x + yi). d2e

satisfies properties similar to those of dC presented in For this reason Proposition 1.71 below. We call d2e the Euclidean distance function on R2 . Since dC is an extension of de , we expect to see that it has properties similar to those of de presented in Proposition 1.17. The following proposition shows that this is actually true. Proposition 1.71. The following statements are true. (1) For all z, w ∈ C, dC (z, w) ≥ 0. (2) The distance dC (z, w) is 0 if and only if z = w. (3) For all z, w ∈ C, dC (z, w) = dC (w, z). (4) For all z, w, and α in C, dC (z, w) ≤ dC (z, α) + dC (α, w).

38

1. The Real Number System Revisited

Proof. We prove item (4), which is the only nontrivial part. To do so, let z = a+bi, w = x + yi, and α = c + di. Then, 1  dC (z, w) = (x − a)2 + (y − b)2 2 1  = (x − c + c − a)2 + (y − d + d − b)2 2 1  1  ≤ (x − c)2 + (y − d)2 2 + (c − a)2 + (d − b)2 2 = dC (α, w) + dC (z, α), where the inequality follows from Minkowski’s inequality (Theorem 1.65). This proves (4).  Finding a common theme: The key to abstraction. Great mathematicians are able to notice which things two concrete theories have in common. This is a particularly important ability for those who want to be a pure mathematician. Comparing the concrete theories of real and complex numbers and noticing what they have in common, we will be able to develop various abstract theories by a process which is known as abstraction. We will talk more about abstract theories in “Notes on Essence and Generalizability” for this chapter. For now, we note that the existence of the distance functions de and dC , which satisfy similar properties by Propositions 1.17 and 1.71, is a common theme in the concrete theories of real and complex numbers. We will see in Chapter 6 that this allows us to establish an abstract theory, known as the theory of metric spaces. Since the Euclidean distance function de is defined in terms of the absolute value, we are led to think of an extension of the absolute value to the complex number setting that allows us to express dC in terms of it. This way of thinking results in the concept of modulus for complex numbers. Definition 1.72. The modulus of a complex number z = a + bi is defined to be  (1.28) |z| = a2 + b2 . It is clear that |z| = dC (z, 0) and more generally, |z − w| = dC (z, w) for all z, w ∈ C. Again, the concept of modulus extends the absolute value of real numbers. This can be seen by letting b = 0 in (1.28), which yields z = a and |z| = |a|. The basic properties of the modulus can be found in Exercises 44 and 45 at the end of this chapter. Complex Numbers and Order. We observed that the set C of all complex numbers may be equipped with algebraic operations that satisfy properties similar to the algebraic axioms (A-1)–(A-6) for real numbers. But, is it possible to equip C with an order relation < with the properties similar to (O-1)–(O-4)?

Notes on Essence and Generalizability

39

Unfortunately, the answer is no. To understand why, suppose that < is a relation on C for which the following are true. (1) For complex numbers z and w one and only one of the relations z < w, z = w, and w < z holds. (2) The relations z < w and w < α imply z < α for all z, w, and α in C. (3) If z < w, then z + α < w + α for every α ∈ C. (4) If z > 0 and w > 0, then zw > 0. Then, it can be proved in a similar way to the case of real numbers that the following are true. (5) For every z = 0 in C, z 2 > 0. (6) If z = 0 is any complex number, then the relations z > 0 and −z > 0 cannot hold simultaneously. More precisely, z > 0 if and only if −z < 0. Finally, statements (5) and (6) above can be used to make a contradiction. Indeed, since i = 0, (5) tells us that (1.29)

−1 = i2 > 0

but that −1 = 0 yields (1.30)

1 = (−1)2 > 0.

It is now clear that (1.29) and (1.30) contradict (6). Thus, in summary, we find that no relation < can be established on C for which the properties (1)–(4) above hold. Exercise 43 at the end of this chapter shows, however, that a relation < can be defined on C for which (1) and (2) above are true. Final Remarks on Complex Numbers. We will not pursue the study of complex numbers in the remainder of this book. More information on complex numbers can be found in Exercises 43–46 at the end of this chapter. We introduced complex numbers just to show that the real number system is still extendable to a larger number system with better algebraic properties. More precisely, a remarkable result known as the fundamental theorem of algebra, due to the great mathematician Gauss, asserts that every polynomial of degree n with complex coefficients has n (not necessarily distinct) roots in C. For example, the polynomial equation z 2 + 1 = 0 has two solutions in C, which are i and −i. This shows that C is algebraically more complete than R. On the other hand, we observed that if we want to equip C with an order, then the resulting relation fails to be consistent with the algebraic structure of C. Complex numbers and their allied concepts and methods are usually studied within a course entitled “Complex Analysis”, “Complex Variables”, or something similar. We refer the reader to [8], which is an excellent resource for such a course, for further study of the complex numbers.

Notes on Essence and Generalizability In this chapter we studied real numbers using an axiomatic approach. We saw that all the known properties of real numbers can be deduced from a couple of real

40

1. The Real Number System Revisited

number axioms. These were the axioms (A-1)–(A-6), (O-1)–(O-4), and (C). The principle of mathematical induction was also used to prove several important real number inequalities and identities. The material presented in this chapter is intended to complete our understanding of real numbers developed in calculus and more elementary mathematics. There was no footprint of generalizations here. Besides providing the necessary tools for the proofs of what we already knew from calculus or more elementary mathematics, we included some new concepts and tools briefly discussed below. (1) The concepts of supremum and infimum are not among what one learns in calculus. Most students know about maxima and minima before taking a course on elementary analysis. But these are not enough if we want to rigorously study what is presented in calculus. Analysis fills in this gap by introducing suprema and infima, and by giving criteria for their existence. As we will see in the next chapters, suprema and infima are indispensable tools in many situations. It is not possible to generalize these notions to the context of arbitrary sets, because their definition requires partial ordering, something like the usual order ≤ of R. Of course when a set is equipped with a partial order, defining suprema and infima is possible. See [30] for more on partial orderings and generalizations of suprema and infima. (2) The notion of distance in the real line is widely used in calculus. Instances include the ε − N definition of convergence for real sequences, and the ε − δ definition of limit for real functions. Nevertheless, in calculus the emphasis is more on the calculation of limits, and such precise definitions and rigorous proofs are not considered of much importance. In the next two chapters we will use the Euclidean distance function de to present a solid theory for real sequences and functions, respectively. Next, in the second part of the book, we use the important properties of de collected in Proposition 1.17 as our cue for the development of an abstract theory, the theory of metric spaces. A metric space is nothing but a nonempty set X together with a distance function d : X × X → R which satisfies properties similar to those of de presented in Proposition 1.17. Abstract theories are those developed in just such a general context and which may include many particular cases as their examples; in their general form they may be developed out of concrete applications. The process of developing an abstract theory is known as abstraction. The cornerstones of abstract theories are abstract concepts, which are obtained from concrete concepts by generalization. For instance, the concept of distance in the real line is a concrete concept, as we can use it in our daily experience, for example in measuring the length of a rod of iron. When we define a distance function d on some set X with properties similar to those of de , we are indeed generalizing the notion of distance from R to arbitrary sets, and the distance function is, in its general form, an abstract concept. (3) The Archimedean property and the density of the rational numbers in R are also new aspects of the real number theory for most students. These will be used throughout the book. Although the density of Q in R is proved using the

Exercises

41

order 0 there exists x ∈ A such that x > a − ε. Then, state and prove a similar assertion for lower bounds and infima. 25. Find the supremum and  infimum of the given sets. m (a) m+n : m, n ∈ N    n(n+1)  2 + n3 : n ∈ N (b) 2(−1)n+1 + (−1) 2   2 :n∈N (c) (n+1) 2n   n : n ∈ N (d) 1 − (−1) n

44

1. The Real Number System Revisited

26. Assume that A and B are subsets of R, A ⊆ B, and B is bounded. Prove that sup A ≤ sup B and inf A ≥ inf B . 27. Let A be a nonempty subset of R, and define −A = {−x : x ∈ A}. Prove that A is bounded from above if and only if −A is bounded from below. Show that in this case sup A = − inf(−A). 28. Let A and B be nonempty subsets of R which are bounded from above. Prove that sup(A ∪ B) = max{sup A, sup B}. What is the appropriate formula for inf(A ∪ B)? If A + B = {a + b : a ∈ A, b ∈ B}, then show that A + B is also bounded from above and sup(A + B) = sup A + sup B. 29. Let A and B be nonempty subsets of R such that for every a ∈ A and every b ∈ B, a ≤ b. Show that if B is bounded from above, then so is A and sup A ≤ sup B. 30. If A and B are nonempty sets of positive real numbers which are bounded from above, and if AB = {ab : a ∈ A, b ∈ B}, then AB is also bounded from above and sup(AB) = sup A sup B. 31. Suppose that A is a nonempty set of positive real numbers which is bounded from above and n ∈ N. Let A(n) = {xn : x ∈ A}. Prove that A(n) is also bounded from above and that sup A(n) = (sup A)n .

32. Let x be a positive irrational number, and let A = {m + nx : m, n ∈ Z and m + nx > 0}.

Prove that inf A = 0. 33. Prove that any nonempty set of integers which is bounded from above contains a greatest element. 34. Let a be a positive real number. Prove that for every real number x, there exists an integer n such that na ≤ x < (n + 1)a. 35. Prove that between any two real numbers an irrational number can be found. √ 36. If p is a prime number, prove that p is not a rational number. 37. Show that for every irrational number x, an irrational number y can be found such that xy is rational. 38. Show that for no rational number q, 2q = 3 can be true. √ √ 39. Show that for every natural number n, n + 1+ n − 1 is an irrational number.

Exercises

45

40. Suppose b > 1 is a fixed real number. For every x ∈ R consider the set Bx = {bt : t ∈ Q, t ≤ x}. Prove that when r is a rational number br = sup Br . This allows us to define bx for arbitrary x ∈ R by bx := sup Bx . Use this definition to show that for arbitrary real numbers x and y, bx+y = bx by . 41. Let x and y be irrational numbers such that x − y is also irrational. Define A = {x + r : r ∈ Q}, B = {y + r : r ∈ Q}. Show that A and B are disjoint sets. 42. Let I and J be intervals such that I ∩ J = ∅. If I ∩ J has more than one element, show that it is also an interval. 43. Define a relation < on C as follows. For z = a + bi and w = x + yi, write z < w if a < x or if a = x and b < y. Prove that the following are true. (a) For arbitrary complex numbers z and w exactly one of the relations z < w, z = w, and w < z holds. (b) If z, w, and α are complex numbers satisfying z < w and w < α, then z < α. The relation < is called, for obvious reasons, the lexicographic order of C. 44. Prove that the following are true for arbitrary complex numbers z, w, and α. (a) The modulus |z| is a nonnegative real number and |z| = 0 if and only if z = 0. (b) |zw| = |z| |w|. (c) |z + w| ≤ |z| + |w|. 45. The conjugate of a complex number z = a + bi is, by definition, the complex number z = a − bi. (a) Observe that the points that represent z and z are symmetric with respect to the real axis. (b) If z is a real number, observe that z = z. (c) Verify that for complex numbers z and w, z + w = z + w, zw = z w. (d) Verify that for every z ∈ C, |z| = |z|, |z|2 = zz. 46. Prove the complex version of the Cauchy–Schwartz inequality stated as follows. For arbitrary complex numbers z1 , . . . , zn and w1 , . . . , wn ,   n  n  12  n  12       2 2 zi wi  ≤ |zi | |wi | .    i=1

i=1

i=1

Chapter 2

Sequences and Series of Real Numbers

In the previous chapter we studied real numbers and their properties in detail. Among what we discussed was the Euclidean distance function de , introduced in Section 1.3. As we mentioned before, this notion of distance plays a crucial role in our analysis of real numbers. As a first manifestation of this fact, we show the way the distance can be used in the theory of real sequences and series. Sequences and series of real numbers appear in almost every calculus course. In this chapter we develop a solid theory for sequences and series and explain the way analysis helps us to remove the shortcomings of the theory one learns in calculus. To this end, we have to present the theory from its very beginning. This will help you to rebuild the building of sequences and series in your mind. To motivate the presentation, let us pose some questions. (2.a) Why should we care about real sequences? (2.b) What are the possible reasons for the divergence of a sequence? (2.c) Does every sequence have a monotone subsequence? Does every sequence have a convergent subsequence? (2.d) Is there a limit-free formulation of convergence? More precisely, can we prove that a sequence is convergent without knowing, or even guessing, the value of its limit? (2.e) Why do we study infinite series? (2.f ) What happens if we change the order in which the terms of a given series appear? Does the new series have the same convergence situation as the original one? These are some of the questions we will answer in this chapter. Question (2.a) will be answered in Section 2.1, where we show the necessity of considering sequences on the basis of what we learned in Chapter 1. To answer question (2.b), 47

48

2. Sequences and Series of Real Numbers

we first find a necessary condition for convergence. This is boundedness: A sequence is bounded when its range (the set of all its terms) is a bounded subset of R. With this definition, we then observe that any convergent sequence is bounded. A partial answer to (2.b) is obtained by the contrapositive law: Unboundedness is a possible reason for divergence. This is just a partial answer because, as we will see shortly, some bounded sequences are also divergent. This motivates us to pursue our study of divergent sequences in Section 2.2. The second section begins with an important question: What makes a bounded sequence into a divergent one? To answer this question we study subsequences and the way they can be used to determine the convergence or divergence of a sequence. Our studies motivate the notions of limit superior and limit inferior. These concepts and their applications in the theory of sequences and series provide a good instance of the way analysis strengthens calculus. The questions posed in (2.c) will also be answered in this section. In fact, we will see that the first question can be answered in the affirmative, while the second one has a negative answer. It will be proved, however, that every bounded sequence of real numbers has a convergent subsequence. This result is known as the Bolzano–Weierstrass theorem. We will answer question (2.d) in Section 2.3. To do so, we introduce Cauchy sequences and show that a real sequence is convergent if and only if it is Cauchy. Here the so-called Cauchy’s condition, which determines Cauchy sequences, depends only on the terms of the sequence under study and has nothing to do to the limit. Thus it is the limit-free formulation of convergence we asked for in (2.d). The third section is followed by the very short, but yet important, Section 2.4. This section contains some simple results about sequences in closed and bounded intervals. The basic theory of infinite series of real numbers will be developed in Section 2.5, where we answer question (2.e) using an argument that justifies the mathematical meaning of a series. This section also includes some convergence tests, the most important of which are the strengthened versions of the ratio and root tests one learns in calculus. These last tests use the notions of limit superior and limit inferior, introduced in the second section, to strengthen their calculus versions. We answer the questions posed in (2.f ) in Section 2.6, which is devoted to a study of the rearrangements of series. It will be shown that every rearrangement of an absolutely convergent series converges to the same value as the original series, and that this is not the case for conditionally convergent series. Finally in Section 2.7, we briefly discuss power series. The material presented in this section provides the necessary background for the study of Taylor series in Chapter 4.

2.1. Real Sequences, Their Convergence, and Boundedness Intuitively, a sequence in some nonempty set X is an ordered list of the elements of X, something like x1 , x2 , x3 , . . . . For example, arranging even natural numbers such as 2, 4, 6, . . . allows us to think of the sequence of even numbers in N. Formally, a sequence in X is a function from N into X. For instance, if we define a function x on N by x(n) = 2n, then the range of x is the subset of N whose only elements are

2.1. Real Sequences, Their Convergence, and Boundedness

49

even numbers. So, according to our intuitive understanding of the even numbers as a sequence in N, we may think of x, or its range {x(n) : n ∈ N}, as a sequence. Now assume that x is a function from N into X, that is, a sequence in X. We use in place of x(m) the simpler notation xm and call this the mth term of the sequence {xn }. In the first part of this book we are only concerned with real sequences, that is, ones whose range is a subset of R. For this reason, by a sequence we always mean a real sequence. To specify a sequence we may either (1) present a formula for its nth term; (2) list a few of its first terms so that the rule defining the nth term can be easily deduced; or (3) define it by recursion. Defining a sequence recursively means to present a couple of its first terms, and then give a formula that allows one to compute a term using its preceding one(s). For example, xn = n!, xn : 1, 2!, 3!, 4!, . . . and x1 = 1, xn+1 = (n + 1)xn for every n ∈ N all specify the same sequence. Example 2.1. In each case find a formula that determines the nth term of the given sequence. (1) xn : −1, 1, −1, 1, . . . ; (2) yn : −2, 5/2, −10/3, 17/4, . . . . Solution. (1) If we pay no attention to the signs, all terms include 1. So, to create the alternative appearance of the minus, we may write xn = (−1)n . (2) We can write −2 = −(1 + (1/1)), 5/2 = 2 + (1/2), −10/3 = −(3 + (1/3)), and 17/4 = 4 + (1/4). Hence, yn = (−1)n (n + (1/n)) is the right formula. Example 2.2. Find the first five terms of the following recursively defined sequences. (1) a1 = 1, an+1 = 3an ; n ∈ N. (2) b1 = 3, b2 = 5, bn+2 = bn + bn+1 ; n ∈ N. Solution. (1) We know that a1 = 1. So a2 = 3a1 = 3, a3 = 3a2 = 9, a4 = 3a3 = 27, and a5 = 3a4 = 81. It is easy to find a formula for the nth term of this sequence. This is an = 3n−1 . (2) Based on the recursive formula, b3 = 8, b4 = 13, and b5 = 21. Motivation: Why Do We Need Sequences? Before proceeding to the theory of sequences, it is appropriate to answer question (2.a). Why should we care about real sequences? What makes us eager to learn so much about them? Perhaps you learned in your calculus courses that sequences can be used in natural sciences. An instance is the recursively defined Fibonacci’s sequence x1 = 1, x2 = 1, xn = xn−1 + xn−2 ; n ≥ 3,

50

2. Sequences and Series of Real Numbers

which is important in biology. But, we are not going to use such applications to motivate our argument. This is because we are studying analysis, a discipline which requires a considerable amount of mathematical insight. So, it is better to find motivations within mathematics itself. To begin with, consider a set A ⊂ R which is bounded from above, and let x be the supremum of A, which exists as a real number by the axiom of completeness. Then for every n ∈ N, x − 1/n is not an upper bound for A, and this gives us an element of A, say xn , which is greater than x − 1/n. Since xn ≤ x, we find that |xn − x| < 1/n for every n ∈ N. Now, let  > 0 be given. By the Archimedean property we find N ∈ N such that 1/N < . Then for every n ≥ N , |xn − x| < 1/n ≤ 1/N < . In summary, given a set A ⊂ R with supremum x, we can find a sequence {xn } in A for which the following statement is true. (SC) For every  > 0 there exists N ∈ N such that |xn − x| <  for all n ≥ N . (Here SC is used as the abbreviation of sequential convergence.) What does (SC) say? As we saw in Chapter 1, de (x, y) = |x − y| is the distance between the points that represent x and y on the real line. Thus, the inequality |xn − x| <  says that the distance between xn and x is less than . Since  > 0 is arbitrary and N is chosen in accordance with , this way of thinking allows us to interpret the statement (SC) as follows. The terms xn will be as close to x as we wish, provided that n is sufficiently large. The truth of (SC) shows that the terms xn of the sequence are gathering around x. In fact when (SC) is true, with the terminology of Chapter 1 every -neighborhood of x contains all terms of {xn } except perhaps a finite number of them. Definition 2.3. If (SC) is true for a sequence {xn } and some x ∈ R, we say that {xn } converges to x. With this terminology, our above argument can be summed up into the following important result. Theorem 2.4. If x is the supremum of a set A ⊂ R, then there exists a sequence {xn } in A that converges to x.

2.1. Real Sequences, Their Convergence, and Boundedness

51

What does Theorem 2.4 say? Theorem 2.4 shows why sequences are so important in analysis. As we remember from Chapter 1, suprema and infima constitute a crucial part of the theory of real numbers, and the above theorem ties suprema to sequences. In this way, analysis strengthens our knowledge of sequences by demonstrating their relevance to the least upper bounds. Also, Theorem 2.4 tells us, in view of the above interpretation of (SC), that the supremum of a set is in some sense adhered to it and cannot be of a considerable distance from its elements. Example 2.5. Recall from Example 1.43(1) that sup(0, 1) = 1. Find a sequence in (0, 1) that converges to 1. Solution. We claim that {1 − 1/(2n)} is such a sequence. First, it is clear that 0 < 1 − 1/(2n) < 1 for every n, so that this is a sequence in (0, 1). Next, if  > 0 is given, we may find N ∈ N such that 1/N < 2. Then, for every n ≥ N ,       1 − 1 − 1 = 1 ≤ 1 < .  2n  2n 2N This proves that {1 − 1/(2n)} converges to 1. Of course, Theorem 2.4 tells nothing about the uniqueness of the sequence that converges to the supremum, as the sequence is not unique in general. For example, {1 − 1/(3n)} is another sequence in (0, 1) which converges to 1. Exercise 2.6. In each case find the supremum of the given set, then find a sequence in the set that converges to the supremum. (1) A = [−1, 0]. (2) B = {−1, 1}. Exercise 2.7. Let x be the infimum of a set A ⊂ R which is bounded from below. Prove that there exists a sequence {xn } in A that converges to x. Now, let us consider another situation in which sequences appear naturally. As we saw in Theorem 1.49, between any two real numbers a rational number can be found. So if x ∈ R is arbitrary, then for every n ∈ N a rational number pn can be found such that x < pn < x + 1/n. Now if  > 0 is given, by choosing N ∈ N with 1/N < , we see that |pn − x| < 1/n ≤ 1/N <  for every n ≥ N , that is, {pn } converges to x. Therefore, we proved the following interesting result. Proposition 2.8. If x is any real number, there is a sequence of rational numbers that converges to x. Of course when x is rational, {x − 1/(2n)} is a sequence of rational numbers that converges to x. This can be proved in the same way as we showed the convergence of {1 − 1/(2n)} to 1 in Example 2.5. The following example provides an illustration when x is irrational.

52

2. Sequences and Series of Real Numbers

√   √ Example 2.9. Prove that [2n 2]/2n converges to 2. Solution. It is clear that this is a sequence of rational numbers. Since [x] ≤ x < 1 + [x] for each x ∈ R, we observe that for every n ∈ N,   √ √  [2n 2] √  √ [2n 2] 1   = − 2 2 − < n.     2n 2n 2 √ √ 1/ Hence, if we choose N > log2 , then |[2n 2]/2n − 2| ≤ 1/2N <  for every n ≥ N . This proves the desired result. Exercise 2.10. Determine those real numbers x for which the sequence {[2n x]/2n } converges to x. Exercise 2.11. Show that for every real number x, there is a sequence of irrational numbers which converges to x. More on Convergence: Basic Facts and Examples. We are now ready to start a careful study of sequences. When {xn } converges to x, we say that x is the limit of {xn } and we write limn→∞ xn = x. The following proposition describes the reason we used “the limit” instead of “a limit”. Proposition 2.12. If {xn } converges to x and y, then x = y. Proof. To prove x = y, it is enough to show that |x − y| <  for every ε > 0. This is because 0 is the only nonnegative real number which is less than every positive number. So let  > 0 be given. Find N1 and N2 in N such that |xn − x| < /2 and |xn − y| < /2 for every n ≥ N1 and n ≥ N2 , respectively. Now, if N = max{N1 , N2 }, then by the triangle inequality |x − y| = |x − xN + xN − y| ≤ |x − xN | + |xN − y| < .



What does Proposition 2.12 say? Proposition 2.12 says that every sequence can converge to only one limit. This is intuitively evident because when the terms of a sequence are gathering around some point x, the same cannot be true for another point y. The above uniqueness result can be proved in an indirect way. Indeed, assuming x = y, we get |x − y| > 0. Now, for  = (1/2)|x − y| > 0, we may find some term xn whose distance from both x and y is less than . Then, the triangle inequality yields |x − y| ≤ |x − xn | + |xn − y| < 2 = |x − y|, a contradiction. Our next result shows that the convergence or divergence of a sequence cannot be affected by changing a finite number of its terms. Proposition 2.13. Assume that a sequence {xn } converges to x, and that {yn } is a sequence satisfying yn = xn for all but finitely many n ∈ N. Then {yn } also converges to x.

2.1. Real Sequences, Their Convergence, and Boundedness

53

Proof. Let k ∈ N be such that yn = xn holds for every n ≥ k. If  > 0 is given, find N so large that n ≥ N implies |xn − x| < . Then, |yn − x| <  for every n ≥ max{k, N }. This completes the proof.  What does Proposition 2.13 say? Proposition 2.13 shows that the convergence of a sequence is stable under changing a finite number of its terms. Stability results appear here and there in mathematics and in analysis in particular. They determine the amount of change a system, situation, or condition may tolerate before loosing its original form. In simpler words, Proposition 2.13 says that a convergent sequence remains convergent if we modify a finite number of its terms. Exercise 2.14. Is the convergence of a sequence stable under changing an infinite number of its terms? Justify your answer. Now that we are sure about the uniqueness of limits, it is time to consider some more examples. Example 2.15. Verify the following equalities by the  − N definition of limit. (1) limn→∞ 1/n = 0. (2) limn→∞ (sin3 (n + 1))/n4 = 0. (3) limn→∞ (n2 − 1)/(n2 + 1) = 1. (4) limn→∞ an = 0, if a is any real number satisfying |a| < 1. √ (5) limn→∞ n n = 1. Solution. Let  > 0 be given. (1) If we find N such that 1/N < , then for every n ≥ N , |1/n − 0| = 1/n ≤ 1/N < . √ (2) Let N be such that 1/N < 4 . Then for every n ≥ N ,  3   sin (n + 1)  | sin3 (n + 1)| 1 1  = − 0 ≤ 4 ≤ 4 < .   4 4 n n n N Here, the first inequality follows from the fact that | sin x| ≤ 1 for every x ∈ R. 2 2 2 2 (3) For every  n, |(n −21)/(n +2 1) − 1| = 2/(n + 1) < 2/n . So, if we find N > 2/, then |(n − 1)/(n + 1) − 1| <  for every n ≥ N .

(4) If a = 0, then the desired equality holds obviously. So, assume that 0 < |a| < 1. Let b = 1/|a| − 1. Then b is positive, and |a| = 1/(1 + b). Now Bernoulli’s inequality (1.9) of Example 1.31 yields for every n ∈ N that (1 + b)n ≥ 1 + nb. Hence, |a|n = 1/(1 + b)n ≤ 1/(1 + nb) < 1/(nb). Thus, if we choose N such that 1/N < b, then for every n ≥ N , |an − 0| = |a|n < 1/(nb) ≤ 1/(N b) < .

54

2. Sequences and Series of Real Numbers

√ (5) It is enough to show that an = n n − 1 tends to zero as n tends to infinity. To do so, we should prove that for all sufficiently large n, |an | = an < . Now notice that by the Binomial Theorem for every n > 1, 1 1 n = (1 + an )n = 1 + nan + n(n − 1)a2n + · · · + ann ≥ 1 + nan + n(n − 1)a2n . 2 2  Thus, a simple calculation shows a2n ≤ 2/n. Therefore, an ≤ 2/n for every n, and if we choose a natural number N > 2/2 , then an <  for every n ≥ N . Exercise 2.16. If c is any positive real number, prove that limn→∞ c1/n = 1. Hint. Consider two cases, 0 < c < 1 and c > 1, and use Bernoulli’s inequality (Example 1.31) in each case. √ Exercise 2.17. If 0 < r < 1 is arbitrary, prove that the sequence { nr n } converges to 0. Divergence: A First Glance. When a sequence is not convergent, we say that it is divergent. Remark 2.18. To prove that a sequence {xn } is divergent, one way is to negate statement (SC) for every x ∈ R. In fact, the divergence of {xn } can be proved by showing that the following statement is true. Given any x ∈ R a positive number  can be found such that for every N ∈ N, |xn − x| ≥  holds for some n ≥ N . √ Example 2.19. Use the method of Remark 2.18 to prove that the sequence { n} is divergent. Solution. Let x ∈ R be arbitrary, and consider  = 1. If N ∈ N is given, choose n > max{N, (1 + x)2 }. If 1 + x ≤ 0, then it is clear that √ n > 1 + x, (2.1) because this holds for every natural number √ n. Otherwise, (2.1) follows from our choice of n. In either case, (2.1) gives us | n − x| > 1. Bounded and Unbounded Sequences. Now we turn to question (2.b). What are the possible reasons for the divergence of a sequence? To answer this question, we start by finding a necessary condition for the convergence of sequences. Recall that a necessary condition for the truth of some statement p is another statement q which is implied by p. To begin with, let {xn } be a convergent sequence with limit x. For  = 1, there exists N ∈ N such that |xn − x| < 1 for every n ≥ N . Thus for all such n, |xn | − |x| ≤ |xn − x| < 1, that is, |xn | < 1 + |x|. If we let M = max{|x1 |, . . . , |xN −1 |, 1 + |x|}, then it follows that |xn | ≤ M for every n ∈ N. Since we defined sequences as functions from N into R, the set {xn : n ∈ N} ⊂ R

2.1. Real Sequences, Their Convergence, and Boundedness

55

that corresponds to a sequence {xn } may be called its range. With this terminology, our above discussion shows that the convergence of a sequence implies the boundedness of its range (Proposition 1.57). To state this result more simply, let us say that a sequence is bounded when its range satisfies the same property. Therefore, we have proved the following. Theorem 2.20. Every convergent sequence is bounded. When a sequence is not bounded, we say that it is unbounded . Notice that by the contrapositive law of mathematical logic, the following statement is equivalent to Theorem 2.20. Every unbounded sequence is divergent. In other words, unboundedness is another reason for divergence. Various kinds of unbounded sequences. It is (1) (2) (3)

clear that a sequence {xn } is unbounded if and only if it is either bounded from above and unbounded from below, unbounded from above and bounded from below, or unbounded from above and below.

Example 2.21. The following sequences are unbounded and, accordingly, divergent. In each case determine the reason for unboundedness. √ (1) { n}. (2) {−n2 }. (3) {(−1)n n}. (4) {an }, if a is any real number satisfying |a| > 1. Solution.

√ (1) If M > 0 is arbitrary, we may find N ∈ N such that N > M 2 . Then n > M for every n ≥ N . Note that 1 is a lower bound for this sequence and the sequence is unbounded from above. √ (2) For a given L < 0, −L > 0 and we may choose N ∈ N with N > −L. Then −n2 < L for every n ≥ N , and this shows that the sequence is unbounded from below. It should be noted that −1 is an upper bound for this sequence. (3) The range of this sequence is {2k : k ∈ N} ∪ {−(2k − 1) : k ∈ N}. Since the first set is not bounded from above and the second one is not bounded from below, the sequence is unbounded from above and below. (4) Since |a| > 1, we have two cases as follows. • a > 1. In this case a positive number c exists, namely c = a − 1, such that a = 1 + c. Then, by the Binomial Theorem we find that for every n, 1 an = (1 + c)n = 1 + nc + n(n − 1)c2 + · · · + cn > nc. 2 So if M > 0 is given, choosing N ∈ N so large that N > M/c, we find that an > nc ≥ N c > M for every n ≥ N . This proves that {an } is unbounded from above and, hence, divergent in this case.

56

2. Sequences and Series of Real Numbers

• a < −1. Here, the range is unbounded from above, as its subset {a2k : k ∈ N} is also. The range is also unbounded from below. To see this, it is enough to show that its subset {a2k−1 : k ∈ N} has the same property. But this follows easily because {a2k−1 : k ∈ N} = (1/a){a2k : k ∈ N}, the right-hand set is unbounded from above, and 1/a < 0. We therefore proved that the sequence {an } is unbounded from above and below when a < −1. Of course, a bounded sequence may be divergent. This fact reveals that unboundedness is not the only reason for divergence. We will pursue our study of the reasons of divergence in Section 2.2. Example 2.22. Show that the sequence {cos nπ} is bounded and nevertheless divergent. Solution. It is known that for every n, cos nπ = (−1)n and hence | cos nπ| = 1, which shows that this sequence is bounded. Assume to the contrary that {(−1)n } converges to some L ∈ R. For  = 1, we then find some N ∈ N such that for every n ≥ N , |(−1)n − L| < 1. Let n0 > N be even. Then n0 + 1 is odd, and we obtain from the last inequality that (2.2)

|(−1)n0 − L| = |1 − L| < 1

and (2.3)

|(−1)n0 +1 − L| = | − 1 − L| < 1.

But (2.2) yields 0 < L < 2 and (2.3) gives −2 < L < 0. This contradiction shows that {cos nπ} is divergent. A note on Example 2.22. Comparing Example 2.22 with Example 2.21(1), √ we observe two different divergence types. The terms of the sequence { n} are growing unlimitedly, and this prevents the sequence from being convergent. We describe this by saying that the sequence diverges to +∞. On the other hand, the sequence {(−1)n } cannot converge to any real number, in spite of the fact that all of its terms are in the bounded set {−1, 1}. The next result can be used to find the limit of many sequences. Proposition 2.23. Let {xn } be a bounded sequence. If {yn } is a sequence that converges to 0, then limn→∞ xn yn = 0. Proof. Let M > 0 be such that |xn | ≤ M for every n ∈ N. If  > 0 is given, find a natural number N such that |yn | < /M for every n ≥ N . Then, for all such n,  = .  |xn yn − 0| = |xn ||yn | < M M

2.1. Real Sequences, Their Convergence, and Boundedness

57

To interpret Proposition 2.23, let us call the sequence {xn yn } the product of {xn } and {yn }. What does Proposition 2.23 say? Proposition 2.23 says that the product of a sequence which converges to zero and a bounded sequence will also converge to zero. Exercise 2.24. Let {xn } be a bounded sequence. If {yn } is a sequence that converges to some nonzero real number, is it necessarily true that {xn yn } is convergent? Why? Example 2.25. Find the value of the following limits. (1) limn→∞ (sin n)/n. √ (2) limn→∞ (−1)n ( n n − 1). Solution. (1) The sequence {sin n} is bounded and limn→∞ 1/n = 0. The desired limit is therefore equal to 0 by Proposition 2.23. √ (2) We observed in the solution of Example 2.15(5) that limn→∞ ( n n − 1) = 0. Now, it follows from the boundedness of {(−1)n } and Proposition 2.23 that the given limit is equal to 0. Algebraic Operations and Convergence. Given arbitrary sequences {xn } and {yn }, we can use the algebraic operations of addition, multiplication, and subtraction to define new sequences. These are {xn + yn }, {xn yn }, and {xn − yn }, respectively. We have already considered the product sequence {xn yn } in Proposition 2.23. If yn = 0 for every n ∈ N, then we can also use division to obtain the sequence {xn /yn }. But, what is the relation between the convergence of {xn } and {yn } and that of the above-mentioned sequences? Clearly, we expect to see that the limit respects the algebraic operations, which is the content of our following result. Theorem 2.26. If {xn } and {yn } converge to x and y, respectively, then (1) limn→∞ (xn + yn ) = x + y, limn→∞ (xn − yn ) = x − y; and (2) limn→∞ (xn yn ) = xy; (3) if, in addition, yn = 0 for every n and y = 0, then limn→∞ xn /yn = x/y. Proof. Let  > 0 be given. (1) We only prove the first equality as the second one can be proved similarly. There exist N1 and N2 such that |xn − x| < /2 and |yn − y| < /2, for every n ≥ N1 and n ≥ N2 , respectively. Let N = max{N1 , N2 }. Then, by the triangle inequality, for every n ≥ N , |(xn + yn ) − (x + y)| ≤ |xn − x| + |yn − y| < /2 + /2 = .

58

2. Sequences and Series of Real Numbers

(2) Since {xn } is convergent, it is bounded by Theorem 2.20. Let M > 0 be such that for every n, |xn | < M . Then, by the triangle inequality, |xn yn − xy| = |xn yn − xn y + xn y − xy| ≤ |xn ||yn − y| + |y||xn − x| ≤ M |yn − y| + |y||xn − x| for every n. Now letting K = max{M, |y|}, we see that for every n, |xn yn − xy| ≤ K(|yn − y| + |xn − x|). Choosing N1 and N2 so large that n ≥ N1 and n ≥ N2 imply |xn − x| < /(2K) and |yn − y| < /(2K), respectively, |xn yn − xy| <  follows for all n ≥ max{N1 , N2 }. (3) It is sufficient to show that with the assumptions we have, limn→∞ 1/yn = 1/y, because the desired result then follows from this and item (2). To this end we note that   1 1  |yn − y|  (2.4)  yn − y  = |yn ||y| . Since limn→∞ yn = y = 0, we may choose N1 ∈ N such that for every n ≥ N1 , |yn − y| < |y|/2. But, ||yn | − |y|| ≤ |yn − y|, and hence it follows that (2.5)

|y| 2 for every n ≥ N1 . Combining (2.4) and (2.5) we see that for every n ≥ N1 ,    1  − 1  < 2 |yn − y|.  yn y  |y|2 |yn | >

So if we find N2 ∈ N such that |yn − y| <  |y|2 /2 for all n ≥ N2 , then for every n ≥ max{N1 , N2 }, |1/yn − 1/y| < .  What does Theorem 2.26 say? The first equality in Theorem 2.26(1) says that the limit of the sum of two convergent sequences is the sum of their limits. The remaining equalities can be interpreted similarly. Exercise 2.27. Show by means of an example that a sequence of nonzero real numbers may converge to 0. This shows that the assumption y = 0 is necessary in Theorem 2.26(3). Since a sequence, all of whose terms are equal to some fixed number a, must converge to a itself, it follows that for an arbitrary sequence {xn } with limit x, limn→∞ (a + xn ) = a + x, limn→∞ (a − xn ) = a − x and limn→∞ (axn ) = ax. In particular, we find that limn→∞ (−xn ) = − limn→∞ xn = −x. Example 2.28. Find the value of the given limits. (1) limn→∞ (2n2 − 1)/(3n2 + 4n). √ (2) limn→∞ ((0.1)n + 2 n n)/n2 .

2.1. Real Sequences, Their Convergence, and Boundedness

59

Solution. (1) We first note that 2n2 − 1 n2 (2 − 1/n2 ) (2 − 1/n2 ) = = . 3n2 + 4n n2 (3 + 4/n) (3 + 4/n) Since limn→∞ 1/n = 0 by Example 2.15(1),  2 1 1 =2 lim 2 − 2 = 2 − lim n→∞ n→∞ n n and limn→∞ 3 + 4/n = 3 + 4 limn→∞ 1/n = 3 = 0. Thus, Theorem 2.26(3) tells us that the given limit is 2/3.  √ √  (2) For every n, ((0.1)n + 2 n n)/n2 = ((0.1)n + 2 n n) 1/n2 . Therefore, letting n tend to infinity in both sides of this equality and using (4) and (5) of Example 2.15 and Theorem 2.26(2), we find that √ (0.1)n + 2 n n lim = (0 + 2)(0) = 0. n→∞ n2 Monotone Sequences. If {xn } is a bounded sequence, then its range X = {xn : n ∈ N} is a bounded subset of R, by our definition. Let x = sup X. By Theorem 2.4, x will be the limit of a sequence in X. Since {xn } itself is a sequence in X, we encounter a natural question. When does a bounded sequence converge to the supremum of its range? To answer this question, denote the assertion “{xn } converges to the supremum of its range” by q. We are therefore seeking a statement p concerning {xn } that implies q. By the contrapositive law, we should then be able to deduce the negation of p from that of q. So, let us begin with the negation of q; that is, let us assume that {xn } is a bounded sequence that does not converge to the supremum x of its range. By Remark 2.18 there exists ε > 0 such that for every N ∈ N, n ≥ N can be found with x − xn = |xn − x| ≥ ε. Since x − ε < x, we find m ∈ N such that (2.6)

xm > x − ε.

By our choice of ε, for N = m + 1 we find n ≥ N such that (2.7)

x − xn ≥ ε.

It then follows from (2.6) and (2.7) that xn ≤ x − ε < xm . In summary, we found m, n ∈ N with n > m and xn < xm . The negation of p for {xn } then reads as follows. There exist m, n ∈ N such that n > m and xn < xm . The statement p itself therefore states the following. For every m, n ∈ N with n > m, xn ≥ xm .

60

2. Sequences and Series of Real Numbers

When the above statement is true for a sequence {xn }, we say that {xn } is increasing. Thus, we observed that a bounded and increasing sequence necessarily converges to the supremum of its range. We will present a more nicely arranged proof of this fact below. For now we give a formal definition of increasing sequences. Definition 2.29. A sequence {xn } is increasing (resp., decreasing) if for every n ∈ N, xn ≤ xn+1 (resp., xn ≥ xn+1 ). If ≤ (resp., ≥) is replaced by < (resp., >), then we say that {xn } is strictly increasing (resp., strictly decreasing). A monotone sequence is one which is either increasing or decreasing. Example 2.30. In each case determine whether the given sequence is monotone or not. (1) {1 − 1/n3 }. √ (2) { n}. (3) {(−1)n cos2 n}. Solution. (1) It is clear that for every n, (n + 1)3 > n3 and hence −1/(n + 1)3 > −1/n3 . Adding 1 to the both sides of this last inequality, we find that the sequence is strictly increasing. √ √ (2) Since n + 1 > n for every n, this sequence is also strictly increasing. (3) The terms of this sequence are alternatively negative and positive. This shows that the sequence is neither increasing nor decreasing. We now come to the formal statement of our observation above. Theorem 2.31. An increasing sequence which is bounded from above converges to the supremum of its range. Proof. Let {xn } be a bounded increasing sequence. The boundedness assumption ensures, in view of the axiom of completeness, that the range X = {xn : n ∈ N} of the sequence has a supremum. Denote this by x. We prove that limn→∞ xn = x. To see this, let ε > 0 be given. Since x = sup X and x − ε < x, x − ε cannot be an upper bound for X. Thus, we can find N ∈ N such that xN > x − ε. That {xn } is increasing shows that for every n ≥ N , (2.8)

xn ≥ xN > x − ε.

On the other hand, the assumption x = sup X implies that for every n ∈ N, (2.9)

xn ≤ x < x + ε.

Now, we conclude from (2.8) and (2.9) that |xn − x| <  for every n ≥ N . This completes the proof. 

2.1. Real Sequences, Their Convergence, and Boundedness

61

A note on Theorem 2.31. A bounded sequence may converge to the supremum of its range without being monotone. An example is the sequence

(1 + (−1)n ) 1− , n which converges to 1, the supremum of its range, and is nevertheless not monotone. Thus, the property of being monotone is a sufficient condition for the convergence of a sequence to the supremum of its range. This property is by no means necessary. Exercise 2.32. Prove that a decreasing sequence that is bounded from below converges to the infimum of its range. As an immediate consequence of Theorem 2.31 and Exercise 2.32, we have the following corollary. Corollary 2.33. Every monotone and bounded sequence is convergent. Example 2.34. Show that the sequence 1 1 1 + + ··· + an = n+1 n+2 2n converges. Solution. A simple calculation shows that for every n, 1 1 1 1 1 + − = − > 0. an+1 − an = 2n + 2 2n + 1 n + 1 2n + 1 2n + 2 Hence {an } is strictly increasing. Also   1 an ≤ n < 1, n+1 showing that {an } is also bounded from above (and that it is bounded indeed, because an ≥ 0). The sequence is therefore convergent by Theorem 2.31.   √ Example 2.35. For each n ∈ N, let xn = 2 + 2 + · · · + 2, where there are n  √ √ square roots. For instance, x1 = 2, x2 = 2 + 2, and so on. Prove that {xn } is convergent. Solution. It is enough to show that the sequence is bounded from above and increasing. To this end we use induction on n to show that (1) 2 is an upper bound for {xn }, and (2) xn ≤ xn+1 for every n ∈ N. √ To prove (1), note that x1 = 2 < 2, and if xn ≤ 2, then √ √ xn+1 = 2 + xn ≤ 2 + 2 = 2.

62

2. Sequences and Series of Real Numbers

 √ √ As for (2), we first see that x1 = 2 < 2 + 2 = x2 . Next, if xn ≤ xn+1 , then  √ xn+1 = 2 + xn ≤ 2 + xn+1 = xn+2 , which proves (2). Finding the limit of the sequence {xn } in a mathematically rigorous way will be done in the next section (Example 2.56). Exercise 2.36. Rework Example 2.35 with 2 replaced by a fixed real number a > 0 in the definition of {xn }. Divergence to Infinity. One important class of unbounded sequences is composed of those sequences that diverge to infinity. To have an idea of what is meant by divergence to infinity, consider the sequence {n}. Given any M > 0, Example 1.40(2) gives us some natural number N greater than M . Then for every n ≥ N , n is also greater than M . In other words, the terms xn of this sequence can go beyond any prescribed positive value when n is sufficiently large. Clearly, this implies that the sequence is unbounded and, hence, divergent. We describe this by saying that the sequence diverges to +∞ or that it tends to +∞. This is our motivation for the following definition, in which we also define divergence to −∞. Definition 2.37. Let {xn } be a sequence. We say that {xn } diverges to +∞ (resp., −∞) if the following statement is true. Given any M > 0 (resp., M < 0), N ∈ N can be found such that xn > M (resp., xn < M ) for every n ≥ N . An obvious example for a sequence that diverges to −∞ is {−n}.   Example 2.38. Prove that the sequence (n2 + 1)/(n − 1) diverges to +∞. Solution. Denote the nth term of the sequence by an . If M > 0 is given, we should find N ∈ N such that an > M for every n ≥ N . Since this inequality is equivalent to n2 − nM > −(M + 1) and −(M + 1) < 0, we find that an > M holds for every n ∈ N with n2 − nM > 0, or equivalently with n > M . Thus, if we choose N ∈ N to be greater than M , then an > M holds for every n ≥ N .   Exercise 2.39. Verify that the sequence (n2 + 1)/(2 − n) diverges to −∞. A note on unboundedness and divergence to infinity. A sequence that diverges to +∞ or −∞ is necessarily unbounded. More precisely, the divergence of {xn } to +∞ implies that {xn } is unbounded from above, and the divergence of {xn } to −∞ tells us that {xn } is unbounded from below. Of course, an unbounded sequence is not necessarily divergent to infinity. An example is the sequence {(−1)n n} which is unbounded from above and below, and is nevertheless not divergent to infinity. The following theorem complements Theorem 2.31. Theorem 2.40. An increasing sequence that is unbounded from above diverges to +∞.

2.1. Real Sequences, Their Convergence, and Boundedness

63

Proof. Let {xn } be an unbounded and increasing sequence. Let M > 0 be arbitrary. Since {xn } is not bounded from above, M is not an upper bound for its range. Thus, there exists some N ∈ N such that xN > M . Since {xn } is increasing,  xn ≥ xN > M for every n ≥ N . This shows that limn→∞ xn = +∞. A note on increasing sequences. As a result of Theorem 2.31 and Theorem 2.40 we find that an increasing sequence either converges to the supremum of its range or diverges to +∞. Exercise 2.41. If {xn } diverges to +∞, does it necessarily follow that {xn } is increasing? Exercise 2.42. Prove that a decreasing sequence that is unbounded from below diverges to −∞. Convergence and Order. Now we turn to the relations that convergence and order may have. We begin by answering a natural question. If {xn } converges to x, what can be said about the sign of x and those of the terms xn ? Lemma 2.43. Let {xn } be a sequence with limit x. If x < 0, then xn < 0 for all sufficiently large n. Proof. For  = −x/2 > 0, find N ∈ N such that for every n ≥ N , |xn − x| < . Then, it follows from this inequality that for all such n, xn < x/2 < 0.  Note that the converse of Lemma 2.43 is not true. In fact, {−1/n} is a sequence, all of whose terms are negative, that nevertheless converges to 0. Exercise 2.44. If limn→∞ xn = x and x > 0, then prove that xn > 0 for all sufficiently large n. Lemma 2.43 and Exercise 2.44 explored the way the sign of the limit of a sequence affects those of its terms. The following theorem examines the converse of this. Theorem 2.45. If xn ≥ 0 for all sufficiently large n and limn→∞ xn = x, then x ≥ 0. In particular, if xn ≥ yn for all sufficiently large n, limn→∞ xn = x, and limn→∞ yn = y, then x ≥ y. Proof. If x < 0, we may use Lemma 2.43 to deduce that xn < 0 for all sufficiently large n, contradicting our nonnegativity assumption. To prove the second assertion, note that the assumption implies xn −yn ≥ 0 for all sufficiently large n, and therefore  by the first part and Theorem 2.26, x − y = limn→∞ (xn − yn ) ≥ 0. What does Theorem 2.45 say? The first part of Theorem 2.45 says that a sequence all of whose terms, except perhaps a finite number of them, are nonnegative cannot converge to a negative limit. The second part shows that the limit respects the order relation 0 be √ √ given. If x = 0, then it is enough to find N so large that | xn − 0| = xn <  for √ every n ≥ N . But xn <  is equivalent to xn < 2 . So, it is sufficient to find a natural number N such that xn < 2 for every n ≥ N . Since {xn } converges to x and xn = |xn − 0|, this can be done easily. √ √ √ √ If x > 0, then x > 0. Now xn + x ≥ x > 0 for every n, and we may write √ √ 1 |xn − x| √ ≤ √ |xn − x|. | xn − x| = √ xn + x x √ √ √ If we choose N so large that |xn − x| <  x for every n ≥ N , then | xn − x| <  for all such n.  √ To interpret Proposition 2.46 let us call { xn } the square root of {xn }. What does Proposition 2.46 say? Proposition 2.46 says that the limit of the square root of a convergent sequence of nonnegative numbers is the square root of its limit. Example 2.47. Find the limit of the given sequences.  (1) limn→∞ (n + 1)/(n + 3). √ (2) limn→∞ n2 + n − n. Solution. (1) Since for every n, n+1 1 + 1/n = , n+3 1 + 3/n it follows by letting n → ∞ and using Theorem 2.26 that n+1 lim = 1. n→∞ n + 3 It now follows from Proposition 2.46 that the given limit is also equal to 1. (2) The limit cannot be computed in this way. So we write √ √  ( n2 + n − n) ( n2 + n + n) 2 √ n +n−n = n2 + n + n n 1 = √ . = 2 n +n+n 1 + 1/n + 1 Since by Proposition 2.46   √ lim 1 + 1/n = lim (1 + 1/n) = 1 = 1, n→∞

n→∞

2.1. Real Sequences, Their Convergence, and Boundedness

65

letting n tend to infinity in both sides of the equality  1 , n2 + n − n =  1 + 1/n + 1 we find by (1) and (3) of Theorem 2.26, that the desired value is 1/2. √ Exercise 2.48. Show that the sequence { n2 + 1 − n} converges, and find the value of its limit. The following is a limit-analogue of the real number property that y ≤ x ≤ y implies x = y. Theorem 2.49 (The Squeeze Theorem). If yn ≤ xn ≤ zn for all sufficiently large n and limn→∞ yn = limn→∞ zn = x, then limn→∞ xn = x. Proof. Suppose N1 is so large that yn ≤ xn ≤ zn for every n ≥ N1 . Let  > 0 be given. There exists N2 ∈ N such that |zn − x| <  for every n ≥ N2 . Similarly, one finds N3 ∈ N such that |yn − x| <  for each n ≥ N3 . Let N = max{N1 , N2 , N3 }. Then for every n ≥ N , − < yn − x ≤ xn − x ≤ zn − x < . This gives |xn − x| <  for all n ≥ N , and accordingly completes the proof.



What does the Squeeze Theorem say? The Squeeze Theorem says that if a sequence is squeezed between two sequences that converge to a common limit, then this sequence will also converge to the same limit. Example 2.50. Find the value of the given limits. (1) limn→∞ 1/2n . (2) limn→∞ (1 + cos2 n!)/(3n2 + n!). (3) limn→∞ n!/nn .  (4) limn→∞ n 1 + 1/2 + · · · + 1/n.   (5) limn→∞ 1/n2 + 1/(n + 1)2 + · · · + 1/(2n)2 . Solution. (1) As we observed in Example 1.27, 2n > n for every n ∈ N. Hence for each n, 1/2n < 1/n and we deduce from limn→∞ 1/n = 0 and the Squeeze Theorem that limn→∞ 1/2n = 0. (2) Since for every n ∈ N, (2.10)

0 < 1/(3n2 + n!) < 1/n! ≤ 1/n

and limn→∞ 1/n = 0, the Squeeze Theorem gives limn→∞ 1/(3n2 + n!) = 0. On the other hand, the sequence {1 + cos2 n!} is bounded, as 1 ≤ 1 + cos2 n! ≤ 2 for every n. Thus, the given limit is equal to 0 by Proposition 2.23. Note that from (2.10) and the Squeeze Theorem, we also deduce that limn→∞ 1/n! = 0.

66

2. Sequences and Series of Real Numbers

(3) For each n, n! n 1 1 = ··· ≤ . nn n n n Thus we may deduce as in (1) that 0
+ ··· + = > 2 = 2 2 2 (2n) (2n) (2n) 4n 4n On the other hand, 1 1 n+1 1 1 an < 2 + · · · + 2 = = + 2. n n n2 n n Consequently for every n, 1 1 1 < an < + 2 . 4n n n It now follows from the Squeeze Theorem that limn→∞ an = 0. Exercise 2.51. If yn ≤ xn ≤ zn for all sufficiently large n and {yn } and {zn } are convergent with distinct limits, does it necessarily follow that {xn } is convergent?

2.2. Subsequences, Limit Superior and Limit Inferior In this section we continue our study of the reasons of divergence to provide a more complete answer to question (2.b). To motivate our main concepts, let us begin with considering the sequence xn = (−1)n (1 + 1/n). In this sequence those terms which have an even index, namely the terms x2k = 1 + 1/(2k), form a sequence {1 + 1/(2k)}. Since the terms of this new sequence are chosen from those of the original one {xn }, we say that {x2k } is a subsequence of {xn }. Here, the terms of our subsequence are chosen from the xn ’s using a given strictly increasing sequence of natural numbers, namely {2k}. If we work with the sequence of odd natural numbers {2k − 1}, which is again a strictly increasing sequence of natural numbers, we get another subsequence of {xn }. This is the sequence {−(1 + 1/(2k − 1))}. Based on what we learned in the previous section, it is easy to see that 1 lim x2k = lim 1 + =1 k→∞ k→∞ 2k and   1 lim x2k−1 = lim − 1 + = −1. k→∞ k→∞ 2k − 1

2.2. Subsequences, Limit Superior and Limit Inferior

67

From the above equalities one finds that the sequence {xn } fails to have a limit. More precisely, the terms with an even index are decreasing to 1 as n tends to infinity, while those with an odd index are increasing to −1 when n becomes larger and larger. This prevents the terms xn from gathering around a single real number, and therefore makes {xn } into a divergent sequence. Thus, we have made an important observation. We found a divergent sequence whose divergence is due to possessing convergent subsequences that tend to different values. As we will see shortly, this is a common reason for the divergence of real sequences. To continue our studies in a general context, we present the formal definition of a subsequence. Definition 2.52. Let {xn } be a sequence, and let {nk } be a strictly increasing sequence of natural numbers. The sequence {xnk } is called a subsequence of {xn }. Example 2.53. Prove that every sequence that is unbounded from above has a subsequence that diverges to +∞. Solution. Suppose {xn } is a sequence that is unbounded from above. We may find n1 ∈ N such that xn1 > 1, for otherwise 1 would be an upper bound for the sequence. With a similar reasoning, we can find n2 ∈ N greater than n1 such that xn2 > 2. Continuing in this way, we find a strictly increasing sequence {nk } of natural numbers such that xnk > k for every k ∈ N. The subsequence {xnk } of {xn } therefore diverges to +∞. If {xn } is any sequence and for k ∈ N we set nk = 2k, we get the subsequence {x2k } of even-indexed terms. Similarly, by letting nk = 2k − 1, we obtain the subsequence {x2k−1 } of odd-indexed terms. As we will see shortly, these are of great importance in determining the convergence or divergence of {xn }. Example 2.54. Find the subsequences of even- and odd-indexed terms for the sequence xn = sin(nπ/2), and determine their convergence or divergence. Solution. For every k ∈ N, x2k = sin kπ = 0. Therefore the subsequence of even-indexed terms is the sequence all of whose terms are equal to 0. It is accordingly convergent to 0. Also, it is easy to verify that x2k−1 = (−1)k+1 for each k. Thus, the subsequence of odd-indexed terms is {(−1)k+1 }. This sequence is divergent, as it has two convergent subsequences that tend to different values, one converging to 1 and the other converging to −1. Now, we are ready to prove one of the main results of this section, which we already discussed by means of our concrete examples. Theorem 2.55. The following conditions are equivalent for a sequence {xn } of real numbers. (1) The sequence {xn } converges to x. (2) Every subsequence of {xn } converges to x. (3) The subsequences of even- and odd-indexed terms of {xn } converge to x. Proof. (1)⇒(2) Let {xnk } be an arbitrary subsequence of {xn }, and let  > 0 be given. If N is such that |xn − x| <  for every n ≥ N , then there exists k0 ∈ N such

68

2. Sequences and Series of Real Numbers

that nk ≥ N holds for all k ≥ k0 . Thus |xnk − x| <  if k ≥ k0 . This proves that {xnk } converges to x. (2)⇒(3) This is trivial. (3)⇒(1) If  > 0 is given, then there are natural numbers k1 and k2 such that k ≥ k1 and k ≥ k2 imply |x2k − x| <  and |x2k−1 − x| < , respectively. It now follows that |xn − x| <  for every n ≥ max{2k1 , 2k2 − 1}. This shows that {xn } converges to x.  Example 2.56. Find the limit of the sequence discussed in Example 2.35. √ Solution. We saw that the sequence defined by x1 = 2 and √ (2.11) xn+1 = 2 + xn , n ∈ N converges. Denote the limit by L. Since {xn+1 } is a subsequence of {xn }, it also converges to L by the above theorem. Letting n tend √ to infinity in both sides of (2.11) and noticing Proposition 2.46 we get L = 2 + L or, equivalently, L2 − L − 2 = 0. From this we must have L = −1 or L = 2. But, L = −1 is not acceptable, because a sequence of nonnegative numbers cannot have a negative limit by Theorem 2.45. Therefore, the sequence converges to 2. It follows from Theorem 2.55 that a sequence may diverge in spite of having many convergent subsequences. The following proposition shows that this is not the case for monotone sequences. Proposition 2.57. If some subsequence of a monotone sequence {xn } converges, then the sequence {xn } also converges to the same limit. Proof. We may assume for the sake of clarity that {xn } is increasing. The proof is similar when {xn } is decreasing. Suppose that a subsequence {xnk } of {xn } converges to some x ∈ R. We prove that {xn } also converges to x. To see this, let  > 0 be given and choose k0 ∈ N such that for every k ≥ k0 , (2.12)

x −  < xnk < x + .

It is enough to show that |xn − x| <  for every n ≥ nk0 . To see this, fix some n ≥ nk0 and note that then xn ≥ xnk0 by our monotonicity assumption. Next, that {nk } is strictly increasing allows us to find some k1 > k0 such that nk1 > n. This gives xnk1 ≥ xn , again by monotonicity. Combining these inequalities with (2.12), we get x −  < xnk0 ≤ xn ≤ xnk1 < x + , which gives our desired inequality. What does Proposition 2.57 say? Proposition 2.57 says that a monotone sequence converges to x if and only if one of its subsequences does. Compare this with the equivalence (1) ⇔ (2) of Theorem 2.55.



2.2. Subsequences, Limit Superior and Limit Inferior

69

Limit Superior and Limit Inferior. The concepts of limit superior and limit inferior play a decisive role in the theory of real sequences. Since these are related to the limits of subsequences, we begin our study of them with the following definition. Definition 2.58. Let {xn } be a sequence, and let x be a real number. We say that x is a subsequential limit of {xn } if x is the limit of a subsequence of {xn }. For example, 1 and −1 are the only subsequential limits of the sequence {(−1)n}. For an arbitrary sequence {xn } consider a set E({xn }) that consists of all subsequential limits of {xn }, with the convention that +∞ and −∞ may also belong to E({xn }). In fact, +∞ ∈ E({xn }) (resp. −∞ ∈ E({xn })) when some subsequence of {xn } diverges to +∞ (resp. −∞). The limit superior and limit inferior of {xn } will be defined as the supremum and infimum of E({xn }) in the extended real number system, respectively. But this definition requires the assumption that E({xn }) is nonempty. The following theorem, which is interesting in its own right, shows that this set is indeed nonempty. Theorem 2.59. Any sequence possesses a monotone subsequence. Note that Theorem 2.59 answers the first question posed in (2.c) in the affirmative. Before proving this theorem, we introduce a useful concept. Definition 2.60. A term xm of a sequence {xn } is said to be a peak of the sequence if for every n ≥ m, xn ≤ xm . Example 2.61. In the sequence {(−1)n }, each term (−1)m with even m is a peak. If m is odd, (−1)m is not a peak. Each term of the sequence {1/n} is a peak, because n ≥ m implies 1/n ≤ 1/m. If a sequence is strictly increasing, it will have no peaks. Proof of Theorem 2.59. Let {xn } be an arbitrary sequence. We consider two cases for the number of peaks {xn } may have. (1) The sequence has a finite number (possibly zero) of peaks. Denote these peaks by xm1 , xm2 , . . . , xms , where m1 < m2 < · · · < ms . Let n1 = ms + 1 (and if {xn } has no peaks, let n1 = 1). Then xn1 is not a peak, and hence we can find n2 > n1 such that xn2 > xn1 . Again, since n2 > n1 > ms , xn2 is not a peak, and we find n3 > n2 such that xn3 > xn2 . Continuing in this way, we find a subsequence {xnk } of {xn } which is strictly increasing. (2) The sequence has an infinite number of peaks. Since every infinite subset of a countably infinite set is countably infinite, we may find a strictly increasing sequence of natural numbers, say {nk }, such that {xnk : k ∈ N} is the set of all peaks in {xn }. Since each term of {xnk } is a peak of {xn } and n1 < n2 < · · · ,  xn1 ≥ xn2 ≥ · · · , {xnk } is thus a decreasing subsequence of {xn }.

70

2. Sequences and Series of Real Numbers

Why is E({xn }) always nonempty? Given any sequence {xn } consider a monotone subsequence {xnk }. If {xnk } is bounded, then Corollary 2.33 says that it converges to some real number x, and x ∈ E({xn }). Otherwise, {xnk } diverges to infinity by Theorem 2.40 and Exercise 2.42, and at least one of +∞ ∈ E({xn }) and −∞ ∈ E({xn }) is true. Thus, E({xn }) is nonempty in either case. As an immediate consequence of Theorem 2.59 and Corollary 2.33, we get the following important result. Theorem 2.62 (The Bolzano–Weierstrass Theorem). Every bounded sequence of real numbers has a convergent subsequence. Proof. If {xn } is a bounded sequence, it has a bounded monotone subsequence {xnk } by Theorem 2.59 and the fact that every subsequence of a bounded sequence is also bounded. The subsequence {xnk } is now convergent by Corollary 2.33.  A note on the Bolzano–Weierstrass theorem. An unbounded sequence may have no convergent subsequences. An instance is the sequence {n}. Although this answers the second question of (2.c) in the negative, the Bolzano–Weierstrass theorem gives us a class of sequences for which the question has an affirmative answer. Example 2.63. Verify that the sequence {1 + cos(nπ/2)} is bounded. Then, find a convergent subsequence of this sequence. Solution. By the known properties of the cosine function, |1 + cos(nπ/2)| ≤ 2 for every n ∈ N. So, the sequence is bounded. Since cos(nπ/2) = cos(kπ − π/2) = 0 for n = 2k − 1, the subsequence of odd-indexed terms is the sequence all of whose terms are equal to 0. The subsequence diverges to 0 accordingly. Example 2.64. Show by means of an example that an unbounded sequence may have a convergent subsequence. Solution. In the sequence {((−1)n − 1)(n2 /(n + 1))}, the subsequence of evenindexed terms converges to 0 because all of its terms are equal to 0. Verify that the subsequence of odd-indexed terms of this sequence diverges to −∞. This shows that the given sequence is unbounded from below. By Theorem 2.55, a sequence {xn } converges to x if and only if E({xn }) = {x}. Since the supremum and infimum of a subset of R∗ are equal if and only if the set is a singleton, we see that {xn } converges if and only if sup E({xn }) = inf E({xn }) ∈ R. This leads us to think of the supremum and infimum of the set E({xn }). Definition 2.65. Let {xn } be a sequence. We call the supremum of E({xn }) (in the extended real number system) the limit superior of {xn } and it is denoted by lim supn→∞ xn . Similarly, the infimum of E({xn }) is called the limit inferior of {xn } and it is denoted by lim inf n→∞ xn .

2.2. Subsequences, Limit Superior and Limit Inferior

71

Since every nonempty subset of R∗ has a supremum and an infimum, the limit superior and the limit inferior exist as extended real numbers, even if {xn } is divergent. The reason we used the strange notation lim supn→∞ xn and lim inf n→∞ xn can be understood in Exercise 16 at the end of this chapter. To conclude our discussion, we again emphasize that a sequence {xn } is convergent if and only if lim supn→∞ xn = lim inf n→∞ xn ∈ R. Example 2.66. Find the limit superior and the limit inferior of the following sequences, and then determine their convergence or divergence. (1) The sequence of all rational numbers. n

(2) bn = {n(−1) }. (3) cn = {cos(nπ/3)}. Solution. (1) Since Q is countably infinite, it is possible to enumerate its elements as a sequence, say {an }. Since every real number is the limit of a sequence of rational numbers by Proposition 2.8, R is a subset of E({an }). This shows that lim supn→∞ an = +∞ and lim inf n→∞ an = −∞. The sequence is divergent accordingly. (2) The subsequences of even- and odd-indexed terms are {2k} and {1/(2k−1)}, respectively. The former subsequence diverges to +∞, while the latter converges n to 0. If {bnk } is any other subsequence of {n(−1) }, then one of the following cases will occur. • All the terms bnk , except perhaps a finite number of them, are chosen from {2k}. In this case {bnk } diverges to +∞. • All the terms bnk , except perhaps a finite number of them, are chosen from {1/(2k − 1)}. Then {bnk } converges to 0. • The sequence {bnk } contains an infinite number of terms from both {2k} and {1/(2k − 1)}. In this case, {bnk } can neither diverge to +∞ nor converge to 0 (and also to any other real number). Thus, E({bn }) = {0, +∞} and this shows that lim inf bn = 0, lim sup bn = +∞. n→∞

n→∞

The sequence is divergent. (3) The first six terms of this sequence are 1/2, −1/2,−1, −1/2, 1/2, and 1. By the 2π-periodicity of the cosine function, the next six terms are the same as those above, and so forth. This shows that the only possible subsequential limits for {cos(nπ/3)} are −1, −1/2, 1/2, and 1. Thus, lim supn→∞ an = 1 and lim inf n→∞ an = −1. This sequence is also divergent. Exercise 2.67. Rework the above example for the sequence {sin(nπ/3)}. Example 2.68. Prove the following assertions. (1) If the limit superior of a sequence is +∞, then the sequence is unbounded from above. (2) If the limit superior of a sequence is a real number, then the sequence is bounded from above.

72

2. Sequences and Series of Real Numbers

Solution. (1) Let {xn } be a sequence that is bounded from above. If we denote by x the supremum of the range of {xn }, then xn ≤ x for every n ∈ N. Thus we find that no subsequence of {xn } can diverge to +∞ and no subsequential limit of {xn } can be greater than x. It follows that lim supn→∞ xn ≤ x < +∞. The proof of (1) is now complete by the contrapositive law. (2) If {yn } is a sequence that is unbounded from above, then Example 2.53 tells us that the limit superior of {yn } is +∞ (and hence cannot be a real number). The truth of (2) now follows from the contrapositive law. A note on Example 2.68. As is shown in the solution of Example 2.68(2), the limit superior of a sequence that is unbounded from above is +∞. Combining this with Example 2.68(1), we find that a sequence is unbounded from above if and only if its limit superior is +∞. Exercise 2.69. Prove the following statements. (1) A sequence is unbounded from below if and only if its limit inferior is −∞. (2) If the limit inferior of a sequence is a real number, then the sequence is bounded from below. Is the converse of (2) also true? Next we present some of the most important properties of the limit superior which will be used frequently in what follows. Theorem 2.70. Let {xn } be an arbitrary sequence of real numbers. (1) The limit superior of {xn } belongs to E({xn }). (2) If β > lim supn→∞ xn , then there exists some natural number N such that xn < β for every n ≥ N . Moreover, the limit superior is the unique extended real number with the above two properties. Proof. (1) We consider three cases for the limit superior of {xn }, as follows. • The limit superior is +∞. In this case, Example 2.68(1) shows that {xn } is unbounded from above. Hence Example 2.53 tells us that some subsequence of {xn } diverges to +∞. This means that +∞, as the limit superior of {xn }, belongs to E({xn }). • The limit superior is −∞. Since E({xn }) is nonempty, our definition of the limit superior implies E({xn }) = {−∞}, that is, the limit superior of {xn } belongs to E({xn }) also in this case. • The limit superior is a real number, say α. By Theorem 2.4, α is the limit of a sequence of elements of E({xn }). So, to prove that α ∈ E({xn }) it is enough to show that E({xn }) contains the limit of any convergent sequence of its elements. So assume that y is the limit of a sequence in E({xn }). We show that y ∈ E({xn }), that is, y is the limit of a subsequence of {xn }.

2.2. Subsequences, Limit Superior and Limit Inferior

73

Choose n1 ∈ N such that xn1 = y (If such n1 could not exist, then the set E({xn }) would be equal to {y}, in which case there is nothing to prove.) Then δ := |xn1 − y| is a positive real number. Since y is the limit of a sequence in E({xn }), we may find z1 ∈ E({xn }) such that |z1 − y| < δ/4. But z1 ∈ E({xn }) and hence we can find n2 > n1 in N such that |xn2 − z1 | < δ/4. Therefore, it follows from the triangle inequality that |xn2 − y| ≤ |xn2 − z1 | + |z1 − y| < δ/2. Continuing in this way we find a strictly increasing sequence {nk } in N such that for every k > 1, (2.13)

|xnk − y|
lim supn→∞ xn . This contradiction proves our desired result. To prove the uniqueness, assume that z1 and z2 are distinct elements of E({xn }) with the property that for i = 1, 2, β > zi implies xn < β for all sufficiently large n. If z1 < z2 , choose x such that z1 < x < z2 . Then we can find N ∈ N such that xn < x for every n ≥ N . But then no element of E({xn }) can be greater than x. This contradicts our assumption that z2 > x is an element of E({xn }). When z2 < z1 , a similar reasoning works.  Exercise 2.71. Let {xn } be a sequence. Prove the following statements. (1) The limit inferior of {xn } belongs to E({xn }). (2) If β < lim inf n→∞ xn , then there exists some natural number N such that xn > β for every n ≥ N . Prove further that the limit inferior is the unique extended real number with the above two properties. As a useful consequence of Theorem 2.70 we prove the following. Theorem 2.72. Let {xn } and {yn } be sequences such that xn ≤ yn for all sufficiently large n. Then lim supn→∞ xn ≤ lim supn→∞ yn .

74

2. Sequences and Series of Real Numbers

To prove this theorem more easily, we first prove a simple lemma which is related to the extended real numbers. Lemma 2.73. Let x and y be extended real numbers with the property that x ≤ u for every extended real number u > y. Then x ≤ y. Proof. Assume to the contrary that x > y. Let z be a real number satisfying x > z > y. Since z > y, it follows from our assumption that x ≤ z, and this contradicts the inequality x > z.  What does Lemma 2.73 say? Lemma 2.73 says that when x cannot be greater than any extended real number greater than y, x cannot be greater than y itself. Proof of Theorem 2.72. Suppose N ∈ N is such that xn ≤ yn for all n ≥ N . Also, let x and y denote the limit superior of {xn } and {yn }, respectively. To prove x ≤ y, which is our desired result, we show in view of the above lemma that when u is an extended real number greater than y, x ≤ u. So assume that u > y is an extended real number. Then Theorem 2.70(2) gives us some N1 ∈ N such that yn < u for all n ≥ N1 . Now if N is as above and n ≥ max{N, N1 }, then (2.14)

xn ≤ yn < u.

Since x is the limit superior of {xn }, it follows from (2.14) that x ≤ u, as desired.



Exercise 2.74. Let {xn } and {yn } be sequences such that xn ≤ yn for all sufficiently large n. Prove that lim inf n→∞ xn ≤ lim inf n→∞ yn .

2.3. Cauchy Sequences We now turn to question (2.d). Can we find a limit-free formulation of convergence? To answer this question, let us see what happens when a sequence {xn } converges to some real number x. As we now easily understand, this assumption urges the terms xn to gather around x on the real line. This, in turn, shows that the terms should become closer and closer to each other, as their index tends to infinity. This intuitive interpretation can be stated in a mathematically rigorous way using the  − N approach to the definition of convergence. To do so, assume that  > 0 is given and choose N ∈ N such that |xn − x| < /2 for every n ≥ N . Then for natural numbers m and n satisfying m > n ≥ N , the triangle inequality yields   |xn − xm | ≤ |xn − x| + |x − xm | < + = . 2 2 Thus we observed that when a sequence {xn } converges, we can make its terms xn and xm as close as to each other as we wish, provided that m and n are sufficiently large. Clearly, this consequence does not use the value of the limit x, and is merely an outgrowth of the convergence. The importance of this observation motivates us to consider the following definition.

2.3. Cauchy Sequences

75

Definition 2.75. A sequence {xn } is said to be Cauchy if the following statement is true. For every  > 0, some natural number N can be found such that |xm − xn | <  for all m and n satisfying m > n ≥ N . With this definition, our above discussion implies that every convergent sequence is Cauchy. It then follows from the contrapositive law that a sequence that is not Cauchy is necessarily divergent. Quite naturally, one may ask about the convergence of Cauchy sequences. Before discussing this point, we present some examples of Cauchy sequences. Example 2.76. In each case determine if the given sequence is Cauchy. (1) xn = 1 + 1/22 + · · · + 1/n2 . (2) yn = 1 + 1/2 + · · · + 1/n. (3) zn = (−1)n . Solution. Let  > 0 be given. (1) We claim that {xn } is a Cauchy sequence. If m and n are natural numbers with m > n, then 1 1 1 + + ··· + 2 |xm − xn | = (n + 1)2 (n + 2)2 m       1 1 1 1 1 1 − − − < + + ···+ n n+1 n+1 n+2 m−1 m 1 1 1 1 − < + . = n m n m Note that the first inequality follows from 1 1 1 − , < k2 k−1 k which is valid for every natural number k > 1. Now, if we choose N so large that 1/N < /2, then |xm − xn | <  for every m > n ≥ N . (2) For every n ∈ N, 1 1 + ··· + >n |y2n − yn | = y2n − yn = n+1 n+n



1 2n

 =

1 . 2

Thus for  = 1/2, |ym − yn | <  cannot be true for all sufficiently large m and n. This shows that {yn } is not a Cauchy sequence. (3) If m and n are distinct natural numbers, then |zm − zn | is either 0 or 2. In fact, if m and n are both even or both odd, then |zm − zn | = 0. Otherwise |zm − zn | will be equal to 2. This tells us that |zm − zn | < 1 cannot be true for all sufficiently large m and n. Accordingly, the sequence {(−1)n } is not Cauchy. As we mentioned earlier, a sequence that is not Cauchy must be divergent. Thus, the sequences in (2) and (3) of the above example are divergent. (Note that the divergence of {(−1)n } was known to us.) But it is not still possible to determine the convergence or divergence of the sequence in (1). To overcome this deficiency, we need to answer our question about the relation of Cauchy’s condition and convergence. To this end, we first prove the following useful lemma.

76

2. Sequences and Series of Real Numbers

Lemma 2.77. Every Cauchy sequence is bounded. Proof. Let {xn } be a Cauchy sequence. For  = 1, we find N ∈ N such that for every n ≥ N , |xn − xN | < 1. This implies that for all such n, |xn | ≤ 1 + |xN |. Thus, if we let M = max{|x1 |, . . . , |xN −1 |, 1 + |xN |}, then |xn | ≤ M for every n ∈ N. This  means that {xn } is bounded. By the contrapositive law, the above lemma is equivalent to the fact that unbounded sequences are not Cauchy. The sequence {(−1)n } shows that a bounded sequence may fail to be Cauchy. Theorem 2.78. Every Cauchy sequence is convergent. Proof. Let {xn } be a Cauchy sequence. Since {xn } is bounded by Lemma 2.77, it has a convergent subsequence {xnk } by the Bolzano–Weierstrass theorem. Denote the limit of this subsequence by x. We claim that {xn } also converges to x. To see this, let  > 0 be arbitrary. Since {xnk } converges to x, k0 ∈ N can be found such that |xnk − x| < /2 for every k ≥ k0 . Also, that {xn } is Cauchy implies that |xn − xm | < /2 for every m and n greater than or equal to some N ∈ N. Now, choose M ≥ k0 such that nM > N. Then for every n ≥ N ,   |xn − x| ≤ |xn − xnM | + |xnM − x| < + = . 2 2 This completes the proof that {xn } converges to x.  The following useful fact was contained in the proof of the above theorem. Notice the similarity between this fact and Proposition 2.57. Proposition 2.79. If some subsequence of a Cauchy sequence converges to a real number x, then the sequence itself also converges to x. The answer to question (2.d). As a result of Theorem 2.78, we see that a sequence is convergent if and only if it is Cauchy. This shows that Cauchy’s condition is the limit-free formulation of convergence we were looking for and answers question (2.d) in affirmative. You will understand the importance of this limit-free approach to convergence in Section 2.5, where Cauchy’s criterion is used in the proof of some of our main theorems.

2.4. Sequences in Closed and Bounded Intervals When you finish studying this section, you will find that it is one of the smallest sections in the whole book. Nevertheless, this does not mean that the present material is not important. This can be described by a Persian proverb: Don’t see how tiny the pepper is—break it and see how hot it is! More precisely, this section is, in spite of its tiny body, one of the most important sections of the first part of the book!

2.4. Sequences in Closed and Bounded Intervals

77

Proposition 2.80. If {yn } is a sequence in [a, b] which converges to some y ∈ R, then y ∈ [a, b]. Proof. Assume to the contrary that y ∈ [a, b]. If y < a, let ε = (a − y)/2, and when y > b, let ε = (y − b)/2. Then in either case, the neighborhood of y with radius ε includes no one of the yn ’s. This contradicts our assumption that {yn } converges to y.  What does Proposition 2.80 say? Proposition 2.80 says that an interval of the form [a, b] contains the limit of any convergent sequence of its elements. When you proceed further in the book, you will see that this property is described by saying that [a, b] is a closed subset of R. Note that an arbitrary interval may not be closed in this sense. For example, the sequence {1/(2n)} is a convergent sequence of elements of (0, 1) whose limit lies outside this interval, and this shows that (0, 1) is not a closed set in the sense we just discussed. Theorem 2.81. Every sequence in [a, b] has a subsequence that converges to some point of [a, b]. Proof. If {yn } is a sequence in [a, b], it is bounded. So by the Bolzano–Weierstrass theorem {yn } has a subsequence that converges to some y ∈ R. Proposition 2.80 tells us that y ∈ [a, b], and this completes the proof.  A note on Theorem 2.81. Theorem 2.81 establishes a property for intervals of the form [a, b] which is of great importance in analysis. Because of this property we say that [a, b] is a compact subset of R. Again, the interval (0, 1) shows that an arbitrary interval may not be a compact set: the sequence {1/(2n)} of elements of (0, 1) has no convergent subsequence in this interval. If A is a subset of R and {xn } is a sequence of elements of A, then we say that {xn } converges in A if this sequence converges to some element x of A. Otherwise, we say that {xn } diverges in A. For example, the sequence {(sin n)/n} converges to 0 in R, but it is divergent in any set which does not contain 0, the interval (0, 2] for example. Theorem 2.82. In [a, b], every Cauchy sequence converges. Proof. Let {yn } be a Cauchy sequence of elements of [a, b]. By Theorem 2.81, {yn } has a subsequence that converges to some y ∈ [a, b]. Now, it follows from  Proposition 2.79 that {yn } also converges to y.

78

2. Sequences and Series of Real Numbers

A note on Theorem 2.82. Theorem 2.82 proves another crucial property for intervals of the form [a, b]. This is described by saying that such intervals are complete spaces. Since the Cauchy sequence {1/(2n)} diverges in (0, 1), we find that an arbitrary interval may fail to be complete in this sense. Please do not worry if you are not able to completely understand what is discussed in the above three boxes. You will understand what is going on shortly!

2.5. Series: Revisiting Some Convergence Tests One particular class of sequences is of great importance in calculus, and hence in analysis. This consists of sequences of sums of real numbers, known as series. To describe what is meant by a series rigorously, let {an } be an arbitrary sequence. We associate to {an } a sequence {sn } whose nth term is the sum of the first n terms of {an }. More precisely, sn = a1 + · · · + an for every n. The sequence {sn } may converge or diverge, depending on the original ∞sequence {an }. When it converges s and we write to a real number s, we say that the series n=1 an converges ∞ to ∞ a = s. In this case we also say that s is the value of n n=1 n=1 an . If {sn } is } is called the sequence divergent, we say that the series diverges. The sequence {s n  of partial sums of ∞ n=1 an , no matter the series is convergent or divergent. ∞ ∞ Exercise 2.83. Suppose n=1 an and n=1 bn are convergent series. Prove that ∞ ∞ ∞ (1) n=1 (an + bn ) = n=1 an + n=1 bn ; and  ∞ (2) for every real number λ, ∞ λa n = λ n=1 n=1 an . What we discussed above is the precise definition of series. But did the argument persuade you to study series? If not, here is an argument that will (hopefully) be persuading. In what follows we aim to interpret the sequence {sn } of partial sums of the n series ∞ n=1 1/2 geometrically. The first term s1 = 1/2 can be considered to be the area of a rectangle R1 of height 1 and width 1/2. The second term s2 = 1/2 + 1/22 can be thought of as the area of a shape R2 which is obtained from R1 by adjoining to it a square of sides 1/2. Having constructed the shapes R1 , . . . , Rn−1 for n > 2 such that the area of Ri is equal to si for i = 1, . . . , n − 1, we construct a shape Rn by adjoining to Rn−1 a rectangle (possibly a square) with area 1/2n . As can be seen in Figure 1, the shapes Rn are becoming arbitrarily close to a square R with sides of length 1. This means that the area sn of the shape Rn can be arbitrarily close to 1, the area of R, when n is sufficiently large. Analytically, this can be written as   1 1 1 + 2 + ···+ n = 1 (2.15) lim n→∞ 2 2 2 or with our above conventions as (2.16)

∞  1 = 1. n 2 n=1

2.5. Series: Revisiting Some Convergence Tests

Figure 1. Illustrating the sequence of partial sums of the series

79

∞

n=1

1/2n .

Although 1 is the limit of a sequence of finite sums by (2.15), the above geometric argument leads us to the (wrong) idea that 1 can be obtained by adding together the terms of the sequence {1/2n }, and this is the idea behind (2.16) in which 1 is written as an infinite sum. Of course, it should be noted that (2.16) is just a convention for (2.15). The same can be said in the general case for notational ∞ a , which is just another way of representing the limit of the sequence {sn } n n=1 of partial sums. Before presenting our first examples, it is convenient to note that we may work with series whose index starts from 0. In such cases a sequence  {an } exists in which n takes values in the set N ∪ {0} and the series is written as ∞ n=0 an . The sequence of partial sums then will be s0 = a0 , s1 = a0 + a1 , s2 = a0 + a1 + a2 , . . . . Item (4) of Example 2.84 is an instance of such series. Similarly, the index of a series can start from a number m > 1. For example, when we wish to form a series from the sequence ∞{1/ ln n}, n cannot start from 1, because ln 1 = 0. Thus we think of the series n=2 1/ ln n. for series ∞ ∞of the form ∞Although we will state most of our results and definitions a , they all can be applied to series of the forms a and n=1 n n=0 n n=m an with m > 1. Example 2.84. Determine the convergence or divergence of the following series. ∞ (1) 1/n. n=1 ∞ (2) 1/(n(n + 1)). n=1 ∞ (3) 1/n2 . n=1 ∞ n (4) n=0 a , where a is an arbitrary real number. ∞Solution. (1) By Example 2.76(2), the sequence of partial sums of the series n=1 1/n is not Cauchy and is therefore divergent. (2) Since 1/(n(n+1)) = 1/n−1/(n+1) for each n, the nth term of the sequence of partial sums is       1 1 1 1 1 1 tn = 1 − − − . + + ··· + =1− 2 2 3 n n+1 n+1

80

2. Sequences and Series of Real Numbers

Since {tn } converges to 1, the given series also converges to 1. (3) We saw in Example 2.76(1) that the sequence of partial sums of this series is Cauchy and therefore convergent. (4) Series of this form are known as geometric series. It can be easily verified that the geometric series diverges when |a| = 1. So we assume that |a| = 1. As we proved by induction in Example 1.33, the nth partial sum of this series is 1 − an+1 . 1−a By Example 2.15(4) and Example 2.21(4), the sequence {an } converges if and only if |a| < 1 (of course with the assumption |a| = 1), and that in this case, limn→∞ an = 0. So by letting n tend to infinity in (2.17), we see that the series  ∞ n n=0 a converges if and only if |a| < 1, and that in this case (2.17)

∞ 

an =

n=0

1 . 1−a

∞ An alternative proof for the divergence of n=0 1/n is sketched in Exercise 23 at the end of this chapter.  2 Example 2.85. Prove that ∞ n=1 1/(2n − 1) converges. Then assume that

and find the value of

∞ n=1

∞  1 π2 , = n2 6 n=1

1/(2n − 1)2 .

Solution. Let {sn }  and {tn } be the sequences of partial sums associated to the  ∞ 2 2 1/n and series ∞ n=1 n=1 1/(2n − 1) , respectively. It is then easy to see that for every n,   1 1 1 + ···+ = s2n − sn . (2.18) tn = s2n − 2 2 2 (2n) 4 The convergence of {tn } now follows from that of {sn } and (2.18). Since {sn } converges to π 2 /6, by letting n tend to infinity in (2.18), we obtain   ∞  1 π2 π2 π2 − . 1/(2n − 1)2 = = 6 4 6 8 n=1 Exercise  2.86. Let {bn } be a convergent sequence with limit b. Prove that the series ∞ n=1 (bn − bn+1 ) converges to b1 − b. ∞ The Remainders of a Series. For a series ∞ n=1 an and a given natural ∞ number k > 1, we may obtain a series of the form n=1 an+k (or equivalently n=k+1 an ) ∞ by neglecting the first k terms ∞ of the original series. We call n=1 an+k the remainder after k terms of n=1 an . ∞by {sn } and {tn } the sequences of partial sums of the series ∞If we denote a and n=1 n n=1 an+k , respectively, then for every n, tn = sk+n − sk .

2.5. Series: Revisiting Some Convergence Tests

81

∞ Now suppose that n=1 an converges to a real number s. Then letting n tend to infinity in the above equality, we find that ∞ 

(2.19) If we denote the remainder k tend to infinity that

∞

an+k = s − sk .

n=1

n=1

an+k by Rk , then it follows from (2.19) by letting lim Rk = 0.

k→∞

∞ In summary, when a series n=1 an converges, the sequence {Rk } of its remainders converges to 0. In particular, we obtain the following lemma which will be used in the proof of Theorem 2.107. ∞ Lemma 2.87. If n=1 an is a convergent series, then for every ε > 0 some k ∈ N can be found such that ∞  an < ε. n=k+1

What does Lemma 2.87 say? Lemma 2.87 says that a series converges only if the sum of its terms can be arbitrarily small for sufficiently large n.

Some Basic Convergence Tests. One important task in the theory of real series is to determine the convergence or divergence of a given series. Since this is not always as easy as we saw in Example 2.84, we need some general convergence tests. We begin with a necessary condition for the convergence of real series. ∞ Proposition 2.88. If n=1 an converges, then the sequence {an } converges to 0. Proof. Let {sn } be the sequence of partial sums. Then, an = sn − sn−1 for n > 1 and hence lim an = lim sn − lim sn−1 = 0.

n→∞

n→∞

n→∞



What does Proposition 2.88 say? The convergence of a sequence {an } to 0 implies that the terms an can be arbitrarily small for all sufficiently large n. Thus, Proposition 2.88 says that for a sequence {an } whose terms cannot be arbitrarily small, the series  ∞ n=1 an is necessarily divergent. ∞ Although the sequence {1/n} converges to 0, the series n=1 1/n is divergent by Example 2.84(1). This shows that  the convergence of {an } to 0 is just a necessary condition for the convergence of ∞ n=1 an which is by no means sufficient.

82

2. Sequences and Series of Real Numbers

Example 2.89. Determine the convergence or divergence of each series. ∞ (1) (2n − 1)/(n + 5). n=1 ∞ n (2) n=1 (−1) (2 + 3/n). ∞ 4 2 (3) n=1 (sin n)/n . Solution. (1) Since the sequence {(2n − 1)/(n + 5)} converges to 2, the series does not satisfy the necessary condition of Proposition 2.88 and, accordingly, it is divergent. (2) The sequence {(−1)n (2 + 3/n)} is divergent, because its limit superior and limit inferior are 2 and −2, respectively. Thus, the given series is also divergent. (3) Since {sin4 n} is bounded and limn→∞ 1/n2 = 0, limn→∞ (sin4 n)/n2 = 0 by Proposition 2.23. So, this series satisfies the necessary condition for convergence. Based on what we learned so far, it is not possible to determine the convergence or divergence of this series. To determine the convergence or divergence of series like the one in Example 2.89(3), we present the following necessary and sufficient condition. Theorem 2.90. A series with nonnegative terms is convergent if and only if its sequence of partial sums is bounded from above. ∞ Proof. Suppose n=1 an is a series with nonnegative terms and that {sn } is its associated sequence of partial sums. Since an+1 ≥ 0, sn+1 = sn + an+1 ≥ sn for every n. This shows that {sn } is an increasing sequence. Therefore, {sn } converges if and only if it is bounded from above, by Theorem 2.31 and Theorem 2.20.  Now, we can easily solve Example 2.89(3). ∞ Example 2.91. The series n=1 (sin4 n)/n2 is convergent. this, denote ∞ To see 2 the sequence of partial sums of this series and that of 1/n by {sn } and n=1 4 4 2 2 {tn }, respectively. ∞Since sin n ≤ 1, (sin n)/n ≤ 1/n and hence sn ≤ tn for every n. Since n=1 1/n2 is convergent by Example 2.84(3), {tn } is convergent and accordingly bounded. Thus, it follows the last inequality that {sn } is also from ∞ bounded. Now, Theorem 2.90 says that n=1 (sin4 n)/n2 is convergent. ∞ The method we just used to prove the convergence of the series n=1 (sin4 n)/n2 was to compare its terms with those of another series whose convergence or divergence is already known to us. The following useful theorem considers this comparison method in a general framework. Theorem 2.92 (The Comparison Test). Let {an } and {bn } be sequences of nonnegative real numbers. ∞  (1) If ∞ n=1 an is convergent and bn ≤ an for all sufficiently large n, then n=1 bn is also convergent.  ∞ (2) If ∞ n=1 an is divergent and bn ≥ an for all sufficiently large n, then n=1 bn is also divergent. ∞ ∞ Proof. (1) Denote the sequences of partial sums of n=1 bn and n=1 an by {sn } and {tn }, respectively. Also, assume that bn ≤ an holds for every n ≥ k. Let

2.5. Series: Revisiting Some Convergence Tests

83

 > 0 be given. Since {tn } is a Cauchy sequence, N ∈ N can be found such that |tm − tn | <  for all m and n satisfying m > n ≥ N . Now, for all m and n satisfying m > n ≥ max{k, N }, |sm − sn | = bn+1 + · · · + bm ≤ an+1 + · · · + am = |tm − tn | < .  This shows that {sn } is a Cauchy sequence, and hence that ∞ n=1 bn is convergent. ∞ ∞ (2) If b was convergent, we would find by (1) that n=1 n n=1 an is also convergent, contradicting our assumption.  Notice the way the limit-free formulation of convergence (Cauchy’s condition) is used in the proof of the above theorem. What does the comparison test say? Theorem 2.92 (1) says that a series whose terms cannot grow faster than those of a convergent series is necessarily convergent. The second assertion can be interpreted similarly.

Example 2.93. Discuss the convergence or divergence of the given series. ∞ √ (1) n=1 1/ n. ∞ 3 (2) n=1 1/n . ∞ 3 (3) n=1 [0.2 n]/n . ∞ √ Solution. (1) Since 1/ n ≥ 1/n for every n ∈ N and n=1 1/n is divergent, the given series is divergent by Theorem 2.92 (2).  2 (2) Since 1/n3 ≤ 1/n2 for each n ∈ N and ∞ n=1 1/n is convergent, the given series is convergent by Theorem 2.92 (1). (3) Since [0.2 n] ≤ 0.2 n < n for every n ∈ N, [0.2 n]/n3 < 1/n2 for all such n. The given series is therefore convergent by the comparison test. ∞ ∞ ∞ √ that n=1 1/ n and n=1 1/n are divergent, while n=1 1/n2 and ∞We saw 3 This leads us to a natural question: For which values n=1 1/n are convergent.  p of p is the series ∞ n=1 1/n convergent? To answer this question, we first prove a useful convergence test which is due to Cauchy. Theorem 2.94. Let {xn } be a decreasing sequence of nonnegative ∞ k real numbers. x converges if and only if the series Then, the series ∞ n=1 n k=0 2 x2k is convergent. Proof. By Theorem 2.90, to prove that each of the series is convergent, it is enough to show that the associated sequence of partial sums is bounded fromabove. So, ∞ ∞ let {sn } and {tk } be the sequences of partial sums for n=1 xn and k=0 2k x2k , respectively.

84

2. Sequences and Series of Real Numbers

∞ First assume that k=0 2k x2k is convergent and let M be an upper bound for {tk }. For arbitrary n ∈ N, choose some natural number k > n. Then n < 2k and sn

= ≤ ≤ =

x1 + · · · + xn x1 + · · · + x2k x1 + · · · + x2k+1 −1 x1 + (x2 + x3 ) + · · · + (x2k + · · · + x2k+1 −1 )

≤ x1 + 2x2 + · · · + 2k x2k = tk ≤ M.

∞

Thus, n=1 xn converges. Note that first inequality of the last row follows by the assumption that {xn } is a decreasing sequence. ∞ To prove the converse, suppose that n=1 xn is convergent and let N be an upper bound for {sn }. For a given k ∈ N, find n ∈ N such that n > 2k . Then = x1 + · · · + xn ≥ x1 + · · · + x2k = x1 + x2 + (x3 + x4 ) + · · · + (x2k−1 +1 + · · · + x2k ) 1 1 x1 + x2 + 2x4 + · · · + 2k−1 x2k = tk . ≥ 2 2 ∞ Hence, tk ≤ 2sn ≤ 2N . This shows that the series k=0 2k x2k is convergent. sn



∞ Theorem 2.94 determines the convergence situation of a series n=1 xn using }. As an application of this theorem, the relatively small subsequence {x2k } of {xn p we find those values of p for which the series ∞ n=1 1/n converges.  p Theorem 2.95. The series ∞ n=1 1/n converges if and only if p > 1. Proof. If p ≤ 0, then it can be easily verified that limn→∞ 1/np = 0. So, the series is divergent in this case. If p > 0, the sequence {1/np } is decreasing and we may use Theorem 2.94 above. To this end, we consider ∞ ∞

k   1 2(1−p) , 2k k p = (2 ) k=0

k=0

which is a geometric series. From Example 2.84(4) we know that this is convergent if and only if 2(1−p) < 1. The proof then finishes by noticing that the last inequality holds if and only if (1 − p) < 0.  Absolute Convergence. Thus far, we have presented some convergence tests for series whose terms are nonnegative. But what if the terms can be negative? Before stating more general convergence tests, we introduce a condition for series which is stronger than their convergence. This is absolute convergence, which when restricted to nonnegative series coincides with the usual notion of convergence.  Definition 2.96. A series ∞ n=1 an is said to be absolutely convergent if the series  ∞ |a | converges. n=1 n It is clear that a series with nonnegative terms is absolutely convergent if and only if it is convergent.

2.5. Series: Revisiting Some Convergence Tests

85

Example 2.97. Which of the following series are absolutely convergent? Which ones are not? ∞ n (1) n=1 (−1) /n. ∞ 5 (2) n=1 (cos n)/n . ∞ (3) n=1 (1 − cos(π/n)). ∞ ∞ Solution. (1) Since n=1 |(−1)n /n| = n=1 1/n is divergent, the given series is not absolutely convergent.   ∞ (2) For every n, (cos n)/n5  ≤ 1/n5 . So, n=1 |(cos n)/n5 | is convergent by the comparison test. The given series is therefore absolutely convergent. (3) For every x ∈ R, 1 − cos x = 2 sin2 (x/2) and | sin x| ≤ |x|. Thus     π2  1 π   2 π   . 1 − cos  = 2 sin ≤ n 2n 2 n2 ∞ Since n=1 1/n2 is convergent, the given series is absolutely convergent. If a series has some negative terms, the relation between its absolute convergence and convergence is still unclear. For example, what can be said about the convergence or divergence of the series we worked with in the above example? The following proposition answers this question in part. Proposition 2.98. Every absolutely convergent series is convergent. ∞ convergent. Denote by {sn } and {tn } the seProof. Let n=1 an be absolutely ∞ ∞ quences of partial sums of n=1 an and n=1 |an |, respectively. For natural numbers m and n with m > n, the triangle inequality yields   m m       |sm − sn | =  ak  ≤ |ak | = |tm − tn |.   k=n+1

k=n+1

Since {tn } is Cauchy, theabove inequality shows that {sn } is also a Cauchy se quence. This proves that ∞ n=1 an is convergent.  ∞ 5 We now find that the series ∞ n=1 (cos n)/n and n=1 (1 − cos(π/n)) are convergent, because we proved in the above example that these are absolutely conver∞ gent. But, the convergence or divergence of n=1 (−1)n /n is still unknown. To get rid of this ambiguity, we prove another convergence test which is due to Dirichlet. ∞ Theorem 2.99 (Dirichlet’s Test). Let the partial sums of n=1 an form a } is a decreasing sequence converging to 0, then the series bounded sequence. If {b n ∞ n=1 an bn is convergent. ∞ It now follows easily that n=1 (−1)n /n is convergent. To see this, we use Dirichlet’s test and the following facts. (1) The sequence {1/n} is decreasing and convergent to zero. ∞ (2) The sequence {tn } of partial sums of n=1 (−1)n is bounded. In fact, tn = −1 when n is odd and tn = 0 if n is even, so that |tn | ≤ 1 for every n.

86

2. Sequences and Series of Real Numbers

Thus, a series that is not absolutely convergent may be convergent. We will refer to such series as conditionally convergent.  Proof of Theorem 2.99. Denote the sequence of partial sums of ∞ n=1 an by let M be an upper bound for this sequence. To compute the nth partial {sn }, and ∞ sum of n=1 an bn , we set s0 = 0 to obtain n 

ak bk

=

k=1

= =

n 

(sk − sk−1 )bk

k=1 n  k=1 n 

sk bk −

n 

sk bk+1 + sn bn+1

k=1

sk (bk − bk+1 ) + sn bn+1 .

k=1

Since {sn bn+1 } converges to 0 by Proposition 2.23, to prove the convergence of n  ∞ it is enough to show that { s (bk − bk+1 )}, or equivalently that k n=1 an bn k=1 ∞ the series k=1 sk (bk − bk+1 ) is convergent. That {bn } is decreasing implies |sk (bk − bk+1 )| = |sk |(bk − bk+1 ) ≤ M (bk − bk+1 ).  convergent. Hence, As you verified in Exercise 2.86, the series ∞ k=1 (bk − bk+1 ) is  it follows from (2.20) and the comparison test that the series ∞ k=1 sk (bk − bk+1 ) is absolutely convergent. This completes the proof. 

(2.20)

Example 2.100. Prove that for every real number t and every p > 0, the series  ∞ p n=1 (sin nt)/n converges. Solution. If t = 2kπ for some k ∈ Z, then it is clear that the series converges to 0. So, we will assume in the sequel that t = 2kπ for every k ∈ Z. Since {1/np } is decreasing and convergent to 0 by the assumption p > 0, it is sufficient in view  sin nt is of Dirichlet’s test to show that the sequence {sn } of partial sums of ∞ n=1 bounded. Since for arbitrary real numbers a and b, 1 sin a sin b = (cos(a − b) − cos(a + b)) , 2 a simple calculation shows that   1 2n + 1 t t sn sin = t . cos − cos 2 2 2 2 This in turn implies

    sn sin t  ≤ 1,  2

or equivalently, 1 . | sin(t/2)| Note that the assumption t = 2kπ is used here. |sn | ≤

Example 2.101. Suppose an = 1 if n = 3k − 2, an =  2 when n√= 3k − 1, and an = −3 if n = 3k for some k ∈ N. Prove that the series ∞ n=1 an / n converges.

2.5. Series: Revisiting Some Convergence Tests

87

∞ Solution. If {s√ n } is the sequence of partial sums of n=1 an , then sn ≤ 3 for every n. Since {1/ n} is decreasing and converges to 0, the desired result follows from Dirichlet’s test.  n The convergence of ∞ n=1 (−1) /n also follows from the following special case of Dirichlet’s test. Corollary 2.102. (The alternating series test) {bn } is a decreasing sequence If ∞ of positive real numbers that converges to 0, then n=1 (−1)n bn is convergent. Proof. Use Dirichlet’s test with an = (−1)n .



The test was first proved by Leibniz. For this reason, it is also known as Leibniz’s test. ∞ Note that when {bn } is a sequence of positive numbers, the terms of the series n=1 (−1)n bn are alternatively positive and negative. This is the reason we call such series alternating.  n p Example 2.103. For which values of p is the alternating series ∞ n=1 (−1) /n absolutely convergent? For which values is the series conditionally convergent? Solution. By Theorem 2.95, the series is absolutely convergent if and only if p > 1. If 0 < p ≤ 1, it is conditionally convergent by Leibniz’s test. The Root and Ratio Tests Revisited. The root and ratio tests are the most important convergence tests which are usually taught in calculus. In analysis, the tests are strengthened to be of use in a wider class of examples. This provides us with a good instance of the way analysis completes our calculus-based knowledge. To understand how the tests are strengthened, let us recall that version of the root test which is usually taught in calculus. ∞ For a series n=1 an let  (2.21) α = lim n |an |. n→∞

Then, the series converges absolutely when α < 1 and it is divergent if α > 1. The test is inconclusive when α = 1. This test is helpful only when the limit in (2.21) exists as an extended real number, in which case we have either α ∈ [0, +∞) or α = +∞. But it is easy to find sequences for which the limit in (2.21) does not exist in this sense. An instance n is the sequence {an } defined by an = 1/2n if n is even and an = 1/3 ∞ when n is odd. If we try to determine the convergence situation of the series n=1 an using the above test, we see that the sequence { n |an |} cannot tend to a limit in the extended real number system. This is because   1 1 lim inf n |an | = , lim sup n |an | = . n→∞ 3 n→∞ 2 This motivates us to seek a strengthened version of the root test. Since  every sequence has a limit superior, our hope is to use the limit superior of { n |an |} to ∞ determine the convergence or divergence of n=1 an . It is one of our important observations in this section that this hope can be made into reality.

88

2. Sequences and Series of Real Numbers

Theorem 2.104 (The Root Test). Given a series  α = lim sup n |an |.

∞ n=1

an , let

n→∞

∞ (1) If α < 1, then n=1 an is absolutely convergent.  (2) If α > 1, then ∞ n=1 an is divergent. Why is Theorem 2.104 a strengthened version of the root test? We say that Theorem 2.104 is a strengthened version of the above version of n |a the root test because of this simple reason: the limit superior of { n |} al ways exists, even when { n |an |} does not convergein R∗ , and when { n |an |} tends to an element of R∗ , the limit superior of { n |an |} is also equal to this value. ∞It now follows from this strengthened root test that the above-mentioned series n=1 an is absolutely convergent. This is because  1 lim sup n |an | = < 1. 2 n→∞  The says nothing when α = 1. To understand why, consider ∞ n=1 1/n ∞root test and n=1 1/n2 . By Example 2.15(5), the quantity α of the root test is equal to 1 for both series, while the former is divergent and the latter is convergent. Proof of Theorem 2.104. (1) Let β = (α + 1)/2. Since  β > α, Theon rem 2.70(2) gives us some N ∈ N such that for every n ≥ N , |an | < β, or equivalently, (2.22)

∞

|an | < β n .

geometric series. So, it follows from Since 0 < β < 1, n=1 β n is a convergent ∞ (2.22) and the comparison test that n=1 |an | is also convergent.  (2) Since α > 1, an infinite number of the terms of { n |an |} are greater than 1. Since the sameis then true for {|an |}, the sequence {an } cannot converge to 0. ∞ This shows that n=1 an is divergent. Example 2.105. Determine the convergence or divergence of the given series.

∞ 2n( √ n n−1) n (1) . n=1 n+1 ∞ n (2) n=1 (n + 1/n) . Solution. (1) Since      √

 2n( n n − 1) n  √ 2n n   lim ( n n − 1) = (2)(0) = 0, lim = lim   n→∞ n→∞ n + 1 n→∞ n+1 the series is convergent by the root test. (2) Since

  n    1 1  n  lim = +∞, lim n +  n + n  = n→∞ n→∞ n the series is divergent by the root test.

2.5. Series: Revisiting Some Convergence Tests

89

Next, we present a strengthened version of the ratio test. To state this more fruitfully, we first recall the calculus version.  For a series ∞ n=1 an with nonzero terms, let α = limn→∞ |an+1 /an |. Then the series converges absolutely when α < 1, and it is divergent if α > 1. The test is inconclusive when α = 1. The recursively defined sequence ⎧ 1  ⎨ 2 + n1 an n even, n ∈ N, (2.23) a1 = 1, an+1 =  ⎩ 1 1 a + n odd, n 3 n shows that the above version of the ratio test is not of sufficient efficiency. This is because      an+1  1  an+1  1    = , (2.24) lim inf  = and lim sup  n→∞ an  3 an  2 n→∞ so that the limit used in the above test does not exist for this particular sequence. Analysis helps us by proving the following strengthened version of the ratio test.  Theorem 2.106 (The Ratio Test). For a series ∞ n=1 xn with nonzero terms, let      xn+1   xn+1     . , δ = lim sup  γ = lim inf  n→∞ xn  xn  n→∞  (1) If δ < 1, then ∞ xn is absolutely convergent. n=1 ∞ (2) If γ > 1, then n=1 xn is divergent. Proof. (1) For β := (δ + 1)/2 > δ, Theorem 2.70(2) gives us some N ∈ N such that for every n ≥ N , |xn+1 /xn | < β, or equivalently, |xn+1 | < β|xn |. Thus, if we let kn = n − N for n > max{1, N }, then |xn | < β|xn−1 | < β 2 |xn−2 | < · · · < β kn |xn−kn | = cβ n , ∞ where c =  β −N |xN |. Since 0 < β < 1, n=1 β n is a convergent geometric series. ∞ The series n=1 |xn | is therefore convergent by (2.25) and the comparison test. (2) Since γ > 1, Exercise 2.71(2) gives us some N ∈ N such that for every n ≥ N , |xn+1 /xn | > 1 or equivalently |xn+1 | > |xn |. This shows that {|xn |}, and accordingly ∞the sequence {xn }, cannot converge to 0. Hence, Proposition 2.88 tells us that n=1 xn is divergent.  (2.25)

An informal interpretation of the ratio test. The quantities δ and γ measure the relative growth of the terms ∞ of the sequence {|xn |}. This is why δ 1 tells us that n=1 xn is divergent. The ratio test gives us no information when      xn+1   xn+1     . lim inf  ≤ 1 ≤ lim sup  n→∞ xn  xn  n→∞ ∞  2 This can be seen by considering the series ∞ n=1 1/n and n=1 1/n .

90

2. Sequences and Series of Real Numbers

As a result of Theorem 2.106 and the limit superior computed in (2.24), we  a find that for the sequence {an } introduced in (2.23), the series ∞ n=1 n converges. A note on the root and ratio tests. Although the ratio test can be applied easier than the root test, the latter can be used in a wider class of examples than the former. As an illustration  n (−1)n n . If we let an = 2(−1) n , then for this claim, consider the series ∞ n=1 2      an+1     = 0, lim sup  an+1  = +∞, lim inf    n→∞ an an  n→∞ so that the ratio test gives us no information about the convergence situation of this series. The ratio test however shows that the series diverges, because  lim sup n |an | = 2 > 1. n→∞

This point is discussed more formally in Exercises 43  and 44 at the end (−1)n n is so of this chapter. Of course, the divergence of the series ∞ n=1 2 apparent that there is no need to use such advanced tools as the ratio and root tests. Can you explain why?

2.6. Rearrangements of Series  The sequence {sn } of partial sums of a series ∞ n=1 xn depends on the way we add the terms xn to each other. More precisely, each term sn is obtained from {xn } by adding together its first n terms. If we change the order of addition, we get another sequence in place of {sn }, and accordingly a new series.  It is surprising that the new series may be divergent when the original series ∞ n=1 xn converges, and vice versa!  n+1 /n, which is To illustrate our discussion, let us consider the series ∞ n=1 (−1) convergent by the alternating series test. This series is obtained form the sequence 1 1 1 1 1 1 1 1 1 1 1 1, − , , − , , − , , − , , − , , − , . . . . 2 3 4 5 6 7 8 9 10 11 12 Now, let us “rearrange” this sequence as 1 1 1 1 1 1 1 1 1 an : 1, − , − , , − , − , , − , − , , . . . , 2 4 3 6 8 5 10 12 7 in which each positive term is followed by two negative terms. We call {an } a rearrangement of the original sequence {(−1)n+1 /n} because the terms of the former sequence are the same as those of the latter; we just changed the way are ∞the terms n+1 a and (−1) /n, arranged. To compare the convergence situation of ∞ n=1 n n=1 let {tn } and {sn } denote the sequences of partial sums of these series, respectively. It is then easy to see that     1 1 1 + ···+ s2n = 1 + · · · + − 2n − 1 2 2n 

and t3n =

1 + ···+

1 2n − 1



 −

1 1 + ··· + 2 4n

 .

2.6. Rearrangements of Series

91

Then for every n,

  1 1 1 s2n − t3n = + ··· + . 2 n+1 2n But it can be easily verified using induction that for every n,     1 1 1 1 1 + ··· + + ···+ . − = 1 + ··· + 2n − 1 2 2n n+1 2n

Hence we obtain s2n − t3n =

1 s2n , 2

or equivalently, 1 s2n . 2 ∞ Now denote the limit of {sn }, that is the sum of n=1 (−1)n+1 /n, by s. Then it follows from (2.26) that 1 lim inf tn ≤ s < s. n→∞ 2 Note that we wrote s/2 < s because s2n−1 > 1/2 for every n ∈ N, and hence s cannot be equal to 0. ∞ ∞ The conclusion is that even if the rearrangement n=1 an of n=1 (−1)n+1 /n converges, it cannot converge to the same value as the original series. Thus, by rearranging the terms of a convergent series, we may obtain a series with different value. This provides a partial answer to question (2.f ). We obtain a more complete answer to (2.f ) as a result of our next theorem. Note that the series ∞ n+1 (−1) /n is conditionally convergent. n=1  Theorem 2.107. Suppose ∞ absolutely convergent series which conn=1 xn is an verges to s. Then every rearrangement of ∞ n=1 xn also converges to s. (2.26)

t3n =

Before proving this theorem, we should first describe the precise meaning of rearrangement. Look at the sequence xn := (−1)n+1 /n and its rearrangement {an } again. Each term of the former sequence appears only once in the latter, and {an } has no terms other than those of {xn }. This means that the terms of these sequences are in one-to-one correspondence. More precisely, a one-to-one function f from N onto N exists such that for every n ∈ N, an = xf (n) . For example, since a3 = x4 = −1/4, f (3) is equal to 4. The above argument is our motivation for the following definition. ∞ ∞ ∞ Definition 2.108.Let n=1 xn and n=1 an be series. We say that n=1 an is a ∞ rearrangement of n=1 xn if a one-to-one function from N onto N exists such that for every n ∈ N, an = xf (n) . With this definition, we are now ready to prove Theorem 2.107. Proof of Theorem 2.107. Suppose f is a one-to-one function from N onto ∞ N. We show that the corresponding rearrangement n=1 xf (n) also converges to ∞ ∞ s. To see this, denote the sequences of partial sums of n=1 xn and n=1 xf (n) by {sn } and {tn }, respectively. To complete the proof, we should show that {tn } converges to s.

92

2. Sequences and Series of Real Numbers

∞ So, let ε > 0 be given.Since n=1 xn converges absolutely, Lemma 2.87 gives ∞ us some k ∈ N such that n=k+1 |xn | < ε. Letting N = max{f −1 (l) : l = 1, . . . , k}, where f −1 is the inverse function of f , we see that (2.27)

{1, . . . , k} ⊆ {f (1), . . . , f (N )}.

If n ≥ N , then (2.27) shows that the terms x1 , . . . , xk will not appear in the subtraction tn − sn . To understand why, suppose for example that 1 = f (j) for some j ∈ {1, . . . , N }. Then xf (j) and −x1 both appear in the summation that determines tn − sn . Now, for every n ≥ N , |tn − sn | ≤

∞ 

|xn | < ε.

n=k+1

This shows that {tn − sn } converges to 0 and completes the proof.



What does Theorem 2.107 say? Theorem 2.107 says that rearranging an absolutely convergent series does  n+1 not affect its value. The series ∞ (−1) /n shows that the same is not n=1 true for conditionally convergent series.

2.7. Power Series One important part of the theory of real series, which is contained in most calculus courses, is the theory of power series. Describing our intention of considering power series is not possible without knowing their formal definition. We therefore begin with the required definition. Definition 2.109. Letx0 be a real number. A power series centered at x0 is an n expression of the form ∞ n=0 an (x − x0 ) in which {an } is an arbitrary sequence of real numbers. The basic goal in the theory of power series is the following. ∞ Given a power series n=0 an (x − x0 )n centered at x0 , determine those values of x for which the series is convergent.  n Of course, it is clear that ∞ n=0 an (x − x0 ) converges to a0 when x = x0 . The above problem can be easily resolved using the convergence criteria we have developed so far. Example 2.110. Let x0 be a real number. In each case determine those values of x for which the given power series is convergent. ∞ n (1) n=0 n!(x − x0 ) . ∞ (x−x0 )n . (2) n=0 n ∞ (x−x0 )n . (3) n=0 nn

2.7. Power Series

93

Solution. (1) Suppose x is a fixed real number other than x0 . Considering we may use the ratio test for determining the {n!(x − x0 )n } to be a sequence,  n!(x − x0 )n . This gives us the quantity convergence situation of ∞ n=0    (n + 1)! (x − x0 )n+1    = lim |x − x0 |(n + 1). Lx = lim   n→∞ n→∞ n! (x − x0 )n ∞ By the ratio test, the series n=0 n!(x − x0 )n converges absolutely when Lx < 1, and diverges when Lx > 1. If x is any real number other than x0 , then Lx = |x − x0 | lim (n + 1) = +∞. n→∞

 n So the ratio test tells us that the series ∞ n=0 n!(x − x0 ) converges if and only if x = x0 . (2) If x = x0 is any real number, then      (x − x0 )n+1 /(n + 1)  n  = lim |x − x0 |  Lx = lim  = |x − x0 |.  n→∞ n→∞ (x − x0 )n /n n+1 So, the series converges absolutely when |x − x0 | < 1 and diverges if |x − x0 | > 1, by the ratio test. When |x − x0 | = 1, in which case we have either x = x0 + 1 or x = x0 − 1, theratio test gives us no information. Substituting x0 + 1 for x in the ∞ series, we get n=0 ∞1/n, which is a divergent series. If we let x = x0 − 1 in the series, we obtain n=0 (−1)n /n which converges by the alternating series test. Hence, in summary, the only points at which the given power series converges are those of the interval [x0 − 1, x0 + 1) centered at x0 . (3) For every x,    (x − x0 )n   = lim |x − x0 | = 0. Lx = lim n   n→∞ n→∞ nn n Therefore, the series converges for every x ∈ R by the root test. Noticing the above example, we find at least three distinct types of power series centered at x0 . ∞ (1) Power series such as n=0 n!(x − x0 )n , which are convergent only at x0 .  n (2) Power series such as ∞ n=0 (x − x0 ) /n, which are convergent in an interval centered at x0 . ∞ (3) Power series such as n=0 (x − x0 )n /nn that converge for every x ∈ R. That these are the only convergence types follows from the following theorem.  n Theorem 2.111.Suppose ∞ n=0 an (x − x0 ) is a power series centered at x0 , and n α = lim supn→∞ |an |. If (1) α = 0, then the series converges absolutely for every x ∈ R; (2) 0 < α < +∞, then the series converges absolutely for every x in the interval (x0 − 1/α, x0 + 1/α) and diverges for every x satisfying |x − x0 | > 1/α; (3) α = +∞, then the series converges only at x0 .

94

2. Sequences and Series of Real Numbers

Proof. For every x ∈ R, let Lx = lim supn→∞

 n |an (x − x0 )n |. Then

Lx = α|x − x0 |. (1) If x is any real number, then Lx = 0 by the assumption that α is equal to 0. The root test therefore shows that the series converges absolutely for every x ∈ R. (2) By the root test, the series converges absolutely if Lx < 1 and diverges when Lx > 1. This proves the desired result. (3) For every x = x0 the assumption α = +∞ yields Lx = +∞. The series is  therefore divergent for x = x0 . In Theorem 2.111(2), the series may or may not converge at the boundary points x = x0 − 1/α and x = x0 + 1/α. These are the points at which Lx is equal to 1. The convergence situation should be examined separately at these points, just as was done in Example 2.110(2). In each case in Theorem 2.111 we can imagine that the power series ∞ 

an (x − x0 )n

n=0

converges on an interval with center x0 and radius 1/α. We call the interval (which is R when α = 0 and is equal to the annihilated form {x0 } when α = +∞) the interval of convergence of the series, and we say that 1/α is the radius of convergence of the series. For example, ∞ (1) the interval of convergence for n=0 n!(x − x0 )n is {x0 }, and the radius of convergence is 0;  n (2) the interval of convergence for ∞ n=0 (x − x0 ) /n is [x0 − 1, x0 + 1), and the radius of convergence is 1; ∞ n n (3) the interval of convergence for n=0 (x − x0 ) /n is R, and the radius of convergence is +∞. Why do we study power series?  n When a power series ∞ n=0 an (x − x0 ) converges for every x in some subset A of R, an interval centered at x0 or R itself, we can define a function f on A by ∞  (2.28) f (x) = an (x − x0 )n . n=0

One important theme in analysis is to study such functions, that is, functions which are defined by a power series. One more important subject to think about is the converse of this: Given a function f defined on some interval centered at x0 , can we represent f as the sum of a power series, as in (2.28)? This question will be answered in Chapter 4 as a result of Taylor’s theorem.

2.7. Power Series

95

We have already seen an example of a function  whichn is defined in terms of a power series. In fact, since the geometric series ∞ n=0 x converges to 1/(1 − x) for every x ∈ (−1, 1) (see Example 2.84(4)), we obtain the following power series expansion of the function f (x) = 1/(1 − x) on the interval (−1, 1): ∞  1 = xn ; |x| < 1. 1 − x n=0

(2.29)

Expansions such as (2.29) will be studied seriously at the end of Chapter 4. As our concluding argument in this chapter, we use the important expansion (2.29) to obtain some other formulas of this kind. Example 2.112. Expand each of the given functions in a power series centered at 0. Determine the subset of R on which the expansion is valid. (1) f (x) = 1/(1 + x). (2) g(x) = 1/(1 + x2 ). Solution. | − x| < 1,

(1) Since f (x) = 1/(1 − (−x)), it follows from (2.29) that when

f (x) =

∞ 

(−x)n .

n=0

Since | − x| = |x|, we obtain the series expansion ∞  1 (−1)n xn ; |x| < 1. = 1 + x n=0

The subset of R on which the expansion is valid is therefore (−1, 1). (2) If | − x2 | < 1, then (2.29) tells us that

g(x) =

∞ 

(−x2 )n .

n=0

Since | − x2 | = |x2 | < 1 if and only if |x| < 1, the desired expansion is as follows. ∞  1 = (−1)n x2n ; |x| < 1. 1 + x2 n=0

96

2. Sequences and Series of Real Numbers

What is a series expansion good for? When we expand a function in a power series, we consider the function, in an appropriate sense, to be the limit of a sequence of polynomials. For example, for every n ∈ N∪{0}, consider the polynomial sn (x) = 1+· · ·+xn . Then it follows from (2.29) that for every x ∈ (−1, 1), 1 = lim sn (x). (2.30) 1 − x n→∞ We describe (2.30) by saying that the function 1/(1 − x) is the pointwise limit of the sequence {sn } of polynomials on the interval (−1, 1). Therefore, series expansions allow us to find polynomial approximations of functions. These points will become clear gradually within this book.

Notes on Essence and Generalizability In this chapter we saw the way analysis helps us to enhance our calculus-based knowledge of sequences and series. Perhaps many results and convergence tests presented in this chapter were known to you. We have included such results both for the sake of completeness and to prove them rigorously, what is usually neglected in calculus courses. On the other hand, some of the material presented in this chapter was new for newcomers to analysis. Such concepts and results were aimed at completing the theory of sequences and series one learns in calculus, among which are the following. (1) The concepts of limit superior and limit inferior which helped us to better understand the convergence situation of a sequence in terms of that of its subsequences. As we saw, these concepts also enabled us to present the strengthened versions of the root and ratio tests. (2) The notion of Cauchy sequence, which endowed us with a limit-free formulation of convergence. As we observed at several points in the theory of series, this formulation of convergence is a powerful tool in the proofs that allows us to avoid the explicit use of limits. (3) The material on the rearrangement of series, which is not usually considered so serious in calculus courses. Some aspects of real sequence theory can be generalized to the metric space context and some others cannot. In Chapter 7 we will define the notion of convergence for sequences in arbitrary metric spaces. Cauchy sequences are also generalizable to this context, but we will see that in a generic metric space, a Cauchy sequence may fail to be convergent. This will lead us to the important concept of a complete metric space: a space in which Cauchy sequences converge. We have already seen instances of such spaces: R itself (Theorem 2.78) and intervals of the form [a, b] (Theorem 2.82). Another important property which will be generalized in the metric space theory is compactness. This is the property we first proved for intervals of the form [a, b] in Theorem 2.81: a space X is said to be compact if every sequence in X has

Exercises

97

a subsequence that converges to some point of X. As you saw in the current chapter, and you will see in the next chapter, most of the properties that distinguish intervals of the form [a, b] from the other subsets of R are due to this important property. Series cannot, in general, be considered in metric spaces because their definition depends on the algebraic operation of addition, and this may have no meaning in a generic metric space. Nevertheless, in Chapter 9, we will consider series whose terms are real-valued functions and study them in some detail.

Exercises 1. In each case find the supremum of the given set, then find a sequence in the set that converges to the supremum. (a) B = (−∞, 2). (b) C = {1/2n + 1/5m : m, n ∈ N}. 2. Verify the given equalities by the ε − N definition of limit. (a) limn→∞ (2n − 1)/(n + 4) = 2. (b) limn→∞ (n2 + 3n − 1)/(2n2 − n + 1) = 1/2. 3. Which of the following sequences are bounded? (a) {(n2 + 1)/(3n + 1)}. (b) {ln(n + 1)}. (c) {1/(sin(n − 1))}. (d) {e−n }. 4. In each case describe why the given sequence converges to 0. (a) {(−1)n / ln(n + 2)}. (b) {(cos 2n)/(n3 − 1)}. 5. In each case determine if the given sequence is monotone. (a) {n ln(2n)}. (b) {sin n}. (c) {(−1)n /n}. 6. Find the value of



1 1 + ···+ 2. n→∞ 4 n 7. Suppose {an } is a sequence such that lim

n

1+

0 ≤ an+m ≤ an + am holds for all m, n ∈ N. Prove that the sequence {an /n} is convergent. 8. Suppose {xn } is a bounded sequence such that xn+1 ≥ xn −

1 2n

for every n ∈ N. Prove that {xn } is convergent. 9. Verify that the sequence {n/2n } is strictly decreasing, and find its limit.

98

2. Sequences and Series of Real Numbers

10. In each case find a monotone subsequence of the given sequence. (a) {(−1)n / ln n}. (b) {1 − sin(nπ/4)}. 11. In each case verify that the given sequence is bounded, then find a convergent subsequence. (a) {sin(nπ/4)}, (b) {(−1)n n/(n2 + 1)}. 12. Find the value limits. √ of the following √ (a) limn→∞ √n + 1 − n. (b) limn→∞ 4n2 + n − 2n. 13. Find the limit superior and limit inferior of the given sequences. (a) {(−2)n (1 + 1/n)}. (b) {(−1)n (1 + 1/n)}. (c) {n(1 + (−1)n )}. 14. If {xn } is an arbitrary sequence with range X, prove that inf X ≤ lim inf xn ≤ lim sup xn ≤ sup X. n→∞

n→∞

15. Let {xnk } be an arbitrary subsequence of a sequence {xn }. Prove that lim inf xn ≤ lim inf xnk ≤ lim sup xnk ≤ lim sup xn . n→∞

k→∞

k→∞

n→∞

Deduce from the above inequalities that when a sequence converges to x, all its subsequences do so. 16. Let {xn } be any sequence. Prove that lim sup xn = lim sup{xn : n > N } n→∞

N →∞

and lim inf xn = lim inf{xn : n > N }. n→∞

N →∞

17. If {xn } is a sequence that converges to a positive real number x and {yn } is any sequence, then prove that lim sup(xn yn ) = x lim sup yn . n→∞

n→∞

Here the conventions x(+∞) = +∞ and x(−∞) = −∞ can be used. 18. Let {xn } be a divergent sequence such that the sequence {|xn |} converges to some x ∈ R. Prove that x is nonzero and that lim inf xn + lim sup xn = 0. n→∞

n→∞

Can you give an example of such a sequence? 19. Let {xn } be a sequence, and let 0 < α < 1 be such that |xn+2 − xn+1 | ≤ α|xn+1 − xn | for every n. Prove that {xn } is a Cauchy sequence.

Exercises

99

20. Prove that the recursively defined sequence 1 , n ∈ N, an √ is Cauchy. Then verify that {an } converges to 1 + 2. a1 = 2, an+1 = 2 +

21. Verify that the recursively defined sequence 4 + 3xn , n ∈ N, 3 + 2xn √ is Cauchy. Then prove that {xn } converges to 2. x1 = 1, xn+1 =

22. In each of the given intervals find a divergent Cauchy sequence. (a) (1, +∞). (b) [0, 1/2).  23. Let {sn } be the sequence of partial sums of the series ∞ n=1 1/n. Prove that for every k ∈ N, k s2k ≥ 1 + . 2  Then use this fact to deduce that the series ∞ n=1 1/n is divergent. 24. Determine the convergence situation of each of the following series. ∞ 3 (a) n=1 (n!)3 /en . ∞ n 3n (b) n=2 (−1) /(ln √ n) . ∞ n n (c) n=1 (−1) / n. ∞ (d) (2 + cos n)/n4 .  ∞ n=1 25. If n=1 an is convergent, prove that ∞ n=1 an /n is also convergent. ∞ ∞ 2 26.  If that n=1 an is divergent, does it follow n=1 an is also divergent? If  ∞ ∞ 2 a converges, is it true that a is also convergent? n n=1 n=1 n  ∞ 27. If n=1  an is 2a convergent series with nonnegative terms, then prove that the series ∞ an is also convergent. ∞ n=1 28. If n=1 anis absolutely convergent and {bn } is a bounded sequence, prove that ∞ the series n=1 an bn converges.  ∞ 29. Let n=1 xn and ∞ n=1 yn be convergent series of nonnegative numbers. Prove ∞ √ that n=1 xn yn also converges. ∞ ∞ 30. Let n=1 an and n=1 bn be series with positive terms suchthat the sequence ∞ {an /bn } converges  to a positive real number L. Prove that n=1 an is conver∞ gent if and only if n=1 bn is so. 31. Suppose {an } is a sequence of nonnegative numbers. Let s1 = 1, and for every n ∈ N define  1 sn+1 = (sn + s2n + an ). 2 Prove that for every n ∈ N, s n+1 ≤ sn + an /4. Then deduce that the conver∞ gence of n=1 an implies that of {sn }.  n 32. Verify that the series ∞ n=1 n/3 converges by using (a) the ratio test, and (b) the comparison test.

100

2. Sequences and Series of Real Numbers

∞ 33. Suppose n=1 an is a conditionally convergent series. For each n let bn = an if an < 0, and let bn = 0 if an ≥ 0. Similarly, let cn =an if an > 0, and let  ∞ b cn = 0 if an ≤ 0. Prove that the series ∞ n=1 n and n=1 cn both diverge. What do you conclude from this exercise? 34. Show, by means of an example, that the conclusion of the alternating series test may not hold if we omit the assumption that the involved sequence is decreasing. 35. Suppose {an } is a decreasing sequence of positive real numbers such that ∞ a diverges. Prove that n n=1 lim

n→∞

a1 + a3 + · · · + a2n−1 = 1. a2 + a4 + · · · + a2n

∞ 36. Let n=1 an be a divergent series with positive terms, and let {sn } be its associated sequence of partial sums. Prove that  (a) ∞ n=1 an /sn is divergent, ∞ (b) an /s2n is convergent. ∞n=1 ∞ 37. Let n=1 xn and n=1 yn be series with positive terms such that yn+1 xn+1 ≤ xn yn for every ∞ n. Prove the following statements. (a) If ∞ n=1 yn converges, then n=1 xn is also convergent. ∞ (b) If n=1 xn diverges, then ∞ n=1 yn is also divergent. 38. Suppose {an } is a sequence for which lim inf |an | = 0. n→∞

Prove that a subsequence {ank } of {an } can be found such that converges.

∞

k=1 ank

39. Suppose an = 1 if n = 3k − 2, an = −2when n = 3k − 1, and an = 1 if n = 3k for some k ∈ N. Prove that the series ∞ n=1 an /n converges. 40. Prove that the series

∞ 

1 n(ln n)p n=2 converges if p > 1 and diverges when p ≤ 1. Hint. Use Theorem 2.94. 41. Observe that the ratio test cannot be used to determine the convergence sit∞ n uation of the series n=1 2(−1) −n . Then use both of the following tests to deduce that the series converges. (a) The root test. (b) The comparison test. 42. In each case observe that for the given series none of the root and ratio tests can be used to determine the convergence situation. Then determine the convergence of the series using an appropriate test. ∞ or divergence √ (a) n=1 (−1)n / n. 2 (b) ∞ n=1 n/(n + 3). ∞ n n (c) n=1 (2/((−1) − 3)) .

Exercises

101

43. Let {xn } be a sequence of nonzero real numbers. Prove that        xn+1   xn+1  n n    . ≤ lim inf |xn | ≤ lim sup |xn | ≤ lim sup  lim inf  n→∞ n→∞ xn  xn  n→∞ n→∞ Use the above inequalities to show that if limn→∞ |xn+1 /xn | exists and is equal  to L, then limn→∞ n |xn | also exists and is equal to L. 44. The above exercise suggests that the root test can be used in a wider framework than the ratio test. In other words, the above exercise shows that when the ratio test can be applied to determine the convergence situation of a series, the root test can be used as well. To see that the converse of this is not true, define a sequence {an } as follows. Let an = (1/2)(n+1)/2 when n is odd, and let an = (1/3)n/2 if n is even. Find the limit superior and limit inferior of the sequences {an+1 /an } √ and { n an }. Then verify that the ratio ∞ test cannot be used to determine the convergence situation of the series n=1 an , but the root test shows that the series converges. 45. Determine the interval and the radius of convergence for each of the following powerseries. ∞ (a) n=2 nln n xn . ∞ (b) (3x)n / ln n. n=2 2 ∞ (c) n=0 xn /3n . ∞ n 2n (d) n=1 (−1) x . 46. Expand the function f (x) =

1 2−x

in a power series (a) centered at 0, (b) centered at 1. In each case determine the interval on which the expansion is valid.

Chapter 3

Limit and Continuity of Real Functions

Functions are, without doubt, the most important objects of study in calculus. The vast theory of functions of a real variable studied in calculus contains such crucial notions as limit, continuity, differentiation, and integration. In this chapter we focus on the two first notions and postpone the study of the remaining ones to the next chapters. As usual, we begin with some questions to motivate our presentation. (3.a) Consider the function f defined by f (x) = x if 0 ≤ x ≤ 1 and f (x) = 3 when x = 2. Can we find a limit for f at 2? Is it meaningful to talk about the continuity or discontinuity of f at 2? Note that 2 is completely separated from the other elements of the domain of f , as can be seen in Figure 1 below. Functions of this kind are not usually considered in calculus when limit and continuity of functions are concerned.

Figure 1. The graph of the function f .

(3.b) Is it possible to find a function that fails to have a limit at every a ∈ R? 103

104

3. Limit and Continuity of Real Functions

(3.c) Given a function f defined on some interval I, which properties guarantee that f has one-sided limits at every point of this interval? (3.d) Do continuous functions map Cauchy sequences onto Cauchy ones? The above questions are among what we will discuss in this chapter. Question (3.a) will be answered in Sections 3.1, 3.2, and 3.5. It motivates us to revise the definition of limit and continuity which is usually taught in calculus. More precisely, in Section 3.1 we recall the calculus-based definition of limit and observe that it can be strengthened using the new concept of limit point. In the remainder of Section 3.1 limit points are studied together with some other related classes of points, and the strengthened definition of limit will be presented in Section 3.2. Of course, we will see in this section that for the above-mentioned function, it is meaningless to speak of a limit at 2, even with the strengthened definition of limit. In Section 3.2, the notion of limit is studied in detail. As a result of our explorations in this section, we answer question (3.b) in the affirmative by introducing Dirichlet’s function. Section 3.3 studies the concept of limit at infinity. The material of this section is the same as what is usually taught in calculus, except here we emphasize the mathematical rigor more. In Section 3.4, we define one-sided limits in the spirit of the strengthened definition of limit introduced in Section 3.2. The most important result of this section guarantees the existence of one-sided limits for monotone functions, a result that provides an answer to question (3.c). Section 3.5 is devoted to a more general definition of continuity and a discussion of discontinuities. We will see in this section, as a result of our definition of continuity, that the function f of question (3.a) is continuous at 2! (Compare this with the aforementioned fact that f fails to have a limit at 2.) In Section 3.6, we study those functions that are continuous on a closed and bounded interval [a, b]. This section contains the important theorems one learns in calculus together with their rigorous proofs, namely, the intermediate value and the extreme value theorems. Finally in Section 3.7, we show that question (3.d) has a negative answer. Then we introduce a condition stronger than continuity, which is called uniform continuity. It will be shown, among other things, that uniformly continuous functions map Cauchy sequences onto Cauchy sequences.

3.1. Limit Points and Some Other Classes of Points in R To begin with, reconsider question (3.a). The domain of the function f is F := [0, 1] ∪ {2}, and the question asks us about limit and continuity of f at point 2. The first part of the question was about the possibility of determining a limit for f as x approaches 2. To answer this, let us recall the definition of limit which is usually taught in calculus. Recall that for arbitrary δ > 0, the deleted neighborhood of a ∈ R with radius δ is the set of all real numbers, except a itself, whose distance from a is less than δ. Since the distance between the real numbers x and a is given

3.1. Limit Points and Some Other Classes of Points in R

105

by de (x, a) = |x−a|, this is just the δ-neighborhood Nδ (a) from which a is removed, that is, (a − δ, a) ∪ (a, a + δ). Now, if L is a real number, E is a subset of R which contains a deleted neighborhood of a, and f is a function from E into R, the calculus-based definition of limit writes limx→a f (x) = L if the following statement is true. (LD) For every ε > 0, there exists δ > 0 such that x ∈ E and 0 < |x − a| < δ imply |f (x) − L| < ε. (Here, LD is used as an abbreviation for limit definition.) What does (LD) say? The statement (LD) says that we can make the value f (x) as close to L as we wish, provided that x is sufficiently close to a. The assumption that f is defined in a deleted neighborhood of a ensures that for every δ > 0, and in particular for the δ we found for ε in (LD), at least one x ∈ E satisfying 0 < |x − a| < δ can be found. In fact, if γ > 0 is such that the deleted neighborhood of a with radius γ lies entirely in E, then 1 x = a + min{γ, δ} 2 is one such element of E. This is illustrated in Figure 2 below, where it is assumed that γ = min{γ, δ}. It will be instructive to draw an illustration for the case δ < γ.

Figure 2. Since (a, a + γ) ⊂ E, x ∈ E.

In summary, when E contains a deleted neighborhood of a, (LP) every neighborhood of a contains some element of E other than a, and this property is important in the definition of limit we discussed above. (We used LP as the abbreviation of limit point.) Our aim is to strengthen the calculus-based definition of limit by replacing those a ∈ R of which E contains a deleted neighborhood by points a ∈ R that satisfy the property (LP). This replacement indeed strengthens the definition because of the following simple observation: if a = 0 or a = 1, then every neighborhood of a contains some element of F other than a, but nevertheless, no deleted neighborhood of a is entirely contained in F . This can be seen for a = 0 in Figure 3.

Figure 3. The points in (−δ, 0) lie outside F .

With this aim in mind, it is convenient to give a specific name to those points for which (LP) is true.

106

3. Limit and Continuity of Real Functions

Definition 3.1. Let E be a subset of R, and let a ∈ R. We say that a is a limit point of E if for every δ > 0, the δ-neighborhood of a contains some element of E other than a. The set of all limit points of E will be denoted by E  , the derived set of E. More precisely, a is a limit point of E if for every δ > 0, Nδ (a) ∩ (E\{a}) = ∅. The strengthened definition of limit can be presented right now, using the notion of limit point, but we postpone it to the next section. This is because we devote the remainder of the current section to a study of limit points and some other related classes of points for subsets of R. As a first illustration, let us seek for the limit points of F , the domain of the function f we considered in question (3.a). Which points of R are limit points for this set? Certainly, these are the points at which f may tend to a limit. To simplify our investigation, let us classify the elements of R. An element x of R lies in one and only one of the following sets. (1) A = (0, 1). If x ∈ A, then for every δ > 0, x + 12 min{1 − x, δ} is an element of Nδ (x) ∩ (F \{x}). So, x is a limit point for F . Of course, more can be said for the elements of A: for every x ∈ A, there exists ε > 0 such that Nε (x) ⊆ A ⊂ F . This can easily be seen by letting 1 ε = min{1 − x, x}. 2 We may describe this by saying that every x ∈ A is an interior point of F . This is because by the inclusion Nε (x) ⊂ F we find that, not only x is an element of F , but every y ∈ R which is sufficiently close to x also lies in F . (2) B = {0, 1}. If y ∈ B, that is, y = 0 or y = 1, we have already verified that y is a limit point of F . Such a point y also satisfies a further property: for every ε > 0, the neighborhood Nε (y) contains elements from both F and F c . To see this, note that for a given ε, if y = 0, then 1 min{1, ε} ∈ Nε (y) ∩ F, 2 and ε − ∈ Nε (y) ∩ F c , 2 while for y = 1, 1 1 − min{1, ε} ∈ Nε (y) ∩ F, 2 and 1 1 + min{ε, 2} ∈ Nε (y) ∩ F c . 4 We describe the additional property of such points y by saying that they are boundary points of F : if you stand on y = 0, going one step to the right you enter the set F , and going one step to the left you go outside this set, just like when you stand on the border (or boundary) of a country. A similar statement can be said for y = 1, with right replaced by left and vice versa.

3.1. Limit Points and Some Other Classes of Points in R

107

(3) C = {2}. If z ∈ C, that is, z = 2, we can easily find δ > 0 such that Nδ (z) ∩ F = {2}, and this says that z cannot be a limit point of F . For this, just set δ = 1/2. Note that this tells us solely that we cannot speak of the limit of f at 2, and it answers the first part of question (3.a) in the negative. The point z = 2 has two additional properties. First, it is a boundary point for F : for every ε > 0, 2 ∈ Nε (z) ∩ F , and 2 + ε/2 ∈ Nε (z) ∩ F c . Second, it is an isolated point of F , in the sense we now describe. As we mentioned before, z = 2 is completely separated from the other elements of F , and this can be made precise by using neighborhoods: the neighborhood N 12 (z) contains no element of F other than 2 itself, and it therefore isolates 2 from the remaining points of F . (4) F c . For x ∈ F c , we may consider three cases, as follows. • x < 0. Let λ = |x|/2. • 1 < x < 2. Let λ = 12 min{x − 1, 2 − x}. • x > 2. Let λ = 12 (x − 2). Then in either case, (3.1)

Nλ (x) ∩ F = ∅, showing that x is not a limit point of F . Note that for x ∈ F c , (3.1) shows that Nλ (x) ⊆ F c . This means, like what we discussed in case (1), that each x ∈ F c is an interior point of this set. Since such points lie in the interior of the complement of F , we say that they are exterior points for F .

As a result of the above argument, we see that the derived set of F is F  = [0, 1]. The elements of this set are those points of R at which we may compute a limit for the associated function f . The argument also made us familiar with some other classes of points in R, which we now formally define. Definition 3.2. Let E be a subset of R, and let a be an element of R. We say that a is • an interior point of E, if there exists ε > 0 such that Nε (a) ⊆ E; • a boundary point of E, if for every ε > 0, Nε (a) intersects both E and E c ; • an isolated point of E, if there exists ε > 0 such that Nε (a) ∩ E = {a}; • an exterior point for E, if a is an interior point of E c . We will denote by E ◦ and ∂E the sets of all interior and boundary points of E, respectively. We refer to the sets E ◦ and ∂E as the interior and boundary of E, respectively. For example, for the set F we discussed above, F  = [0, 1], F ◦ = (0, 1), and ∂F = {0, 1, 2}. The only isolated point of F is 2, and the set of all exterior points of F is F c .

108

3. Limit and Continuity of Real Functions

Why do we study such various classes of points? The above-mentioned classes of points deserve careful study for at least two reasons. First, they are, in some sense, related to limit points. For example, we will see shortly that every interior point of a set E ⊆ R is its limit point, the elements of a subset of R are either its limit points or are its isolated points, etc. Second, they show the way one may use distance to classify the points of a space and even determine their geometric position by calling them interior or boundary points. This aspect will also serve as a motivation for the abstraction which will lead in the definition of metric spaces in Chapter 6. Theorem 3.3. Let E be a subset of R. (1) Every interior point of E is a limit point of E. (2) A limit point of E which belongs to E c is necessarily a boundary point of E. (3) A boundary point of E which belongs to E c is a limit point of E. (4) If x is an element of E, then x is either a limit point or an isolated point of E. (5) If x ∈ E, then x is either an interior point or a boundary point of E. (6) E ◦ ∩ ∂E = ∅. Proof. (1) If x is an interior point of E, then there exists ε > 0 such that Nε (x) ⊆ E. If δ > 0 is given, then y = x + 12 min{ε, δ} is an element of Nδ (x), other than x, that lies in Nε (x), and hence in E. So, x is a limit point of E. (2) Let y be a limit point of E such that y ∈ E c . To prove that y is a boundary point of E, we must verify that every neighborhood of y intersects both E and E c . But every neighborhood of y contains an element of E other than y, as y ∈ E  , and contains y ∈ E c . (3) If z is a boundary point of E, every neighborhood of z contains some element of E. Since z ∈ E c , this element is not the same as z. So, z is a limit point of E. (4) If x is not a limit point of E, then there exists δ > 0 such that Nδ (x) ∩ E = {x}. This shows that x is an isolated point of E. (5) If V is a neighborhood of x, then V contains at least one element of E, namely, x itself. If we assume that x is not an interior point of E, then V should contain at least one element of E c . This shows that x is a boundary point of E. (6) This is an immediate consequence of the definition.  Below we discuss some points which complement the above theorem. The proof of some of the claims can be found in the examples that follow. You can consider the proof of the remaining ones as exercises. • For every interval with endpoints a and b, such as [a, b] and (a, b), a and b are limit points. However, these are not the interior points of the set. Compare this with Theorem 3.3(1).

3.1. Limit Points and Some Other Classes of Points in R

109

• In any interval of the form [a, b], a is a boundary point which is also a limit point. This point is, however, an element of the set itself, not an element of its complement. Compare this with Theorem 3.3(2). • For each interval E := [a, b] with a < b, (b + a)/2 is a limit point. It is neither a boundary point nor an element of E c . It will be instructive to compare this with Theorem 3.3(3). • An interior point of a set is, by the definition, an element of the set. This is also the case for isolated points. However, limit points and boundary points of a set may or may not belong to the set itself. For instance, b is a limit point and also a boundary point of the interval [a, b), while it is not an element of this set. Example 3.4. In each case, find the sets A , A◦ , and ∂A. Does A have any isolated points? (1) A = Q. (2) A = [a, b]. Solution. (1) If x ∈ R is arbitrary and ε > 0, then Theorem 1.49 (the density of rational numbers in R) gives us some p ∈ Q ∩ (x, x + ε), so that Nε (x) ∩ (Q\{x}) is not empty. Thus, x is a limit point of Q. Consequently, Q = R. This shows that the derived set of a set may contain it strictly! If q is an element of Q, then every neighborhood of q contains some irrational number. Thus q is not an interior point of Q. This proves that Q◦ = ∅. Since every neighborhood of a real number contains both rational and irrational numbers, ∂Q = R. The set Q has no isolated points because all its elements are limit points. (2) We claim that A = A. We prove this by considering the following steps. • We show that a and b are limit points of A. To see that a is a limit point of A, note that for given ε > 0, x = a + 12 min{ε, b − a} lies in Nε (a) ∩ A. This is because (b − a) < a + (b − a) = b a 0

110

3. Limit and Continuity of Real Functions

is given, a − ε/2 ∈ Nε (a) ∩ Ac and a + 12 min{ε, b − a} ∈ Nε (a) ∩ A. Our above arguments show that the other elements of R cannot be boundary points of A. Hence ∂A = {a, b}. Since each x ∈ (a, b) is an interior point of A and a and b are boundary points of this set, A◦ = (a, b). The set A has no isolated points as all its elements are limit points. Remark 3.5. Some parts of the above example may be confusing for you. For example, in our proof that every x ∈ (a, b) is an interior point of [a, b], you may have had trouble understanding the reason we chose 1 ε = min{x − a, b − x}. 2 If this is the case for you, we have to add some intuition. Consider an arbitrary x ∈ (a, b), as in Figure 4. To find ε > 0 satisfying Nε (x) ⊂ [a, b], we need to choose ε with the following property. Going ε centimeters (meters or whatever) from x to the right (resp., left), we should not reach b (resp. a).

Figure 4. Find ε > 0 such that Nε (x) ⊆ [a, b].

This tells us that ε must be less than the distance x has from each of a and b. The numbers x − a and b − x represent these distances, respectively. The multiplier 1/2 is used to ensure that ε is less than both de (x, a) and de (x, b). It could be replaced by any real number 0 < r < 1. Some Results Concerning Limit Points. Possibly limit points are so named because they appear naturally when we study the limit of real functions. The following proposition presents another reason for the usage of this name. Proposition 3.6. Let E be a subset of R, and let a ∈ R. Then a is a limit point of E if and only if there exists a sequence {an } in E\{a} such that limn→∞ an = a. Proof. Let a be a limit point of E. For every n ∈ N, choose an ∈ E such that 0 < |an −a| < 1/n. Then it is clear that {an } is a sequence in E\{a} that converges to a. To prove the converse, let {an } be a sequence in E\{a} that converges to a. If δ > 0 is arbitrary, then there exists N ∈ N such that for every n ≥ N , an lies in the δ-neighborhood of a. In particular, aN satisfies 0 < |aN − a| < δ. This shows that a is a limit point of E.  What does Proposition 3.6 say? Proposition 3.6 says that a is a limit point of E if and only if it is the limit of a sequence of the elements of E which are not equal to a. Although our definition of limit point just implies that Nδ (a) ∩ (E\{a})

3.1. Limit Points and Some Other Classes of Points in R

111

is not empty for every δ > 0, our next result shows that when a is a limit point of E, these sets are indeed infinite. Proposition 3.7. If a is a limit point of E, then every neighborhood of a contains an infinite number of the elements of E. Proof. Suppose ε > 0 is such that Nε (a) ∩ (E\{a}) is a finite set, say {a1 , . . . , an }. Let r = 12 min{|aj − a| : j = 1, . . . , n}. Then, our choice of r shows that 0 < r < ε and the r-neighborhood of a contains no element of E\{a}. This contradicts our assumption that a is a limit point of E.  In particular, every finite subset of R has no limit points. The following exercise shows that this may also happen for infinite sets. Exercise 3.8. Verify that the set N has no limit points. Example 3.9. Find the derived set of E = {1/n : n ∈ N}. Solution. Since {1/n} is a sequence in E\{0} = E that converges to zero, 0 is a limit point of E. We claim that 0 is the only limit point of E. To see this, let x = 0 be arbitrary. We have two cases for x, and we show in each case that x is not a limit point of E. x > 0. In this case find N ∈ N such that 1/N < x. Then by letting ε = 1 2 (x − 1/N ) we find that the ε-neighborhood of x cannot contain 1/n with n ≥ N . This, in view of the above proposition, shows that x is not a limit point of E. x < 0. In this case, the ε-neighborhood of x with ε = |x|/2 does not intersect E. This is because if y ∈ Nε (x), then y < x + ε = x/2 < 0, so that y ∈ E. Thus, x is not a limit point of E. A note on the limit points and deleted neighborhoods. When E contains a deleted neighborhood of a, a is a limit point of E. We saw in the above example that the converse of this statement is not true: the point 0 is a limit point of E = {1/n : n ∈ N}, but E contains no deleted neighborhood of 0. Of course, we observed this fact earlier using the set F = [0, 1] ∪ {2}. Exercise 3.10. For the set E of the above example, find the sets E ◦ and ∂E. We conclude this section with a result whose importance may not be understood until Chapter 7. Theorem 3.11. For a set A ⊆ R, the following conditions are equivalent. (1) Every sequence in A has a subsequence that converges to some element of A. (2) A is bounded and contains all its limit points. Proof. (1) ⇒ (2). If A is unbounded, then for each n ∈ N, we find xn ∈ A such that |xn | > n. Then {xn } is a sequence which has no convergent subsequences,

112

3. Limit and Continuity of Real Functions

contradicting (1). So if (1) is true, then A must be bounded. If x is a limit point of A, then there exists a sequence {xn } in A such that xn = x for every n ∈ N and limn→∞ xn = x. By (1), {xn } has a subsequence that converges in A. Since {xn } converges to x, this subsequence must also converge to x, so that x is an element of A. (2) ⇒ (1). Let {xn } be a sequence in A. Since A is bounded, {xn } has a subsequence {xnk } that converges to some x ∈ R, by the Bolzano–Weierstrass theorem. If for some k ∈ N, xnk = x, then x ∈ A. Otherwise, x is a limit point of A by Proposition 3.6, and we find that x ∈ A by (2). 

3.2. A More General Definition of Limit We are now ready to present the strengthened definition of limit, namely, the one that is based on the notion of limit point. Definition 3.12. Let f : E → R be a function, and let a be a limit point of the set E ⊆ R. If L is a real number, then we write limx→a f (x) = L if the following statement is true. For every ε > 0, there exists δ > 0 such that x ∈ E and 0 < |x − a| < δ imply |f (x) − L| < ε. Note that the last statement is just the statement (LD) of the previous section. The only difference is in the property that a satisfies: here, a is assumed to be a limit point of E. As we observed above, this assumption is weaker than the assumption that E contains a deleted neighborhood of a. As a first result, we prove that when a function has a limit at some point, the limit is indeed unique. This will be essential in the forthcoming examples, where we find the limit of some important functions. Theorem 3.13. Let f : E → R be a function, and let a be a limit point of E. If limx→a f (x) = L1 and limx→a f (x) = L2 , then L1 = L2 . A note on Theorem 3.13. The above theorem may be confusing for some students, because it seems evident that when L1 and L2 are equal to a single quantity, limx→a f (x) for example, then we must have L1 = L2 . If you are among such students, note that by limx→a f (x) = L1 and limx→a f (x) = L2 we mean that by tending x to a, f (x) may approach both L1 and L2 arbitrarily. Proof of Theorem 3.13. To prove the desired result, it is enough to show that for every ε > 0, |L1 − L2 | < ε. So, let ε > 0 be given. For i = 1, 2, choose δi > 0 such that x ∈ E\{a} and |x − a| < δi imply |f (x) − Li | < ε/2. If we let δ = min{δ1 , δ2 }, then x ∈ E\{a} and |x − a| < δ imply ε ε |L1 − L2 | ≤ |L1 − f (x)| + |f (x) − L2 | < + = ε. 2 2 Since ε was arbitrary, this shows that L1 = L2 .

3.2. A More General Definition of Limit

113

Example 3.14. We saw in Example 3.9 that 0 is a limit point of the set

1 E= :n∈N . n Thus, when f is a function defined on E, we may think of the limit of f at 0. In particular, consider the function f : E → R defined by f (q) = q 2 + 1 for every q ∈ E. It seems that by approaching q to 0 in E, the values of f tend to 1. To see this, let ε > 0 be given. We should find δ > 0 such that q ∈ E and 0 < |q − 0| < δ imply |f (q) − 1| < ε. But, the√last inequality holds if and only if q 2 < ε. This suggests that we may choose δ = ε. Thus limq→0 f (q) = 1. Note that the function f of the above example is not defined in a deleted neighborhood of 0, so that it has no limit at 0 if we use the calculus-based definition of limit. Exercise 3.15. Let E be as in the above example, and define f : E → R by (1) f (q) = q 3 − 1, (2) f (q) = 2(q + 1), for every q ∈ E. In each case, find the limit of f as q approaches 0. Example 3.16. Consider the function f (x) = sin

  1 , x

and let a = 0. Discuss the existence of the limit of f at a if (1) E = {1/(nπ) : n ∈ N}, (2) E = {2/((4n + 1)π) : n ∈ N}, and (3) E = R\{0}. Solution. The sequences {1/(nπ)} and {2/((4n + 1)π)} and {1/n} converge to 0. These observations in view of Proposition 3.6 tell us that 0 is a limit point of E when E is as in (1), (2), and (3), respectively. (1) The function f is constant on E, that is f (x) = 0 for every x ∈ E. Hence it is clear that lim f (x) = 0

x→a

in this case. (2) The function f is constant on E with f (x) = 1 for every x ∈ E. This shows that lim f (x) = 1

x→a

in this case. (3) It can be shown similar to Example 3.22 below that f cannot tend to a limit when it is defined on E.

114

3. Limit and Continuity of Real Functions

What does Example 3.16 say? Example 3.16 says that the limit of a function f at a depends strongly on the domain we consider for f , and changing the domain may change the existence or the value of the limit. When a function f is defined on R, the existence of the limit of f at every a ∈ R can be examined. This is because each such a is clearly a limit point of R. Example 3.17. Show that for every a ∈ R, (1) limx→a c = c, where c is an element of R; (2) limx→a x = a; (3) limx→a sin x = sin a. Solution. (1) Let f (x) = c for every x ∈ R. The desired result means that the limit of the constant function f at every point a ∈ R is c. But this is obvious because for every x ∈ R, |f (x) − c| = 0, which is less than any given ε > 0. (2) Let f (x) = x for every x ∈ R. Since it is always true that |f (x) − a| = |x − a|, once ε > 0 is given, we may choose 0 < δ < ε and observe that x ∈ R and 0 < |x − a| < δ imply |f (x) − a| < ε. This proves the desired result. (3) It is clear that every a ∈ R is a limit point of R, the domain of the sine function. We use the inequality | sin x| ≤ |x|

(3.2) and the identity

x+a x−a cos , 2 2 which are valid for every x and a in R. It follows from (3.3) and (3.2) that    x − a   | sin x − sin a| ≤ 2 sin ≤ |x − a|. 2 

(3.3)

sin x − sin a = 2 sin

So given ε > 0, if we choose 0 < δ < ε, then x ∈ R and 0 < |x − a| < δ imply | sin x − sin a| < ε. Exercise 3.18. Show that for every a ∈ R, limx→a cos x = cos a. Example 3.19. For n ≥ 2 in N define a function fn on [0, +∞) by fn (x) = Show that limx→1 fn (x) = 1.

√ n x.

Solution. The domain of the √ involved function is [0, +∞) of which 1 is a limit √ √ √ n n point. Since (x − 1) = ( n x − 1)( xn−1 + xn−2 + · · · + n x + 1) and √ √ √ n n xn−1 + xn−2 + · · · + n x + 1 ≥ 1, we see that √ |x − 1| √ | n x − 1| = √ ≤ |x − 1|. √ n n n−1 x + xn−2 + · · · + n x + 1

3.2. A More General Definition of Limit

115

If√for a given ε > 0, we choose 0 < δ < ε, then x ≥ 0 and 0 < |x − 1| < δ imply | n x − 1| < ε. This proves the desired result. Limit and Boundedness. The assumption that a is a limit point of the domain of a function f by no means guarantees that f has a limit at this point. To see this, we prove a simple result that relates limit to boundedness. A function f is said to be bounded on a subset S of its domain if f (S) = {f (x) : x ∈ S} is a bounded subset of R. Obviously, this is equivalent to the existence of a real number M > 0 such that |f (x)| < M for every x ∈ S. Proposition 3.20. Let f : E → R be a function, and let a be a limit point of E. If limx→a f (x) = L, then there exists δ > 0 such that f is bounded on the set Nδ (a) ∩ (E\{a}). Proof. Since limx→a f (x) = L, for ε = 1 we find δ > 0 such that x ∈ E and 0 < |x − a| < δ imply |f (x) − L| < 1, or equivalently, L − 1 < f (x) < L + 1. This proves that f is bounded on Nδ (a) ∩ (E\{a}).



So, when a function is not bounded in any deleted neighborhood of a, it certainly fails to have a limit at this point. The following example illustrates this situation. Example 3.21. The function f (x) = 1/x is defined on R\{0}. We show that f is not bounded in any deleted neighborhood of 0. Once this is proved, it follows from the above proposition that f has no limit at 0. Let δ > 0 be given, and consider the deleted neighborhood of 0 with radius δ. To show that f is not bounded in this deleted neighborhood, we should verify that for every M > 0, x can be found in the neighborhood such that |1/x| > M . But this is always possible as we may choose

1 0 < x < min δ, ; M see Figure 5. Of course, a function may be bounded in every deleted neighborhood of a but fail to have a limit at this point. Example 3.22. The function g(x) = cos(1/x) is bounded in any deleted neighborhood of 0 because |cos(1/x)| ≤ 1 for each x = 0. Nevertheless, we can show that g fails to have a limit in 0. To see this, assume that limx→0 cos(1/x) exists and is equal to L. Then for ε = 1, δ > 0 can be found such that 0 < |x| < δ implies     cos 1 − L < 1. (3.4)   x Now find an even natural number n such that 1/n < πδ. Then letting x = 1/(nπ), we obtain from (3.4) that |1 − L| < 1

116

3. Limit and Continuity of Real Functions

Figure 5. The case 1/M < δ.

or equivalently (3.5)

0 < L < 2.

Similarly, by setting x = 1/((n + 1)π), we deduce from (3.4) that (3.6)

−2 < L < 0.

But no number L can satisfy (3.5) and (3.6) simultaneously. This shows that g cannot tend to a limit when x approaches zero. Geometrically, this can be seen in Figure 6. When we approach zero, the graph of y = cos(1/x) has so many oscillations that it prevents the function from tending to a unique limit. If we multiply the function g of the above example by the identity function h(x) = x, then the many oscillations will become very small near 0 and the resulting function hg may have a limit at 0; see Figure 7. Example 3.23. Find limx→0 x cos(1/x). Solution. The domain of the involved function is R\{0} of which 0 is a limit point. As can be seen in Figure 7, when x approaches 0, the values of this function

Figure 6. The graph of y = cos(1/x).

3.2. A More General Definition of Limit

117

Figure 7. The graph of y = x cos(1/x).

tend to 0. To prove this fact analytically, let ε > 0 be given. Since for every nonzero x,         x cos 1 − 0 = x cos 1  ≤ |x| = |x − 0|,    x x we see that by choosing some 0 < δ < ε, 0 < |x − 0| < δ implies |x cos(1/x) − 0| < ε. This proves that the given limit is equal to 0. Limit, Algebraic Operations, and Order. As mentioned earlier, the functions considered in this chapter are all real functions. This means that the range of the functions is contained in R. Thus, the functions we study may be added to or multiplied by each other, and they may be compared using the order relation 0 be given. First note that |(f g)(x) − L1 L2 | = |f (x)g(x) − L1 L2 | = |f (x)g(x) − f (x)L2 + f (x)L2 − L1 L2 | ≤ |f (x)| |g(x) − L2 | + |L2 | |f (x) − L1 |. Since f has a limit at a, Proposition 3.20 gives us some M > 0 and δ > 0 such that for every x ∈ Nδ (a) ∩ (E\{a}), |f (x)| < M . Therefore, if we let K = max{M, |L2 |},

118

3. Limit and Continuity of Real Functions

then for every x ∈ Nδ (a) ∩ (E\{a}), |(f g)(x) − L1 L2 | ≤ K(|g(x) − L2 | + |f (x) − L1 |). Now, find δ1 > 0 such that for every x ∈ Nδ1 (a) ∩ (E\{a}), ε , |f (x) − L1 | < 2K and δ2 > 0 such that for every x ∈ Nδ2 (a) ∩ (E\{a}), ε . |g(x) − L2 | < 2K If we let γ = min{δ, δ1 , δ2 }, then for every x ∈ Nγ (a) ∩ (E\{a}), |(f g)(x) − L1 L2 | < ε. This completes the proof. (4) It is enough, in view of (3), to prove that 1 1 = . (3.7) lim x→a g(x) L2 To see this, note that for every x,    1 1  |g(x) − L2 |  − (3.8)  g(x) L2  = |g(x)||L2 | . Since limx→a g(x) = L2 = 0, we can find δ1 > 0 such that for every x ∈ Nδ1 (a) ∩ (E\{a}), |L2 | . |g(x) − L2 | < 2 Since ||g(x)| − |L2 || ≤ |g(x) − L2 |, we see that for all such x, |L2 | (3.9) |g(x)| > . 2 Therefore, it follows from (3.8) and (3.9) that for every x ∈ Nδ1 (a) ∩ (E\{a}),    1 1  2 |g(x) − L2 |  − .  g(x) L2  ≤ |L2 |2 Now, find δ2 > 0 such that for every x ∈ Nδ2 (a) ∩ (E\{a}), ε|L2 |2 . 2 Then, letting δ = min{δ1 , δ2 }, we see that for every x ∈ Nδ (a) ∩ (E\{a}),    1 1   −  g(x) L2  < ε. |g(x) − L2 |
0 such that f (x) ≥ 0 for every x ∈ Nδ (a) ∩ (E\{a}), then L ≥ 0. Proof. Assume, to the contrary, that L < 0. Then for ε = −L, there exists λ > 0 such that x ∈ E and 0 < |x − a| < λ imply |f (x) − L| < −L or, equivalently, 2L = L − (−L) < f (x) < L − L = 0. If x ∈ E\{a} is such that 0 < |x − a| < min{δ, λ}, then we deduce that f (x) ≥ 0 by the fact that 0 < |x − a| < δ, and that f (x) < 0 by the inequalities 0 < |x − a| < λ. This contradiction shows that L must be nonnegative.  What does Lemma 3.27 say? Lemma 3.27 says that a function which has nonnegative values near a cannot tend to a negative value at a. It is now easy to prove that limits respect the order. Theorem 3.28. Let g and h be functions defined on E ⊆ R, let a be a limit point of E, and let limx→a g(x) = L1 and limx→a h(x) = L2 . If there exists δ > 0 such that g(x) ≤ h(x) for every x ∈ Nδ (a) ∩ (E\{a}), then L1 ≤ L2 .

120

3. Limit and Continuity of Real Functions

Proof. Define a function f on E by f (x) = h(x) − g(x). Then f (x) ≥ 0 for every x ∈ Nδ (a) ∩ (E\{a}). Since lim f (x) = lim h(x) − lim g(x) = L2 − L1

x→a

x→a

x→a

by Theorem 3.24, the above lemma yields L2 − L1 ≥ 0, which is the desired result.  What does Theorem 3.28 say? Theorem 3.28 says that when the values of f are not greater than those of g in the nearby of a, the limit of f at a cannot be greater than that of g at a. Limit of Functions and Limit of Sequences. Let a be a limit point of the domain E of a function f . We saw in Proposition 3.6 that there exists a sequence {an } in E\{a} that converges to a. If f has a limit L at a, it seems that the sequence {f (an )} also converges to L. This is not surprising, since the terms an tend to a, and when the independent variable approaches a, the values of f tend to L. The following theorem establishes this fact. Theorem 3.29. Let f : E → R be a function, and let a be a limit point of E. Then, the following are equivalent. (1) limx→a f (x) = L. (2) For every sequence {an } in E\{a} that converges to a, limn→∞ f (an ) = L. Proof. (1) ⇒ (2). Let {an } be a sequence in E\{a} that converges to a. We show that limn→∞ f (an ) = L. To see this, let ε > 0 be given. By (1), there exists δ > 0 such that x ∈ E and 0 < |x − a| < δ imply |f (x) − L| < ε. Since limn→∞ an = a, we can find N ∈ N such that for every n ≥ N , 0 < |an − a| < δ. Hence for every n ≥ N , |f (an ) − L| < ε, proving the desired result. (2) ⇒ (1). Assume that (1) is not true. Then there exists ε > 0 such that for every δ > 0, x ∈ E can be found with 0 < |x − a| < δ and |f (x) − L| ≥ ε. For every n ∈ N, let an be an element of E which is associated to δ = 1/n in this way. Then {an } is a sequence in E\{a} that converges to a, but limn→∞ f (an ) = L. This shows that (2) is not true. The proof is now finished by the contrapositive law.  Remark 3.30. Using the above theorem, we can present an easier proof of Theorem 3.24. To see how, let f , g, E, a, L1 , and L2 be as in the theorem, and let {an } be a sequence in E\{a} that converges to a. Then, lim (f g)(an ) = ( lim f (an ))( lim g(an )) = L1 L2 .

n→∞

n→∞

n→∞

This proves Theorem 3.24(3). The remaining parts of the theorem can be also proved by a similar method. √  Example 3.31. Let f be the function f (q) = 2 + q 4 defined on Q. √ Since 2 ∈ Q by Example 3.4, we may think of the limit of f as q approaches 2.√ To compute this, let {qn } be a sequence of rational numbers that converges to 2 (note that

3.2. A More General Definition of Limit

121

√ at least one such sequence exists by the fact that 2 ∈ Q and Proposition 3.6). Then, √ lim f (qn ) = lim (2 + qn4 ) = 2 + ( 2)4 = 6. n→∞

n→∞

Thus, Theorem 3.29 tells us that limq→√2 f (q) = 6. Note that √ in the above example, the domain Q of f contains no deleted neighborhood of 2, and the limit has no meaning, accordingly, if we use the calculusbased definition of limit. Example 3.32. Prove that for every a > 0, limx→a ln x = ln a. Solution. We know that each a > 0 is a limit point of (0, +∞). Let {xn } be a sequence in (0, +∞)\{a} that converges to a. Then {xn /a} converges to 1. Since y−1 ≤ ln y ≤ y − 1 y for every y > 0, we find that x x xn − a n n (3.10) −1 ≤ ≤ ln xn a a for every n ∈ N. It now follows from (3.10) by letting n tend to infinity and using the Squeeze Theorem (Theorem 2.49) that x n = 0. lim ln n→∞ a This implies lim ln xn − ln a = 0, n→∞

which gives the desired result in view of the above theorem. Theorem 3.29 can be applied to discuss the existence of limit for functions that are not usually considered in calculus. The following example presents two functions of this kind. The first one, which is known as Dirichlet’s function, is the characteristic function of the set of rational numbers in R. Example 3.33. In each case determine the points at which the given function tends to a limit.

1 x ∈ Q, (1) f (x) = 0 x ∈ Q.

x x ∈ Q, (2) g(x) = 1 − x x ∈ Q. Solution. (1) Let a ∈ R be arbitrary and assume that limx→a f (x) = L. Now, let {xn } and {yn } be sequences of rational and irrational numbers, respectively, in R\{a} such that limn→∞ xn = limn→∞ yn = a. Then we find, in view of the above theorem, that limn→∞ f (xn ) = limn→∞ f (yn ) = L. Since limn→∞ f (xn ) = 1 and limn→∞ f (yn ) = 0, this is a contradiction. So f cannot tend to a limit when x approaches a. This shows that f fails to have a limit at each a ∈ R. (2) Let a, {xn } and {yn } be as in (1), and assume that the limit of g at a is L. Then it follows that limn→∞ xn = limn→∞ (1 − yn ) = L. So, a = L = 1 − a. This proves that g has a limit only at a = 1/2, and that the limit is L = 1/2.

122

3. Limit and Continuity of Real Functions

The first item of the above example answers question (3.b): Dirichlet’s function fails to have a limit at every point of its domain, R. Theorem 3.29 can be also used to prove a function-analogue of the Squeeze Theorem for sequences of real numbers (Theorem 2.49). Theorem 3.34 (Squeeze Theorem for Functions). Let f , g, and h be functions defined on E ⊂ R, let a be a limit point of E, and lim g(x) = lim h(x) = L.

x→a

x→a

If for every x ∈ E\{a} which is sufficiently close to a g(x) ≤ f (x) ≤ h(x),

(3.11) then

lim f (x) = L.

(3.12)

x→a

Proof. By our assumption, there exists δ > 0 such that for every x in (E\{a}) ∩ Nδ (a), (3.11) holds. To prove (3.12), it is enough, in view of Theorem 3.29, to show that when {an } is a sequence in E\{a} that converges to a, lim f (an ) = L.

(3.13)

n→∞

But, if {an } is such a sequence, then we find N ∈ N such that for every n ≥ N , an ∈ Nδ (a). So, for every n ≥ N , g(an ) ≤ f (an ) ≤ h(an ). Now, (3.13) follows from the above inequalities and the Squeeze Theorem for sequences of real numbers (Theorem 2.49).  Example 3.35. Verify the following equalities. (1) limx→0 (sin x)/x = 1. (2) limx→0 (1 − cos x)/x = 0. Solution. (1) Since for every x in the deleted neighborhood of 0 with radius π/2 sin x 0, |g(f (x)) − L| can be less than ε, if we choose x sufficiently close to a. Use (3) to find δ > 0 such that y ∈ F and 0 < |y − b| < δ imply |g(y) − L| < ε. Next use (2) to find γ > 0 such that x ∈ E and 0 < |x − a| < γ imply |f (x) − b| < δ. Then it follows that x ∈ E and 0 < |x − a| < γ imply |g(f (x)) − L| < ε, as desired.  √ √ Example 3.37. Prove that for every a ≥ 0, limx→a n x = n a. Solution. If a = 0, the result follows from this simple observation: Given ε > 0, √ if we let δ = εn , then x ≥ 0 and |x − 0| < δ imply | n x − 0| < ε. √ If a > 0, let f (x) = x/a and g(x) = n x for every x ≥ 0. Since lim f (x) = 1

x→a

and lim g(x) = 1,

x→1

by Example 3.19, the above theorem yields that  x lim n = lim g ◦ f (x) = 1. x→a a x→a But,



√ 1 x = √ lim n x. n x→a a a x→a Therefore, the desired result follows from the above equalities. lim

n

Infinite Limits. The function f (x) = 1/x2 is defined on R\{0}, and 0 is a limit point of this set. Thus, we may think of the limit of f at 0. But, when x approaches 0, what happens to the values of f ? As Figure 8 shows, when x approaches 0, the values of f grow unlimitedly. This suggests that the limit of f at 0 may not exist in the sense we defined in this section. To verify this more formally, consider some L > 0. (For every x in the domain of f , f (x) > 0. So the values of f cannot

124

3. Limit and Continuity of Real Functions

Figure 8. The graph of f (x) = 1/x2 .

tend to a number L < 0. Also √ it is clear that L = 0 cannot be the limit of f as x approaches 0.) If 0 < x < 1/ 2L, then 1/x2 > 2L, and hence for every 0 < y < x, |f (y) − L| =

1 − L > 2L − L = L. y2

This proves that the values of f cannot be made arbitrarily close to L. Since L was arbitrary, this argument shows that f has no limit at 0. This is by no means a new phenomenon for us. We saw in Example 3.22 that the function g(x) = cos (1/x) defined on R\{0} also fails to have a limit at 0. But, there is an important difference between the behavior of f and g near 0: g is bounded in a (and indeed any) deleted neighborhood of 0, while f is unbounded in any deleted neighborhood of 0. More precisely, the value f (x) can be as large as we wish, larger than any prescribed value M > 0, provided that we choose x sufficiently close to 0. This can be seen in the graph of f , depicted in Figure 8. To see this analytically, let M > 0 be given. We should find δ > 0 such √ that for every y = 0 in Nδ (0), f (y) = 1/y 2 > M . It is easy to see that δ = 1/ M has the desired property. Because of this property, we say that the function f (x) = 1/x2 tends to +∞ as x approaches 0, and we write 1 = +∞. x→0 x2 Of course, we should remember that f has no limit at 0, and by the last equality we just mean that the values of f exceed any prescribed positive value. As a result of the above discussion, we can prove the following statement for the function h(x) = −1/x2 defined on R\{0}. For every M < 0, there exists δ > 0 such that for every x = 0 in Nδ (0), h(x) < M . This means that we can make the values of h as small as we wish, provided that x is sufficiently close to 0, which can be seen in Figure 9. We describe this by writing limx→0 −1/x2 = −∞. Based on the above arguments, we present the following definition. lim

3.2. A More General Definition of Limit

125

Figure 9. The graph of h(x) = −1/x2 .

Definition 3.38. Let f : E → R be a function, and let a be a limit point of E. We write (1) limx→a f (x) = +∞ if for every M > 0, there exists δ > 0 such that for every x = a in Nδ (a) ∩ E, f (x) > M ; (2) limx→a f (x) = −∞ if for every M < 0, there exists δ > 0 such that for every x = a in Nδ (a) ∩ E, f (x) < M . Example 3.39. Prove that √ (1) limx→1 1/ x − 1 = +∞, √ (2) limx→0 1/(1 − 1 + x2 ) = −∞. Solution. (1) Note that the domain of the involved function is E := (1, +∞), of which 1 is a limit point. Let M > 0 be given. We should find δ > 0 (the requirement |x − 1| > 0 is automatically satisfied here, as 1 ∈ E) such that x ∈ E and 0 < |x − 1| < δ imply 1 √ > M. x−1 A simple calculation yields that the last inequality holds if and only if 1 x < 1 + 2. M Therefore, it is sufficient to choose δ = 1/M 2 . (2) The domain of the function is R, of which 0 is clearly a limit point. Given M < 0, we should find δ > 0 such that x ∈ R and 0 < |x − 0| < δ imply 1 √ < M. 1 − 1 + x2 It can easily be seen that the last inequality holds if and only if  1 2 |x| < − + 2. M M  2 1 So it is enough to let δ = − M + M 2 . Exercise 3.40. Prove that limx→1 1/(x2 − 1)2 = +∞.

126

3. Limit and Continuity of Real Functions

3.3. Limit at Infinity The notion of limit at infinity cannot be considered to be a special case of the concept of limit we introduced in the previous section. This is obvious if we notice the aim of these notions of limit. When we talk about the limit of a function at some limit point of its domain, we wish to understand the behavior of the function near that point. However, when we speak of the limit of a real function at +∞ or −∞, we want to know about the way it behaves when the independent variable takes arbitrarily large, or arbitrarily small, values.

Figure 10. The graph of f (x) = 1/x for x > 0.

To understand this last point, consider the function f (x) = 1/x for x ∈ (0, +∞). As 1/x is the reciprocal of x, by increasing the value of x, 1/x decreases. More precisely, we can make f (x) = 1/x as close to 0 as we wish, provided that x is sufficiently large. This can be seen in Figure 10, where the graph of f is seen to be very close to the x-axis for sufficiently large values of x. To verify what we discussed above analytically, consider an arbitrary ε > 0. We show that for every x greater than some appropriate M > 0, |f (x) − 0| = |f (x)| < ε. Since |f (x)| = 1/x, it is obvious that M = 1/ε works here. We describe this behavior of f by writing limx→+∞ f (x) = 0. On the other hand, let us consider the same function f (x) = 1/x for x ∈ (−∞, 0). As shown in Figure 11, by making x sufficiently small, we can make the values of f as close to 0 as we wish. This can be shown analytically by finding some M < 0 for a given ε > 0 such that for every x < M ,   1 1 |f (x) − 0| =   = − < ε. x x But, the last inequality holds if and only if x < −1/ε. So, M = −1/ε is the right choice here. We describe this behavior of f by writing limx→−∞ 1/x = 0.

3.3. Limit at Infinity

127

Figure 11. The graph of f (x) = 1/x for x < 0.

In a more general context, we present the following definition. Definition 3.41. Let f be a function which is defined on an interval (a, +∞) for some a ∈ R, and let L be a real number. We say that f (x) approaches L when x tends to +∞, and we write limx→+∞ f (x) = L if (LI) for every ε > 0 there exists M > 0 such that x > M implies |f (x)−L| < ε. Similarly, if f is a function defined on an interval (−∞, a) for some a ∈ R, we write limx→−∞ f (x) = L if for every ε > 0 there exists M < 0 such that x < M implies |f (x) − L| < ε. (Here we used LI as an abbreviation for limit at infinity.) What does (LI) say? The statement (LI) can be interpreted as follows. We can make the value f (x) as close to L as we wish provided that x is sufficiently large. A similar interpretation can be stated for the statement that defines the limit of f at −∞. Example 3.42. Prove that √ (1) limx→+∞ 1/ x = 0, √ (2) limx→−∞ 1/ 3 x = 0. Solution. Let ε > 0 be given. (1) We find M > 0 such that for every x > M ,     1  √ − 0 = √1 < ε.   x x Since the last inequality holds if and only if x > 1/ε2 , we see that M = 1/ε2 is an appropriate choice.

128

3. Limit and Continuity of Real Functions

(2) We find M < 0 such that for every x < M ,    1   √ − 0 = − √1 < ε.  3x  3 x The last inequality holds if and only if x < −1/ε3 , so M = −1/ε3 is what we were looking for. The limit at infinity of a function, if it exists, is necessarily unique. This is the content of the following theorem. Theorem 3.43. Let f be a function defined on an interval (a, +∞) for some a ∈ R. If lim f (x) = L1 and lim f (x) = L2 , x→+∞

x→+∞

then L1 = L2 . Proof. Let ε > 0 be given. For i = 1, 2 find Mi > 0 such that for every x > Mi in (a, +∞), |f (x) − Li | < ε/2. Then, let M = max{M1 .M2 }, and note that for an arbitrary x > M in (a, +∞), ε ε |L1 − L2 | ≤ |L1 − f (x)| + |f (x) − L2 | < + = ε. 2 2 Since ε was arbitrary, this gives us the desired result.  Next, we establish a result that ties the limit at infinity of functions and the limit of sequences together. Theorem 3.44. Let f be a function which is defined on an interval (a, +∞), for some a ∈ R. Then the following are equivalent. (1) limx→+∞ f (x) = L. (2) For every sequence {xn } in (a, +∞) that diverges to +∞, the sequence {f (xn )} converges to L. Proof. (1) ⇒ (2). Let ε > 0 be given, and consider a sequence {xn } in (a, +∞) that diverges to +∞. We show that {f (xn )} converges to L. By the assumption we find M > 0 such that for every x > M , |f (x) − L| < ε. Since {xn } diverges to +∞, we may choose N ∈ N so large that for every n ≥ N , xn > M . Thus, it follows that for every n ≥ N , |f (xn ) − L| < ε. This means that limn→+∞ f (xn ) = L. (2) ⇒ (1). To prove this, we establish its contrapositive. Assume that limx→+∞ f (x) = L. Then there exists ε > 0 such that for every n ∈ N, xn ∈ (a, +∞) can be found with xn > n and |f (xn ) − L| ≥ ε. The sequence {xn } obtained in this way is a sequence in (a, +∞) that diverges to +∞, but for which  limn→+∞ f (xn ) = L also holds. This shows that (2) cannot be true. In particular, if f is a function whose domain contains [1, +∞) and if limx→+∞ f (x) exists and is equal to L, then limn→+∞ f (n) = L. Limits at infinity behave within the algebraic operations in the same way the usual limits do. Theorem 3.45. Let f and g be functions defined on an interval (a, +∞) for some a ∈ R. If limx→+∞ f (x) = L1 and limx→+∞ g(x) = L2 , then

3.3. Limit at Infinity

129

(1) limx→+∞ (f + g)(x) = L1 + L2 , (2) limx→+∞ (f − g)(x) = L1 − L2 , (3) limx→+∞ (f g)(x) = L1 L2 . If g(x) = 0 for every x and L2 = 0, then (4) limx→+∞ (f /g) (x) = L1 /L2 . Proof. The proof follows easily from Theorem 3.44. We therefore leave it as an exercise to the reader.  Example 3.46. Compute √ √ (1) limx→+∞ ( x − x)/( x + x), and (2) limx→+∞ (x2 − 3x + 5)/(2x3 − 1). Solution. (1) Since for nonzero x, √ √1 − 1 x−x x √ = 1 √ +1 x+x x √ and limx→+∞ 1/ x = 0 by Example 3.42, it follows from the above theorem that the given limit is −1. (2) Since for nonzero x, x2 − 3x + 5 = 2x3 − 1

1 x

− x32 + 2 − x13

5 x3

and limx→+∞ 1/x = 0, we deduce from the above theorem that the limit is 0. Infinite Limits at Infinity. Quite naturally, we may have infinite limits at infinity. Such limits can easily be defined by combining infinite limits with limits considered at infinity. Definition 3.47. Let f be a function that is defined on an interval of the following forms: (1) (a, +∞). We write limx→+∞ f (x) = +∞ if for every M > 0 there exists K > 0 such that for every x > K, f (x) > M . (2) (a, +∞). We write limx→+∞ f (x) = −∞ if for every M < 0 there exists K > 0 such that for every x > K, f (x) < M . (3) (−∞, a). We write limx→−∞ f (x) = +∞ if for every M > 0 there exists K < 0 such that for every x < K, f (x) > M . (4) (−∞, a). We write limx→−∞ f (x) = −∞ if for every M < 0 there exists K < 0 such that for every x < K, f (x) < M . Example 3.48. Prove that (1) limx→+∞ (x2 + 2x − 1) = +∞, (2) limx→−∞ (x2 − 1) = +∞.

130

3. Limit and Continuity of Real Functions

Solution. Let M > 0 be given. (1) We should find K > 0 such that x > K implies x2 + 2x − 1 > M . But, x > K implies that x2 + 2x − 1 > K 2 + 2K − 1. It is enough to choose K large enough that (3.16)

K 2 + 2K − 1 > M.

√ Since K 2 + 2K − 1 = (K + 1)2 − 2, we find that any K greater than M + 2 − 1 will satisfy (3.16). (2) We find K < 0 such that x < K implies x2 − 1 > M . If x < K < 0, then −x > −K > 0, so that x2 > K 2 , and therefore x2 − 1 > K 2 − 1. To obtain the desired√result, it is enough to choose K < 0 such that K 2 − 1 > M or, equivalently, |K| > M + 1. Example 3.49. Show that limx→+∞ (1 − x3 ) = −∞. Solution. Let M < 0 be given. We should find K > 0 such√that for every M . The last inequality holds if and only if x > 3 1 − M . Hence x > K, 1 − x3 0. We say that L is the right (resp., left) limit of f at a, and we write limx→a+ f (x) = L (resp., limx→a− f (x) = L) provided that the following statement is true. (RL) (resp., (LL)) For every ε > 0, there exists δ > 0 such that x ∈ E and a < x < a + δ (resp., a − δ < x < a) imply |f (x) − L| < ε. (We used RL and LL as abbreviations for right and left limits, respectively.) Note that here the assumption that f is defined in a right (resp., left) deleted neighborhood (a, a + γ) (resp., (a − γ, a)) is made to ensure that for every δ > 0, the interval (a, a + δ) (resp., (a − δ, a)) that appears in (RL) (resp. (LL)) contains some element of E. For this reason, and as in the case of limits, we may weaken this assumption by considering those a that are limit points of the set E∩(a, +∞) (resp., E ∩ (−∞, a)). We are therefore in a position to present the following definition. Definition 3.50. Let a and L be real numbers, and let f be a function with domain E. (1) If a is a limit point of E ∩ (a, +∞), then we say that L is the right limit of f at a, and we write lim f (x) = L or f (a+) = L

x→a+

if the statement (RL) above is true.

3.4. One-Sided Limits

131

(2) If a is a limit point of E ∩ (−∞, a), then we say that L is the left limit of f at a, and we write lim f (x) = L or f (a−) = L

x→a−

if the statement (LL) above is true. This definition is more general than the calculus-based definition of one-sided limits, as the following example shows. Example 3.51. Consider f (q) = 1+q 2 to be a function on E = {1/n : n ∈ Z\{0}}. Find f (0+) and f (0−). Solution. First of all, we should verify that 0 is a limit point of both E ∩(0, +∞) and E∩(−∞, 0). To see this, for a given γ > 0 find some n0 ∈ N satisfying 1/n0 < γ. Then 1/n0 is an element of (E ∩ (0, +∞)) ∩ Nγ (0) other than 0, and −1/n0 is an element of (E ∩ (−∞, 0)) ∩ Nγ (0) other than 0. Nevertheless, note that E contains no interval of the form (0, γ) or (−γ, 0), so that the calculus-based definition of one-side limits is of no use here. We claim that f (0+) = f (0−) = 1. To prove these equalities, let ε > 0 be given. To prove f (0+) = 1, we should find δ > 0 such that q ∈ E and 0 0 be given. For every x ∈ Q such that 0 < x < 1/2, h(x) = [x] = 0 and therefore |h(x) − 0| = 0 < ε. This proves h(0+) = 0. Similarly, for every x ∈ Q that lies in (−1/2, 0), h(x) = [x] = −1 and hence |h(x) − (−1)| = 0 < ε. This shows that h(0−) = −1. As for the limit at 0, we claim that h has no limit as x approaches 0. To see this, let L be any real number. If limx→0 h(x) = L, then for ε = 1/2, δ > 0 can be found such that x ∈ Q and 0 < |x − 0| < δ imply (3.17)

|h(x) − L| < ε.

But the deleted neighborhood of 0 with radius δ contains at least one positive rational number x0 and one negative rational number x1 such that |x0 |, |x1 | < 1. Since h(x0 ) = 0 and h(x1 ) = −1, we deduce from (3.17) by letting x = x0 and x = x1 , respectively, that 1 1 − 0 such that x ∈ E and a − δ2 < x < a imply |f (x) − L| < ε. If we let δ = min{δ1 , δ2 }, it follows that x ∈ E and 0 < |x − a| < δ imply |f (x) − L| < ε. This means that the limit of f at a exists and is equal to L. 

3.4. One-Sided Limits

133

One-Sided Limits of Monotone Functions. We begin with a formal definition of monotone functions. Definition 3.55. Let f be a function defined on some interval I of real numbers. We say that f is increasing (resp., decreasing) on I if for all x, y ∈ I, x ≤ y implies f (x) ≤ f (y) (resp., f (x) ≥ f (y)). A monotone function is one that is increasing or decreasing. We say that f is strictly increasing (resp., strictly decreasing) on I if for all x, y ∈ I, x < y implies f (x) < f (y) (resp., f (x) > f (y)).

Figure 12. The graph of an increasing function on (a, b).

If you draw the graph of how many monotone functions you can imagine on an interval (a, b), you will see that such functions have one-sided limits at every point of the interval. (If you don’t believe this, you can try it!) An example is the function whose graph is depicted in Figure 12. In what follows, we want to prove our claim above. But before going into a formal discussion, look at Figure 12 again. Here, the only point at which the given function has no limit is c. A careful examination of the graph, taking into account the fact that the function is increasing on (a, b), reveals that f (c+) = inf{f (x) : c < x < b} and f (c−) = sup{f (x) : a < x < c}. Of course, in the equation that gives f (c+), we may write min in place of inf. This observation is a special case of the following theorem. Theorem 3.56. Let f be a monotone function defined on (a, b), and consider some x ∈ (a, b). Then f (x+) and f (x−) exist. Moreover, (1) If f is increasing, then f (x+) = inf{f (y) : x < y < b} and f (x−) = sup{f (y) : a < y < x}.

134

3. Limit and Continuity of Real Functions

(2) If f is decreasing, then f (x+) = sup{f (y) : x < y < b} and f (x−) = inf{f (y) : a < y < x}. Proof. We only prove (1). The proof of (2) is similar and is therefore left as an exercise (see Exercise 2 at the end of this chapter). Assume that f is increasing. Let A1 = {f (y) : a < y < x}. If y is an element of the interval (a, x), then f (y) ≤ f (x) by the assumption that f is increasing. This shows that f (x) is an upper bound for A1 . Thus A1 has a supremum in R, say α1 . To prove the desired equality for f (x−), we must show that f (x−) = α1 . Let ε > 0 be given. Since α1 − ε < α1 and α1 = sup A1 , there exists a < x1 < x such that f (x1 ) > α1 − ε. Then for every x1 < y < x, α1 − ε < f (x1 ) ≤ f (y) ≤ α1 < α1 + ε, by our assumption that f is increasing and that y ∈ A1 . In short, we observed that for every y in a left deleted neighborhood of x, namely, (x1 , x), |f (y) − α1 | < ε. This gives us the desired equality f (x−) = α1 . To complete the proof, apply a similar method to A2 = {f (y) : x < y < b},  show that α2 := inf A2 exists in R, and that f (x+) = α2 . A note on Theorem 3.56. Theorem 3.56 says that monotonicity of a function on (a, b) implies the existence of its one-sided limits at all points of this interval. It therefore provides an answer to question (3.c), as it could be stated for monotone functions which are defined on an arbitrary interval I. Of course, in that case, we would have to assume that x is an interior point of I, which is automatically satisfied by every x ∈ (a, b). After all, the assumption that x is an interior point of I implies that I, as the domain of the function, contains some interval of the form (a, b) containing x. If we want to write the theorem in this general form, then we should modify the equations that appeared in (1) and (2) of Theorem 3.56. For instance, we should write the following in place of the equation that determines f (x+) in (1): f (x+) = inf{f (y) : y ∈ I and y > x}. We will use this general form of Theorem 3.56 in Theorem 3.88 below, where we show that the inverse of a strictly monotone continuous function is continuous.

Infinite One-Sided Limits. Just as a function may have infinite limits at some points, it may have infinite one-sided limits. On the basis of what we discussed so far, it is easy to define such limits precisely.

3.4. One-Sided Limits

135

Definition 3.57. Let a be a real number, and let f be a function with domain E. If a is a limit point of E ∩ (a, +∞) and for every M > 0 (resp., M < 0) there exists δ > 0 such that x ∈ E and a < x < a + δ imply f (x) > M (resp., f (x) < M ), then we write limx→a+ f (x) = +∞ (resp., limx→a+ f (x) = −∞). The cases limx→a− f (x) = +∞ and limx→a− f (x) = −∞ can be defined similarly. Example 3.58. Prove that (1) limx→0+ 1/x = +∞, (2) limx→0− 1/x = −∞. Solution. (1) The point 0 is a limit point of the set (R\{0}) ∩ (0, +∞) = (0, +∞). Let M > 0 be given. We should find δ > 0 such that x ∈ (0, +∞) and 0 < x < δ imply f (x) > M . Since the last inequality is 1/x > M , we find that δ = 1/M is an appropriate choice. (2) The point 0 is a limit point of the set (R\{0}) ∩ (−∞, 0) = (−∞, 0). Let M < 0 be given. It is enough to find δ > 0 such that x ∈ (−∞, 0) and −δ < x < 0 imply f (x) < M . Again, since f (x) = 1/x, we see that δ = −1/M works here. The behavior of f (x) = 1/x in the right- and left-hand side of 0, discussed above, can be seen in Figure 13. If we approach 0 from the right, the values of x increase unlimitedly, and when we approach 0 from the left, the values decrease unlimitedly.

Figure 13. The graph of f (x) = 1/x.

136

3. Limit and Continuity of Real Functions

3.5. Continuity and Two Kinds of Discontinuity In calculus the notion of limit is intimately related to the important concept of continuity. Recall that when a function f is defined in a neighborhood of some a ∈ R, we say that f is continuous at a if the limit of f at a exists and is equal to the value of f at the point (3.18)

lim f (x) = f (a).

x→a

If we want to express (3.18) mathematically, we can write (CD) For every ε > 0, there exists δ > 0 such that x ∈ E and |x − a| < δ imply |f (x) − f (a)| < ε. (Here CD is the abbreviation of continuity definition.) What does (CD) say? The statement (CD) says that we can make f (x) as close to f (a) as we wish, provided that x is sufficiently close to a. We summarize our above discussion in the main definition of this section. Definition 3.59. Let f be a function with domain E, and let a be an element of E. We say that f is continuous at a if the statement (CD) is true. Of course, it should be noted that when we wrote (CD) to describe (3.18), it was assumed that f is defined in a neighborhood of a. But, in the above definition, we only assumed that the domain of f contains a itself. By Theorem 3.3(4), we have two cases for a ∈ E. • The point a is an isolated point of E. In this case, δ > 0 can be found such that Nδ (a) ∩ E = {a}. So, for this particular δ, x ∈ E and |x − a| < δ imply |f (x) − f (a)| = |f (a) − f (a)| = 0 < ε, for every ε > 0. Thus, f is continuous at a. Note that nevertheless, f fails to have a limit at a in this case. • The point a is a limit point of E. In this case, (CD) says that the limit of f at a is equal to f (a), that is, (3.18) holds. In short, we proved the following simple, but important, result. Proposition 3.60. Let a be a real number contained in the domain E of a function f. (1) If a is an isolated point of E, then f is continuous at a. (2) If a is a limit point of E, then f is continuous at a if and only if limx→a f (x) = f (a).

3.5. Continuity and Two Kinds of Discontinuity

137

An answer to the second part of question (3.a). Using the above proposition, we are now able to answer the second part of question (3.a). On the basis of our knowledge of continuity developed in calculus, the function f in question is not continuous at 2, because f has no limit at this point. But, if we choose to work in the framework that analysis provides, f is continuous at 2, because 2 is an isolated point of the domain of f . Note that even in this framework, f fails to have a limit at 2. Example 3.61. Since the only limit point of the set E = {1/n : n ∈ N} is 0, every element of this set is an isolated point. So, every function f : E → R is continuous at every point of E, by our definition of continuity. Functions that are continuous at every point of a set deserve more attention. Definition 3.62. Let f be a function and S be a set contained in the domain of f . We say that f is continuous on S if f is continuous at every a ∈ S. With this definition, every function from E = {1/n : n ∈ N} into R is continuous on E. Exercise 3.63. Verify that every function from Z into R is continuous on Z. Example 3.64. The function f of question (3.a) is continuous on its domain [0, 1] ∪ {2}. This follows from the fact that for every a ∈ [0, 1], lim f (x) = lim x = a,

x→a

x→a

and f is continuous at the isolated point 2. Example 3.65. By Example 3.17 and Exercise 3.18, respectively, the sine and cosine functions are continuous on R. Also, by Example 3.37, each function fn (x) = √ n x is continuous on [0, +∞). By Example 3.25, every polynomial is continuous on R, and by Example 3.32 the natural logarithmic function is continuous on (0, +∞). Proposition 3.66. Let I be any interval, not necessarily closed or bounded. If f is a function from I into R, then f is continuous on I if and only if for every z ∈ I, limx→z f (x) = f (z). Proof. To prove this, we only need to show that every element of an interval I is its limit point. We verified this for I = [a, b] in Example 3.4(2). The verification of the remaining cases is left to the reader.  What does Proposition 3.66 say? Proposition 3.66 says that when we work with functions that are defined on intervals, the strengthened definition of continuity tells us nothing more than the calculus-based definition. Remark 3.67 below shows that this is not the case for those functions whose domain is not an interval.

138

3. Limit and Continuity of Real Functions

Remark 3.67. Calculus-based knowledge tells us that when a function is continuous, its graph has no gaps or holes. The strengthened definitions of limit and continuity presented in this chapter allow us to consider continuous functions whose graphs have some gaps. An instance is the function

x 0 ≤ x ≤ 1, f (x) = x − 1 2 ≤ x ≤ 3, whose graph is depicted in Figure 14. The function is not continuous on its domain D := [0, 1] ∪ [2, 3] if we use the calculus-based notion of limit, because then f has no limit at 1 and 2. But, if we use the strengthened definition of limit presented in this chapter, then 1 and 2 are limit points of D, lim f (x) = f (1) = 1

x→1

and lim f (x) = f (2) = 1.

x→2

Thus, in this strong sense, f is continuous on D. Note that nevertheless the graph of f has a gap, as can be seen in Figure 14.

Figure 14. The function f is continuous on its domain D.

Continuity, Algebraic Operations, and Order. The relation of the algebraic operations and the order relation < to limits was determined in Section 3.2. It is now time to explore their relation to continuity. Proposition 3.68. Let f : E → R be a function which is continuous at some a ∈ E, and let f (a) > 0. If a is a limit point of E, then there exists δ > 0 such that for every x ∈ Nδ (a) ∩ E, f (x) > 0. Proof. Let ε = f (a)/2. Since limx→a f (x) = f (a), there exists δ > 0 such that for every x ∈ Nδ (a) ∩ E, |f (x) − f (a)| < ε, giving, in particular, that f (x) > ε > 0.



3.5. Continuity and Two Kinds of Discontinuity

139

Figure 15. The function f has positive values on the interval (a − δ, a + δ).

A similar result can be formulated when f (a) < 0. Thus, when a function is continuous at some limit point a of its domain E and f (a) = 0, then there exists δ > 0 such that x ∈ E and |x − a| < δ imply f (x) = 0. Figure 15 illustrates the above proposition in the case E is an interval. Next we turn to the relation of continuity and algebraic operations. Theorem 3.69. Let f and g be functions defined on E, and let a be an element of this set. If f and g are continuous at a, then (1) the functions f + g, f − g, and f g are also continuous at a; and (2) the function f /g is continuous at a provided that g(a) = 0. Proof. If a is an isolated point of E, then both (1) and (2) are obvious. So assume that a is a limit point of E. (1) This follows from Theorem 3.24(1)–(3). (2) Since g(a) = 0, the above argument shows that g(x) = 0 for all x ∈ E which is sufficiently close to a. The desired result now follows from Theorem 3.24(4).  Continuity and the Convergence of Sequences. Continuous functions are in some sense very nice, because they often behave in the way one expects. For example, a nice function f is expected to map a sequence {an } converging to some a onto the sequence {f (an )} which converges to f (a). The following theorem shows that not only is this true for functions which are continuous at a, but it is also a necessary and sufficient condition for the continuity of f at a. Theorem 3.70. Let f be a function whose domain E contains a. Then, the following conditions are equivalent. (1) The function f is continuous at a. (2) For every sequence {an } of elements of E that converges to a, the sequence {f (an )} converges to f (a). Proof. If a is an isolated point of E, then both (1) and (2) are true, and, accordingly, these are equivalent. The truth of (2) is due to the fact that in this case, the

140

3. Limit and Continuity of Real Functions

only sequences in E that converge to a are those whose terms are equal to a for all sufficiently large indices. If a is a limit point of E, the proof is similar to that of Theorem 3.29.  Example 3.71. Compute 2 −1 and (1) limn→∞ sin πn 2n2 +n  (2) limn→∞ 3 5 + 21n . Solution. (1) Since limn→∞ (πn2 − 1)/(2n2 + n) = π/2 and the sine function is continuous at π/2, the given limit is equal to sin π/2 = 1. to zero. Since the function (2) The limit of {5 + 1/2n } is 5, as {1/2n } converges √ √ f (x) = 3 x is continuous at 5, the given limit is 3 5. Example 3.72. Let f and g be functions which are continuous on [a, b]. If A = {x ∈ [a, b] : f (x) = g(x)}, then prove that A ⊆ A. Solution. Let y be a limit point of A. Since A ⊂ [a, b], y is also a limit point of [a, b], and since ([a, b]) = [a, b], y is actually an element of [a, b]. So to prove y ∈ A, we should show that f (y) = g(y). By Proposition 3.6, there exists a sequence {yn } in A\{y} such that lim yn = y.

n→+∞

Since f and g are continuous, f (y) = lim f (yn ) = lim g(yn ) = g(y), n→+∞

n→+∞

which gives the desired result. Continuity of Composite Functions. One important property of continuity is that the composition of continuous functions is continuous. Theorem 3.73. Let f : E → R and g : F → R be functions such that f (E) ⊆ F . If a ∈ E, f is continuous at a, and g is continuous at f (a), then the composite function g ◦ f is continuous at a. Proof. The condition f (E) ⊆ F shows that the composition g ◦ f can be defined. Let ε > 0 be given. That g is continuous at f (a) gives us some δ1 > 0 such that y ∈ F and |y − f (a)| < δ1 imply |g(y) − g(f (a))| < ε. Since f is continuous at a, there exists δ > 0 such that x ∈ E and |x − a| < δ imply |f (x) − f (a)| < δ1 . Therefore, x ∈ E and |x − a| < δ imply |g(f (x)) − g(f (a))| < ε, and this means that g ◦ f is continuous at a.  The above proof shows that when we try to prove theorems concerning continuous functions, there may be no need to consider two cases for isolated and limit points, and a proof that works for both cases may be available.

3.5. Continuity and Two Kinds of Discontinuity

141

Two Kinds of Discontinuity. When a function f is not continuous at some point a of its domain, we say that f is discontinuous at a. It is an important task to find the reasons a function may be discontinuous at some point, and this is what we are going to discuss in this subsection. First of all, a function cannot be discontinuous at an isolated point of its domain. This means that if a function f is discontinuous at some point a, then a is necessarily a limit point of the domain of f . So, the discontinuity of f at a can be interpreted as follows. (DR) It is not true that the limit of f at a exists and is equal to f (a). (We used DR as the abbreviation of discontinuity reason.) For this reason, it is enough to think about the reasons (DR) may happen. These are as follows. (1) The limit of f at a exists but is not equal to f (a). (2) The limit of f at a does not exist. If (1) is the case, we may modify f (a) by setting it equal to the limit of f at a, and this certainly removes the discontinuity of f at a. So, when (1) is the case, we say that f has a removable discontinuity at a. For example, the discontinuity of the function ⎧ ⎨ x cos x1 x = 0, f (x) = ⎩ 1 x = 0, at 0 is removable, as we may redefine the value of f at 0 by setting it equal to lim f (x) = lim x cos

x→0

x→0

1 = 0. x

Indeed, modifying the value of f at 0 in this way, we obtain the new function ⎧ ⎨ x cos x1 x = 0, f1 (x) = ⎩ 0 x = 0, which is continuous at 0. When (2) happens, it is still possible that f (a+) and f (a−) exist. For this reason, we split (2) into two cases. (i) The one-sided limits f (a+) and f (a−) exist, in which case we say that f has a discontinuity of the first kind , or a simple discontinuity, at a. (ii) At least one of the one-sided limits f (a+) and f (a−) does not exist. In this case we say that f has a discontinuity of the second kind at a. Example 3.74. In each case, find the points at which the given function is discontinuous. Then, determine the kind of discontinuity at every point where the function is discontinuous. ⎧ x < 0, ⎨ x2 (1) f (x) = ⎩ 2x − 1 x ≥ 0.

142

3. Limit and Continuity of Real Functions ⎧ ⎨ 1 x ∈ Q,

(2) g(x) =



0 x ∈ Qc .

Solution. (1) It is easy to see that f (0−) = 0, f (0+) = −1 and that the limit of f at 0 does not exist. Therefore, f has a discontinuity of the first kind at 0 which is not removable. The function f is continuous at every nonzero real number a. (2) Since the limit of g at every a ∈ R does not exist, g is discontinuous at each such a. To determine the kind of discontinuity, note that all intervals of the form (a − δ, a) or (a, a + δ) contain elements from both Q and Qc . An argument similar to the one used in Example 3.52 shows that g(a+) and g(a−) do not exist. Thus, g has a discontinuity of the second kind at every a ∈ R. As an application of Theorem 3.56, we are able to describe the discontinuities of monotone functions. Theorem 3.75. Monotone functions do not have discontinuity of the second kind. Moreover, the set of discontinuities of a monotone function is countable. Proof. The first assertion is a direct consequence of Theorem 3.56. For the second assertion, for the sake of clarity, assume that f is increasing on (a, b). The case that f is decreasing can be handled similarly. By Theorem 3.56, f (x−) and f (x+) exist at every x ∈ (a, b). Thus, f is discontinuous at some x if and only if f (x−) < f (x+). Let D be the set of all points in (a, b) at which f is discontinuous. If D is nonempty, for each x ∈ D choose a rational number f (x−) < rx < f (x+). If x, y ∈ D are such that x < y, then f (x+) ≤ f (y−) by Theorem 3.56. So, rx < f (x+) ≤ f (y−) < ry . This shows that x → rx is a one-to-one correspondence between D and a subset of Q. Since Q is countable, this shows that D is also countable. 

3.6. Continuity on [a, b]: Results and Applications As we saw in Section 2.4, sequences in a closed and bounded interval [a, b] satisfy some nice properties which are not shared by those in other kinds of intervals. In this section we aim to show that continuous functions defined on a closed and bounded interval [a, b] also have some special properties. The first property of this kind concerns the boundedness of continuous functions. Proposition 3.76. If f : [a, b] → R is continuous, then f is bounded on [a, b]. Proof. If we assume that f is not bounded on [a, b], we find a sequence {xn } in [a, b] such that for every n ∈ N, |f (xn )| > n. By Theorem 2.81, {xn } has a subsequence {xnk } which converges to some x ∈ [a, b]. Since for every k, |f (xnk )| > nk , {f (xnk )} is an unbounded sequence. But f is continuous and therefore {f (xnk )} converges  to f (x). This shows that {f (xnk )} is a bounded sequence, a contradiction. Of course, if we replace [a, b] by a nonclosed or unbounded interval I, a continuous function f : I → R may be unbounded. This is the content of the following example.

3.6. Continuity on [a, b]: Results and Applications

143

Example 3.77. The function f (x) = 1/x is continuous on the intervals (0, 1) and (0, +∞), but it is unbounded on these sets. To see this, note that for every M > 0, each 0 < x < 1/M satisfies f (x) > M . As a result of the above proposition, we find that the range f ([a, b]) of a continuous function f has both supremum and infimum in R. The following important theorem shows that these are indeed the maximum and minimum of f ([a, b]). Theorem 3.78 (The Extreme Value Theorem). If f : [a, b] → R is continuous, then there exist c, d ∈ [a, b] such that for every x ∈ [a, b], f (c) ≤ f (x) ≤ f (d). Proof. Let M be the supremum of f ([a, b]). We wish to find some d ∈ [a, b] such that f (d) = M . Assume that this is not the case. Then, for every x ∈ [a, b], f (x) < M . This allows us to define a continuous function h on [a, b] by 1 . h(x) = M − f (x) By Proposition 3.76, h is bounded on [a, b], that is, there exists K > 0 such that for every x ∈ [a, b], 1 0< = h(x) ≤ K. M − f (x) This shows that M − 1/K, which is obviously less than M , is an upper bound for f ([a, b]), and it contradicts our assumption M = sup f ([a, b]). Thus, we should have M = f (d) for some d ∈ [a, b]. Similarly, we can show that for m = inf f ([a, b]), c ∈ [a, b] can be found such that f (c) = m.  What does the extreme value theorem say? The extreme value theorem says that a function that is continuous on an interval of the form [a, b] attains the maximum and minimum of its values on this interval. If we replace [a, b] with a nonclosed or unbounded interval I, the conclusion of the extreme value theorem would not be true. For instance, Example 3.77 shows that the function f (x) = 1/x fails to attain the maximum of its values on the intervals (0, 1) and (0, +∞). Example 3.79. The function g(x) = x is continuous on A := (0, 1), inf g(A) = 0, and sup g(A) = 1, but g(A) has neither a maximum nor a minimum. The Intermediate Value Theorem and Its Uses. Next, we state and prove the intermediate value theorem, one of the most important results of this chapter. To motivate this theorem, look at Figure 16, where the graph of a continuous function is drawn on some interval [a, b]. As is shown in the figure, the continuity of f on [a, b] means, geometrically, that the graph of f on this interval involves no gaps or holes (see Proposition 3.66 and Remark 3.67). So for every y0 between f (a) and f (b), the line y = y0 must meet the graph of f at some point. The x-component of this point, say x0 , is an element of (a, b) for which the equality y0 = f (x0 ) holds. For obvious reasons, this property is called the intermediate value property.

144

3. Limit and Continuity of Real Functions

Figure 16. An illustration for the intermediate value theorem.

Definition 3.80. Let f be a function defined on an interval [a, b]. If for every y0 between f (a) and f (b), x0 ∈ (a, b) can be found such that y0 = f (x0 ), then we say that f has the intermediate value property on [a, b]. With this definition, we are now ready to state the intermediate value theorem. Theorem 3.81 (The Intermediate Value Theorem). If f : [a, b] → R is a continuous function, then f has the intermediate value property on [a, b]. Proof. Let y0 be a real number between f (a) and f (b). We assume, without loss of generality, that f (a) < y0 < f (b). To complete the proof, we should find x0 ∈ (a, b) such that y0 = f (x0 ). Let S = {z ∈ [a, b] : f (z) − y0 < 0}. Then S is nonempty, as a ∈ S, and S is bounded from above by b, which is not an element of S. Thus, the axiom of completeness tells us that S has a supremum in R, say x0 . It is clear that x0 ∈ [a, b]. We claim that this x0 is what we are looking for. To see this, it is enough to show that if g(z) := f (z) − y0 , then g(x0 ) = 0. If this is not the case, we have two cases as follows. • g(x0 ) > 0. Since g is continuous, Proposition 3.68 gives us some δ > 0 such that for every z ∈ (x0 − δ, x0 + δ) ∩ [a, b], g(z) > 0. But, our definition of x0 gives us some element of S in (x0 − δ, x0 ]. This is a contradiction. • g(x0 ) < 0. An argument, similar to the previous one, shows that we will have a contradiction in this case. We leave the details to the reader as an exercise. Therefore, we must have g(x0 ) = 0, which means y0 = f (x0 ).



Exercise 3.82. Show, by means of an example, that the continuity of f on [a, b] is not a necessary condition for having the intermediate value property on this interval. The intermediate value theorem is a useful tool in many situations. We will illustrate its importance in the following examples. Example 3.83. Show that the equation x9 + x2 + 4 = 0 has at least one solution in R.

3.6. Continuity on [a, b]: Results and Applications

145

Solution. Let P (x) = x9 + x2 + 4 and observe that P (0) = 4 > 0 and P (−2) = −504 < 0. Since P is continuous on [−2, 0] and P (−2) < 0 < P (0), the intermediate value theorem gives us some x ∈ (−2, 0) such that P (x) = 0; see Figure 17.

Figure 17. The graph of P (x) = x9 + x2 + 4.

Example 3.84. Let f be a function which is continuous on [a, b] and whose range is contained in Q. Prove that f is necessarily a constant function. Solution. Assume, to the contrary, that f is not constant on [a, b]. Then, there exist x and y such that a ≤ x < y ≤ b and f (x) = f (y). Assume without loss of generality that f (x) < f (y) and consider some irrational f (x) < r < f (y). Since f is continuous on [x, y], the intermediate value theorem gives us some z ∈ [x, y], which is also an element of [a, b], satisfying f (z) = r. This contradicts the assumption that the range of f is contained in Q. Example 3.85. If f : [a, b] → [a, b] is a continuous function, prove that there exists x0 ∈ [a, b] such that f (x0 ) = x0 . Solution. This is evident geometrically, as it says that the graph of the continuous function f : [a, b] → [a, b] must meet the line y = x at some point. This is illustrated in Figure 18. To prove this analytically, define a new function g by g(x) = f (x) − x. Then g is continuous on [a, b], g(a) = f (a) − a ≥ 0 and g(b) = f (b) − b ≤ 0. If one of g(a) and g(b) is equal to zero, we obtain the desired result. Otherwise we find that g(a) > 0 and g(b) < 0. So the intermediate value theorem gives us some x0 ∈ (a, b) such that g(x0 ) = 0, that is, f (x0 ) = x0 . Example 3.85 is a simple fixed point result. The point x0 of the above example is called, for obvious reasons, a fixed point of f . Therefore, the above statement says that any continuous function from [a, b] into [a, b] has a fixed point. Results that state the existence of fixed points for a function are known as fixed point results. We will prove a very general fixed point result in Chapter 8 (Theorem 8.36). In view of Theorem 1.58, the intermediate value theorem says that the image f ([a, b]) of a continuous function f : [a, b] → R is an interval. As our final result

146

3. Limit and Continuity of Real Functions

Figure 18. The graph of f must meet the line y = x at some point.

in this section we show that the intermediate value theorem can be used to prove a similar statement with [a, b] replaced by an arbitrary interval I. Theorem 3.86. If I is any interval and f is continuous on I, then f (I) is also an interval. Proof. In view of Theorem 1.58, it is enough to prove the following statement. If a, b ∈ f (I) are such that a < b, and a < z < b, then z is also an element of f (I). But, the assumption a, b ∈ f (I) gives us α and β in I such that a = f (α) and b = f (β). Now, since f (α) < z < f (β) and f is continuous on [α, β], the intermediate value theorem says that for some x ∈ (α, β), z is equal to f (x). Hence, z ∈ f (I), as desired.  A note on Theorem 3.86 and the intermediate value theorem. Theorem 3.86 is an equivalent form of the intermediate value theorem. This means that by assuming each of the theorems, we can prove the other one. The proof of Theorem 3.86 shows the way we may deduce this result from the intermediate value theorem. As for the converse, assume Theorem 3.86, and let f be continuous on I := [a, b]. Then, Theorem 3.86 tells us that f (I) is an interval, and this shows that (3.19)

[f (a), f (b)] ⊆ f (I).

So if we consider some α between f (a) and f (b), then it follows from (3.19) that α ∈ f (I). This proves the intermediate value theorem. Example 3.87. Find the image of the interval (0, 1) under the function f (x) =

1 . x2 + 8x

3.6. Continuity on [a, b]: Results and Applications

147

Solution. Since 0 and −8 are not elements of (0, 1), f is continuous on this interval. By the above theorem f ((0, 1)) is an interval. Since 0 < x < 1 implies 1 < f (x) < +∞, 9 it follows that the image of (0, 1) under f is contained in (1/9, +∞). We leave it to the reader to show that the image is actually (1/9, +∞). Continuity of the Inverse Function. When a function f is strictly increasing on an interval I, it is one-to-one and therefore has an inverse function. If f is also assumed to be continuous on I, what can be said about the continuity of the inverse function on f (I)? The following theorem answers this question. Theorem 3.88. Let f be a function which is strictly monotone and continuous on an interval I. Then the inverse function g := f −1 is continuous on J := f (I). Proof. It follows from Theorem 3.86 that J is also an interval. We assume, for the sake of definiteness, that f is strictly increasing. The case that f is strictly decreasing can be handled similarly. To complete the proof, we show that g is continuous on J. Assume, to the contrary, that g is not continuous at some y0 ∈ J. Then, the fact that g is strictly increasing tells us that this is a simple discontinuity and that g(y0 −) < g(y0 +). Choose some (3.20)

g(y0 −) < x < g(y0 +)

such that x = g(y0 ) (see Figure 19). If x = g(y) for some y < y0 , then sup{g(z) : z ∈ I and z < y0 } < g(y) by the left-hand inequality in (3.20), which is a contradiction. A similar reasoning, using the right-hand inequality in (3.20), shows that x = g(y) for y > y0 is also impossible. This shows that for every y ∈ J, x = g(y), and hence that x ∈ I. This contradicts the assumption that I is an interval. Hence g must be continuous on J, as desired. 

Figure 19. The point x lies between g(y0 −) and g(y0 +), and x = g(y0 ).

148

3. Limit and Continuity of Real Functions

Example 3.89. The function hn (x) = xn is strictly increasing on [0, +∞). This can be seen by noticing that if 0 < x < y, then y n − xn = (y − x)(y n−1 + y n−2 x + · · · + yxn−2 + xn−1 ) > 0. We know that each hn is continuous on the specified domain and maps it√onto [0, +∞). So, the above theorem says that for each n, the function fn (x) = n x is continuous on [0, +∞) as the inverse of hn . Note that the continuity of the fn ’s was established in Example 3.65. Example 3.90. We saw in Example 3.65 that the function f (x) = ln x is continuous on (0, +∞). Based on our calculus-based knowledge, we know that f is strictly increasing on this interval and maps it onto R. Therefore, the above theorem says that the inverse of this function, namely, the exponential function g(x) = ex , is continuous on R. As you may have noticed, to ensure that a function is invertible, we just need to assume that it is one-to-one. Since strict monotonicity is stronger than the property of being one-to-one, the assumption that f is strictly monotone seems to be stronger than what is actually needed in Theorem 3.88. In other words, one may think that it was enough to assume in Theorem 3.88 that f is one-to-one and continuous. The following theorem shows that these assumptions imply strict monotonicity. This means that Theorem 3.88 is stated appropriately. Theorem 3.91. If f is continuous and one-to-one on some interval I, then f is strictly monotone on this interval. Proof. The function f is not constant on I, because it is assumed to be one-to-one. Assume, to make a contradiction, that f is neither strictly increasing nor strictly decreasing on I. Then we find a, b, and c in I such that a < b < c and either (3.21)

f (a) ≤ f (b), f (c) ≤ f (b)

or (3.22)

f (a) ≥ f (b), f (c) ≥ f (b).

In what follows we assume that (3.21) is the case. When (3.22) is true, a similar argument gives us the desired result. First note that by the assumption that f is one-to-one, both inequalities (3.21) are strict, that is (3.23)

f (a) < f (b), f (c) < f (b).

Now, choose a real number α satisfying max{f (a), f (c)} < α < f (b). Then the intermediate value theorem gives us some points x1 of (a, b) and x2 of (b, c) such that (3.24)

f (x1 ) = f (x2 ) = α.

Since x1 < x2 , equation (3.24) contradicts the assumption that f is one-to-one.



As an exercise draw figures that illustrate a, b, and c for which (3.21) and (3.22) are true.

3.7. Uniform Continuity

149

3.7. Uniform Continuity Based on what we have learned thus far, it is easy to answer question (3.d) in the negative. Example 3.92. The function f (x) = 1/x is continuous on (0, +∞); nevertheless, it maps the Cauchy sequence {1/n} onto the unbounded sequence {n}. Note that {1/n} is not convergent in the domain of this function. In this section, we aim to find a class of functions which map Cauchy sequences onto Cauchy ones. To motivate our definition of such functions, let us begin with another discussion of continuous functions defined on a closed and bounded interval [a, b]. Let f be such a function. As a result of the continuity, for every ε > 0 and every x ∈ [a, b], we find δ > 0 such that y ∈ [a, b] and |y − x| < δ imply |f (y) − f (x)| < ε. Note that the number δ not only depends on ε, but also depends on x; if we fix ε and change x, δ would change accordingly. In what follows, we aim to show that the dependence of δ on x can be neglected. More precisely, we show that (UC) for every ε > 0, δ > 0 can be found such that for every x, y ∈ [a, b] with |y − x| < δ, |f (y) − f (x)| < ε. (Here we used UC as the abbreviation of uniform continuity.) To prove that (UC) is true, let us assume that the statement is not true. Then there exists ε > 0 such that for every δ > 0, x, y ∈ [a, b] can be found such that |y − x| < δ, but |f (y) − f (x)| ≥ ε. In particular, for every n ∈ N, we find xn , yn ∈ [a, b] such that (3.25)

|yn − xn |
0, δ > 0 can be found such that for every x, y ∈ S with |y −x| < δ, |f (y) − f (x)| < ε.

150

3. Limit and Continuity of Real Functions

With this terminology, our discussion above can be summarized in the following theorem. Theorem 3.94. Every function which is continuous on [a, b] is uniformly continuous on this interval. Compactness: A crucial property. Recall from the discussion just after Theorem 2.81 that the property established there was described by saying that intervals of the form [a, b] are compact. As we observed above, the compactness of [a, b] was crucial in the proof of Theorem 3.94. In Chapter 8, where we generalize the concepts of continuity and uniform continuity to the context of metric spaces, a generalization of Theorem 3.94 is also presented. See Theorem 8.29. The following result shows that uniformly continuous functions constitute the class of functions we mentioned at the beginning of this section. Proposition 3.95. Let f be a function which is uniformly continuous on S. If {an } is a Cauchy sequence in S, then {f (an )} is Cauchy in f (S). Proof. For the given ε > 0, let δ be associated with ε as in Definition 3.93. Using the fact that {an } is Cauchy, find N ∈ N such that for all m, n ≥ N , |an − am | < δ. Then our choice of δ implies that for all such m and n, |f (an ) − f (am )| < ε. This  means that {f (an )} is Cauchy. As a result of Example 3.92 and the above proposition, we find that the function f (x) = 1/x is not uniformly continuous on (0, +∞). So a continuous function may fail to be uniformly continuous. Example 3.96. Prove directly from the definition that the function f (x) = 1/x is not uniformly continuous on (0, 1). Solution. Consider ε = 1 and for a given δ > 0, choose n ∈ N so large that δ/n < 1. Also, let x = δ/n and y = δ/2n. Then |y − x| = and

δ ε. y x δ

This proves that the negation of (UC) holds for f on (0, 1), meaning that f is not uniformly continuous on this interval. Example 3.97. Prove that g(x) = 1/(1 + x2 ) is uniformly continuous on R. Solution. Let ε > 0 be given. Since limx→−∞ g(x) = 0, we find a ∈ R such that ε (3.27) |g(x) − 0| = |g(x)| < , for x < a, 2

Notes on Essence and Generalizability

151

and since limx→+∞ g(x) = 0, b ∈ R can be found such that ε (3.28) |g(x) − 0| = |g(x)| < , for x > b. 2 The function g is continuous, and therefore uniformly continuous, on [a − 1, b + 1]. This gives us some 0 < δ < 1 such that for x, y ∈ [a − 1, b + 1] with |y − x| < δ, |g(y) − g(x)| < ε. Now, consider x, y ∈ R which satisfy |y − x| < δ. We can consider three cases for x and y, as follows. • x < a − 1 or y < a − 1. In the former case, y < x + δ < a − 1 + δ < a, and in the latter case, x < y + δ < a − 1 + δ < a. So in either case ε ε (3.29) |g(y) − g(x)| ≤ |g(y)| + |g(x)| < + = ε 2 2 by (3.27). • x > b + 1 or y > b + 1. If the former is the case, then y > x − δ > b + 1 − δ > b, and when the latter occurs, x > y − δ > b + 1 − δ > b. So (3.28) tells us that (3.29) holds again. • x, y ∈ [a − 1, b + 1]. In this case, our choice of δ ensures that |g(y) − g(x)| < ε. In short, we observed that when x, y ∈ R are such that |y −x| < δ, |g(y)−g(x)| < ε. This means that g is uniformly continuous on R, as desired. The proof that g is uniformly continuous on R can be easily extended to a broader context. Exercise 3.98. Let f be a function which is continuous on R, limx→−∞ f (x) = L1 , and limx→+∞ f (x) = L2 , where L1 and L2 are finite. Prove that f is uniformly continuous on R. Further examples and results concerning uniformly continuous functions can be found in Exercises 50–61 at the end of this chapter.

Notes on Essence and Generalizability In this chapter we studied limit and continuity of real-valued functions with an approach which was slightly more general than that of calculus. This approach strengthened our calculus-based knowledge of limit and continuity, enabled us to calculate the limit of a wider class of functions, and helped us to discuss the continuity of functions at points that we always thought continuity had no meaning. The argument on uniform continuity, an equivalent formulation of continuity for functions defined on closed and bounded intervals, and a stronger version of continuity for functions defined on other kinds of sets also complemented our knowledge of continuous functions. The approach also introduced us to an important consequence of the existence of a notion of distance in R, the classification of the points of R according to their position relative to a given set A ⊆ R, and these introduced the concepts of limit, boundary, interior, and isolated points. We will see in the beginning of Chapter 6 that such concepts have a meaning in other sets in which distance is meaningful, in R2 for example, and we will use this fact as motivation for our definition of metric spaces.

152

3. Limit and Continuity of Real Functions

As we saw, the Euclidean distance function de , which appeared in terms of the absolute value by writing de (x, y) = |x − y|, was our main tool for the definition of limit and continuity. It is one of the main objectives of the development of metric space theory to generalize these concepts to the resulting abstract setting. Generalizing what is developed in this chapter to the context of metric spaces will be done in Chapters 6 and 8. As we will see in Chapters 6 and 8, some of the current material is generalizable, and some is not. The generalizable concepts include limit, boundary, interior, and isolated points, and limit, continuity, and uniform continuity. Also, some important theorems of Section 3.6 can be generalized to functions defined on compact subsets of metric spaces. Our attempt for generalizing the intermediate value theorem will end in the important concept of connectedness in Chapter 8. The concepts of right and left limit, infinite limit, and limit at infinity cannot be generalized to the metric space setting, as these are dependent on the order relation < of R.

Exercises 1. Complete the proof of Theorem 3.24. 2. Complete the proof of Theorem 3.56. 3. Complete the proof of Theorem 3.45. 4. Complete the proof of Proposition 3.66. 5. Let A be a subset of R. Show that every point of A◦ is an interior point of this set. 6. Let x ∈ R and ε > 0 be arbitrary. Prove that every point of the neighborhood Nε (x) is an interior point of this set. 7. Find a subset A of R such that (A ) is a singleton. Hint. Consider A = { n1 +

: n, m ∈ N} and prove that

1 : n ∈ N ∪ {0} . A = n 1 m

8. If A and B are subsets of R such that A ⊆ B, prove that A ⊆ B  and A◦ ⊆ B ◦ . Is it also true that ∂A ⊆ ∂B? 9. If A (a) (b) (c) (d)

and B are arbitrary subsets of R, then verify that (A ∩ B)◦ = A◦ ∩ B ◦ ; A◦ ∪ B ◦ ⊆ (A ∪ B)◦ , and the inclusion may be strict; (A ∩ B) ⊆ A ∩ B  , and the inclusion may be strict; (A ∪ B) = A ∪ B  .

10. Construct a bounded subset of R which has exactly three limit points. 11. Verify that countable subsets of R have no interior points. 12. Show that for a set A ⊆ R, the following conditions are equivalent. (a) A contains all its limit points. (b) A contains all its boundary points.

Exercises

153

13. Suppose S is a subset of R and f is a function defined on S. If a is a limit point of S and the set {x ∈ S : |f (x)| ≥ ε} is finite for every ε > 0, prove that limx→a f (x) = 0. 14. Give an example of a function f with the following properties. (a) The domain E of f contains exactly one isolated point a (and E is not accordingly an interval). (b) The function f is continuous on E. (c) The range of f is a closed and bounded interval. 15. Define a function f on [0, 1] such that f is discontinuous at every point of [0, 1], but |f | is continuous on this interval. 16. Let [x] denote the integral part of x. Find the points at which the function J(x) = x − [x] is continuous. √ 17. If n is an odd natural number, verify that the nth root function fn (x) = n x is continuous on R. 18. Let g be a function which is continuous on R. (a) Let S = {x ∈ R : g(x) = 0}, and prove that S  ⊆ S. (b) If for every r ∈ Q, g(r) = 0, show that g(x) = 0 for every x ∈ R. 19. In each case

determine the points at which the given function is continuous. x2 x ∈ Q, (a) f (x) = 2x − 1 x ∈ Qc .

sin x x ∈ Q, (b) g(x) = cos x x ∈ Qc . 20. Let A = (0, +∞). Define a function h on A as follows. For every x ∈ Qc ∩ A, h(x) = 0, and if x = m/n ∈ Q ∩ A is such that m and n are relatively prime natural numbers, then h(x) = 1/n. Prove that h is continuous at every irrational x ∈ A and that it is discontinuous at the rational points of A. 21. Let f be a function defined on R with the property that for some M > 0, |f (x)| ≤ M |x|2 holds for every x ∈ R. Prove that f (x) = 0. x 22. Let a be a limit point of the domain of some function g satisfying lim f (x) = lim

x→0

x→0

lim (g(x))2 = L.

x→a

Prove that if L = 0, then limx→a g(x) is also equal to zero. Show, by means of an example, that when L is nonzero, g may fail to have a limit at a. 23. Let f be a function defined on (0, +∞) such that limx→+∞ xf (x) = L, where L is a real number. Show that limx→+∞ f (x) = 0. 24. Assume that limx→a f (x) = L > 0 and limx→a g(x) = +∞. Show that limx→a f (x)g(x) = +∞. If L = 0, show that this conclusion is not necessarily true.

154

3. Limit and Continuity of Real Functions

25. Let f and g be functions which have limit as x tends to +∞. If for all x in some interval (a, +∞), f (x) ≤ g(x), prove that lim f (x) ≤ lim g(x).

x→+∞

x→+∞

26. Let f and g be functions defined on an interval (a, +∞) for some a ∈ R. Assume also that for every x > a, g(x) > 0, and that limx→+∞ (f (x)/g(x)) exists and is equal to L = 0. Prove that (a) if L > 0, then limx→+∞ f (x) = +∞ if and only if limx→+∞ g(x) = +∞; (b) if L < 0, then limx→+∞ f (x) = −∞ if and only if limx→+∞ g(x) = +∞. 27. Let f be a function which is defined on an interval (a, +∞), for some a ∈ R. Show that limx→+∞ f (x) = +∞ if and only if for every sequence {xn } in (a, +∞) that diverges to +∞, the sequence {f (xn )} also diverges to +∞. 28. Suppose that f is a function defined on (0, +∞). Prove that limx→+∞ f (x) = L if and only if limx→0+ f (1/x) = L. 29. Let f be a function defined on R such that for every x, y ∈ R, f (x + y) = f (x) + f (y). Such a function is called additive. (a) Prove that if limx→0 f (x) exists, then the limit must be zero. (b) Show that f has a limit at every c ∈ R. (c) Verify that if f is continuous at some point x0 , then f is continuous at every x ∈ R. (d) If f is continuous on R, show that for every x ∈ R, f (x) = f (1)x. 30. Suppose g is a function defined on R such that for every x, y ∈ R, g(x + y) = g(x)g(y). Prove that (a) if g is continuous at 0, then it is continuous at every x ∈ R; (b) if g(x0 ) = 0 for some x0 ∈ R, then g(x) = 0 for every x ∈ R. 31. Let f and g be functions which are continuous on R. If T = {x ∈ R : f (x) ≥ g(x)}, show that T  ⊆ T . 32. Let f and g be functions defined on R which are continuous at some a ∈ R. Show that the function h defined by h(x) = max{f (x), g(x)} is also continuous at a. What happens if we replace the maximum with minimum in the definition of h? Is the resulting function also continuous at a? Hint. Observe that for every x ∈ R, 1 h(x) = ((f (x) + g(x)) + |f (x) − g(x)|) . 2 33. Let f be a function which is continuous on [a, b]. Define a function g on this interval by g(x) = sup{f (y) : a ≤ y ≤ x}. Prove that g is continuous on [a, b].

Exercises

155

34. Let f be a continuous function defined on [a, b]. If for every x ∈ [a, b] there exists y in the interval such that 1 |f (x)|, 2 show that c ∈ [a, b] can be found with f (c) = 0. |f (y)| ≤

35. If f is a function which is continuous on [0, 1] and f (0) = f (1), show that c ∈ [0, 1/2] can be found such that f (c) = f (c + 12 ). Hint. Consider the function g(x) = f (x) − f (x + 12 ). 36. Let f be continuous on [a, b], f (a) < 0 and f (b) > 0. Let K = {x ∈ [a, b] : f (x) < 0} and k = sup K. Show that f (k) = 0. 37. Let f be continuous on R and lim f (x) = lim f (x) = 0.

x→−∞

x→+∞

Show that f is bounded on R and attains its maximum or minimum on R. Show, by means of an example, that (a) f may fail to attain both its maximum and minimum, (b) if we omit the continuity assumption, then f may fail to be bounded. 38. Let c be a number between a and b, and

g(x) a ≤ x ≤ c, f (x) = h(x) c < x ≤ b. Assume also that g is continuous on [a, c] and h is continuous on [c, b]. Prove that f is continuous on [a, b] if and only if g(c) = h(c). 39. Let I be an interval, not necessarily closed or bounded. If f is a nonconstant function which is continuous on I, prove that f (I) is uncountable. 40. Consider the function

f (x) =

x[ x1 ] x = 0, 1 x = 0,

where [ x1 ] denotes the integral part of x1 . Find the points at which f is discontinuous. In each such point determine the type of discontinuity. 41. In each case find a function f with the given property. (a) The function f is continuous on R\Z. (b) The function f is continuous on R\{1/n : n ∈ N}. (c) The function f is continuous only at 1 and 2. 42. In each case show that the given function is continuous on R. (a) f (x) = (x − [x])(x + [−x]). (b) g(x) = [x] sin πx. 43. Show that the function h(x) = [x] cos πx has a simple discontinuity at every integer. √ 44. Show that the function h(x) = (sin x)/ x attains the maximum and minimum of its values on (0, π].

156

3. Limit and Continuity of Real Functions

45. Give an example of a function g : (0, 1) → R such that g is not continuous on (0, 1), but g attains the maximum and minimum of its values on (0, 1). 46. Let f : [a, b] → R be a continuous function, and let x1 , . . . , xn be elements of [a, b]. Show that z ∈ [a, b] can be found such that f (z) =

f (x1 ) + · · · + f (xn ) . n

47. Find the image of the interval [0, +∞) under the function h(x) = 1/(x2 + 1). 48. Show that the function

⎧ ⎨

1 + x2 0 f (x) = ⎩ −(1 + x2 )

x > 0, x = 0, x < 0,

is not continuous on R but that it has a continuous inverse. 49. Verify that the function g(x) =



x 0 ≤ x ≤ 1, x − 1 2 < x ≤ 3,

is continuous on D = [0, 1] ∪ (2, 3]. Find the inverse function g −1 , and show that it is not continuous on f (D). Does this contradict Theorem 3.88? Why? 50. Show that the absolute value function is uniformly continuous on R. 51. In each case show that the given function is uniformly continuous on the specified interval. (a) f (x) = √ x/(x − 1) √ on [2, +∞). (b) g(x) = x + 1 − x on [0, +∞). 52. Show that the function f (x) = x3 is not uniformly continuous on R. 53. Find a uniformly continuous function from R onto (0, 1]. 54. If f (x) = x and g(x) = sin x, prove that f and g are uniformly continuous on R. Show that, nevertheless, the function f g is not uniformly continuous on R. 55. Let f and g be functions which are uniformly continuous on a set S ⊆ R. (a) Prove that f + g is also uniformly continuous on S. (b) Show that if f and g are both bounded on S, then f g is also uniformly continuous on S. 56. Let f be continuous on (a, b). If f (a+) and f (b−) exist and are finite, show that f is uniformly continuous on (a, b). 57. Let f be a periodic function defined on R. Prove that if limx→+∞ f (x) exists, then f is a constant function. Deduce from this that limx→+∞ sin x and limx→+∞ cos x do not exist. 58. If f is uniformly continuous on (a, b), prove that f is bounded on this interval. Note that if f was only assumed to be continuous on (a, b), it could be unbounded by Example 3.77. 59. Let f be a function which is continuous and monotone on (a, b). Prove that f is uniformly continuous on (a, b) if and only if f is bounded on this interval.

Exercises

157

60. Let f be a function which is continuous on [0, +∞). If for some a > 0, f is uniformly continuous on [a, +∞), prove that f is uniformly continuous on [0, +∞). 61. A function f is said to be Lipschitz on a set S ⊆ R, if there exists K > 0 such that for all x, y ∈ S, |f (x) − f (y)| ≤ K|x − y|. (a) Prove that if f is Lipschitz on S, then f is uniformly continuous on this set. √ (b) Show that f (x) = x is uniformly continuous on [0, 1], but it is not Lipschitz on this interval.

Chapter 4

Derivative and Differentiation

One-variable calculus has two major parts, differential calculus and integral calculus, which we will study in the remainder of this part of the book. Our focus in this chapter will be on the former part in which the notion of derivative plays a decisive role. To motivate our presentation, we begin with some questions. (4.a) Why do we need the concept of derivative? (4.b) If a function f is differentiable on some interval I with derivative f  , does it follow that f  is continuous on I? (4.c) Is any function the derivative of another one? More precisely, if f is a given function, can we find a function g such that f ≡ g  ? (4.d) Suppose f is a function defined by (4.1)

f (x) =

∞ 

an (x − a)n ,

n=0

where the power series converges for every x in some set A of real numbers, A being an interval centered at a or A = R. Is f differentiable on A? (4.e) If f is defined in terms of a power series as in (4.1), how can we find the coefficients an ? (4.f ) When is it possible to expand a given function f as a power series? These are some of the questions that we will answer in this chapter. Question (4.a) will be answered in Section 4.1, where we begin our work with a discussion of tangent lines and the way their study leads us to the concept of derivative. This section also contains some basic examples of differentiable functions and a discussion of one-sided derivatives, differentiability on intervals and higher-order derivatives. In Section 4.2 we present some basic properties of the operation of differentiation. These include the way the derivative behaves in algebraic operations and the problems of differentiability for composite and inverse functions. As a result of 159

160

4. Derivative and Differentiation

some of the techniques developed in this section, we present a differentiable function whose derivative is not continuous everywhere on its domain. This shows that the answer of question (4.b) is negative. Section 4.3 is devoted to an important application of the concept of derivative: a necessary condition for the occurrence of local extrema of functions. We will prove, as a result of this necessary condition, the important fact that derivatives satisfy the intermediate value property. This result allows us to answer question (4.c) in the negative: not every function is the derivative of some other. Section 4.4 begins with one of the most important results of this chapter, namely the mean value theorem. As we will see, this theorem follows from the necessary condition for local extrema established in Section 4.3. After presenting various applications of the mean value theorem, including criteria that allow us to determine monotone functions, we state a more general result, namely Cauchy’s mean value theorem. This result then will be used to prove the 00 case of l’Hˆopital’s rule. Questions (4.d) and (4.e) will be answered in Section 4.5. As we will see therein, the questions are closely related to each other. To answer question (4.d), we prove that functions that are defined in terms of power series are differentiable of any order. More precisely, we show that a power series can be differentiated term-by-term. As a result, we are then able to answer question (4.e). This leads us to the important concept of Taylor series. Our final section is devoted to Taylor’s theorem that allows us to approximate differentiable functions by polynomials. As a result of this theorem, we characterize those functions that can be the sum of their Taylor series. This result gives us an answer to question (4.f ). We recall that in the theory of derivative, functions are defined on intervals of real numbers.

4.1. The Why and What of the Concept of Derivative We recall that in the theory of derivative, functions are defined on intervals of real numbers. We begin with question (4.a). Why do we need the concept of derivative? Mathematically, the concept of derivative appears when we try to find linear approximations of plane curves. Linear approximations allow us to better understand curves in terms of lines, which are the best known plane curves. Here, by linear approximation we mean the approximation of a plane curve by finding its associated tangent lines at various points. To have an idea of the way this works, we first need to know about tangent lines. Let C be a plane curve, and let P be a point on C. Roughly speaking, the tangent line of C at P is a line L that passes through P and is very close to C in the area of this point. See Figure 1 for an illustration of this rough description of tangent lines. Here, the closeness of L to C is essential, and the assumption that L passes through P is not sufficient to ensure that L is tangent to C at P . This is illustrated in Figure 2.

4.1. The Why and What of the Concept of Derivative

161

Figure 1. The line L is the tangent line of the curve C at P .

Figure 2. In each case, the line L cannot be considered as the tangent line of the curve C at P .

In each case in Figure 2, P lies on both C and L, but it is not reasonable to think of L as the tangent line for C at P . This is because L is not sufficiently close to C in the area of P . Note that in the right-hand figure, the curve does not have a tangent line at P . This is because, due to the corner C has at P , no straight line passing through P can be sufficiently close to C in the area of P . Once the tangent lines of a curve C are found at its various points, we can use them to obtain a geometric insight of the curve. This is particularly important when C is the graph of a real function y = f (x). For example, we may use the tangent lines of C to determine the points at which the function f attains a local extremum. To understand how, assume that for every x in a deleted neighborhood of a we have drawn the tangent line Lx at (x, f (x)). It can be seen that if the slope of Lx is positive for every x < a and is negative for every x > a, then f has a local maximum at a (Theorem 4.42 below). This is illustrated in Figure 3.

Figure 3. The slope of the tangent line Lx is positive, while that of Ly is negative.

162

4. Derivative and Differentiation

This motivates us to find the tangent lines of the graph of a function f at its various points. To this end we just need to find the slope of the tangent lines. This is because, once the slope of the tangent line at (a, f (a)) is calculated as m, then the equation of the tangent line is y − f (a) = m(x − a). So it is enough to solve the following problem. (TLP) If a function f is defined in a neighborhood of a, determine the slope of the tangent line of the graph of f at the point (a, f (a)), whenever it exists. (Here, TLP is used as the abbreviation of the tangent line problem.) To solve (TLP), let us consider Figure 4 in which the graph of a continuous function f is drawn together with the tangent line La at (a, f (a)).

Figure 4. The tangent line La whose equation is to be determined.

To find the slope of La , we may use a limit process as follows. Given a point x in the domain of f , we consider the line that passes through the points (x, f (x)) and (a, f (a)), and we call it Lx . As it can be seen in Figure 5, when x tends to a, the slope of Lx approaches that of La .

Figure 5. The slope of Lx is closer to that of La than the slope of Ly .

Analytically, this means that the slope of the tangent line at (a, f (a)) is (4.2)

lim

x→a

f (x) − f (a) , x−a

4.1. The Why and What of the Concept of Derivative

163

because the involved quotient is the slope of the line Lx . Of course, this limit may not exist for certain choices of f and a. For example, if f (x) = |x| and a = 0, then lim

x→a+

f (x) − f (a) |x| = lim+ =1 x−a x x→0

and

f (x) − f (a) |x| = lim− = −1, x−a x x→a x→0 so that the limit in (4.2) does not exist in this case. Note that the graph of the absolute value function resembles the right-hand curve of Figure 2 in the area of the origin. The point is, however, that when the limit in (4.2) exists, it represents the slope of the tangent line at (a, f (a)), or more simply at a. In this case, we give the limit a specific name. lim−

Definition 4.1. Let f be defined in a neighborhood of a. We define the derivative of f at a, denoted by f  (a), to be the limit f (x) − f (a) , x−a whenever it exists as a real number. In this case we say that f is differentiable at a. lim

x→a

Geometrically, our definition implies that f has a tangent line at a if and only if it is differentiable at this point, and when this is the case, the derivative f  (a) gives us the slope of the tangent line to the graph of f at (a, f (a)). Example 4.2. Our above discussion shows that the function f (x) = |x| is not differentiable at 0. Geometrically, this means that the graph of f fails to have a tangent line at the origin. Recall that the graph of f has a corner at (0, 0) (see Figure 6). Example 4.3. If f is a constant function, then f is differentiable at every point a and f  (a) = 0. Example 4.4. Let L be any straight line, and let P be a point on L. Show that the tangent line of L at P is L itself. Solution. Suppose L is represented by the equation f (x) = ax + b and that P = (z, az + b). Then, the slope of the tangent line at P is (ax + b) − (az + b) = a. x−z Thus, the tangent line at P is represented by y −(az +b) = a(x−z) or, equivalently, y = ax + b, which is the equation of L itself. The above example helps us to better understand Example 4.3. Given a constant function, say f (x) = c for every x ∈ R, f is represented geometrically by the horizontal line y = c whose slope is 0. This justifies the result of Example 4.3 geometrically. f  (z) = lim

x→z

164

4. Derivative and Differentiation

What does Example 4.4 say? Example 4.4 says that the linear approximations of a line are the same as the line itself. This is not surprising, because our intention in using linear approximations was to approximate complicated curves in terms of lines which are the simplest plane curves. So when a curve is in the simplest form, it is not possible to approximate it using simpler objects. Example 4.5. Suppose f is differentiable at a and {xn } is a sequence in the domain of f such that xn = a for every n. If {xn } converges to a, prove that lim

n→∞

f (xn ) − f (a) = f  (a). xn − a

Solution. Define a function F on the domain of f by ⎧ f (x)−f (a) x = a, ⎨ x−a F (x) = ⎩ f  (a) x = a. Since limx→a F (x) = f  (a) = F (a), F is continuous at a. Now the assumption limn→∞ xn = a and Theorem 3.70 imply that f (xn ) − f (a) = lim F (xn ) = F (a) = f  (a). n→∞ n→∞ xn − a lim

Example 4.6. If α is a fixed nonzero integer, prove that the function f (x) = xα is differentiable at every a where it is defined and f  (a) = αaα−1 .

(4.3)

Solution. We consider two cases for α. (1) α ∈ N. If we denote α by n in this case, then for an arbitrary real number a and every x = a, xn − a n = xn−1 + axn−2 + · · · + an−2 x + an−1 . x−a Hence by letting x tend to a in the above equality, we obtain (4.3) with α = n. (2) The number α is equal to −n for some natural number n. In this case for an arbitrary nonzero real number a and every nonzero x = a, x−n − a−n 1/xn − 1/an −(xn−1 + xn−2 a + · · · + xan−2 + an−1 ) = = . x−a x−a xn a n Thus, it follows by letting x tend to a that −nan−1 x−n − a−n = = −na−n−1 . x→a x−a a2n lim

This proves (4.3) with α = −n. Example 4.7. Prove that the natural logarithmic function f (x) = ln x is differentiable at every a > 0.

4.1. The Why and What of the Concept of Derivative

165

Solution. We use the inequalities y−1 ≤ ln y ≤ y − 1, y

(4.4)

which are valid for every y > 0. For a, x > 0, let y = x/a in (4.4) to obtain x x − a x−a ≤ ln . ≤ x a a Since ln(x/a) = ln x − ln a, this shows that 1 ln x − ln a 1 ≤ ≤ x x−a a

(4.5) when x > a > 0, and

1 ln x − ln a 1 ≤ ≤ a x−a x

(4.6)

if a > x > 0. Now by letting x tend to a in (4.5) and (4.6), we deduce that f  (a) = 1/a. Before proceeding to some more examples of the calculation of derivatives, it will be helpful to recall an equivalent formulation of the limit (4.2). This can be obtained from (4.2) by changing the variable to h = x − a. Since tending x to a amounts to approaching h to 0, we find that lim

x→a

f (x) − f (a) f (a + h) − f (a) = lim . h→0 x−a h

So to compute the derivative of a function, we may use the right-hand limit above. Example 4.8. Show that the sine function is differentiable at every x ∈ R and find the value of its derivative. Solution. Since sin(x + h) = sin x cos h + cos x sin h for all x, h ∈ R, sin(x + h) − sin x sin x(cos h − 1) + cos x sin h = h h when h = 0. Hence sin(x + h) − sin x lim = sin x h→0 h



cos h − 1 lim h→0 h





sin h + cos x lim h→0 h

 .

Since limh→0 (cos h − 1)/h = 0 and limh→0 (sin h)/h = 1 by Example 3.35, this shows that the derivative of the sine function at x is cos x. Exercise 4.9. Use the identity cos(x + h) = cos x cos h − sin x sin h to prove that the cosine function is differentiable at every x ∈ R with derivative − sin x. The following proposition reveals the reason we have drawn the graph of a continuous function when we want to motivate our definition of derivative geometrically (see Figure 4). Proposition 4.10. If f is differentiable at a, then f is continuous at a.

166

4. Derivative and Differentiation

Proof. For every x in a deleted neighborhood of a,   f (x) − f (a) f (x) − f (a) = (x − a). x−a So, letting x tend to a, we obtain lim (f (x) − f (a)) = f  (a) lim (x − a).

x→a

x→a

Since limx→a (x−a) = 0, the continuity of f at a follows from the above equality.



What does Proposition 4.10 say? Proposition 4.10 says that the continuity of a function at some point is a necessary condition for its differentiability at this point. In other words, a function that is not continuous at some point a cannot be differentiable at this point. This is evident if we notice the geometric meaning of the derivative and that of discontinuity. The converse of Proposition 4.10 is not true. To see this, note that the function f (x) = |x| is continuous and yet not differentiable at 0. One-Sided Derivatives. As we saw for the function f (x) = |x|, the limit used to define the derivative of f at 0 does not exist. Recall, nevertheless, that the associated one-sided limits exist. This motivates us to present the following definition. Definition 4.11. Suppose f is defined on an interval [a, b). The right derivative  (a), is defined to be of f at a, denoted by f+ f (x) − f (a) x−a  (a), is defined whenever it exists as a real number. The left derivative of f at a, f− similarly using the left limit when f is defined on an interval (c, a]. lim

x→a+

 (a) It is clear that with this definition, f is differentiable at a if and only if f+  (a) exist as real numbers and are equal. and f−   Example 4.12. If f (x) = |x|, then f+ (0) = 1 and f− (0) = −1, as we observed before.

From a geometric point of view, the one-sided derivatives of f at a can be used to define the right and left tangent lines at this point. These lines are represented algebraically by the equations (4.7)

 (a)(x − a) y − f (a) = f+

and (4.8)

 (a)(x − a), y − f (a) = f−

respectively. When f is differentiable at a, these lines coincide with the tangent line at a. When the graph of f has a corner at a, the lines (4.7) and (4.8) are not the same. This is the case for the function f (x) = |x| and the point a = 0. In this case, the right and left tangent lines are y = x and y = −x, respectively, which coincide with the relevant branches of the graph of f . This can be seen in Figure 6.

4.1. The Why and What of the Concept of Derivative

167

Figure 6. The graph of the absolute value function.

Example 4.13. Find those values of a and b for which the function ⎧ 2 ⎨ x + 1 x < 0, f (x) = ⎩ ax + b x ≥ 0, is differentiable at 0. Solution. If f is to be differentiable at 0, it should be continuous at this point by Proposition 4.10. Since f (0+) = b and f (0−) = 1, we must have b = 1. To be sure that f is differentiable at 0, the limits  f+ (0) = lim

x→0+

(ax + 1) − (1) =a x−0

and

(x2 + 1) − (1) =0 x−0 x→0 must be equal, forcing a = 0. In summary, the function f is differentiable at 0 if and only if a = 0 and b = 1.  (0) = lim− f−

Differentiability on Intervals. Just like what we did for continuity, it is convenient to consider differentiability of functions on intervals. Definition 4.14. Suppose I is an interval. We say that a function f is differentiable on I if the following statements are true. (1) The function f is differentiable at every interior point of I. (2) If a is a boundary point of I which is also an element of I, then f has a one-sided derivative at a. As a matter of convention, when f is differentiable on I and we speak of f  (x) for x ∈ I, the derivative is understood to be one sided at the boundary points. In particular, f is differentiable on [a, b] if (and only if) f is differentiable at every   (a) and f− (b) exist. x ∈ (a, b) and f+

168

4. Derivative and Differentiation

Example 4.15. Given any a, b ∈ R and an interval I which does not contain 0, the function f of Example 4.13 is differentiable on I. When 0 is an interior point of I, f is differentiable on I if and only if a = 0 and b = 1. When f is differentiable on some interval I, we can define a function f  on I as follows. The value of f  at each x ∈ I is simply f  (x). We call f  the derivative of f on I. Example 4.16. It follows from Example 4.8 that the sine function is differentiable on R with its derivative being the cosine function. Also, Example 4.7 shows that the natural logarithmic function f (x) = ln x is differentiable on (0, +∞) with its derivative being the function f  (x) = 1/x. Higher Order Derivatives. When a function f is differentiable on some interval I, it is quite natural to think of whether the derivative f  is also differentiable on I or not. This motivates us to think of higher order derivatives of differentiable functions. Higher order derivatives can be defined both at points and on intervals. For example, the pointwise definition of the second order derivative f  (a) requires the assumption that f is differentiable on a neighborhood of a. Also, when f is differentiable on some interval I and f  is also differentiable on this interval, we obtain the second order derivative of f as a function on I. If f  is also differentiable on I, we obtain the third order derivative f (3) of f as a function on I, and so forth. In general we denote the nth order derivative of a function f by f (n) , whenever it exists. We also make the convention that f (0) is the function f itself. Example 4.17. The sine function is differentiable of any order on R. In fact, if f (x) = sin x, then f (2n−1) (x) = (−1)n+1 cos x and f (2n) (x) = (−1)n sin x for every n ∈ N and every x ∈ R. Exercise 4.18. Verify that the cosine function is differentiable of any order on R and find formulas that determine the higher order derivatives of this function. Example 4.19. We know that the function f (x) = ln x is differentiable on (0, +∞) with f  (x) = 1/x. By Example 4.6, f  is differentiable on (0, +∞) with f  (x) = −1/x2 . Continuing in this way and using Example 4.6, we find that f (n) (x) = (−1)n+1 (n − 1)! x−n for every n ∈ N and every x > 0.

4.2. The Basic Properties of Derivative Having seen some examples of derivatives and differentiable functions, it is now time to present those basic properties that govern the operation of differentiation. As usual, the first result in this connection concerns the way derivative treats the algebraic operations.

4.2. The Basic Properties of Derivative

169

Theorem 4.20. Let f and g be differentiable at a. (1) The functions f + g and f − g are differentiable at a and (f + g) (a) = f  (a) + g  (a), (f − g) (a) = f  (a) − g  (a). (2) For every real constant c, the function cf is differentiable at a and (cf ) (a) = cf  (a). (3) The function f g is differentiable at a and (f g) (a) = f  (a)g(a) + f (a)g  (a). (4) If g is nonzero on its domain, then the function f /g is differentiable at a and   f f  (a)g(a) − f (a)g  (a) (a) = . g (g(a))2 Proof. (1) We just prove the statement that concerns f + g. This follows easily from the equation (f + g)(x) − (f + g)(a) f (x) − f (a) g(x) − g(a) = + , x−a x−a x−a which is valid for every x = a, by letting x tend to a. (2) Since for every x = a, (cf )(x) − (cf )(a) =c x−a



f (x) − f (a) x−a

 ,

the result follows by letting x tend to a. (3) Given x = a, we observe that (f g)(x) − (f g)(a) x−a

f (x)g(x) − f (a)g(x) + f (a)g(x) − f (a)g(a) x−a f (x) − f (a) g(x) − g(a) = g(x) + f (a) . x−a x−a

=

What we claimed now follows by letting x tend to a and by noticing that g is continuous at a. (4) For every x = a, (f /g) (x) − (f /g) (a) x−a

f (x)g(a) − f (a)g(x) (x − a)g(x)g(a) f (x)g(a) − f (a)g(a) + f (a)g(a) − f (a)g(x) = (x − a)g(x)g(a)     f (x) − f (a) f (a) 1 g(x) − g(a) − = . x−a g(x) g(x)g(a) x−a

=

The desired result now follows by letting x tend to a, using the fact that g is continuous at a and a simple calculation. 

170

4. Derivative and Differentiation

A note on Theorem 4.20. Because of the identities (f + g) ≡ f  + g  and (cf ) ≡ cf  , we say that the derivative is a linear operation. This means that the derivative respects the operations of addition and scalar multiplication: the derivative of the sum of two differentiable functions is the sum of their derivatives. Similarly, the derivative of a scalar multiple of a differentiable function is the same scalar multiple of the derivative of that function. Items (3) and (4) of Theorem 4.20 show that the derivative does not behave in multiplication and division as nicely as one may expect. For example, the derivative of a product is not the product of the derivatives. Example 4.21. Determine the points at which the function f (x) = tan x is differentiable and find the value of the derivative wherever it exists. Solution. The function f is defined everywhere in R except at the points kπ + π/2, k ∈ Z, because these are the points at which the cosine function is equal to 0. If x is not such an element of R, then by Theorem 4.20(4), Example 4.8, and Exercise 4.9, f  (x) =

cos x cos x − (− sin x) sin x = 1 + tan2 x. cos2 x

Exercise 4.22. Show that the function g(x) = cot x is differentiable everywhere except at the points kπ, k ∈ Z. If x is a point at which g is defined, prove that g  (x) = −(1 + cot2 x). Next, we determine the way the derivative behaves in the composition of functions. Theorem 4.23 (The chain rule). Let f and g be functions defined on intervals I and J, respectively, such that f (I) ⊆ J. If a ∈ I, f is differentiable at a and g is differentiable at f (a), then the composite function g ◦ f is differentiable at a and (g ◦ f ) (a) = g  (f (a))f  (a).

(4.9)

Proof. Define a function G on J by ⎧ g(y)−g(f (a)) ⎨ y−f (a) G(y) = ⎩ g  (f (a))

y = f (a), y = f (a).

Then G is continuous at f (a) because limy→f (a) G(y) = g  (f (a)) = G(f (a)). Since f is continuous at a, Theorem 3.73 tells us that the composite function G ◦ f is continuous at a, that is, (4.10)

lim G ◦ f (x) = G ◦ f (a) = g  (f (a)).

x→a

On the other hand, our definition of G implies that for every x = a (4.11)

f (x) − f (a) g ◦ f (x) − g ◦ f (a) = G ◦ f (x) , x−a x−a

4.2. The Basic Properties of Derivative

171

no matter whether f (x) is equal to f (a) or not. Now we get (4.9) by letting x tend to a in (4.11), noticing (4.10), and using the assumption that f is differentiable at a.  Using the results established in this section, we are now able to answer question (4.b). In fact, the following example shows that the question has a negative answer. Example 4.24. Define a function f on R by f (x) = x2 sin(1/x) if x = 0 and f (0) = 0. Since x2 sin(1/x) 1 = lim x sin = 0 x→0 x→0 x x by reasoning similar to that of Example 3.23, f is differntiable on R with ⎧ ⎨ 2x sin x1 − cos x1 x = 0,  f (x) = ⎩ 0 x = 0. f  (0) = lim

But f  is not continuous on R, because it is not continuous at 0. This is because, by Example 3.22, the function cos(1/x) does not tend to a limit when x approaches 0. As our final result in this section, we find a formula for the derivative of inverse functions. Theorem 4.25. Suppose f is one-to-one and continuous on a neighborhood I of a. If f is differentiable at a with f  (a) = 0, then the inverse function f −1 is differentiable at f (a) and (4.12)

(f −1 ) (f (a)) =

1 f  (a)

.

Proof. Since f  (a) exists and is nonzero, lim

b→a

1 b−a =  . f (b) − f (a) f (a)

So given ε > 0 we can find δ > 0 such that 0 < |b − a| < δ and b ∈ I imply    b−a 1   (4.13)  f (b) − f (a) − f  (a)  < ε. On the other hand, f −1 is continuous on f (I) by Theorem 3.88. Thus, γ > 0 exists such that |v − f (a)| < γ and v ∈ f (I) imply |f −1 (v) − f −1 (f (a))| < δ. Therefore, if v = f (b) and |v − f (a)| < γ, then (4.13) tells us that   −1  f (v) − f −1 (f (a)) 1   − < ε,  v − f (a) f  (a)  proving (4.12). Note that when b = a, f (b) = f (a), so that (4.13) is meaningful. This is because f is assumed to be one-to-one on I. 

172

4. Derivative and Differentiation

What does Theorem 4.25 say? Theorem 4.25 says that the derivative of the inverse of a function f at f (a) is the inverse of the derivative of f at a, provided that the latter derivative is nonzero. Notice that the words “derivative” and “inverse” are interchanged nicely in this interpretation. Example 4.26. We know that the exponential function is the inverse function of g(x) = ln x, that is, g −1 (y) = ey for every y ∈ R. In other words, if y = ln x, then x = ey . Hence Theorem 4.25 and Example 4.7 show that (g −1 ) (y) =

1 1 = = x = ey , g  (x) 1/x

for every y ∈ R. This means that the exponential function is differentiable on R with its derivative being the function itself. Example 4.27. The function f (x) = sin x is strictly increasing on [−π/2, π/2]. (If you cannot recall this from calculus, see Example 4.40 below.) Thus, the function is one-to-one and hence invertible on [−π/2, π/2]. The inverse function f −1 (y) = arcsin y is therefore a function from [−1, 1] onto [−π/2, π/2]. We may use the above theorem to find the derivative of f −1 . If y = sin x for some x ∈ (−π/2, π/2), then x = arcsin y, and hence by Theorem 4.25 (4.14)

(f −1 ) (y) =

1 1 = . f  (x) cos x

2 2 But, since  sin x + cosx = 1 and the cosine function is positive on (−π/2, π/2), 2 cos x = 1 − sin x = 1 − y 2 . This, in view of (4.14) shows that

1 (f −1 ) (y) =  1 − y2 for every y ∈ (−1, 1).

4.3. Local Extrema and Derivative As we mentioned in the introductory part of this chapter, the concept of derivative can be efficiently applied in finding the local extrema of a function. In this section we present a first manifestation of this fact. To begin with, we first need to show what is meant by a local extremum. Definition 4.28. Let f be a function which is defined in a neighborhood of a. We say that f has a local maximum (resp., minimum) at a if for every x in a neighborhood of a, f (x) ≤ f (a) (resp., f (x) ≥ f (a)). When we want to emphasize that f has a local maximum or a local minimum at a, we say that f has a local extremum at a.

4.3. Local Extrema and Derivative

173

When does f fail to have a local extremum at a? To answer this question, we should negate both statements that f has a local maximum at a and that f has a local minimum at a. Using the fact that negating the existential quantifier gives us the universal quantifier, the answer to our question will be as follows: The function f fails to have a local extremum at a if for every ε > 0, x1 and x2 in the ε-neighborhood of a can be found such that f (x1 ) > f (a) and f (x2 ) < f (a). As an important application of derivative, we obtain a necessary condition for local extrema. As we will see shortly, this result is used in the proof of some crucial results in which derivative plays a role. Theorem 4.29. If f has a local extremum at a, then either f is not differentiable at a or it is differentiable at a and f  (a) = 0. Proof. It is enough to show that when f is differentiable at a, f  (a) is necessarily equal to 0. So, assume that f is differentiable at a and that f has a local maximum at a. The proof is similar when f has a local minimum at a. By our assumption there exists δ > 0 such that f (x) ≤ f (a) for every x with |x − a| < δ. So, if a < x < a + δ, then f (x) − f (a) ≤ 0. x−a This implies that (4.15)

lim

x→a+

f (x) − f (a) ≤ 0. x−a

Similarly, f (x) − f (a) ≥0 x−a for a − δ < x < a, and this shows that (4.16)

lim

x→a−

f (x) − f (a) ≥ 0. x−a

Since f is assumed to be differentiable at a, it follows from (4.15) and (4.16) that f  (a) = 0.  What does Theorem 4.29 say? Since f  (a) represents the slope of the tangent line at (a, f (a)), Theorem 4.29 says that when f has a local extremum at a, then for the graph of f at (a, f (a)) exactly one of the following statements is true: (1) The graph of f does not have a tangent line at (a, f (a)). (2) The graph of f has a horizontal tangent line at (a, f (a)). These are illustrated in Figure 7.

174

4. Derivative and Differentiation

Example 4.30. The converse of Theorem 4.29 is not true. In fact, for the function f (x) = x3 , we have f  (0) = 0, but f does not have a local extremum at 0. To see this, note that given any ε > 0, the ε-neighborhood of 0 contains ε/2 and −ε/2. But f (ε/2) = ε3 /8 > 0 = f (0), so that f (0) cannot be the maximum of f on (−ε, ε). Similarly, f (−ε/2) = −ε3 /8 < 0 = f (0), showing that f (0) cannot be the minimum of f on (−ε, ε).

Figure 7. The function f is not differentiable at a, while g is differentiable at b and g  (b) = 0.

The Intermediate Value Property for Derivatives. Although we observed as a result of Example 4.24 that the derivative of a differentiable function may fail to be continuous everywhere on its domain, the following theorem shows that derivatives satisfy a certain weaker property. Theorem 4.31 (The Intermediate Value Property for Derivatives). If f is differentiable on some interval I, then f  satisfies the intermediate value property on I. Proof. We show that for arbitrary a, b ∈ I, if y0 lies between f  (a) and f  (b), then x0 ∈ (a, b) can be found such that f  (x0 ) = y0 . For this, we just consider the case f  (a) < y0 < f  (b). The proof is similar when f  (b) < y0 < f  (a). Define a function g by g(x) = f (x) − y0 x. Then g is continuous on [a, b] and hence attains the minimum of its values at some x0 ∈ [a, b]. It can be shown that x0 ∈ (a, b). In fact, if we assume that x0 = a, then the assumption that g(x) ≥ g(a) for every x ∈ (a, b] implies that f (x) − f (a) ≥ y0 , x−a from which, by letting x tend to a, we obtain the contradiction f  (a) ≥ y0 . Similarly, by assuming x0 = b, we get the contradiction f  (b) ≤ y0 . Hence, Theorem 4.29 shows that g  (x0 ) = 0. The equality f  (x0 ) = y0 we are  interested in follows from g  (x0 ) = 0. This completes the proof.

4.4. The Mean Value Theorem: More Applications of Derivative

175

What does Theorem 4.31 say? Theorem 4.31 says that when a function does not satisfy the intermediate value property on some interval I, the function cannot be the derivative of any function on that interval. Thus, Theorem 4.31 allows us to answer question (4.c) in the negative. This is the content of the following example. Example 4.32. The function f , defined by f (x) = −1 for x < 0 and f (x) = 1 when x ≥ 0, does not satisfy the intermediate value property on the interval [−1, 1]. This is because f (−1) = −1 < 0 < 1 = f (1), but there is no x ∈ [−1, 1] such that f (x) = 0. Thus, f cannot be the derivative of any function on [−1, 1]. Further instances of functions which are not the derivative of any function can be found in Exercise 12 at the end of this chapter.

4.4. The Mean Value Theorem: More Applications of Derivative Perhaps the most important application of Theorem 4.29 is that we can use it to prove the mean value theorem, a central result in differential calculus which can be utilized to deduce the fundamental theorem of calculus in the next chapter (Theorem 5.39). Theorem 4.33 (The Mean Value Theorem). If f is continuous on [a, b] and differentiable on (a, b), then there exists c ∈ (a, b) such that (4.17)

f  (c) =

f (b) − f (a) . b−a

Proof. Define a function h on [a, b] by h(x) = (f (b) − f (a))x − (b − a)f (x). Our assumptions imply that h is continuous on [a, b] and differentiable on (a, b). Also, it can be easily seen that h(a) = h(b) = af (b) − bf (a). By our definition of h, to prove that (4.17) holds for some c ∈ (a, b), it is enough to show that h (c) = 0. If h is a constant function, then h (c) = 0 holds for every c ∈ (a, b) by Example 4.3. Otherwise, there exists d ∈ (a, b) such that h(d) = h(a). We now consider two cases as follows. (1) h(d) > h(a). In this case h(a) (and hence h(b)) cannot be the maximum of h on [a, b]. Hence h attains the maximum of its values on [a, b] in some c ∈ (a, b), and by Theorem 4.29, h (c) = 0. (2) h(d) < h(a). Then h(a) (and hence h(b)) cannot be the minimum of the values of h on [a, b]. Thus h attains its minimum in some c ∈ (a, b), and Theorem 4.29 says that h (c) = 0. 

176

4. Derivative and Differentiation

What does the mean value theorem say? The mean value theorem reveals a completely natural fact. To see this, recall that f  (c) denotes the slope of the tangent line at (c, f (c)), and f (b) − f (a) b−a is that of the line that passes through the points (a, f (a)) and (b, f (b)). Thus, the mean value theorem says that when f is continuous on [a, b] and differentiable on (a, b), c ∈ (a, b) can be found such that the tangent line at (c, f (c)) is parallel to the line that passes through the points (a, f (a)) and (b, f (b)). This is illustrated in Figure 8.

Figure 8. A point c for which the conclusion of the mean value theorem is true.

The remainder of this section is devoted to examples and results that reveal the importance of the mean value theorem. Notice the variety of contexts in which the theorem can be applied. Example 4.34. If f is differentiable on some interval I and f  is bounded on this interval, prove that f is uniformly continuous on I. Solution. Suppose that M > 0 is such that for every x ∈ I, |f  (x)| ≤ M . Given x, y ∈ I with x < y, we may use the mean value theorem to find c ∈ (x, y) such that f (y) − f (x) = f  (c)(y − x). Taking absolute value from the both sides of this equality and using the bound M , we obtain |f (y) − f (x)| ≤ M |y − x|. This shows that f is a Lipschitz and hence uniformly continuous function on I. (See Exercise 61 at the end of Chapter 3.) Example 4.35. Show that the equation sin x+x cos x = 0 has at least one solution between 0 and π. Solution. Define a function f on [0, π] by f (x) = x sin x. Then f is continuous on [0, π] and differentiable on (0, π) with f  (x) = sin x + x cos x. Since f (0) =

4.4. The Mean Value Theorem: More Applications of Derivative

177

f (π) = 0, the mean value theorem gives us some c ∈ (0, π) such that f  (c) = sin c + c cos c = 0. Example 4.36. Prove that for every x ∈ R, ex ≥ 1 + x. Verify that equality holds if and only if x = 0. Solution. It is clear that equality holds for x = 0. If x > 0, we may apply the mean value theorem to the function f (x) = ex on the interval [0, x]. This gives us some c ∈ (0, x) such that (4.18)

ex − e0 = ec (x − 0).

Since c > 0 and f is strictly increasing, ec > e0 = 1. So (4.18) implies ex > 1 + x. A similar reasoning shows that the last inequality is also true for every x < 0. We leave the details as an exercise. As another application of the mean value theorem, we can prove the following converse of Example 4.3. Proposition 4.37. If f is differentiable on an interval I and f  (x) = 0 for every x ∈ I, then f is constant on I. Proof. Assume to the contrary that f is not constant on I. Then there exist x1 , x2 ∈ I such that f (x1 ) = f (x2 ). The assumption that f is differentiable on I allows us to apply the mean value theorem on the interval [x1 , x2 ] to find c ∈ (x1 , x2 ) ⊂ I such that f (x2 ) − f (x1 ) = 0. f  (c) = x2 − x1 This contradicts our assumption that f  is identically zero on I.  Example 4.38. The conclusion of Proposition 4.37 does not remain true if we replace I with an arbitrary subset of R. To see this consider ⎧ ⎨ 1 1 < x < 2, f (x) = ⎩ 2 3 < x < 4. The function f is differentiable on I := (1, 2) ∪ (3, 4) with f  (x) = 0 for every x ∈ I, but it is not constant on I. Monotonicity in Terms of Derivative. Since monotonicity of functions is defined in terms of order, it is not easy to believe that it can be determined using the concept of derivative. The following theorem shows, nevertheless, that derivative can be fruitfully applied to determine monotone functions. Theorem 4.39. Let f be differentiable on some interval I. (1) If f (x) ≥ 0 (resp., f (x) > 0) for every interior point x of I, then f is increasing (resp., strictly increasing) on I. (2) If f (x) ≤ 0 (resp., f (x) < 0) for every interior point x of I, then f is decreasing (resp., strictly decreasing) on I.

178

4. Derivative and Differentiation

Proof. We only prove (1) because then (2) can be proved similarly. First assume that f (x) ≥ 0 for every interior point x of I. If x1 , x2 ∈ I are such that x1 < x2 , then f is continuous on [x1 , x2 ] and differentiable on (x1 , x2 ). So, by the mean value theorem, c ∈ (x1 , x2 ) can be found such that (4.19)

f (x2 ) − f (x1 ) = f  (c)(x2 − x1 ).

Since c is indeed an interior point of I, our nonnegativity assumption shows f  (c) ≥ 0. Thus, it follows from (4.19) that f (x2 ) ≥ f (x1 ). This proves that f is increasing on I. If x1 , x2 and c are as above and we assume that f  (x) > 0 for every interior point x of I, then (4.19) yields that f (x2 ) > f (x1 ), that is, f is strictly increasing on I.  Example 4.40. Since for every x ∈ (−π/2, π/2), the derivative of the sine function at x is cos x > 0, this function is strictly increasing on [−π/2, π/2]. Next we prove an inequality which will be used in the proof of Weierstrass’s approximation theorem (Theorem 9.17). Example 4.41. Prove that for every n ∈ N and every x ∈ [0, 1], (1 − x2 )n ≥ 1 − nx2 . Solution. The inequality is made into an equality if n = 1. So, assume that n is a fixed natural number greater than 1. Consider f (x) = (1 − x2 )n − 1 + nx2 as a function on [0, 1]. For every x ∈ (0, 1), f  (x) = 2nx(1 − (1 − x2 )n−1 ) > 0. Hence, Theorem 4.39 shows that f is strictly increasing on [0, 1]. This implies that for every x ∈ (0, 1], f (x) > f (0) = 0 from which we find (1 − x2 )n > 1 − nx2 if 0 < x ≤ 1. Since the above inequality is made into an equality if we let x = 0, we have proved our desired result. As is shown in Example 4.30, Theorem 4.29 presents a necessary condition for local extrema. The following theorem presents a sufficient condition for the occurrence of the local maxima. We leave it to the reader to state and prove a similar assertion for local minima. Theorem 4.42. Let f be defined at a and differentiable in a deleted neighborhood of a. If f  (x) ≥ 0 for every x < a and f  (x) ≤ 0 for every x > a in this neighborhood, then f has a local maximum at a. Proof. Let δ be the radius of the aforementioned deleted neighborhood. We claim that for every x ∈ (a − δ, a + δ), f (x) ≤ f (a). To see this, consider an arbitrary x0 = a in (a − δ, a + δ). We have two cases.

4.4. The Mean Value Theorem: More Applications of Derivative

179

(1) x0 < a. Since we assumed f  (x) ≥ 0 for every x < a, it follows from Theorem 4.39 that f is increasing on (a − δ, a]. Thus x0 < a implies f (x0 ) ≤ f (a). (2) x0 > a. Since we assumed f  (x) ≤ 0 for every x > a, Theorem 4.39 tells us that f is decreasing on [a, a+δ). So the assumption x0 > a yields f (x0 ) ≤ f (a).  Exercise 4.43. Find the local extrema of the function f (x) = x2 (1 − x)2 . Cauchy’s Mean Value Theorem. Now that we learned the mean value theorem, its geometric interpretation and some applications, it is time to present a more general result which is usually known as Cauchy’s mean value theorem. Theorem 4.44 (Cauchy’s Mean Value Theorem). Let f and g be continuous on [a, b] and differentiable on (a, b). Then c ∈ (a, b) can be found such that (f (b) − f (a))g  (c) = (g(b) − g(a))f  (c). Note that the mean value theorem is the special case of Theorem 4.44 with g(x) = x. This is the reason we just present a hint for the proof. Proof of Theorem 4.44. Define a function h by h(x) = (g(b) − g(a))f (x) − (f (b) − f (a))g(x), and proceed as in the proof of the mean value theorem. The details are left as exercise.  As an important consequence of Theorem 4.44, we prove l’Hˆ opital’s rule for the calculation of indeterminate limits of the form 00 . Theorem 4.45 (l’Hˆopital’s Rule for the Indeterminate Form 00 ). Let f and g be continuous on a neighborhood I of a, and let f and g be differentiable on I\{a} with g  being nonzero on I\{a}. If g is nonzero on I\{a}, lim f (x) = lim g(x) = 0

x→a

x→a

and (4.20)

f  (x) = L, x→a g  (x) lim

then (4.21)

lim

x→a

f (x) = L. g(x)

Proof. We consider three cases for L, as follows. (1) L is a real number. To prove the result in this case, let ε > 0 be given. By (4.20), δ > 0 exists such that (a − δ, a + δ) ⊆ I and for every x = a in (a − δ, a + δ),     f (x)    < ε. (4.22) − L  g  (x)  Notice that the quotient in (4.22) is meaningful by the assumption that g  is nonzero on I\{a}. For every z ∈ (a, a + δ), we may apply Cauchy’s mean value theorem to f and g on [a, z] to find some c ∈ (a, z) such that (f (z) − f (a))g  (c) = (g(z) − g(a))f  (c).

180

4. Derivative and Differentiation

By our assumptions, f (a) = g(a) = 0 and so f  (c) f (z) =  . g(z) g (c) Since c ∈ (a, z) ⊂ I, it follows from (4.22) that    f (z)     g(z) − L < ε. This shows that f (x) = L. g(x) Since a similar argument shows that lim

x→a+

lim−

x→a

f (x) = L, g(x)

(4.21) follows in this case. (2) L = +∞. To prove (4.21) in this case, let M > 0 be given. By (4.20), δ > 0 can be found such that (a − δ, a + δ) ⊆ I and for every x with 0 < |x − a| < δ, f  (x) > M. g  (x)

(4.23)

For every y with a < y < a + δ, applying Cauchy’s mean value theorem to f and g on the interval [a, y] we find c ∈ (a, y) such that f (y) f  (c) =  . g(y) g (c) Since c ∈ (a, y) ⊂ (a − δ, a + δ), (4.23) yields that f (y) > M. g(y) This means that lim

f (x) = +∞. g(x)

lim

f (x) = +∞ g(x)

x→a+

Since we can prove that x→a−

by a similar argument, (4.21) follows. (3) L = −∞. In this case, the proof is similar to the case L = +∞.



Example 4.46. Find the value of lim

x→ π 2

sin(cos x) . x − π2

Solution. The given limit is clearly of the indeterminate form l’Hˆopital’s rule and the chain rule, we obtain limπ

x→ 2

sin(cos x) − sin x cos(cos x) = −1. = limπ π x→ 2 x− 2 1

0 0.

Using

4.5. Taylor Series: A First Glance

181

4.5. Taylor Series: A First Glance Our aim in this section is to answer questions (4.d) and (4.e) which, as we will see shortly, are closely related to each other. So, let us begin with question (4.d) that asks about the differentiability of functions which are defined in terms of power series. To answer this question, let us note that we may consider a power series ∞ 

(4.24)

an (x − a)n ,

n=0

informally, as a polynomial with infinitely many terms: (4.25)

a0 + a1 (x − a) + a2 (x − a)2 + · · · + an (x − a)n + · · · .

We wish to be able to differentiate this just like we compute the derivative of a polynomial. More precisely, if f (x) denotes the sum of (4.24) or (4.25) for every x in some interval centered at a, then we hope to see that f  (x) = a1 + 2a2 (x − a) + · · · + nan (x − a)n−1 + · · · or f  (x) =

∞ 

nan (x − a)n−1 .

n=1

But, this seems to be just a dream, because a power series is not actually a polynomial with infinitely many terms. The good news is that this dream can be indeed made into reality. This follows from the following theorem, in which we let a = 0 for the sake of simplicity. Theorem 4.47. Suppose that for every x ∈ (−R, R), f (x) =

∞ 

a n xn ,

n=0

where 0 < R ≤ ∞ is the radius of convergence of the power series. Then, f is differentiable on (−R, R) and (4.26)



f (x) =

∞ 

nan xn−1

n=1

for every x in this interval. To prove this theorem, we first establish a useful lemma. ∞ ∞ n n−1 Lemma 4.48. The power series have the same n=0 an x and n=1 nan x radius of convergence. ∞ Proof. Suppose R > 0 is the radius of convergence of n=0 an xn . To prove the desired result, we show that the following are true. ∞ (1) If |x| < R, then n=1 nan xn−1 converges absolutely.  n−1 (2) If |x| > R, then ∞ diverges. n=1 nan x

182

4. Derivative and Differentiation

To prove (1), consider some x = 0 with |x| < R, and let x0 =

R + |x| . 2

Then |x| < x0 < R and hence |x| < 1. x0 ∞ Our choice of x0 and R shows that the series n=0 an xn0 converges. Hence 0 < r :=

lim an xn0 = 0,

n→∞

and we can therefore find N ∈ N such that |an xn0 | < 1 for every n ≥ N . This implies that for all such n, (4.27)

|nan xn | = |nan (rx0 )n | < nr n .

Since 0 < r < 1 and

   (n + 1)r n+1    = lim r(n + 1) = r, lim  n→∞ n→∞  nr n n ∞ n the ratio test shows that the series n=1 nr converges absolutely. Finally, it follows from (4.27) and the comparison test (Theorem 2.92) that the series ∞ 

nan xn ,

n=1

and hence

∞ 

nan xn−1

n=1

converge absolutely. This completes the proof of (1). To prove (2), note that for every n ∈ N, (4.28)

|an xn | ≤ |nan xn |.

∞ If we assume that n=1 |nan xn−1 | converges for some x with |x| > R, then the ∞ series n=1 an xn converges absolutely by (4.28) and the comparison test. Since |x| > R, the assumption that R is the radius of convergence for the  this contradicts n a x and completes the proof of (2).  series ∞ n=0 n What does Lemma 4.48 say? Lemma 4.48 says that the term-by-term differentiation of a power series does not change its radius of convergence. We are now ready to prove Theorem 4.47.

4.5. Taylor Series: A First Glance

183

Proof of Theorem 4.47. The above lemma shows that the power series on the right-hand side of (4.26) has R as its radius of convergence. Thus, we may define a function g on (−R, R) by g(x) =

∞ 

nan xn−1 .

n=1

To complete the proof, we should show that f  (x) = g(x) for every x ∈ (−R, R). To see this, let x be such that |x| < R and choose some r between |x| and R. Then for every t = x satisfying |t| < r, f (t) − f (x) =

∞ 

an (tn − xn )

n=1

and hence (4.29)

 n  ∞  t − xn f (t) − f (x) n−1 − g(x) = − nx an . t−x t−x n=2

We leave it to the reader to verify that   n   t − xn n(n − 1) n−2 n−1   r (4.30)  ≤ |t − x|  t − x − nx 2 for all n ≥ 2. Then using (4.29) and (4.30), we can write that   ∞    f (t) − f (x)  ≤ |t − x|  − g(x) (4.31) |an |n(n − 1)r n−2 .   t−x n=2 ∞ n−2 has R as its radius of convergence by the above lemma, Since n=2 n(n − 1)an x we find that the series on the right-hand side of (4.31) converges to a nonnegative real number. That f  (x) = g(x) for the given x now follows from (4.31) by letting t tend to x. Example 4.49. Determine the points at which the power series ∞  xn n! n=0

converges. Then differentiate the series using Theorem 4.47. Solution. If x is any nonzero real number, then  n+1  x /(n + 1)!  |x| lim = 0. Lx = lim   = n→∞ n→∞ xn /n! n+1 Thus, the ratio test shows that the given series converges for every x ∈ R. If we denote the sum of the given series by a function f , then Theorem 4.47 shows that for every x ∈ R, f  (x) =

∞ ∞ ∞    xn−1 xn xn−1 = = = f (x). n n! (n − 1)! n=0 n! n=1 n=1

Thus, the function f coincides with its derivative. We will see in the next section that f (x) = ex for every x ∈ R.

184

4. Derivative and Differentiation

As a result of Theorem 4.47 we are now able to answer question (4.e). In fact, assuming that a function f has a power series expansion (4.32)

∞ 

f (x) =

an (x − a)n

n=0

on some interval centered at a, we use consecutive differentiation of f to find the coefficients an as follows. Since (4.33)

f  (x) = a1 + 2a2 (x − a) + 3a3 (x − a)2 + 4a4 (x − a)3 + · · · , a1 = f  (a).

(4.34)

Now, differentiating (4.33) we obtain (4.35)

f  (x) = 2a2 + 6a3 (x − a) + 12a4 (x − a)2 + · · · ,

and hence (4.36)

a2 =

f  (a) . 2

Next, differentiating (4.35), we find that f (3) (x) = 6a3 + 24a4 (x − a) + · · · , and therefore f (3) (a) . 6 Noticing (4.34), (4.36), and (4.37), we see that by continuing the differentiation process, a3 =

(4.37)

f (n) (a) n! for every n ∈ N ∪ {0}. In other words, if we assume that a function f is expanded as a power series centered at a as in (4.32), then the coefficients an are necessarily as in (4.38). So, in this case, we can rewrite (4.32) as (4.38)

(4.39)

an =

f (x) =

∞  f (n) (a) (x − a)n . n! n=0

We call the series on the right-hand side of (4.39) the Taylor series of f centered at a. If a = 0, we call the Taylor series the Maclaurin series of f . Important notice on Taylor series. Our above argument shows that when a function f is expanded in a power series centered at a, then f is differentiable of any order at a and the power series is necessarily the Taylor series of f centered at a. The following example shows that the converse of this assertion is not true. More precisely, a function f may be differentiable of any order at a, but the Taylor series of f centered at a may nevertheless fail to converge to f in any interval centered at a.

4.6. Taylor’s Theorem and the Convergence of Taylor Series

Example 4.50. Consider the function ⎧ −1 ⎨ e x2 f (x) = ⎩ 0

185

x = 0, x = 0.

We leave it to the reader to verify using induction on n that f (n) (0) exists and is equal to 0 for every n ∈ N (see Exercise 4 at the end of this chapter). The Maclaurin series of f therefore converges to the constant function g ≡ 0 on R, which is obviously not equal to f . It is therefore an important task to determine those functions which are the sum of their Taylor series. This is what we do in the next section, as a result of Taylor’s theorem.

4.6. Taylor’s Theorem and the Convergence of Taylor Series Taylor’s theorem, which is the main result of this section, appears naturally as an extended version of the mean value theorem. To motivate it, we think again about what follows from the mean value theorem. Suppose I is an interval that contains a as an interior point. If a function f is differentiable on I and x = a is an element of I, then the mean value theorem gives us some t between a and x such that f (x) − f (a) = f  (t) x−a or, equivalently, f (x) = f (a) + f  (t)(x − a).  Since f (t) represents the slope of the tangent line to the graph of f at (t, f (t)), we may replace it by f  (a), the slope of the tangent line to the graph of f at (a, f (a)), to obtain an approximation of f (x): (4.40)

f (x) ≈ f (a) + f  (a)(x − a).

Note that (4.40) has interesting interpretations as follows. (1) The right-hand side of (4.40) is the tangent line approximation of f (x). To see this, note that the tangent line of the graph of f at (a, f (a)) has an equation T (x) = f (a) + f  (a)(x − a). The approximation (4.40) just says that the value of f at x is almost equal to that of T at x. This follows from our geometric understanding of the tangent line at (a, f (a)): it is the line that passes through (a, f (a)) and is very close to the graph of f at the area of this point. (2) We may interpret (4.40) another way by noticing that the right-hand side of (4.40) is a polynomial of degree 1. Thus, (4.40) gives us an approximation of f by a polynomial of degree 1. The second interpretation of (4.40) allows us to think of a natural question: Is it possible to approximate f by polynomials of degree greater than 1? The following theorem, which is due to Brook Taylor, answers this question.

186

4. Derivative and Differentiation

Theorem 4.51 (Taylor’s Theorem). Suppose f is defined on (c, b), a ∈ (c, b) and n is a fixed natural number. If f is n-times differentiable on (c, b) and x is an arbitrary element of (c, b) other than a, then a point t between a and x can be found such that (4.41)

f (x) =

n−1  k=0

f (k) (a) f (n) (t) (x − a)k + (x − a)n . k! n!

A note on Taylor’s theorem. Before proceeding to the proof of Taylor’s theorem, it is convenient to take a closer look at it. • The equation (4.41) represents f as n−1  k=0

f (k) (a) (x − a)k , k!

which is a polynomial of degree n − 1, plus a remainder f (n) (t) (x − a)n . n! • The case n = 1 of the theorem is just the mean value theorem. This is because when n = 1, the summation in (4.41) reduces to f (a) and the remainder becomes f  (t)(x − a). Note that we may have c = −∞ or b = +∞ in Taylor’s theorem. Proof of Theorem 4.51. Let Pn be the polynomial defined by Pn (y) =

n−1  k=0

f (k) (a) (y − a)k , k!

and let α be such that (4.42)

f (x) = Pn (x) + α(x − a)n .

Define a function g on (c, b) by (4.43)

g(y) = f (y) − Pn (y) − α(y − a)n .

Since Pn is a polynomial of degree n − 1, taking derivatives n-times from the both sides of (4.43), we find that for every y ∈ (c, b), (4.44)

g (n) (y) = f (n) (y) − n!α.

Note that to prove (4.41) we need to show that α = f (n) (t)/n! for some t ∈ (a, x), if we assume x > a of course. Thus, (4.44) tells us that to complete the proof, we just need to show that for some t ∈ (a, x), g (n) (t) = 0. (k)

To see this, notice that f (k) (a) = Pn (a) for every k ∈ {0, . . . , n − 1} and hence (k) g (a) = 0 by (4.43). Since g(x) = 0 by our choice of α, the mean value theorem gives us some t1 ∈ (a, x) such that g  (t1 ) = 0. Now applying the mean value theorem to g  on [a, t1 ], we find t2 ∈ (a, t1 ) such that g  (t2 ) = 0. Having found t1 , . . . , tn−1

4.6. Taylor’s Theorem and the Convergence of Taylor Series

187

in this way, we find by the mean value theorem some tn ∈ (a, tn−1 ) ⊂ (a, x) such that g (n) (tn ) = 0. This completes the proof.  Let us call the polynomial Pn (x) =

n−1  k=0

f (k) (a) (x − a)k k!

the nth Taylor polynomial associated to f at a, and denote by Rn (x) the remainder f (x) − Pn (x) =

f (n) (t) (x − a)n . n!

Formula (4.41), which can be written as (4.45)

f (x) = Pn (x) + Rn (x),

shows that we may approximate f by a polynomial Pn of degree n−1. The following example gives us some insight about this kind of approximation. Example 4.52. Find the first three Taylor polynomials P1 , P2 , and P3 associated to the function f (x) = ex at a = 0. Solution. Since f (n) (0) = e0 = 1 for every n ∈ N, the nth Taylor polynomial associated to f at 0 becomes (4.46)

Pn (x) =

n−1  k=0

xk x2 xn−1 =1+x+ + ···+ . k! 2! (n − 1)!

Consequently, x2 . 2 It is clear that the polynomial P1 gives a very bad approximation of f . The graph of the next two Taylor polynomials is drawn in Figure 9 together with the graph of f. As can be seen in Figure 9, P3 provides a better approximation of f in the area of the point (0, 1) than that obtained from P2 . If you draw the graph of P4 , you see that it approximates f better than P3 in the area of (0, 1). This suggests that P1 (x) = 1, P2 (x) = 1 + x, P3 (x) = 1 + x +

Figure 9. The Taylor polynomials P2 and P3 associated with the exponential function.

188

4. Derivative and Differentiation

the approximation provided by the polynomials Pn becomes better and better as n tends to infinity. If we can prove that this is actually the case, then it follows that lim Rn (x) = 0

(4.47)

n→∞

for every x, and hence by (4.45) and (4.46), (4.48)

x

e = lim Pn (x) = n→∞

∞  xk k=0

k!

.

But is (4.47) actually true? Fortunately, the answer is yes. To see this note that for each n ∈ N, Rn (x) can be written in the form f (n) (t) n x n! for some t between 0 and x. Since f (n) (t) = et , we obtain Rn (x) =

et n x , n! and (4.47) follows easily by letting n tend to infinity in this equation. This is because the sequence {xn /n!} converges to 0 for every x ∈ R. Note that this ∞ n follows from Example 4.49, where we observed that the series n=0 xn! converges for every x ∈ R. As we saw in the above example, that the remainder sequence {Rn (x)} converges to 0 for every x allows us to expand the exponential function in terms of its Taylor series as in (4.48). The following theorem shows that this is true in general. Note that the theorem provides an answer to question (4.f ). Rn (x) =

Theorem 4.53. Suppose f is differentiable of any order on an interval I centered at a. Then, the Taylor series ∞  f (n) (a) (x − a)n (4.49) n! n=0 converges to f on I if and only if (4.50)

lim Rn (x) = 0

n→∞

for every x ∈ I. Proof. For every n ∈ N and every x ∈ I, (4.51)

f (x) = Pn (x) + Rn (x),

where Pn is the nth Taylor polynomial associated to f . Since Pn (x) =

n−1  k=0

f (k) (a) (x − a)k , k!

we find that f (x) is the sum of (4.49) if and only if f (x) = lim Pn (x) n→∞

for every x ∈ I. But (4.51) shows that this holds if and only if (4.50) is true. This completes the proof. 

Notes on Essence and Generalizability

189

Example 4.54. Prove that for every x ∈ R, ∞  (−1)n x2n+1 (4.52) sin x = . (2n + 1)! n=0 Solution. If we let f (x) = sin x, then Example 4.17 shows that f (2n−1) (0) = (−1)n+1 , f (2n) (0) = 0 for every n ∈ N. Therefore, the Maclaurin series of the sine function takes the form ∞  x5 f (n) (0) n x3 x =x− + − ··· , n! 3! 5! n=0 which is the same as the series on the right-hand side of (4.52). Thus, the question asks us to prove that the Maclaurin series of the sine function converges to it on R. To prove this, we use Taylor’s theorem. For every nonzero x ∈ R and every n ∈ N, n−1  (−1)k x2k+1 sin x = + Rn (x), (2k + 1)! k=0

where f (n) (t) n x n! for some t between 0 and x. Since {f (n) (t)} is a bounded sequence, that {Rn (x)} converges to 0 follows from xn = 0, lim n→∞ n! which, as we mentioned before, is valid for every x ∈ R. Hence, the above theorem shows that (4.52) is true for all x ∈ R. Rn (x) =

Notes on Essence and Generalizability In this chapter we introduced derivative as the main concept of differential calculus. We discussed the geometric meaning of derivative and observed that it can be efficiently used in many applied respects. For instance, we saw that derivative’s application in the determination of local extrema leads us to the important mean value theorem, and that the latter is a crucial result with many applications. The concept of derivative is not among what we consider in the second part of the book. It is the dependence of this concept on the algebraic operations that prevents us from generalizing it to the context of generic metric spaces. There are, however, some abstract mathematical theories in which the concept of differentiability is considered. One such theory is the theory of differentiable manifolds which relies on the concept of manifold, a topological space which locally resembles a subset of some Euclidean space Rn . A good resource for the theory of manifolds is [7]. In summary, our brief presentation of differential calculus was aimed at completing our calculus-based knowledge of it.

190

4. Derivative and Differentiation

Exercises 1. Complete the proof of Cauchy’s mean value theorem (Theorem 4.44). 2. Complete the solution of Example 4.36. 3. Complete the proof of Theorem 4.47 by proving (4.30). 4. Complete the solution of Example 4.50 by showing that the given function f is differentiable of any order at 0 and that f (n) (0) = 0 for every n ∈ N. 5. Use the definition of derivative to find the derivative of each function wherever it exists. √ (a) f (x) = n x, where n is a natural number greater than 1. (b) g(x) = x cos x. 6. In each case determine the points at which the given function is not differentiable. (a) f (x) = sin |x|. (b) g(x) = | sin x|. (c) h(x) = |x − 1| + |x|. 7. Let f be a differentiable function. Given any x, compute the following limit: f (x + h2 ) − f (x) . h→0 h sin 2h 8. Suppose f is defined on R and that lim

|f (x) − f (y)| ≤ (x − y)2 for all x, y ∈ R. Prove that f is a constant function. 9. Suppose that f and g are defined in some interval (a, b) and that they are differentiable at some c ∈ (a, b). Show that the function ⎧ ⎨ f (x) a < x ≤ c, h(x) = ⎩ g(x) c < x < b, is differentiable at c if and only if f (c) = g(c) and f  (c) = g  (c). 10. Suppose f is differentiable on some interval I with nonzero derivative f  . Prove that either f  (x) > 0 for every x ∈ I or f  (x) < 0 for all x ∈ I. 11. Find the local extrema of each of the following functions on the set E. (a) f (x) = 1 − (x − 1)2/3 on E = [0, 2]. (b) g(x) = x + 1/x on E = R\{0}. 12. In each case

describe the reason the given function cannot be a derivative. 2x2 + 3 x ∈ Q, (a) f (x) = −1 − x2 x ∈ Qc .

sin x 0 ≤ x ≤ π/2, (b) g(x) = −2 x > π/2. 13. Let f be continuous on [a, b] and differentiable on (a, b). If f  is increasing on (a, b), prove that f (λa + (1 − λ)b) ≤ λf (a) + (1 − λ)f (b)

Exercises

14. 15. 16. 17. 18. 19.

20. 21.

22.

191

for every 0 ≤ λ ≤ 1. Let f and g be differentiable on R, and let f (0) = g(0) and f  (x) ≤ g  (x) for every x ≥ 0. Prove that f (x) ≤ g(x) for every x ≥ 0. Let f and g be differentiable on R. If f  (x) = g(x) and g  (x) = −f (x) for every x ∈ R, prove that the function f 2 + g 2 is constant on R. Suppose f is continuous on [a, b] and differentiable on (a, b). If limx→a f  (x) exists and is equal to α, prove that f is differentiable at a and f  (a) = α. Find a uniformly continuous function f on [0, 1] which is differentiable on (0, 1) but such that f  is not bounded on (0, 1). Compare this with Example 4.34. Verify that the equation 1 + x = e1−x has at least one solution in the interval (0, 1). Suppose f is a continuous function from [a, b] into itself which is differentiable on (a, b). If f  (x) = 1 for every x ∈ (a, b), prove that there exists a unique x0 ∈ [a, b] such that f (x0 ) = x0 . Compare this with Example 3.85. Suppose f is differentiable on (0, +∞). If limx→+∞ f (x) and limx→+∞ f  (x) both exist, prove that limx→+∞ f  (x) = 0. Suppose f is differentiable on (0, +∞) and limx→+∞ f  (x) = 0. If g is defined on (0, +∞) by g(x) = f (x + 2) − f (x), prove that limx→+∞ g(x) = 0. Suppose f and g are differentiable functions such that f  (x)g(x) = f (x)g  (x) for every x ∈ R. Prove that between any two solutions of the equation f (x) = 0 (if any), a solution of the equation g(x) = 0 can be found.

23. Find the value of the given limits. (a) limx→0 (x − sin x)/x3 . (b) limx→1 (ln x)/(x − 1). (c) limx→0 (x − sin x)/(x − tan x). 24. For every x ∈ R, ∞  (−1)n x2n . cos x = (2n)! n=0 Prove this by (a) using Taylor’s theorem and Theorem 4.53, and (b) by differentiating the both sides of (4.52). 25. Find the Maclaurin series of the hyperbolic functions ex − e−x ex + e−x , cosh x = . 2 2 26. Find the sum of the power series ∞  6n . (2n + 1)! n=0 sinh x =

Chapter 5

The Riemann Integral

Among what students learn in calculus, the various kinds of integrals are the most important ones for applications in science and engineering. Integrals can be used to calculate areas and volumes, to find the center of mass of solid objects, etc., and they appear in various forms, definite, improper, multiple, and so on. Our focus in this chapter is on the integral of real-valued functions on closed and bounded intervals, the so-called Riemann integral. To motivate the presentation, let us begin with some questions. (5.a) What are the historical roots of the Riemann integral? (5.b) Is there a bounded function, defined on R, whose restriction to each interval [a, b] is nonintegrable? (5.c) Is the composition of two integrable functions necessarily integrable? (5.d) Are continuous functions the only integrable ones? Which functions are integrable on a given interval [a, b]? (5.e) Why is the fundamental theorem of calculus said to be fundamental ? The above questions will be answered in this chapter. We begin Section 5.1 with the answer to question (5.a). This includes a brief discussion of the ancient method of exhaustion, which serves as a motivation for our definition of the Riemann integral in Section 5.2. The definition of the Riemann integral we present is due to Darboux and is more appropriate—for theoretical purposes—than the calculus-based ones. As a direct consequence of the definition, we show that the restriction of Dirichlet’s function (introduced in Example 3.33) to each interval [a, b] is nonintegrable. This answers question (5.b) in the affirmative. In the remainder of Section 5.2 we prove several basic results concerning the integral and give a useful criterion for integrability. In particular, we show that a continuous function of an integrable function is integrable and that the composition of two integrable functions is not integrable in general. The latter assertion answers question (5.c) in the negative. 193

194

5. The Riemann Integral

The questions posed in (5.d) will be answered in Section 5.3. We will see that continuous functions and monotone functions are integrable. Clearly, the latter class of integrable functions provides us with examples of discontinuous integrable functions. This answers the first question of (5.d) in the negative. The second question is more delicate, and finding its full answer is beyond the scope of this book. As another integrability result in Section 5.3, we show that any function which is discontinuous at a finite number of points is integrable. In Section 5.4, we prove the fundamental theorem of calculus and discuss the reason it is known as a fundamental result. In fact, we observe that the theorem ties the concepts of derivative and integral together via the important concept of antiderivative. The chapter is concluded by a rigorous treatment of the most important techniques of integration one learns in calculus: the change of variables formula and integration by parts.

5.1. Motivation: The Area Problem As mentioned in the previous chapter, one-variable calculus has two important parts: differential calculus and integral calculus. Historically, integral calculus began more than two thousand years ago with the problem of finding the area of a bounded region in the plane, to which we will refer as the area problem. The first serious attempt in this direction was the method of exhaustion, used by the Greeks, which we now briefly discuss. Let A be the area of a bounded region S in the plane. To find A, we inscribe in S a polygon whose enclosed area can be computed easily, and we use this area as an approximation for A. Then we inscribe in S another polygon which gives us a better approximation, and continue this process by inscribing polygons with more and more sides. This process is illustrated in Figure 1 for the case where S is the region enclosed by a circle.

Figure 1. The four first steps of the exhaustion process for the region enclosed by a circle.

To find the area A of the circle, we first inscribe an equilateral triangle in the circle and use its area as an approximation for that of S. Clearly, this approximation is not good enough. So we continue by inscribing a square in the circle, and think of its area as an approximation for A. Although this is a better approximation than the one provided by the triangle, it still seems to be possible to obtain better approximations. Thus we continue by inscribing a regular pentagon, and next a regular hexagon, in the circle, and so on. Clearly, by continuing in this way, we will be able to find approximations of A of desired accuracy.

5.1. Motivation: The Area Problem

195

Figure 2. A region S with complicated boundary curve.

Of course, there is one problem with this approach: if S is enclosed by a more complicated boundary curve, then finding appropriate simpler objects for the exhaustion process can be quite difficult. See Figure 2. Thanks to Ren´e Descartes, who first suggested the use of the orthogonal coordinate system in the plane, we may treat this problem more efficiently. In fact, if we consider the region S of Figure 2 in a plane in which the Cartesian coordinate system is established, then the boundary curve of S can be considered as the graph of two nonnegative bounded functions, depicted on the same interval [a, b] and shown in Figure 3. Here, the area A we are interested in can be found as follows. Find the area Ai of the region which is surrounded from above by the graph of fi , from the left and right by the lines x = a and x = b, and from below by the x-axis, for i = 1, 2. Then A is equal to A1 − A2 . Thus, the problem of finding the area of a bounded region S in the plane reduces to the following simpler one. (RAP) If f is bounded and nonnegative on some interval [a, b], find the area A of the region S which is surrounded from above by the graph of f , from the left and right by the lines x = a and x = b, and from below by the x-axis. (Here RAP is the abbreviation for reduced area problem.) From now on and for the sake of simplicity, we refer to the region S described in (RAP) as the region under the graph of f . To solve (RAP), consider Figure 4 in which the graph of a typical bounded and nonnegative function f is drawn on some interval [a, b]. To find the area A of S, we use a process similar to the exhaustion method, i.e., one which is based on approximations. The idea is to approximate the area of S by the sum of the areas of a number of rectangles. This needs a subset P = {x0 = a, x1 , . . . , xn = b} of [a, b] in which xi−1 < xi holds for each

Figure 3. Dividing the boundary curve into the graph of two functions, f1 and f2 .

196

5. The Riemann Integral

Figure 4. The region S whose area is to be determined.

i ∈ {1, . . . , n}. Once such a set, known as a partition of [a, b], is given, we can find approximations of A as follows. (1) We may construct rectangles of width Δxi := xi − xi−1 and height (5.1)

mi := inf{f (x) : x ∈ [xi−1 , xi ]} for i = 1, . . . , n to find a lower approximation of A, namely, n  L(f, P ) := mi Δxi . i=1

We will also call L(f, P ) the lower Riemann sum of f associated to P . Notice that each term in the summation that defines L(f, P ) is the area of a rectangle. This is illustrated for the function f of Figure 4 and a partition P = {a, x1 , x2 , x3 , b} of [a, b] in Figure 5. Here the four rectangles lie below the graph of f because their height is considered to be the infimum of the values of f on the relevant intervals. This means that L(f, P ), as the sum of the areas of the rectangles, cannot exceed A, that is, L(f, P ) ≤ A. Clearly, this inequality is also true for any partition P of [a, b]. Note that the boundedness of f on [a, b] implies that the infima mi defined in (5.1) exist in R.

Figure 5. The lower approximation of A arises from the partition P = {a, x1 , x2 , x3 , b}.

5.2. The Riemann Integral: Definition and Basic Results

197

(2) It is also possible to construct rectangles of width Δxi and height (5.2)

Mi := sup{f (x) : x ∈ [xi−1 , xi ]} for i = 1, . . . , n. This gives us an upper approximation of A, that is, n  U (f, P ) := Mi Δxi . i=1

We also call U (f, P ) the upper Riemann sum of f associated to P . Here U (f, P ) is the sum of the area of n rectangles that do not lie below the graph of f . This can be seen in Figure 6, where the rectangles that arise from a partition P = {a, x1 , x2 , x3 , b} are drawn for the function f of Figure 4. As can be seen in Figure 6, U (f, P ) ≥ A for this particular partition. Our definition of the Mi ’s shows that this inequality remains true when P is an arbitrary partition of [a, b]. The suprema that define the Mi ’s in (5.2) exist in R because f is assumed to be bounded.

Figure 6. The upper approximation of A arises from the partition P = {a, x1 , x2 , x3 , b}.

In summary, given any partition P of [a, b], we can find approximations L(f, P ) and U (f, P ) of A such that L(f, P ) ≤ A ≤ U (f, P ). It is intuitively evident, at least for the function f of Figure 4, that the supremum of the lower approximations and the infimum of the upper approximations, whenever they exist and are equal, give us the area A we were looking for. Here the supremum and infimum we talked about are taken with respect to P([a, b]), the set of all partitions of [a, b]. So, to solve the area problem in the form of (RAP), we just need to solve the following problem. (IP) Given a bounded and nonnegative function f , determine whether the supremum of {L(f, P ) : P ∈ P([a, b])} and the infimum of {U (f, P ) : P ∈ P([a, b])} exist as real numbers and are equal. (We used IP as the abbreviation of integrability problem.)

5.2. The Riemann Integral: Definition and Basic Results When f is bounded and admits negative values on [a, b], the upper and lower Riemann sums are still meaningful and can be defined similarly. This motivates

198

5. The Riemann Integral

us to solve (IP) for bounded functions that are not necessarily nonnegative. As we will see shortly, this is a rewarding effort. To begin with, we present the main concepts used in (IP) within a formal definition. Definition 5.1. Let f be bounded (and not necessarily nonnegative) on [a, b], and let P = {x0 = a, x1 , . . . , xn = b} be a partition of [a, b]. If for each i ∈ {1, . . . , n}, we let mi = inf{f (x) : x ∈ [xi−1 , xi ]} and Mi = sup{f (x) : x ∈ [xi−1 , xi ]}, then we define the upper  Riemann sum of f associated to  Riemann sum and the lower P by U (f, P ) = ni=1 Mi Δxi and L(f, P ) = ni=1 mi Δxi , respectively. We also define the upper Riemann integral of f on [a, b] by  b f (x) dx = inf{U (f, P ) : P ∈ P([a, b])} a

and the lower Riemann integral of f on [a, b] by  b f (x) dx = sup{L(f, P ) : P ∈ P([a, b])}. a

Note that Definition 5.1 does not claim that the upper and lower integrals exist as real numbers and that they are equal. We will see that the integrals exist as real numbers and may be unequal in general. These observations will lead us to the notion of Riemann integrability, a concept which is essential to our solution of the area problem. The following proposition establishes the existence of the upper and lower integrals. Proposition 5.2. If f is bounded on [a, b], then the upper and lower Riemann integrals of f on [a, b] exist as real numbers. Proof. Let m and M denote the infimum and supremum of the values of f on [a, b], respectively, which exist as real numbers by the assumption that f is bounded. Let P = {x0 = a, x1 , . . . , xn = b} be an arbitrary partition of [a, b], and let mi and Mi denote the infimum and supremum of the values of f on [xi−1 , xi ], respectively. Then for every i ∈ {1, . . . , n}, mi ≥ m and Mi ≤ M by the known properties of infima and suprema. It follows that n n   L(f, P ) = mi Δxi ≥ mΔxi = m(b − a) i=1

and U (f, P ) =

n  i=1

Mi Δxi ≤

i=1 n 

M Δxi = M (b − a).

i=1

Thus, in summary, for every partition P of [a, b], m(b − a) ≤ L(f, P ) ≤ U (f, P ) ≤ M (b − a). This shows that the sets {L(f, P ) : P ∈ P([a, b])} and {U (f, P ) : P ∈ P([a, b])} are bounded subsets of R. The result now follows from the axiom of completeness and its lower bound equivalent.  Next, we turn to the problem of whether the lower and upper Riemann integrals are equal in general. As the following example shows, the answer is negative.

5.2. The Riemann Integral: Definition and Basic Results

199

Example 5.3. Let f be the restriction of Dirichlet’s function to [a, b]: For every x ∈ [a, b], f (x) = 1 if x ∈ Q and f (x) = 0 if x ∈ Qc . Then for every subinterval I of [a, b], the maximum and minimum of f on I are, respectively, 1 and 0. This shows that for any partition P of [a, b], L(f, P ) = 0 and U (f, P ) = b − a. Thus, b b f (x) dx = 0 while a f (x) dx = b − a > 0. a It is therefore convenient to pay more attention to those functions whose lower and upper integrals are equal. Definition 5.4. Let f be bounded on [a, b]. We say that f is Riemann integrable (or just integrable) on [a, b], and we write f ∈ R([a, b]) if 



b

f (x) dx = a

b

f (x) dx. a

In this case we denote the common value of these integrals by it the Riemann integral of f on [a, b].

b a

f (x) dx and call

It can be shown that this definition of integrability is the same as those presented in calculus, although we will not enter into the details of this claim. With this definition, we find that the restriction of Dirichlet’s function to every interval [a, b] is nonintegrable. This answers question (5.b) in the affirmative. What if f is unbounded? When f is not bounded on [a, b], the suprema and/or infima we used in the definition of upper and lower sums may not exist. As a concrete example, consider ⎧ 1 ⎨ x x = 0, f (x) = ⎩ 0 x = 0, as a function on [0, 1]. Given an arbitrary partition P of [0, 1], let α be the least positive element of P . Then the values of f on [0, α] form a set which is unbounded from above, and hence the upper sum U (f, P ) cannot exist as a real number. Throughout this chapter when we speak of an integrable function f , it is assumed that f is bounded.

Example 5.5. If f is constant on [a, b], then f ∈ R([a, b]). In fact, if f (x) = c b for every x ∈ [a, b], then a f (x) dx = c(b − a). To see this, let P = {x0 = a, x1 , . . . , xn = b} be any partition of [a, b]. Then for each i, the supremum and infimum of the values of f on [xi−1 , xi ] are equal to c. Hence U (f, P ) = L(f, P ) = n c Δx i = c(b − a). This implies that the upper and lower integrals on [a, b] are i=1 both equal to c(b − a) and establishes what we claimed.

200

5. The Riemann Integral

Figure 7. When c < 0, the area still can be found using an integral.

Geometric interpretation of Example 5.5 When f (x) = c for every x ∈ [a, b] and c > 0, the region that lies under the graph of f is the rectangle S in the left-hand side of Figure 7. It is clear that the area of S is c(b − a), which we obtained as the Riemann integral of f on b [a, b]. When c < 0, a similar interpretation exists. In this case a |f (x)| dx, which is equal to −c(b − a), gives us the area of the shaded region in the right-hand side of Figure 7. Example 5.6. Define a function f on [0, 1] by f (x) = n1 when x = m n is a rational number with m and n relatively prime, and f (x) = 0 if x ∈ Qc . Prove that f is 1 integrable on [0, 1] and that 0 f (x) dx = 0. Solution. It is clear that for every partition P of [0, 1], L(f, P ) = 0. Hence f (x) dx = 0. To prove the desired result, it is enough to show that for every ε > 0, a partition Q of [0, 1] can be found such that U (f, Q) < ε. Given ε > 0, choose N by the Archimedean property such that 1/N < ε/2. Then the set T = {r/s : 1 ≤ r < s < N, (r, s) = 1} is finite, say with m distinct elements. For n ≥ 2 in N, define a partition Pn = {0, 1/n, . . . , (n − 1)/n, 1} of [0, 1] and consider two subsets of {1, . . . , n}:

! "

! " i−1 i i−1 i , , A= i: ∩T =∅ , B = i: ∩ T = ∅ . n n n n $ # i If i ∈ A and x = rs ∈ i−1 n , n with (r, s) = 1, then by our definition of A, s ≥ N . This implies that f (x) = 1s ≤ N1 . Since x was arbitrary, we find that Mi ≤ N1 . Thus    n   i i−1 i−1 ε 1  i 1 − − < . Mi (5.3) ≤ = n n N i=1 n n N 2 1 0

i∈A

On the other hand, since Mi ≤ 1 for every i ∈ {1, . . . , n},      i i−1 1 − Mi (5.4) ≤m . n n n i∈B

5.2. The Riemann Integral: Definition and Basic Results

201

Therefore, if we choose n > (2m)/ε, then Q = Pn is our desired partition. This is because by (5.3) and (5.4),       i i ε ε i−1 i−1 + < + = ε. − − U (f, Q) = Mi Mi n n n n 2 2 i∈A

i∈B

The integral of the function f of the above example on [0, 1] gives us the area of the region that lies under its graph. That the area is zero follows from the fact that this is not a region in the usual sense: it is composed of a countable number of line segments, and such a collection of lines can never enclose a continuous region, that is, one of positive area. Return to the area problem. It is now time to pause and review what we have done so far. We began with the area problem, we reduced it to the simpler problem (RAP), and from there we got the integrability problem (IP), and accordingly, to the definition of integrable functions. Based on our discussions, we found that when a nonnegative function is integrable on [a, b], the area of the region b under the graph of f is the same as a f (x) dx. So, to solve the area problem completely, we need to know (1) which functions are integrable on a given interval [a, b]; and (2) that when a function f is integrable on [a, b], how the Riemann integral of f on [a, b] can be computed. Items (1) and (2) above will be our main concern in the remainder of this chapter. When f admits negative values, the upper and lower integrals of f may be negative. In this case, even when the upper and lower integrals are equal, their common value may not be interpreted as the area of the region that is surrounded by the graph of f and the x axis. This is because, unlike the integrals, areas can never be represented by negative numbers. Of course, as we observed in the geometric interpretation of Example 5.5, the Riemann integral can be used to find the area enclosed by the graph of functions that admit negative values. We will get back to this point in the next section. More on Partitions and the Riemann Sums. Before we can proceed to the most important parts of our theory, it is necessary to enhance our knowledge of partitions and the Riemann sums. We begin with the notion of norm for partitions. Definition 5.7. If P = {x0 = a, x1 , . . . , xn = b} is a partition of [a, b], we define the norm of P , denoted by ||P ||, as the maximum of the values Δxi for i = 1, . . . , n. When we use P to construct rectangles for our approximation process, ||P || represents the maximum of the widths of the resulting rectangles. Lemma 5.8. Given any interval [a, b] and an arbitrary δ > 0, a partition P of [a, b] can be found such that ||P || < δ. Proof. Let m > 1 be a natural number for which δ/m < b − a. By Example 1.45, δ < b − a} has a greatest element which we denote by k. Define the set {l ∈ N : l m

202

5. The Riemann Integral

δ a partition P by x0 = a, xi = a + i m for i = 1, . . . , k and xk+1 = b. Then, for each δ i ∈ {1, . . . , k}, Δxi = m < δ and   δ xk+1 − xk = b − a + k m   δ δ δ < δ. ≤ a + (k + 1) − a + k = m m m

This proves that ||P || < δ.



What does Lemma 5.8 say? Lemma 5.8 says that given δ > 0, every interval [a, b] can be divided into a number of subintervals, each of length less than δ. In our current context it says that, given any δ > 0, we may approximate the area of the region that lies under the graph of a function f by that of a number of rectangles, each of which has width less than δ. Next we turn to refinements of partitions, which allow us to refine our approximations of areas. Definition 5.9. Let P and Q be partitions of [a, b]. • We say that Q is a refinement of P if P ⊂ Q. • If S is a refinement of both P and Q, then we say that S is a common refinement of them. In the context of the above definition, Q is said to be a refinement of P because it allows us to construct rectangles which are finer than the ones obtained from P . This can be seen by comparing the rectangles of Figure 5 with those of Figure 8. Lemma 5.10. Let f be bounded on [a, b]. (1) If P and Q are partitions of [a, b] and Q is a refinement of P , then L(f, P ) ≤ L(f, Q), U (f, Q) ≤ U (f, P ). (2)

b a

f (x) dx ≤

b a

f (x) dx.

Proof. (1) We only prove the inequality that concerns the upper Riemann sums; the verification of the other inequality is left to the reader as an exercise. It is enough to prove that the desired inequality holds when Q has only one element more than P , as the general case then follows by induction on n, the number of elements in Q\P . Assume that P = {x0 = a, x1 , . . . , xn = b} is a partition of [a, b] and that Q is a partition of [a, b] which is obtained from P by adjoining to it a new point y. Let 1 ≤ j ≤ n be such that y ∈ (xj−1 , xj ). Let Mj1 = sup{f (x) : x ∈ [xj−1 , y]} and Mj2 = sup{f (x) : x ∈ [y, xj ]}. Then Mji ≤ Mj = sup{f (x) : x ∈ [xj−1 , xj ]} for i = 1, 2, and hence U (f, Q) − U (f, P ) = Mj1 (y − xj−1 ) + Mj2 (xj − y) − Mj (xj − xj−1 ) ≤ 0, from which the desired result follows.

5.2. The Riemann Integral: Definition and Basic Results

203

(2) Let P and Q be arbitrary partitions of [a, b], and let S be a common refinement of them. Then by part (1), L(f, P ) ≤ L(f, S) ≤ U (f, S) ≤ U (f, Q). Thus, for arbitrary partitions P and Q of [a, b], (5.5)

L(f, P ) ≤ U (f, Q).

Fixing Q in (5.5) and taking the supremum over all P ∈ P([a, b]), we find that  b (5.6) f (x) dx ≤ U (f, Q). a

Now the desired inequality follows by taking infimum over all Q ∈ P([a, b]) in (5.6).  What does Lemma 5.10 say? • The inequality L(f, P ) ≤ L(f, Q) claimed in Lemma 5.10(1) says that by refining a partition P, we can obtain lower Riemann sums which are closer than L(f, P ) to the lower Riemann integral. When f is nonnegative, in which case the lower Riemann integral of f on [a, b] may be used as the area A of the region under the graph of f , this shows that by refining P we may obtain better lower approximations for A. This is because by refining P we obtain finer rectangles and this results in a reduction of the error our approximation may have. This can be seen by comparing Figure 5 with Figure 8. It is evident that in Figure 8 we have a better approximation of the area. Similarly, you can discuss what follows from the inequality U (f, P ) ≥ U (f, Q) for the upper Riemann sums and integrals. • Inequality (5.5), which was essential in the proof of Lemma 5.10(2), reveals an important fact in the case where f is nonnegative. It says that a lower approximation of the area can never exceed an upper approximation, a fact which is geometrically evident.

Figure 8. The lower approximation of A arises from the partition Q = {a, x1 , x2 , x3 , x4 , b}.

204

5. The Riemann Integral

An Important Integrability Criterion. Using what we obtained so far, we can prove the following important necessary and sufficient condition for integrability. Theorem 5.11. Let f be a bounded function defined on [a, b]. Then f is integrable on [a, b] if and only if the following statement is true. (IC) For every ε > 0 there exists a partition P of [a, b] such that U (f, P ) − L(f, P ) < ε.

(5.7)

(Here, IC is the abbreviation for integrability criterion.) Proof of Theorem 5.11. First assume that f ∈ R([a, b]), and let ε > 0 be given. To find a partition P for which (5.7) holds, note that, since the integral of f on [a, b] is the supremum of lower Riemann sums, a partition P1 of [a, b] can be found such that  b ε f (x) dx − < L(f, P1 ). (5.8) 2 a Similarly, since the integral of f on [a, b] is the infimum of upper Riemann sums, a partition P2 of [a, b] can be found such that  b ε (5.9) U (f, P2 ) < f (x) dx + . 2 a Now, if P is a common refinement of P1 and P2 , then by Lemma 5.10(1), (5.8), and (5.9),  b  b ε ε f (x) dx − < L(f, P1 ) ≤ L(f, P ) ≤ U (f, P ) ≤ U (f, P2 ) < f (x) dx + . 2 2 a a This shows that the partition P satisfies (5.7):     b b ε ε U (f, P ) − L(f, P ) < − = ε. f (x) dx + f (x) dx − 2 2 a a Now suppose that (IC) is true. Given ε > 0, find a partition P of [a, b] for which (5.7) is true. Then  b  b 0≤ f (x) dx − f (x) dx ≤ U (f, P ) − L(f, P ) < ε. a

a

Since ε > 0 was arbitrary, this implies that the lower and upper integrals of f on [a, b] are equal.  What does Theorem 5.11 say? When f is nonnegative, in which case the integral may be considered to be an area, Theorem 5.11 has an interesting interpretation. It says, in this case, that the infimum of the upper approximations of the area is equal to the supremum of its lower approximations if and only if the following statement is true. It is always possible to find a partition P of [a, b] whose associated upper and lower approximations of the area are as close to each other as we wished.

5.2. The Riemann Integral: Definition and Basic Results

205

Example 5.12. Prove that f (x) = sin x is integrable on [0, π/2] using each of the following assumptions. (1) The sine function is continuous on [0, π/2]. (2) The sine function is increasing on [0, π/2]. Solution. We show in each case that for every ε > 0, a partition P of [0, π/2] can be found such that U (f, P ) − L(f, P ) < ε. To see this, we construct a sequence {Pn } of partitions of [0, π/2] as follows. For each n let

(n − 1)π π π ,..., , . Pn = 0, 2n 2n 2 Consecutive points in Pn are of distance π/(2n). This allows us to make the norm of Pn ’s as small as we wish by choosing n sufficiently large. (1) The continuity of f on [0, π/2] shows, in view of Theorem 3.94, that f is uniformly continuous on this interval. Thus, for the given ε > 0, we find δ > 0 such that x, y ∈ [0, π/2] and |y − x| < δ imply |f (y) − f (x)| < (2/π)ε. Now use the Archimedean property of real numbers to find n0 ∈ N such that ||Pn0 || = 2nπ0 < δ. Then,    π 2 n0 ε = ε. U (f, Pn0 ) − L(f, Pn0 ) < 2n0 π (2) Using the assumption that f is increasing on [0, π/2], we find that for every n,

π π π (5.10) U (f, Pn ) − L(f, Pn ) = . sin − sin 0 = 2n 2 2n Now, when ε > 0 is given and n1 ∈ N is chosen by the Archimedean property to satisfy 1/n1 < (2ε)/π, then (5.10) tells us that U (f, Pn1 ) − L(f, Pn1 ) < ε. π

The Riemann integral 02 sin x dx whose existence is established in the above example gives us the area of the shaded region of Figure 9. We will calculate this integral easily as a result of the fundamental theorem of calculus in Section 5.4.

Figure 9. The area of the shaded region is



π 2

0

sin x dx.

206

5. The Riemann Integral

Exercise 5.13. Use each of the following assumptions to show that the cosine function is integrable on [0, π/2]. (1) The cosine function is continuous on [0, π/2]. (2) The cosine function is decreasing on [0, π/2]. Then, interpret

π 2

cos x dx as an area.

0

Some Important Properties of the Integral. Having developed the basics of our integrability theory, it is now time to know more about the integral and its properties. The properties of the integral we are going to prove concern the way integral affects the algebraic operations and the order relation of R. Our first result in this direction can be described by saying that the integral respects the linear operations of addition and scalar multiplication, or more briefly that it is linear. Theorem 5.14 (Linearity of the Riemann Integral). (1) If k is any real number and f is integrable on [a, b], then the function kf is also integrable on [a, b] and  b  b kf (x) dx = k f (x) dx. a

a

(2) If f and g are integrable on [a, b], then the function f + g is integrable on [a, b] and  b  b  b (5.11) (f (x) + g(x)) dx = f (x) dx + g(x) dx. a

a

a

Proof. (1) Let h := kf . If P = {x0 , x1 , . . . , xn } is any partition of [a, b], let mi and Mi be the infimum and supremum of the values of f on [xi−1 , xi ], and consider si and Si as those of h on this interval, respectively. If k = 0, then h is identically zero on [a, b] and the desired result is obviously true. When k = 0, we consider two cases. (a) k > 0. In this case si = kmi and Si = kMi for each i, showing that L(h, P ) = kL(f, P ) and U (h, P ) = kU (f, P ). Thus we find that  b  b  b h(x) dx = h(x) dx = k f (x) dx, a

a

a

which is our desired result. (b) k < 0. Then si = kMi and Si = kmi for each i, from which it follows that L(h, P ) = kU (f, P ), U (h, P ) = kL(f, P ). This shows that  b h(x) dx = sup{L(h, P ) : P ∈ P([a, b])} a

= sup{kU (f, P ) : P ∈ P([a, b])} = k inf{U (f, P ) : P ∈ P([a, b])}  b f (x) dx. = k a

5.2. The Riemann Integral: Definition and Basic Results

Since a similar reasoning shows that   b h(x) dx = k a

207

b

f (x) dx,

a

the desired result also follows in this case. (2) Let h := f + g. If I is a subinterval of [a, b], then for every x ∈ I, inf{f (y) : y ∈ I}+inf{g(y) : y ∈ I} ≤ h(x) ≤ sup{f (y) : y ∈ I}+sup{g(y) : y ∈ I}. This shows that for every partition P of [a, b], L(f, P ) + L(g, P ) ≤ L(h, P ) ≤ U (h, P ) ≤ U (f, P ) + U (g, P ). Given ε > 0, use Theorem 5.11 to find partitions Q and S of [a, b] such that U (f, Q) − L(f, Q) < 2ε and U (g, S) − L(g, S) < 2ε . Now, if T is a common refinement of Q and S, then ε ε U (h, T ) − L(h, T ) ≤ (U (f, T ) + U (g, T )) − (L(f, T ) + L(g, T )) < + = ε. 2 2 This proves that h is integrable on [a, b]. To prove (5.11), let T be as above and note that then  b ε ε f (x) dx + , U (f, T ) < L(f, T ) + ≤ 2 2 a b a

b

f (x) dx + 2ε . Similarly, U (g, T ) < a g(x) dx + 2ε . Thus  b  b  b h(x) dx ≤ U (h, T ) ≤ U (f, T ) + U (g, T ) < f (x) dx + g(x) dx + ε. that is, U (f, T )
0, a partition P of [a, b] can be found such that U (f, P ) < ε. Once b b this is proved, we find that a f (x) dx = 0. It is also clear that a f (x) dx = 0. The desired result then follows from the last two equalities. Consider an arbitrary ε > 0. • If y = a or y = b, let δ = 12 min{ 2ε , b − a}. If y = a, let P = {a, a + δ, b}, and when y = b, let P = {a, b − δ, b}. Then in each of the cases U (f, P ) = δ < ε 2 < ε. • If a < y < b, let δ = Then

1 2

min{ 2ε , b−y, y−a} and consider P = {a, y−δ, y, y+δ, b}. U (f, P ) = 2δ ≤

ε < ε. 2

A Consequence of Example 5.17. Example 5.17 shows that for a bounded function f , the equality 0 does not imply that f is identically zero on [a, b].

b a

f (x) dx =

5.2. The Riemann Integral: Definition and Basic Results

209

Example 5.18. If f is integrable on [a, b] and we modify the value of f at some point y ∈ [a, b], then the resulting function g is also integrable on [a, b] and  b  b g(x) dx = f (x) dx. a

a

Solution. By our assumptions, the function h := g − f satisfies h(y) = 0 and h(x) = 0 for every x = y in [a, b]. Thus Example 5.17 shows that h is integrable b and a h(x) dx = 0. Now, g = h + f is integrable and  b  b  b  b g(x) dx = h(x) dx + f (x) dx = f (x) dx, a

a

a

a

by Theorem 5.14 (2). A Consequence of Example 5.18. Using Example 5.18 and an inductive argument, we can prove the following statement. If f is integrable on [a, b] and g is obtained from f by modifying the values of f at a finite number of points, then g is also integrable and  b  b g(x) dx = f (x) dx. a

a

In other words, modifying the values of a function in a finite number of points does not affect its integrability situation.

Integrability on Subintervals. One important property of the Riemann integral is that the integrability on an interval [a, b] implies integrability on each of its subintervals. To prove this important result, we first establish a simpler assertion. Theorem 5.19. If f is integrable on [a, b] and a < c < b, then f ∈ R([a, c]), f ∈ R([c, b]), and  b  c  b (5.15) f (x) dx = f (x) dx + f (x) dx. a

a

c

Proof. We only prove that f ∈ R([a, c]). The proof that f ∈ R([c, b]) is similar and is left to the reader. Since f ∈ R([a, b]), given ε > 0 a partition P = {x0 = a, x1 , . . . , xn = b} of [a, b] can be found such that (5.16)

U (f, P ) − L(f, P ) < ε.

That a < c < b gives us a unique j ∈ {1, . . . , n} such that c ∈ (xj−1 , xj ]. Now Pc = {x0 = a, . . . , xj−1 , c} is a partition of [a, c]. If c < xj , let Q = {x0 = a, . . . , xj−1 , c, xj , . . . , xn = b}, and when c = xj , let Q = {x0 = a, . . . , xj−1 , c, y, xj+1 , . . . , xn = b}, where y is an arbitrary element of [a, b] between c and xj+1 . Then, in either case, Q is a refinement of P and hence by (5.16), U (f, Q) − L(f, Q) < ε. Since Pc ⊂ Q,

210

5. The Riemann Integral

it follows from this inequality that U (f, Pc ) − L(f, Pc ) < ε. Now Theorem 5.11 tells us that f ∈ R([a, c]). To prove (5.15), note that for an arbitrary partition S of [a, b], we can construct partitions S1 and S2 of [a, c] and [c, b] whose underlying sets are (S ∪ {c}) ∩ [a, c] and (S ∪ {c}) ∩ [c, b], respectively. Then





c

U (f, S) ≥ U (f, S1 ) + U (f, S2 ) ≥

(5.17)

f (x) dx

a

and

 L(f, S) ≤ L(f, S1 ) + L(f, S2 ) ≤

(5.18)

b

f (x) dx + c



c

f (x) dx + a

b

f (x) dx. c

Taking the infimum in (5.17) and the supremum in (5.18) over all S ∈ P([a, b]), we obtain  b  b  c  b  c f (x) dx + f (x) dx ≤ f (x) dx ≤ f (x) dx + f (x) dx, a

c

a

a

c



which gives (5.15). Geometric interpretation of Theorem 5.19. When f ∈ R([a, b]) is nonnegative and c ∈ (a, b), equation (5.15) tells us the following geometrically evident statement. The area of the region under the graph of f from a to b is the sum of those from a to c and from c to b. This is illustrated in Figure 10.

Figure 10. The region is divided into two parts.

Example 5.20. If f is integrable on [0, 1] and for each n ∈ N we let αn = 1/n f (x) dx, prove that {αn } converges to 0. 0

5.2. The Riemann Integral: Definition and Basic Results

211

Solution. Since f is bounded on [0, 1], we find real numbers m and M such that for every x ∈ [0, 1], m ≤ f (x) ≤ M . Then, by Theorem 5.15 and Example 5.5,    1    n1 n 1 1 M m =m −0 = −0 = . m dx ≤ αn ≤ M dx = M n n n n 0 0 The desired result now follows by letting n tend to infinity and applying the Squeeze Theorem (Theorem 2.49). Using Theorem 5.19 we can prove the main result of this subsection. Theorem 5.21. If f is integrable on [a, b] and [c, d] is a subinterval of [a, b], then f is also integrable on [c, d]. If we assume in addition that f is nonnegative on [a, b], then  b  d (5.19) f (x) dx ≥ f (x) dx. a

c

Proof. If c = a or d = b, then the integrability of f on [c, d] follows directly from Theorem 5.19. Otherwise, a < c < d < b. Thus Theorem 5.19 tells us that f is integrable on [a, c] and [c, b]. Now, applying Theorem 5.19 to f on [c, b], we deduce that f is integrable on [c, d] and [d, b]. To prove (5.19), we note that when c = a and d < b, Theorem 5.19 shows that  b  d  b f (x) dx = f (x) dx + f (x) dx. a

c

d

b d

Since f (x) dx ≥ 0 by Theorem 5.15, this gives (5.19) in this case. A similar reasoning establishes (5.19) when c > a and d = b. If a < c < d < b, then using Theorem 5.19 twice, we obtain  c  b  c  d  b  b f (x) dx = f (x) dx + f (x) dx = f (x) dx + f (x) dx + f (x) dx. a

a

c c a

f (x) dx and Since the integrals (5.19) also follows in this case.

a b d

c

d

f (x) dx are nonnegative by Theorem 5.15, 

What does (5.19) say? Since f is assumed to be nonnegative in (5.19), the inequality says that the area of the region that lies under the graph of f between the lines x = a and x = b is never less than the area of the region that lies under the graph of f between the lines x = c and x = d with a ≤ c ≤ d ≤ b. This is illustrated in Figure 11. Exercise 5.22. Show by means of an example that when f admits negative values on [a, b], (5.19) may not be true. The converse of Theorem 5.19 is also true. More precisely, the integrability of a function f on [a, b] follows from that of f on the subintervals [a, c] and [c, b] for some c ∈ (a, b).

212

5. The Riemann Integral

Figure 11. The area of the dark region is less than that of the whole region.

Theorem 5.23. Suppose that f is bounded on [a, b] and that for some c ∈ (a, b), f is integrable on [a, c] and [c, b]. Then f is integrable on [a, b] and (5.15) is true. Proof. To prove f ∈ R([a, b]), we show that given ε > 0, a partition P of [a, b] can be found such that U (f, P ) − L(f, P ) < ε. Assume that ε > 0 is given. Since f ∈ R([a, c]), we find a partition P1 of [a, c] such that ε (5.20) U (f, P1 ) − L(f, P1 ) < . 2 Similarly, by the assumption f ∈ R([c, b]), we obtain a partition P2 of [c, b] such that ε (5.21) U (f, P2 ) − L(f, P2 ) < . 2 Now, P = P1 ∪ P2 will be our desired partition of [a, b]. This follows from the identities U (f, P ) = U (f, P1 ) + U (f, P2 ), L(f, P ) = L(f, P1 ) + L(f, P2 ), and the inequalities (5.20) and (5.21). Next, having proved that f ∈ R([a, b]), (5.15) follows from Theorem 5.19.  Composition and Integrability. It is now time to answer question (5.c) of whether the composition of two integrable functions is integrable. That the answer is negative follows by considering ⎧ ⎧ 1 ⎨ 1 x = 0, ⎨ n x= m n , (m, n) = 1, g(x) = and f (x) = ⎩ ⎩ 0 x ∈ Qc , 0 x = 0, as functions on [0, 1]. Recall that g is integrable by the consequence of Example 5.18 and f is integrable by Example 5.6. But g ◦ f is the restriction of Dirichlet’s function to [0, 1] which, as we saw in Example 5.3, is not integrable on [0, 1]. The following theorem shows, however, that when g is continuous and f is integrable, the composite function g ◦ f is integrable.

5.2. The Riemann Integral: Definition and Basic Results

213

Theorem 5.24. Let f be integrable on [a, b], and let m ≤ f (x) ≤ M for every x ∈ [a, b]. If g is continuous on [m, M ], then the composite function g ◦ f is integrable on [a, b]. Proof. Let h := g ◦ f , and assume that ε > 0 is given. To establish our claim, we find a partition P of [a, b] such that U (h, P ) − L(h, P ) < ε. First, we use the uniform continuity of g on [m, M ] to find 0 < δ < ε such that x, y ∈ [m, M ] and |x − y| < δ imply |g(x) − g(y)| < ε. By the assumption that f is integrable on [a, b], we find a partition P = {x0 , x1 , . . . , xn } of [a, b] such that U (f, P ) − L(f, P ) < δ 2 .

(5.22)

We claim that for this partition, the subtraction U (h, P ) − L(h, P ) is less than a constant multiple of ε. It is clear that once this is proved, the integrability of h on [a, b] follows due to the fact that ε is arbitrary. To prove the claim, we divide the set {1, . . . , n} into two disjoint sets A and B. To describe this division, let us denote by Mi and mi the supremum and infimum of f on [xi−1 , xi ] and consider Mi and mi as those of h on this interval. Now for each i ∈ {1, . . . , n}, we have two cases as follows. (1) Mi − mi < δ, in which case we say that i ∈ A. Then for all s, t ∈ [xi−1 , xi ], |f (s) − f (t)| ≤ Mi − mi < δ and hence, by our choice of δ, |h(s) − h(t)| = |g(f (s)) − g(f (t))| < ε. Using the known properties of the supremum and infimum, it then follows that Mi − mi ≤ ε. (2) Mi − mi ≥ δ, in which case we say that i ∈ B. Then (5.22) shows that    Δxi = δΔxi ≤ (Mi − mi )Δxi ≤ U (f, P ) − L(f, P ) < δ 2 , δ i∈B

that is,

 i∈B

i∈B

i∈B

Δxi < δ.

Now letting K = sup{|g(x)| : x ∈ [m, M ]}, we see that Mi − mi ≤ 2K, so that   (Mi − mi )Δxi + (Mi − mi )Δxi U (h, P ) − L(h, P ) = i∈A

i∈B

≤ ε(b − a) + 2Kδ < ε((b − a) + 2K). What does Theorem 5.24 say? Theorem 5.24 says that a continuous function of an integrable function is integrable. Example 5.25. The function ⎧ ⎨ f (x) = ⎩

2 n

x=

m n

0

x ∈ Qc ,

with (m, n) = 1,



214

5. The Riemann Integral

is integrable on [0, 1] by Example 5.6 and Theorem 5.14(1). Also f maps [0, 1] into [0, 2]. Since the sine function is continuous on [0, 2], the above theorem shows that the function ⎧ ⎨ sin n2 x = m n with (m, n) = 1, h(x) = ⎩ 0 x ∈ Qc , is integrable on [0, 1].

5.3. Some Integrability Theorems We now turn to our first main task in this chapter: determining those properties of functions which are sufficient for integrability. We observed in Example 5.12(1) that the continuity of the sine function on [0, π/2] can be used to prove that it is integrable. We also saw in Theorem 5.24 that a continuous function of an integrable function is integrable. These observations lead us to guess that continuous functions are integrable. That this is indeed true follows from the following theorem. Theorem 5.26. If f is continuous on [a, b], then f ∈ R([a, b]). Proof. Let f be continuous on [a, b]. To prove that f is integrable on [a, b], we show that once ε > 0 is given, a partition P of [a, b] can be found such that U (f, P ) − L(f, P ) < ε. Since f is continuous on [a, b], it is uniformly continuous on this interval. We can find δ > 0, corresponding to the given ε, such that x, y ∈ [a, b] and |x − y| < δ ε . Now, our desired partition P can be one whose norm imply |f (x) − f (y)| < b−a is not greater than δ. More precisely, if P = {x0 = a, x1 , . . . , xn−1 , xn = b} is any partition of [a, b], then the continuity of f on [a, b], in view of the extreme value theorem, gives us ui , vi ∈ [xi−1 , xi ] such that mi = f (ui ) and Mi = f (vi ), for each i. So if ||P || < δ, then |ui − vi | ≤ xi − xi−1 < δ and hence by our choice of δ, U (f, P ) − L(f, P ) =

n 

ε  Δxi = ε. b − a i=1 n

(f (vi ) − f (ui ))Δxi
0. Since f is continuous at x0 , δ > 0 can be found such that x ∈ [a, b] and |x − x0 | < δ imply |f (x) − M | < ε. Since f has nonnegative values, this shows that when a ≤ x0 < b 



b

x0 +δ

f (x) dx ≥ n

a

(M − ε)n dx = (M − ε)n δ,

x0

and when a < x0 ≤ b 



b

f n (x) dx ≥

x0

x0 −δ

a

(M − ε)n dx = (M − ε)n δ.

Note that when a ≤ x0 < b, we choose δ to be so small that x0 + δ < b, and when a < x0 ≤ b, we choose δ in such a way that x0 − δ > a. Therefore, in each case 

 n1

b

1

f n (x) dx

≥ (M − ε)δ n .



 n1

a

Hence f n (x) dx

lim inf n→∞

b

≥ (M − ε),

a

and since ε > 0 was arbitrary,  (5.25)

n

lim inf n→∞

 n1

b

f (x) dx

≥ M.

a

The desired result now follows from (5.24) and (5.25).

216

5. The Riemann Integral

Example 5.28. Let f be continuous and nonnegative on [a, b]. If f (t0 ) > 0 for b some t0 ∈ (a, b), prove that a f (x) dx > 0. Solution. We use the assumptions to find a subinterval [x, y] of (a, b) on which f has positive values. Once such a subinterval is found, in view of the nonnegativity of f , we see that for the partition Q := {a, x, y, b} of [a, b],  b f (x) dx ≥ L(f, Q) ≥ (min{f (t) : t ∈ [x, y]}) (y − x) > 0. a

To find such points x and y in (a, b), we first note that by Proposition 3.68 there exists δ > 0 such that z ∈ [a, b] and |z − t0 | < δ imply f (z) > 0. So, if we let δ1 = (1/2) min{δ, t0 − a}, then x := t0 − δ1 and y := t0 are the points we were looking for. This is because a = t0 − (t0 − a) < x < y < b and for every z ∈ [x, y], |z − t0 | = y − z ≤ y − x = δ1 < δ, so that f (z) > 0. A note on Example 5.28. In view of Example 5.17, if we define f (x) = 0 for x = 1/2 and f (1/2) = 1, 1 then f is integrable on [0, 1] and 0 f (x) dx = 0. This shows that the conclusion of Example 5.28 is not true for discontinuous functions. The following corollary of Theorem 5.24 can be used effectively in what follows. Corollary 5.29. The following statements are true. (1) If f and g are integrable on [a, b], then the function f g is also. (2) If f is integrable on [a, b], then |f | is also and     b  b   (5.26) f (x) dx ≤ |f (x)| dx.   a  a Proof. (1) By Theorem 5.14(2), the functions f + g and f − g are integrable on [a, b]. Theorem 5.24 shows that the functions (f + g)2 and (f − g)2 are integrable on [a, b] as the composition of the continuous function h(x) = x2 with f + g and f − g, respectively. The integrability of f g on [a, b] now follows from the identity  1 (f + g)2 − (f − g)2 . fg ≡ 4 (2) That |f | ∈ R([a, b]) follows from the identity |f | = g ◦ f in which g(x) = |x|, the continuity of g, and Theorem 5.24. As for (5.26) choose c from the set b {−1, 1} such that c a f (x) dx ≥ 0. Then cf (x) ≤ |f (x)| for every x ∈ [a, b] and hence by Theorem 5.14(1),    b  b  b  b    f (x) dx = c f (x) dx = cf (x) dx ≤ |f (x)| dx.    a  a a a Example 5.30. Let f be continuous on [0, 1], and let f (0) = 0 and |f (x)| ≤ 1 for 1 every x ∈ [0, 1]. Prove that | 0 f (x) dx| < 1.

5.3. Some Integrability Theorems

217

Solution. Since f is continuous from the right at 0, we find 0 < δ < 1 such that for every x ∈ (0, δ), |f (x) − f (0)| < 12 . Now for the partition P = {0, δ, 1} of [0, 1],  1 δ δ |f (x)| dx ≤ U (|f |, P ) ≤ + (1 − δ) = 1 − < 1. 2 2 0    1  The desired result now follows from Corollary 5.29(2), because  0 f (x) dx ≤ 1 0

|f (x)| dx.

Exercise 5.31. Show by means of an example that (1) f g may be integrable when f is integrable and g is not, and (2) the integrability of |f | on [a, b] does not imply that of f in general. As a result of Corollary 5.29(2) we can use the Riemann integral to find areas in a broader context than the one discussed so far. In fact when f is integrable b on [a, b], a |f (x)| dx represents the area of the region which is surrounded by the graph of f , the lines x = a and x = b, and the x-axis. To understand why, consider Figure 12 in which the graph of an integrable function is drawn. As can be seen in the figure, the area of the shaded regions coincide. This point is important when f b b admits negative values on [a, b], as otherwise a |f (x)| dx = a f (x) dx.

Figure 12. The areas of the shaded regions are equal.

Weakening the Continuity Property. Now that we are sure about the integrability of continuous functions, it is time to answer the first question posed in (5.d): Are continuous functions the only integrable ones? To answer this question we begin with a useful result. Theorem 5.32. Let f be bounded on [a, b]. (1) If f ∈ R([c, b]) for every c ∈ (a, b), then f ∈ R([a, b]) and  b  b f (x) dx = lim+ f (x) dx. (5.27) a

c→a

c

218

5. The Riemann Integral

(2) If f ∈ R([a, c]) for every c ∈ (a, b), then f ∈ R([a, b]) and  b  c f (x) dx = lim− f (x) dx. a

c→b

a

Proof. We only prove (1); the proof of (2) is quite similar and is left to the reader. Let ε > 0 be given. To prove the integrability of f on [a, b], we find a partition P of this interval such that U (f, P ) − L(f, P ) < ε. Let M and m denote the supremum and infimum of the values of f on [a, b], respectively, and choose δ > 0 so small that (M − m)δ < 2ε and a + δ < b. Since, by our assumption, f is integrable on [a + δ, b], we find a partition Q of [a + δ, b] such that U (f, Q) − L(f, Q) < 2ε . If Q = {a + δ, x1 , . . . , xn−1 , xn = b}, consider the partition P = {a, a + δ, x1 , . . . , xn−1 , xn = b} of [a, b]. Then U (f, P ) − L(f, P ) ≤ (M − m)δ + U (f, Q) − L(f, Q)
0, then U (f, P ) − L(f, P ) < δ(f (b) − f (a)). Now, if ε > 0 is given, choosing δ > 0 such that δ(f (b) − f (a)) < ε and finding a partition P of [a, b] with ||P || < δ, we see that U (f, P ) − L(f, P ) < ε. 

This proves that f is integrable.

Example 5.36. We know that the integral part function f (x) = [x] is increasing on R. So, Theorem 5.35 shows that it is integrable on any interval [a, b]. Since 1 [x] = 0 if 0 ≤ x < 1 and [1] = 1, Example 5.17 says that 0 [x] dx = 0. Example 5.37. The function ⎧ 1 ⎨ n f (x) = ⎩ 0

1 n+1

1 in N. This example shows that a function may be discontinuous at an infinite number of points and yet be integrable. The point is, however, that the points of discontinuity of f form a countable set, which is settheoretically negligible compared to [0, 1], the specified domain of f , which is an uncountable set. Using an appropriate notion of size for subsets of R, known as Lebesgue measure, it can be shown that Riemann integrable functions on an interval [a, b] are precisely those whose points of discontinuity form a negligible set or, more precisely, a set of measure zero. This result, which is known as Lebesgue’s characterization of Riemann integrability, is usually presented in a graduate course whose title is something like “Measure Theory and Integration” or “Real Analysis”.

220

5. The Riemann Integral

5.4. Antiderivatives and the Fundamental Theorem of Calculus Up to now we developed a solid theory for the Riemann integral. We proved many useful properties for integrable functions and identified some classes of them. But how can we compute the Riemann integral of an integrable function? Clearly, the definition of the integral we considered is not of sufficient practical value for the calculation of many integrals. In this section we provide a powerful tool which will help us in the calculation of a wide class of integrals. This is the fundamental theorem of calculus which, besides being important for applications, is of great importance from a theoretical point of view. To motivate the theorem, let us consider some f ∈ R([a, b]) and a partition P = {x0 = a, x1 , . . . , xn = b} of [a, b]. If f is such that for some function F , F  (x) = f (x) holds for every x ∈ [a, b], then for each i, F (xi ) − F (xi−1 ) = F  (ti )Δxi = f (ti )Δxi for some ti ∈ (xi−1 , xi ) by the mean value theorem (Theorem 4.33). Thus, if mi and Mi denote the infimum and supremum of the values of f on [xi−1 , xi ], then (5.28)

mi Δxi ≤ F (xi ) − F (xi−1 ) ≤ Mi Δxi .

Adding the inequalities in (5.28) together for i = 1, . . . , n, we obtain L(f, P ) ≤ F (b) − F (a) ≤ U (f, P ).

(5.29)

Now our definition of the upper and lower Riemann integrals combined with (5.29) yields that  b  b f (x) dx ≤ F (b) − F (a) ≤ f (x) dx. a

a

Since we assumed that f is integrable, the above inequalities give us the identity 

b

f (x) dx = F (b) − F (a), a

which provides us with a practical way for calculation of integrals. To state this more formally, we first introduce the notion of antiderivative. Definition 5.38. Let f and F be functions defined on the same interval I. We say that F is an antiderivative of f on I if for every x in this interval, F  (x) = f (x). With this terminology, we can summarize our observations in the following theorem. Theorem 5.39 (The Fundamental Theorem of Calculus). If f is integrable on [a, b] and F is an antiderivative of f on this interval, then 

b

f (x) dx = F (b) − F (a). a

5.4. Antiderivatives and the Fundamental Theorem of Calculus

221

Why is Theorem 5.39 said to be fundamental? Theorem 5.39 enables us to compute many integrals. For example, since the sine function is an antiderivative of the cosine function on any interval,  π2 π cos x dx = sin − sin 0 = 1. 2 0 But this is not the only reason we call it a “fundamental” theorem. The theorem is said to be fundamental because it joins differential and integral calculus together, the two important parts of calculus which have completely different roots. In fact, as we described before, differential calculus has its roots in the geometric problem of finding tangent lines, while integral calculus dates back to ancient times and is related to the problem of finding the area of plane regions. The fundamental theorem then joins these seemingly separate disciplines together by asserting the following statement. If you can find a function F whose derivative is identical to the integrable function f on [a, b], then the integral of f on [a, b] is equal to F (b) − F (a). Example 5.40. We have already seen that the sine function is integrable on [0, π2 ] π and have discussed the geometric meaning of 02 sin x dx as an area. Using the fundamental theorem of calculus, we can compute this integral and, accordingly, the area of the shaded region in Figure 9. This is because F (x) = − cos x is an antiderivative of the sine function on [0, π2 ] (and also on any other interval). Hence π 2 sin x dx = cos 0 − cos π2 = 1. 0 Example 5.41. Consider ⎧ ⎨ sin x f (x) =



0

x ∈ Qc , x ∈ Q,

as a function on [0, π2 ]. Does f have an antiderivative on this interval? Solution. Having an antiderivative on [0, π2 ] means that f is identical to the derivative of some function F on this interval. So when f has an antiderivative on [0, π2 ], it should satisfy the intermediate value property on [0, π2 ] by Theorem 4.31. But, sin 1 lies between f (0) = 0 and f ( π2 ) = 1, and there is no x in [0, π2 ] such that f (x) = sin 1. This shows that f fails to have an antiderivative on [0, π2 ]. Example 5.42. Find

π 4

0

tan2 x dx.

Solution. We know that F (x) = tan x is an antiderivative for f (x) = 1 + tan2 x on R (and hence on any interval). Writing tan2 x = (1 + tan2 x) − 1, we see that G(x) = tan x − x is an antiderivative of g(x) = tan2 x on [0, π4 ]. Thus, by the π fundamental theorem of calculus, 04 tan2 x dx = (tan π4 − π4 ) − (tan 0 − 0) = 1 − π4 .

222

5. The Riemann Integral

Antiderivatives are not unique whenever they exist. That we always spoke of an antiderivative rather than the antiderivative has a simple reason: If F is an antiderivative of f on some interval I, then so is the function F + c for arbitrary real number c. This is because the derivative of a constant function is identically zero. It is interesting that the converse of this is also true. See Exercise 34 at the end of this chapter. Example 5.43. Define a function h by h(x) = x if x ∈ Q and by h(x) = 0 otherwise. Is h integrable on [0, 1]? Solution. Let P = {x0 = 0, x1 , . . . , xn−1 , xn = 1} be any partition of [0, 1]. Since h(x) ≥ 0 for every x ∈ [0, 1] and each subinterval [xi−1 , xi ] contains irrational numbers, L(h, P ) = 0. This, in view of the fact that P was arbitrary, implies that 1 h(x) dx = 0. To compute the upper sum U (h, P ), we first notice that for each 0 subinterval [c, d] of [0, 1], (5.30)

d = sup{h(x) : x ∈ [c, d]}.

In fact, that d is an upper bound for the set follows directly from our definition of h. But if l < d is given, we may use Theorem 1.49 to find a rational number q satisfying max{c, l} < q < d, and then h(q) = q > l. This shows that (5.30) is indeed true. As a result of this observation we see that U (h, P ) = U (f, P ), where f is the identity function f (x) = x on [0, 1]. Thus we conclude that ! 2 "1  1  1  1 x 1 h(x) dx = f (x) dx = x dx = = . 2 2 0 0 0 0 Since the upper and lower integrals of h on [0, 1] are not the same, we find that h is not integrable on this interval. On the existence of antiderivatives and Riemann integrability. The requirements that f is integrable and that it has an antiderivative, which were used as our hypotheses in the fundamental theorem, are indeed independent of one another. In fact, a function f may be integrable without having an antiderivative (see Example 5.44) and may have an antiderivative without being integrable (see Example 5.46). Example 5.44. The signal function

⎧ −1 ⎪ ⎪ ⎪ ⎪ ⎨ 0 sgn(x) = ⎪ ⎪ ⎪ ⎪ ⎩ 1

x < 0, x = 0, x > 0,

is integrable on [−1, 1] by Theorem 5.34, as it is continuous at every point x = 0 of [−1, 1]. Nevertheless, the function fails to have an antiderivative on [−1, 1] because

5.4. Antiderivatives and the Fundamental Theorem of Calculus

223

it does not satisfy the intermediate value property on [−1, 1]. To compute the integral of this function on [−1, 1], we first use Theorem 5.19 to obtain  0  1  1 sgn(x) dx = sgn(x) dx + sgn(x) dx, −1

−1

0

and next we use the consequence of Example 5.18 to deduce  0  1  1 sgn(x) dx = − dx + dx, = −1 + 1 = 0. −1

−1

0

Exercise 5.45. Although we observed that the signal function has no antiderivative on [−1, 1], verify that by defining h(x) = |x| on [−1, 1] it follows that h (x) = sgn(x) 1 for every nonzero x and −1 sgn(x) dx = h(1) − h(−1).   Example 5.46. Define a function F on [0, 1] by F (x) = x2 sin xπ2 when 0 < x ≤ 1 and F (0) = 0. Then F is differentiable on [0, 1] with derivative ⎧ π 0 < x ≤ 1, ⎨ 2x sin xπ2 − 2π x cos x2 f (x) = ⎩ 0 x = 0. Thus F is an antiderivative of f on [0, 1]. The function f , however, is not integrable on [0, 1] because it is not bounded on this interval. On the Existence of Antiderivatives. Up to now we observed that integrability does not imply the existence of antiderivatives. As a result of the following theorem, we find that continuity is a sufficient condition for the possession of antiderivatives. Theorem 5.47. Let f be integrable on [a, b] and define a function F on this interval by  x F (x) = f (t) dt. a

Then F is continuous on [a, b]. Also at every x in which f is continuous, F is differentiable and F  (x) = f (x). Proof. Let M be an upper bound for the values of |f | on [a, b] and consider x, y ∈ [a, b] with x ≤ y. Then  y   y   x        |F (x) − F (y)| =  f (t) dt − f (t) dt =  f (t) dt a x  ay ≤ |f (t)| dt ≤ M (y − x) = M |y − x|. x

This shows that F , being a Lipschitz function, is uniformly continuous (see exercise 61 at the end of Chapter 3). Now, suppose that f is continuous at some x. To prove that F is differentiable at x and F  (x) = f (x), we show that the quotient F (y)−F (x) tends to f (x) as y approaches x. To see this, note that for every y with y−x

224

5. The Riemann Integral

a ≤ x < y ≤ b,     F (y) − F (x)  − f (x) =  y−x = ≤

(5.31)

   (F (y) − F (x)) − f (x)(y − x)      y−x  y   y  1  f (t) dt − f (x) dt  |y − x| x x  y 1 |f (t) − f (x)| dt. |y − x| x

When y is such that a ≤ y < x ≤ b, a similar reasoning shows that    x  F (y) − F (x)  1   |f (t) − f (x)| dt. (5.32) − f (x) ≤  y−x |y − x| y

Now let ε > 0 be given. Since f is continuous at x, there exists δ > 0 such that for every t ∈ [a, b] with |t − x| < δ, |f (t) − f (x)| < 2ε . So, if y is an element of [a, b] whose distance from x is less than δ, then inequalities (5.31) and (5.32) show that    ε  F (y) − F (x)  − f (x) ≤ < ε.  y−x 2 This means that F  (x) exists and is equal to f (x), as desired.



Notes on Theorem 5.47. It follows from Theorem 5.47 that (1) every continuous function has an antiderivative, and (2) using any integrable function, we can define a uniformly continuous function. Regarding (1), Examples 5.41 and 5.44 show that a discontinuous function may fail to have an antiderivative. The interesting fact about (2) is that even when an integrable function is not continuous, it can be used to define a uniformly continuous function. Example 5.48. Using Theorem 5.47, we can present another solution for Example   x 5.20. In fact, if F is defined by F (x) = 0 f (t) dt, then for each n ∈ N, αn = F n1 . By Theorem 5.47, F is continuous on [0, 1]. Now, since { n1 } converges to 0, {αn } 0 converges to F (0) = 0 f (t) dt = 0. We make the convention that when a < b and f ∈ R([a, b]),  b  a f (t)dt = − f (t)dt. b

a

Example 5.49. Let f be continuous on R. For each n ∈ N, define a function fn by  x+ n1 (5.33) fn (x) = n f (t) dt. x

Prove that the functions fn are differentiable on R, even when f is not.

5.4. Antiderivatives and the Fundamental Theorem of Calculus

225

x

Solution. Define F on R by F (x) = 0 f (t) dt. Then F is differentiable and for every x, F  (x) = f (x). Now, our definition of fn ’s implies     1 fn (x) = n F x + − F (x) , n     1 =n f x+ − f (x) n for every x ∈ R. The function f (x) = |x| is not differentiable on R, but the functions fn defined by (5.33) are all differentiable, as we observed above. and therefore

fn (x)

Example 5.50. Define a function f on [0, 3] by ⎧ x 0 ≤ x < 1, ⎪ ⎪ ⎪ ⎪ ⎨ 1 1 ≤ x < 2, f (x) = ⎪ ⎪ ⎪ ⎪ ⎩ x 2 ≤ x ≤ 3. x

Find the function F which is associated to f by the formula F (x) = 0 f (t) dt. Verify that F is continuous on [0, 3] and determine the points at which F is differentiable. 0

1

Solution. It is clear that F (0) = 0 f (t) dt = 0 and F (1) = 0 t dt = 12 . Since 2 2 Example 5.18 shows that 1 f (t) dt = 1 dt,  2  1  2 3 1 F (2) = f (t) dt = t dt + dt = + 1 = . 2 2 0 0 1 If 0 < x < 1, then





x

F (x) =

t dt =

0

when 1 < x < 2,





x

F (x) =

0



1

f (t) dt = 0

and for 2 < x ≤ 3,

f (t) dt =

F (x) = 0

dt = 1



x

x

t dt + 0



x

f (t) dt =



1

t dt + 0

So, in summary, F can be written as ⎧ x2 ⎪ 2 ⎪ ⎪ ⎪ ⎨ x − 12 F (x) = ⎪ ⎪ ⎪ ⎪ ⎩ x2 1 2 − 2

x2 , 2

1 1 + (x − 1) = x − , 2 2 

2

x

dt + 1

t dt = 2

1 x2 − . 2 2

0 ≤ x ≤ 1, 1 < x ≤ 2, 2 < x ≤ 3.

Since limx→1 F (x) = F (1) = 12 and limx→2 F (x) = F (2) = 32 , F is continuous on [0, 3]. Also, using the limit definition of the derivative at 1 and 2, it follows that F+ (1) = F− (1) = 1, F+ (2) = 2 and F− (2) = 1. Thus, the only point at which F is

226

5. The Riemann Integral

not differentiable is 2. Note that 2 is the only point at which f is not continuous.

Some Integration Techniques. Although the fundamental theorem of calculus enables us to calculate the Riemann integral of many important functions,√it can be 2 inadequate for the computation of many others. Instances include 1 sin√t t dt and π 2

x cos x dx. To compute such integrals, we need to be familiar with formulas that enable us to replace the integrals by simpler ones. Such formulas, which constitute a large and important part of the calculus-based theory of integration, are usually known as integration techniques. In what follows, we present a mathematically rigorous treatment of two of the most important techniques of this kind, namely, the change of variables formula and integration by parts. The change of variables formula enables us to calculate integrals such as √ 2 sin t √ dt. If we denote this integral by I, then by considering f (t) = sin t and 1 t √ 2 g(t) = t, we see that I = 2 1 f (g(t))g  (t) dt. So to compute I, it is sufficient to b be able to find integrals of the form a f (g(t))g  (t) dt. 0

Theorem 5.51 (The Change of Variables Formula). Let g be differentiable on [a, b] with a continuous derivative, and let f be continuous on g([a, b]). Then  (5.34)

b

f (g(t))g  (t) dt =

a

Proof. Define a function F by F (u) = chain rule and the fact that F  ≡ f ,



g(b)

f (x) dx. g(a)

u g(a)

f (x) dx for every u ∈ g([a, b]). By the

(F (g(t))) = F  (g(t))g  (t) = f (g(t))g  (t). Thus by the fundamental theorem of calculus,  b f (g(t))g  (t) dt = F (g(b)) − F (g(a)), a

and the result follows from the equality F (g(a)) = 0.



Why is (5.34) known as the change of variables formula? Informally speaking, the right-hand side of (5.34) can be obtained from the left by letting x = g(t). This can be described by saying that when x = g(t), f (g(t)) = f (x), g  (t) dt = dx and the integral bounds a and b for t are replaced by g(a) and g(b) for x. Thus, (5.34) is known as the change of variables formula because it relies on changing the variable t into x = g(t). Example 5.52. Let f be a continuous function defined on R such that for every 1 y ∈ R, 0 f (ty) dt = 0. Show that for every x ∈ R, f (x) = 0.

5.4. Antiderivatives and the Fundamental Theorem of Calculus

227

Solution. For every x = 0, the change of variable u = tx yields that  1  x f (u) 0 = du f (tx) dt = x 0 0  1 x = f (u) du. x 0 Thus, for every x ∈ R,



x

f (u) du = 0. 0

This gives f (x) =

d dx



x

f (u) du = 0, 0

which is our desired result. Exercise 5.53. Find the value of

√ 2 sin t √ 1 t

dt.

We conclude this chapter with one of the most important integration techniques one learns in calculus. Theorem 5.54 (Integration by Parts). Let f and g be differentiable on [a, b] with f  , g  ∈ R([a, b]). Then  b  b  f (x)g (x) dx = (f (b)g(b) − f (a)g(a)) − f  (x)g(x) dx. (5.35) a

a

Proof. By the product rule for derivatives, for every x (f (x)g(x)) = f (x)g  (x) + f  (x)g(x).

(5.36)

Since the functions that appear in both sides of (5.36) are integrable on [a, b] by our assumptions on f and g, it follows from Theorem 5.14(2) that  b  b  b   (f (x)g(x)) dx = f (x)g (x) dx + f  (x)g(x) dx. (5.37) a

a

a

Since f g is an antiderivative of (f g) on [a, b], by the fundamental theorem of calculus,  b (f (x)g(x)) dx = f (b)g(b) − f (a)g(a). (5.38) a

Now (5.35) follows from (5.37) and (5.38).



Example 5.55. Let f be a differentiable function with continuous derivative on [a, b]. Prove that  b f (x) sin nx dx = 0. lim n→∞

Solution. Let In =

b a

a

f (x) sin nx dx. Using integration by parts, we see that  1 b  f (a) cos na − f (b) cos nb + f (x) cos nx dx. In = n n a

228

5. The Riemann Integral

Since both |f | and |f  | are continuous on [a, b], they attain their maxima on this interval, which we denote by α and β, respectively. Thus,  2α β b |In | ≤ dx, + n n a from which we obtain the desired result by letting n tend to infinity. Exercise 5.56. Find the value of

π 2

0

x cos x dx.

Notes on Essence and Generalizability In this chapter we introduced the Riemann integral using the approach of Darboux. As we saw, the approach uses the concepts of supremum and infimum, and it is therefore not presentable in a calculus course. The material of this chapter aimed at completing our calculus-based knowledge of the Riemann integral and is not among what we will generalize in the chapters of the second part. The Riemann integral can be generalized to a notion of an integral that uses the abstract concept of measure. Such a generalization is usually studied in a graduate course entitled “Measure Theory and Integration” or similar. Describing the content of such a course is not possible at this stage. We will just note that the above-mentioned concept of measure is a generalization of the concepts of length and area in the real line and in the plane, respectively. Excellent resources for the theory of measures are [12, 29] The only place in which the concept of Riemann integrability appears in the second part of the book is Chapter 9, where we examine its relation to the uniform convergence of sequences of functions.

Exercises 1. Complete the proof of Lemma 5.10 by showing that if Q is a refinement of P , then for every bounded function f , L(f, P ) ≤ L(f, Q). 2. Complete the proof of Theorem 5.19 by showing that if f ∈ R([a, b]), then f ∈ R([c, b]) for every a < c < b. 3. Complete the proof of Theorem 5.32. 4. Suppose P1 and P2 are partitions of an interval [a, b]. If ||P1 || ≤ ||P2 ||, does it necessarily follow that P1 is a refinement of P2 ? 5. Let f be bounded on [a, b]. If a partition P of [a, b] exists such that U (f, P ) = L(f, P ), prove that f is constant on [a, b]. 6. If f is integrable on [a, b] and f (x) = 0 for every x ∈ Q ∩ [a, b], prove that b f (x) dx = 0. a 7. Suppose that f is integrable on [a, b]. If a positive constant M can be found such that f (x) ≥ M for every x ∈ [a, b], prove that the function 1/f is also integrable on [a, b].

Exercises

229

8. Define a function f on [0, 1] by ⎧ ⎨ 1 x = n1 for some n ∈ N, f (x) = ⎩ 0 otherwise. Prove, directly from the definition, that f is integrable on [0, 1] and find its integral. 9. Consider the function

⎧ ⎨ sin x g(x) =

Is g integrable on [0,



0

x ∈ Qc , x ∈ Q.

π 2 ]?

10. Find the upper and lower integrals of the following function on [−1, 1]: ⎧ ⎨ x x ∈ Q, h(x) = ⎩ −x x ∈ Qc . 11. Let f be bounded on [a, b]. If there exists a sequence {Pn } of partitions of [a, b] such that limn→∞ (U (f, Pn ) − L(f, Pn )) = 0, prove that f ∈ R([a, b]) and  b f (x) dx = lim U (f, Pn ) = lim L(f, Pn ). a

n→∞

n→∞

Is the converse of this also true? 12. Use the previous exercise to show that the functions f (x) = x and g(x) = x2 1 1 are integrable on [0, 1] and verify that 0 f (x) dx = 12 and 0 g(x) dx = 13 . 13. Let f be bounded on [a, b]. Prove that the following conditions are equivalent. b (a) The function f is integrable on [a, b] and a f (x) dx = α. (b) If {Pn } is a sequence of partitions of [a, b] such that limn→∞ ||Pn || = 0, then limn→∞ U (f, Pn ) = limn→∞ L(f, Pn ) = α. 14. Show that a bounded function f is integrable on [a, b] if and only if the following assertion is true. For every ε > 0, there exists δ > 0 such that for each partition P of [a, b], ||P || < δ implies U (f, P ) − L(f, P ) < ε. 15. Suppose f is integrable on [a, b] and, for every n ∈ N, let Sn =

n 

f (a + ih)h,

i=1

where h =

b−a n .

Prove that the sequence {Sn } converges to 1 0

b a

f (x) dx.

1 0

16. Let f and g be continuous on [0, 1], and let f (x) dx = g(x) dx. Prove that c ∈ [0, 1] can be found such that f (c) = g(c). Does it follow that f (x) = g(x) for every x ∈ [0, 1]? 17. Let f , g, and h be bounded on [a, b] and for every x ∈ [a, b], f (x) ≤ g(x) ≤ h(x). b b If f and h are integrable on [a, b] and a f (x) dx = a h(x) dx = α, prove that b g is also integrable on [a, b] and a g(x) dx = α. 18. Find nonintegrable functions f and g whose product f g is integrable.

230

5. The Riemann Integral

19. Let f be continuous on [a, b] with the property that for every continuous funcb tion g on [a, b], a f (x)g(x) dx = 0. Prove that f is identically zero on [a, b]. 20. If g is integrable on [0, 1], show that limn→∞

1 0

xn g(x) dx = 0.

21. Let f be bounded on [a, b]. Use the assumption that the functions f and |f | are integrable and the inequalities −|f (x)| ≤ f (x) ≤ |f (x)| are valid for every x to present an alternate proof of Corollary 5.29(2). This shows that by assuming the integrability of |f |, there is no need to use Theorem 5.24 in the proof of Corollary 5.29(2). 22. Find an interval [a, b] and a function f ∈ R([a, b]) such that b   b (a)  a f (x) dx = a |f (x)| dx.   b  b  (b)  a f (x) dx < a |f (x)| dx. 23. Let f be a function defined on some interval [−a, a] with a > 0. We say that f is odd (resp., even) if for every x ∈ [−a, a], f (−x) = −f (x) (resp., f (−x) = f (x)). a (a) If f is odd, prove that −a f (x) dx = 0. a a (b) If f is even, prove that −a f (x) dx = 2 0 f (x) dx. Can you geometrically interpret these results? 24. Let f be a continuous function from [0, 1] into itself. Show that the equation  x 2x − f (t) dt = 1 0

has a unique solution in [0, 1]. 25. Let f be continuous on [a, b]. Define a function g on R by  x g(x) = (x − t)f (t) dt. 0 

Prove that for every x ∈ R, g (x) = f (x). 26. Suppose f has a continuous second derivative on R. Show that for every x ∈ R, x f (x) = f (0) + f  (0)x + 0 (x − t)f  (t) dt. 27. If f denotes the signal function defined on [−1, 1], find the function F which is x associated to f by the formula F (x) = −1 f (t) dt for every x ∈ [−1, 1]. Then, determine the points at which F is not differentiable. 28. Suppose f is a continuous function and g is differentiable on [a, b]. If x0 ∈ [a, b] is arbitrary and we define F on [a, b] by  g(x) f (t) dt, F (x) = x0

then prove that F is differentiable on [a, b] and that F  (x) = f (g(x))g  (x) for every x in [a, b]. 29. Use the above exercise to calculate F  in each case and on the given interval. sin x (a) F (x) = 0 cos t dt on [0, π/2]. (b) F (x) =

ex 1

ln t dt on [0, 2].

Exercises

231

30. Assume that f is a nonnegative continuous function and that there exists a real x number A ≥ 0 such that for every x ∈ [a, b], f (x) ≤ A a f (t) dt. Prove that f is identically zero on [a, b]. 31. Use integration by parts to show that for every f ∈ R([0, 1]),   1  1  1 f (t) dt dx = tf (t) dt. 0

x

0

32. Prove that for all natural numbers m and n,  1  1 m n x (1 − x) dx = xn (1 − x)m dx. 0

0

33. Let f be a differentiable one-to-one function, and let g be an antiderivative of f . If f −1 is the inverse of f , prove that h(x) = xf −1 (x) − g(f −1 (x)) is an antiderivative of f −1 . 34. If F is an antiderivative of f on some interval I, prove that for every antiderivative G of f on I a constant c can be found such that G(x) = F (x) + c for every x ∈ I. In the following exercises we investigate an extension of the integral to unbounded intervals and show the way it can be applied in the theory of real series. 35. Suppose f is integrable on [a, x] for every x > a. Define the improper integral ∞ of f on [a, +∞), denoted by a f (t) dt, to be  x f (t) dt. lim x→+∞

a

Say that the improper integral converges to the above limit whenever it exists; otherwise, say that the integral diverges. ∞ (a) Prove that 1 1/(1 + x2 ) dx converges to π/4. ∞ (b) Observe that 1 1/xp dx converges if and only if p > 1. 36. Suppose f is nonnegative and decreasing on [1, +∞). Prove ∞that the improper ∞ integral 1 f (x) dx converges if and only if the series n=1 f (n) converges. This result is known as the integral test for the convergence of series. 37. Use the integral test to determine the convergence or divergence of the following series. ∞ (a) n=1 1/(n2 + 1). ∞ (b) n=1 1/np for p > 0. ∞ (c) n=2 1/(n ln n).

Part 2

Abstraction and Generalization

Chapter 6

Basic Theory of Metric Spaces

One of the most important aspects of beauty in modern mathematics is its ability to unify: Pure mathematicians often try to unify those arguments which have a common theme by gathering them in a single theory. Usually, such theories are developed in a general framework and involve many concrete situations as their special cases. Moreover, such theories may have no, or few, applications in another sciences, at least when they are still new. For these reasons, we usually know theories of this kind as abstract theories. The process of developing an abstract theory is, naturally, known as abstraction. The starting point for abstraction is to find a common theme within some seemingly different arguments. Our aim in this part of the book is to describe what we discussed in the above paragraph. To do so, we develop a theory for metric spaces. As we will see shortly, a metric space is a set together with an appropriate distance function or metric, enabling us to think of the distance between two given points of the set. So, what we are going to develop is an abstract theory entitled the theory of metric spaces. Before proceeding to the development, let us pose a natural question: Why should we care about the notion of distance in a generic set? To answer this question, we try to find a common theme, which is the existence of a well-defined notion of distance in the sets R and R2 .

Distance in the Real Line and in the Plane. In the first part of the book, we studied real numbers and functions extensively. As we saw in that part, most of our analysis was based on the notion of distance in the real line, which is in turn determined by the Euclidean distance function de . The concepts of convergence, limit and continuity, which were essential in the first part, were all defined using this distance function. We also utilized the distance function to define neighborhoods, and then used them in Section 3.1 to classify the points of R according to their position relative to a given set A ⊆ R. 235

236

6. Basic Theory of Metric Spaces

As we mentioned in Chapter 1, the notion of distance also has a meaning in the sets C (of complex numbers) and R2 . More precisely, we defined the distance functions dC and d2e on the sets C and R2 , respectively, and observed that they satisfy properties similar to those of de (see Propositions 1.17 and 1.71). As emphasized in Chapter 1, the existence of such distance functions is a common theme in concrete theories of real and complex numbers and the plane R2 . To describe what is meant by a common theme, let us see what follows from the existence of d2e in the plane. We recall that the distance function d2e in the plane is given by  (6.1) d2e (P, Q) = (x − a)2 + (y − b)2 , where P = (x, y) and Q = (a, b) are arbitrary elements of R2 . As we mentioned in Chapter 1, this is the length of the line segment that joins P to Q; see Figure 1.

Figure 1. The distance function d2e can be obtained from de .

Here, we used the subscript e again to emphasize that d2e is the most natural distance function on R2 whose definition is consistent with, and appropriate for, the Euclidean geometry of the plane. The superscript 2 is used to help us distinguish between the Euclidean metric on R2 and that of R. As you can see in Figure 1, the sides of the right angle are of the length |a − x| and |b − y|, the distance of a and x and that of b and y on the real line, respectively. So, the distance function (6.1) is obtained from these distances by the Pythagorean theorem. Recall that we used similar reasoning to obtain the distance function dC on C in Chapter 1. As is sometimes mentioned in multivariable calculus, the distance function (6.1) can be used to define the notion of limit, and therefore continuity, for real-valued functions which are defined on R2 . In fact, we write lim

(x,y)→(a,b)

f (x, y) = L

if the following statement is true. For every ε > 0, there exists δ > 0 such that 0 < d2e ((x, y), (a, b)) < δ implies de (f (x, y), L) < ε.

6. Basic Theory of Metric Spaces

237

Here, the statement can be interpreted as follows. We can make the value f (x, y) as close to L as we wish, provided that the point (x, y) is sufficiently close to (a, b). It should be noted, however, that the closeness of f (x, y) to L is determined by the distance function de in the line, while that of (x, y) to (a, b) is characterized by the distance function d2e in the plane. The function f is said to be continuous at (a, b) if lim(x,y)→(a,b) f (x, y) = f (a, b). The distance function d2e can also be used to define the limit of sequences of ordered pairs, but we postpone the discussion of this to the next chapter. We can use d2e , as we did for de , to classify the points of R2 according to their position relative to a set A ⊆ R2 . To see this, consider A = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}. As you know, A consists of those points which lie on, or are located inside of, the circle with its center at the origin and radius 1. Consider two points in R2 , P and Q, as depicted in Figure 2.

Figure 2. The points P and Q lie in the interior and boundary of A, respectively.

Here, we intuitively think of P and Q as interior and boundary points of A, respectively. As in the case of our argument in the line, these names can be justified using ε-neighborhoods, which can be defined in terms of d2e . So let us define the εneighborhood of some point (x, y) of R2 as the set of all points (z, w) ∈ R2 satisfying d2e ((z, w), (x, y)) < ε. This is the set of all points which lie inside the circle with center (x, y) and radius ε, depicted in Figure 3, which we will also refer to as an open disk . Now, the fact that P lies in the interior of A is an outgrowth of the following observation. A sufficiently small ε > 0 can be found such that the ε-neighborhood of P lies entirely in A. As can be seen in Figure 4, this means that not only is P a point of A, but also every point that is sufficiently close to P is an element of A. We may think of the point Q as a boundary point because, every ε-neighborhood of Q contains elements from both A and Ac . See Figure 5.

238

6. Basic Theory of Metric Spaces

Figure 3. The ε-neighborhood of (x, y) in R2 .

Figure 4. A neighborhood of P lies entirely in A.

As a result of our above discussion, we find a common theme: in each case we had a set X, first X = R and then X = R2 , and a distance function d : X × X → R, d = de for X = R and d = d2e for X = R2 , which enabled us (i) to define the convergence of sequences in X, (ii) to give a precise meaning to the limit and continuity of functions which are defined on X, and (iii) to classify the points of X according to their position relative to given subsets. This common theme is our cue for the development of metric space theory: given an arbitrary set X, can we equip X with an appropriate notion of distance that enables us to achieve the goals mentioned in (i)–(iii) above? We will answer this question in the next section.

6.1. A First Generalization: The Definition of Metric Space

239

Figure 5. Every neighborhood of Q intersects both A and Ac .

6.1. A First Generalization: The Definition of Metric Space If we want to equip a given set X with a distance function d : X × X → R and make it into a metric space, what would be the defining properties of d? Which properties are required for d to ensure that the goals (i)–(iii) of the previous section are achieved in our space? To answer questions like these, let us think about the properties an arbitrary notion of distance should have. Imagine that we are in a set X, in which to any given pair of points x and y a distance d(x, y) is assigned. First of all, the distance should be a nonnegative real number, that is, we should have the following rule. (1) For all x, y ∈ X, d(x, y) ≥ 0. This is a completely natural requirement—have you ever heard of distance of −1 meter? Next, every point x has no distance with itself, and x must be the unique element of X whose distance with x is 0. This leads us to the following rule. (2) For x, y ∈ X, d(x, y) = 0 if and only if x = y. Also, the distance should be symmetric, meaning that (3) For all x, y ∈ X, d(x, y) = d(y, x). This is also natural—the distance between your nose and your mouth is exactly the same as that of your mouth and your nose! Finally, the distance between some points x and y should not be more than the distance between x and a third point z, plus that of y and z. More precisely, (4) For all x, y and z in X, d(x, y) ≤ d(x, z) + d(z, y). This property, which is the most important one, is known as the triangle inequality. This is what we consider in our daily movements: If you want to go from a point x to another one, y for example, and there is a direct path between them, you will never go from x to a third point z to go from there to y. See Figure 6.

240

6. Basic Theory of Metric Spaces

Figure 6. The preferred path is the bold line segment.

If you can remember from Propositions 1.17 and 1.71, the properties (1)–(4) above are true for the Euclidean distance function de in R and for the distance functions dC and d2e in C and R2 , respectively. And to be honest, this helped us to find that the above properties are the most important features of a distance function. This is a true instance of generalization: We first observe that a mathematical structure works well on some sets, and we then try to establish structures with similar properties on arbitrary sets. To understand why the inequality in (4) and its special case for de in R are known as the triangle inequality, consider Figure 7.

Figure 7. A triangle with vertices P , Q and R.

Here, we have three points P , Q, and R in the plane that are considered to be the vertices of a triangle. As you may know from plane geometry, the length of each side of the triangle is less than the sum of the length of the other sides. In particular, the length of P Q is less than the length of P R plus that of RQ. But, note that these lengths are indeed d2e (P, Q), d2e (P, R), and d2e (R, Q), respectively. So the above geometric fact can be written in the form d2e (P, Q) < d2e (P, R) + d2e (R, Q). The above inequality can be made into an equality only when R lies between P and Q on the line segment that joins them, as in Figure 8. Note that the above line segment is indeed an annihilated form of the triangle PQR of Figure 7, which is obtained from that triangle by pushing R towards the side P Q.

6.1. A First Generalization: The Definition of Metric Space

241

Figure 8. The point R lies between P and Q.

So, the triangle inequality of (4) is taken from its special case which seems to be true in the plane for d2e . Note that we are still not able to claim that the inequality is indeed true for d2e . What we discussed was an intuition-based argument which showed that the inequality may be true for d2e in the plane. We will see shortly that this inequality is actually true. Up to now we proposed four properties for distance functions. It is now natural to ask a few questions. First, are these properties exhaustive? Are they enough for achieving goals (i)–(iii) of the previous section in X? Second, are the properties consistent? Is it possible to use some of the properties to contradict some others? Third, are the properties independent? Can we deduce some of them by assuming some others? As you may remember, questions like these were asked when we were trying to formulate the real number axioms. Here, we encounter again them in our abstraction. As in the case of the real number axioms, the first question is the most delicate one. Usually, determining the exhaustiveness of a collection of axioms or properties requires some trial and error. One should first consider what appears to be needed, and then go into development on the basis of these needs. Once something new is needed, it is not too difficult to add it. Sometimes, finding the right collection of axioms or defining properties takes many years! As we are not going to “reinvent the wheel” here, we trust what mathematicians tested, and we answer the first question in the affirmative. The second question has also a positive answer: The properties are consistent. As for the third question, the following proposition shows that properties (1)–(4) are not independent. Proposition 6.1. Let X be a nonempty set. For a function d : X × X → R, property (1) above follows from the remaining ones. Proof. If x and y are arbitrary elements of X, then 0 = d(x, x) ≤ d(x, y) + d(y, x) = 2d(x, y),

242

6. Basic Theory of Metric Spaces

from which it follows that d(x, y) ≥ 0. In the above relations, we used (2), (4), and (3), respectively.  Now we can summarize our discussion in the following, which is the most important definition of this part. This is our first generalization: a generalization of the notion of distance from the classical spaces R and R2 to arbitrary sets. Definition 6.2. Let X be a nonempty set. A function d : X × X → R is called a distance function, or a metric, on X if it satisfies properties (2)–(4) above. When X is equipped with a metric, we say that (X, d), or more briefly X itself, is a metric space. Note that by the above proposition, the range of a metric is automatically contained in [0, +∞). Why is metric space an abstract concept? The concept of metric space is an abstract one in the sense we discussed earlier in this chapter. In fact, this is an abstract concept because • it is stated in a very general framework and includes many particular cases as its examples, as we will see shortly, and • in its general form, it may be far from being concrete and of use in applications. Example 6.3. Let (X, d) be a metric space, and let x, y, z, and w be arbitrary elements of X. Prove that |d(x, y) − d(z, w)| ≤ d(x, z) + d(y, w). Solution. In view of the triangle inequality, d(x, y)

≤ d(x, z) + d(z, y) ≤ d(x, z) + d(z, w) + d(w, y),

so that (6.2)

d(x, y) − d(z, w) ≤ d(x, z) + d(y, w),

by the fact that d is symmetric. Similar reasoning, with d(z, w) as the starting point, shows that (6.3)

d(z, w) − d(x, y) ≤ d(x, z) + d(y, w).

The desired result now follows from (6.2) and (6.3). Some Examples of Metric Spaces. Now that we have introduced our abstract concept, it is time to see the power of abstraction. The following examples show the wide range of sets on which a metric can be defined. Example 6.4. By Proposition 1.17, de is a metric on R, which we will refer to as the Euclidean metric. The metric space (R, de ) will be called the Euclidean space R. Example 6.5. By Proposition 1.71, dC is a metric on C.

6.1. A First Generalization: The Definition of Metric Space

243

Example 6.6. Let Rn denote the set of all n-tuples of real numbers. On this set a distance function can be defined by  (6.4) dne ((x1 , . . . , xn ), (y1 , . . . , yn )) = (x1 − y1 )2 + · · · + (xn − yn )2 . This naturally generalizes the distance function d2e considered in the previous section on R2 . To prove that dne is indeed a metric, we only need to prove the triangle inequality, as the remaining properties can be easily verified. If P = (x1 , . . . , xn ), Q = (y1 , . . . , yn ) and R = (z1 , . . . , zn ) are elements of Rn , then  12  n  n 2 de (P, Q) = (xi − yi ) i=1

=

 n 

 12 (xi − zi + zi − yi )

i=1



 n 



 12 (xi − zi )2

+

i=1

2

n 

 12 (zi − yi )2

i=1

= dne (P, R) + dne (R, Q). Here, the inequality follows from Minkowski’s inequality (Corollary 1.65)  n  n  12  12  n  12    (ai + bi )2 ≤ a2i + b2i i=1

i=1

i=1

by letting ai = xi − zi and bi = zi − yi . Example 6.7. On Rn × Rn , define dns and dnm by dns ((x1 , . . . , xn ), (y1 , . . . , yn ))

=

n 

|xi − yi |

i=1

and dnm ((x1 , . . . , xn ), (y1 , . . . , yn )) = max{|xi − yi | : i = 1, . . . , n}. Prove that dns and dnm are metrics on Rn . Solution. The only nontrivial property is the triangle inequality. To see this for dns , let (z1 , . . . , zn ) be an element of Rn and note that by the triangle inequality for de , dns ((x1 , . . . , xn ), (y1 , . . . , yn )) = ≤ =

n 

|xi − yi |

i=1 n 

(|xi − zi | + |zi − yi |) i=1 dns ((x1 , . . . , xn ), (z1 , . . . , zn ))

+ dns ((z1 , . . . , zn ), (y1 , . . . , yn )).

244

6. Basic Theory of Metric Spaces

The triangle inequality for dnm also follows from that for de . More precisely, since for each i ∈ {1, . . . , n}, |xi − yi | ≤ |xi − zi | + |zi − yi | ≤ dnm ((x1 , . . . , xn ), (z1 , . . . , zn )) + dnm ((z1 , . . . , zn ), (y1 , . . . , yn )), the triangle inequality follows for dnm . Example 6.8. Let C([a, b]) denote the set of all continuous functions from [a, b] into R. On this set, we define a distance function as du (f, g) = max{|f (x) − g(x)| : x ∈ [a, b]}. Note that the maximum is indeed attained by the continuity of the function |f − g| on [a, b] and the extreme value theorem. See Figure 9.

Figure 9. The distance du (f, g).

Again, the nontrivial part is the verification of the triangle inequality. To prove this, we note that for f , g, and h in C([a, b]) and x ∈ [a, b], |f (x) − g(x)| ≤ |f (x) − h(x)| + |h(x) − g(x)| ≤ du (f, h) + du (h, g), so that by taking maximum over x ∈ [a, b], we get du (f, g) ≤ du (f, h) + du (h, g). Our use of the subscript u will be justified later. Exercise 6.9. Let X be any set, and let B(X) denote the set of all bounded functions from X into R. Prove that d(f, g) = sup{|f (x) − g(x)| : x ∈ X} makes B(X) into a metric space. Note that when X = [a, b], C([a, b]) is a subset of B(X), and the restriction of d to C([a, b]) in this case is the metric du .

6.2. Neighborhoods and Some Classes of Points

245

If X = N, then B(X) is nothing but the set of all bounded sequences of real numbers, and for x = {xn } and y = {yn } in B(X) we will have d(x, y) = sup{|xn − yn | : n ∈ N}. Example 6.10. Let X be a nonempty set and define

1 x = y, ρ(x, y) = 0 x = y. Then ρ is a metric on X, known as the discrete metric. In the remainder of this book, we will use ρ exclusively for the discrete metric. To prove the triangle inequality for ρ, let x, y, and z be arbitrary elements of X, and consider the two cases as follows: • x = y. Then ρ(x, y) = 0, and the triangle inequality (6.5)

ρ(x, y) ≤ ρ(x, z) + ρ(z, y)

holds obviously. • x = y. In this case, z cannot be equal to both x and y, so the right-hand side of (6.5) is at least 1. Thus, (6.5) also holds in this case. Discrete metric space: The source of counterintuitive ideas. The above example manifests an important fact: Every nonempty set can be made into a metric space. Of course, the discrete metric is not a nice distance function. To understand why, note that the discrete metric defined on R says that the distance between 1 and 2, and that of 1 and 1000, are both equal to 1, which is intuitively absurd. Nevertheless, ρ satisfies the properties we considered to be basic for a metric, and we must consider it in our theory. We will see shortly that discrete metric spaces have some other counterintuitive properties. Exercise 6.11. For m, n ∈ N, define

  1 1 d(n, m) =  −  . n m

Show that d is a metric on N.

6.2. Neighborhoods and Some Classes of Points After having seen several examples of metric spaces, it is time to generalize those concepts and results that can be considered on R as a result of the existence of de to the context of metric spaces. Of course, this is our task in the remainder of the book. But, in the current section, we try to generalize the important concepts of neighborhood, limit, interior, boundary, and isolated point from R to the abstract setting of metric spaces. It may be helpful to glance at the discussion of these classes of points in Section 3.1. With this in mind, the first concept that should be generalized to the context of metric spaces is the notion of neighborhood. In view of our understanding of neighborhoods in R and R2 , the following is a natural generalization.

246

6. Basic Theory of Metric Spaces

Definition 6.12. Let (X, d) be a metric space. For ε > 0, the ε-neighborhood of x ∈ X is the set Nε (x) = {y ∈ X : d(y, x) < ε}. Example 6.13. In the Euclidean space R, the ε-neighborhood of each x ∈ R is the interval (x − ε, x + ε), as we knew. Example 6.14. In the Euclidean space (R2 , d2e ), the ε-neighborhood of each point (x, y) ∈ R2 is the open disk centered at (x, y) and with radius ε, as we mentioned before. Of course, in any metric space, a neighborhood of x contains x itself, and therefore it is nonempty. Example 6.15. Let (X, ρ) be a discrete metric space. Given x ∈ X, find Nε (x) for ε = 1 and ε = 2. Solution. Since the only possible values for ρ(y, x) are 0 and 1, we see that N1 (x) = {y ∈ X : ρ(y, x) < 1} = {x} and N2 (x) = {y ∈ X : ρ(y, x) < 2} = X. In general, Nr (x) = {x} if 0 < r ≤ 1, and Nr (x) = X when r > 1. What does Example 6.15 say? Example 6.15 says that in discrete metric spaces with at least two elements, every neighborhood is either a singleton or the whole space. Example 6.16. In metric spaces (R2 , d2s ) and (R2 , d2m ) find the neighborhood → − → − N1 ( 0 ), where 0 = (0, 0). → − Solution. In (R2 , d2s ), d2s ((x, y), 0 ) < 1 if and only if |x| + |y| < 1, − → → − so that N1 ( 0 ) in this space is the rhombus of Figure 10. In (R2 , d2m ), d2m ((x, y), 0 ) < 1 if and only if |x| < 1 and |y| < 1. → − This implies that N1 ( 0 ) in this space is the open rectangle depicted in Figure 11. Example 6.17. In the Euclidean space R3 , every neighborhood is an open ball, that is, the region that lies inside a sphere. Exercise 6.18. Equip N with the metric d(m, n) = |1/n − 1/m|. Find Nr (2) in this metric space for r = 1/3, 1. Next, we show that in any metric space, neighborhoods can be used to separate distinct points from each other. This is not a generalization of a similar result of the previous chapters, although it could be. Theorem 6.19. If (X, d) is a metric space and x, y are distinct elements of X, then there exists ε > 0 such that Nε (x) ∩ Nε (y) = ∅.

6.2. Neighborhoods and Some Classes of Points

247

→ − Figure 10. The neighborhood of 0 of radius 1 in (R2 , d2s ).

→ − Figure 11. The neighborhood of 0 of radius 1 in (R2 , d2m ).

Proof. Since x and y are distinct points, d(x, y) > 0. Let ε = 14 d(x, y). If a point z could be found in Nε (x) ∩ Nε (y), then by the triangle inequality we would find that 1 1 1 d(x, y) ≤ d(x, z) + d(z, y) < d(x, y) + d(x, y) = d(x, y), 4 4 2 a contradiction that proves the desired result.  A first instance of separation properties. The property we proved for metric spaces in Theorem 6.19 is an example of separation properties. This is because by the theorem, distinct points of a metric space can be separated by disjoint neighborhoods. Because of this theorem, we say that metric spaces are Hausdorff, or that they satisfy the Hausdorff separation property. A stronger separation property for metric spaces can be found in Exercise 24 at the end of this chapter.

248

6. Basic Theory of Metric Spaces

Figure 12. The neighborhoods Nε (x) and Nε (y) do not intersect each other.

The idea of the above proof is taken from the special case of plane with the Euclidean distance, as can be seen in Figure 12. Some Special Classes of Points in a Metric Space. As we emphasized in the introductory part of this chapter, one of the reasons we introduced the abstract notion of metric was to obtain what follows from the existence of distance in classical spaces R and R2 in arbitrary sets. As we saw in Chapter 3, the usual distance function on R enabled us to define various kinds of points for arbitrary subsets of R: Limit points, isolated points, boundary points, and interior points. We also discussed, in the first pages of this chapter, that using the Euclidean distance function of R2 , we may define the same notions in R2 . For this reason, the following definition is a natural generalization of the notions we defined in Chapter 3 and in the context of the metric space (R, de ). Definition 6.20. Suppose that (X, d) is a metric space, A ⊆ X, and x ∈ X. We say that x is • a limit point of A if every neighborhood of x contains some element of A other than x; • a boundary point of A if every neighborhood of x contains elements of both A and Ac ; • an interior point of A if there exists δ > 0 such that the δ-neighborhood of x lies entirely in A; • an isolated point of A if there exists δ > 0 such that Nδ (x) ∩ A = {x}. As in Chapter 3, we denote by A , ∂A, and A◦ the set of all limit, boundary, and interior points of A, respectively. We call these sets the derived set, the boundary, and the interior of A, respectively. It can easily be verified that in any metric space, the interior, boundary, and derived set of the empty set are empty. It is clear that ∂A = ∂Ac . Also, it follows from the definition that isolated points and interior points of a set are necessarily its elements. We saw in Chapter 3 that this is not the case for limit and boundary points. It is now easy to generalize some of the basic facts we obtained in Section 3.1 to the context of metric spaces. Theorem 6.21. Let E be a subset of some metric space X. (1) A limit point of E which belongs to E c is necessarily a boundary point of E.

6.2. Neighborhoods and Some Classes of Points

249

(2) A boundary point of E which belongs to E c is a limit point of E. (3) If x is an element of E, then x is either a limit point or an isolated point of E. (4) If x ∈ E, then x is either an interior point or a boundary point of E. (5) E ◦ ∩ ∂E = ∅. (6) If a is a limit point of E, then every neighborhood of a contains an infinite number of the elements of E. Proof. Items (1)–(5) are straightforward generalizations of items (2)–(6) of Theorem 3.3, respectively. Item (6) generalizes Proposition 3.7. The details are left as exercise.  As a rule in this part of the book, when a result is a straightforward generalization of a result from classical theory, its proof is left to the reader. This helps you to cooperate in the abstraction. Not everything is generalizable to the abstract setting. Comparing Theorems 3.3 and 6.21 we see that Theorem 6.21 does not contain a generalized version of Theorem 3.3(1). This is because item (1) of Theorem 3.3 is not actually generalizable to the metric space setting. In fact, in an arbitrary metric space, an interior point may not be a limit point (see Example 6.22 below). This shows us an important fact. When developing an abstract theory, all the results we have in the concrete special cases may not be generalizable to the abstract context. We will see more instances of this fact as we proceed in our theory of metric spaces. That an interior point is not necessarily a limit point can be seen in the following example. Example 6.22. Let (X, ρ) be a discrete metric space, and let A be a nonempty subset of X. Prove that A◦ = A and A = ∂A = ∅. Also, show that every point of A is an isolated point. Solution. For every x ∈ A, N 12 (x) = {x}. This proves that x ∈ A◦ , x ∈ A , x ∈ ∂A and that x is an isolated point of A. So, to complete the solution, we only need to show that when y is an element of Ac , y is neither a limit point nor a boundary point of A. But, this follows easily from the fact that N 12 (y) = {y}. So, in a discrete metric space, every point of a nonempty set is an interior point, while no point of the set can be its limit point. This shows that an interior point is not necessarily a limit point.

250

6. Basic Theory of Metric Spaces

Changing the distance function changes the interior. As we learned in Chapter 3, in the Euclidean space R, A◦ = (0, 1) for A = [0, 1]. If we change the metric on R and consider the discrete metric ρ, then the above example shows that A◦ = [0, 1]. This shows that by modifying the metric we consider on a set, the interior of sets may change. The following exercise shows that the same is also true for the boundary and the set of all limit points. Exercise 6.23. Find ∂A and A for the set A = [0, 1] ∪ {2, 3} in the Euclidean space R and in the discrete space (R, ρ). Example 6.24. Let A = {(x, y) ∈ R2 : x = y}. In the Euclidean space R2 find A◦ , ∂A, A , and the set of all isolated points of A. Solution. We claim that A◦ = ∅, ∂A = A = A, and that A has no isolated points. That A has no interior points is geometrically evident: if A had an interior point P = (x, x), then we could find ε > 0 such that Nε (P ) ⊆ A. But geometrically, Nε (P ) is an open disk and A is a straight line, thus this inclusion is contradictory. See Figure 13. To prove A◦ = ∅ analytically, consider arbitrary P = (x, x) ∈ A and ε > 0. Then, the point (x + ε/2, x) lies in Nε (P ) ∩ Ac . This shows that Nε (P ) ⊆ A, and hence that P is not an interior point of A. Thus, A◦ = ∅.

Figure 13. The neighborhood Nε (P ) cannot lie entirely in A.

This argument also shows that A ⊆ ∂A. To prove that ∂A = A, we show that if P0 = (x0 , y0 ) ∈ Ac , then P0 ∈ ∂A. To see this, let δ denote the distance between P0 and the line y = x in the plane. The neighborhood N δ (P0 ) then contains no 2 element of A, showing that P0 is not a boundary point of A. The idea of this proof is taken from Figure 14. Again, this is a geometric justification. To verify it analytically, note that |y0 − x0 | √ δ= 2

6.2. Neighborhoods and Some Classes of Points

251

Figure 14. The neighborhood N δ (P0 ) does not intersect A. 2

by what you learned in calculus. If we assume that (z, z) is an element of N δ (P0 ) 2 for some z ∈ R, then we would have (z − x0 )2 + (z − y0 )2
0, (y + ε/2, y + ε/2) is an element of A, different from Q, which lies in Nε (Q). Finally, that A has no isolated points follows from the equality A = A. Exercise 6.25. In the Euclidean space R2 , find the sets of all interior, boundary, limit, and isolated points of the set H = {(x, y) ∈ R2 : x2 − y 2 = 1}. As a result of what we learned in Chapter 3, we know that every element of an open interval I ⊆ R is an interior point. Clearly, this implies that every element of a neighborhood in the Euclidean space R is an interior point. The following proposition shows that a similar assertion is true in arbitrary metric spaces. Proposition 6.26. In any metric space (X, d), every point of the sets Nε (x) and {y ∈ X : d(y, x) > ε} is an interior point. Here x ∈ X and ε > 0 are arbitrary.

252

6. Basic Theory of Metric Spaces

Proof. Consider y ∈ Nε (x). Then r := ε − d(x, y) is a positive number. We claim that Nr (y) ⊆ Nε (x). In fact, if z ∈ Nr (y), then d(z, x)

≤ d(z, y) + d(y, x) < r + d(y, x) = ε.

The second part of the proof is left to the reader (see Exercise 2 at the end of this chapter).  Example 6.27. Let P = (x, y) ∈ R2 , and let ε > 0 be arbitrary. In the Euclidean space R2 find ∂Nε (P ). Solution. We claim that (6.6)

∂Nε (P ) = {S ∈ R2 : d2e (S, P ) = ε},

which is the circle with center P and radius ε. To see this, let r > 0 be given, and let S ∈ R2 be such that d2e (S, P ) = ε. We show that Nr (S) contains points from both Nε (P ) and its complement. For the proof of Nr (S) ∩ Nε (P ) = ∅, it is enough to find a point Q between S and P which lies in the intersection. Thus, we should find 0 < λ < 1 such that (6.7)

Q = λS + (1 − λ)P

satisfies d2e (Q, S) = (1 − λ)ε < r and d2e (Q, P ) = λε < ε. Therefore, it is sufficient to choose λ such that  r (6.8) max 0, 1 − < λ < 1. ε Then the point Q defined by (6.7) lies in Nr (S) ∩ Nε (P ). It is instructive to draw a figure illustrating P , Q, and S. To show that Nr (S)∩(Nε (P ))c = ∅, we find a point R such that S is the middle point of the line segment RQ. A simple calculation shows that R = (2 − λ)S + (λ − 1)P, where λ is as in (6.8). Now, it is easy to see that d2e (R, S) = (1 − λ)ε < r and d2e (R, P ) = (2 − λ)ε > ε, so that R ∈ Nr (S) ∩ (Nε (P ))c . Again, draw a figure that illustrates the points R, S, and Q. As a result of the above discussion, we see that the set on the right-hand side of (6.6) is contained in ∂Nε (P ). The truth of (6.6) now follows from Proposition 6.26 and Theorem 6.21.

6.2. Neighborhoods and Some Classes of Points

253

A note on the boundary of neighborhoods. The above example leads us to guess that in any metric space (X, d), the boundary of a neighborhood Nε (x) is the set {y ∈ X : d(y, x) = ε}. But, Example 6.22 shows that this is not true. Indeed, if X is a discrete metric space, then the boundary of each neighborhood, like that of any other set, is empty.

Relation to Set-Theoretic Operations. Since metric spaces are sets, it is natural to seek for the connections the set-theoretic operations and relations may have to metric space notions like interior, boundary, etc. For instance, if A ⊆ B, is it true that A ⊆ B  ? If A and B are sets in a metric space, what is the relation between (A ∪ B)◦ and A◦ ∪ B ◦ . Are these sets necessarily equivalent? We addressed questions of this kind in Exercises 8 and 9 at the end of Chapter 3. Below, we present results in this connection. The proofs which are omitted are left to the reader as exercise. Proposition 6.28. Let A and B be subsets of a metric space such that A ⊆ B. Then A ⊆ B  and A◦ ⊆ B ◦ . It is not true, in general, that ∂A ⊆ ∂B. Still, it is not true that an isolated point of A is an isolated point of B. For instance, in the Euclidean space R let A = (1/3, 1/2) ∪ {1/5} and B = [0, 1]. Then A is a subset of B and 1/5 is an isolated point of A, while B has no isolated points. Theorem 6.29. If A and B are subsets of a metric space, then (1) (A ∩ B)◦ = A◦ ∩ B ◦ ; (2) A◦ ∪ B ◦ ⊆ (A ∪ B)◦ , and the inclusion may be strict; (3) (A ∩ B) ⊆ A ∩ B  , and the inclusion may be strict; (4) (A ∪ B) = A ∪ B  . In general, neither of the equalities (∂A ∪ ∂B) = ∂(A ∪ B) and (∂A ∩ ∂B) = ∂(A ∩ B) may be true. To see this, consider A = (1/3, 1/2) and B = [0, 1] in the Euclidean space R. Then ∂A = {1/3, 1/2} and ∂B = {0, 1}, while ∂(A∩B) = ∂A = {1/3, 1/2} and ∂(A ∪ B) = ∂B = {0, 1}. It is left to the reader to find any inclusions that may hold in this context. Equivalent Metrics. Up to now everything we defined was a mere generalization of a corresponding concept encountered previously in the Euclidean space R to the context of metric spaces. In this subsection we define a notion of equivalence for metrics defined on the same underlying set X. This is by no means a generalization of a corresponding concept encountered previously in the special case X = R. To motivate our definition of this equivalence relation, let us begin with a discussion of interior points in metric spaces.

254

6. Basic Theory of Metric Spaces

As we observed as a result of Example 6.22, the interior of a set A ⊆ X may change if we modify the metric considered on X. However, there may exist metrics on a set X with respect to which the interiors of a subset A are the same. The following example illustrates such a situation. Example 6.30. Let A be a subset of R2 . Show that A◦ is the same in the spaces (R2 , d2e ) and (R2 , d2m ). Solution. Let (x, y) be an interior point of A in (R2 , d2e ). Then, there exists r > 0 such that d2e ((z, w), (x, y)) < r implies (z, w) ∈ A. But,  d2e ((z, w), (x, y)) = (x − z)2 + (y − w)2  2 ≤ 2 (max{|x − z|, |y − w|}) √ 2 = 2 dm ((z, w), (x, y)). √ 2 2 So if dm ((z, w), (x, y)) < r/ 2, then de ((z, w), (x, y)) < r and hence (z, w) ∈ A. This shows that (x, y) is an interior point of A in (R2 , d2m ). Now assume that (x, y) is an interior point of A in (R2 , d2m ). Choose s > 0 such that d2m ((z, w), (x, y)) < s implies (z, w) ∈ A. Since ≤

max{|x − z|, |y − w|}  (x − z)2 + (y − w)2

=

d2e ((z, w), (x, y)),

d2m ((z, w), (x, y)) =

if (z, w) is such that d2e ((z, w), (x, y)) < s, then d2m ((z, w), (x, y)) < s, and we must have (z, w) ∈ A. This implies that (x, y) is interior point of A in (R2 , d2e ). A careful examination of the above proof shows us the reason the interior of A is the same with respect to d2e and d2m . This is indeed due to the inequalities √ (6.9) d2m ((z, w), (x, y)) ≤ d2e ((z, w), (x, y)) ≤ 2 d2m ((z, w), (x, y)). We used the second inequality in the first part and the first inequality in the second part of the solution. The inequalities in (6.9) enable us to pass from a neighborhood of (x, y) in (R2 , d2e ) to one in (R2 , d2m ), and vice versa. More precisely, every neighborhood in (R2 , d2e ) contains a neighborhood in (R2 , d2m ), and conversely. This is depicted in Figure 15. We now generalize the relation of d2e and d2m to the context of metric spaces in hopes of using it later. Definition 6.31. Let X be a nonempty set, and let d and d1 be metrics on X. We say that d and d1 are equivalent if there exist positive real numbers α and β such that for every x, y ∈ X, α d(x, y) ≤ d1 (x, y) ≤ β d(x, y). With this definition, we have already verified that d2e and d2m are equivalent metrics on R2 . Exercise 6.32. Prove that on Rn , n > 1, the metrics dnm and dns are equivalent to the Euclidean metric dne .

6.3. Open and Closed Sets

255

Figure 15. Every open disk contains an open rectangle and vice versa.

The proof of the following proposition is similar to the solution of Example 6.30. We therefore leave it to the reader. Proposition 6.33. Let X be a nonempty set, and let d and d1 be equivalent metrics on X. Then for every A ⊆ X, A◦ is the same in the metric spaces (X, d) and (X, d1 ). Example 6.34. Let (X, d) be a metric space and define d (x, y) = min{1, d(x, y)}. Prove that d is a metric on X. Is this equivalent to d? Solution. The verification of the fact that d is a metric on X is left to the reader. See Exercise 10 at the end of this chapter. Although for every x, y ∈ X, d (x, y) ≤ d(x, y), the metric d is not equivalent to d in general. For example, if we let X = R and d = de , then for no positive real number β can the inequality (6.10)

d(x, y) ≤ β d (x, y)

be true for all x, y ∈ X. To see this, just consider x = β and y = 3β. Nevertheless, if the range of d is a bounded subset of R, then d will be equivalent to d. To see this, suppose that M > 1 is such that for every x, y ∈ X, d(x, y) < M . Then for all such x and y, (6.10) holds with β = M .

6.3. Open and Closed Sets Based on the various kinds of points we considered so far in metric spaces, we can classify the subsets of such spaces. In this connection, a central role is played by boundary points. To understand why, think of the meaning of boundary and the role it plays in determining the realm of countries. From a geographical point of view, the boundary (or border) of a country separates it from other countries and prevents people from going easily outside, or coming easily inside, the country. So, when we consider a country together with its boundary, we are dealing with a closed

256

6. Basic Theory of Metric Spaces

Figure 16. An open region (or set of points) in the plane.

Figure 17. A closed region (or set of points) in the plane.

region. If we think of a country which has no boundary, we can say that this is an open country, where entrance and abundance may take place freely. Open and closed regions can be also illustrated in the plane. See Figures 16 and 17 where instances of open and closed regions (or sets of points) in the plane are drawn. In Figure 17, the boundary of the set is drawn with a solid line to emphasize that the set includes its boundary points. In Figure 16 the boundary is depicted by a dashed line to indicate the fact that it is not actually part of the region. It is now easy to generalize our understanding of closed regions or sets from the plane to the metric space setting. Definition 6.35. Let A be a subset of a metric space X. We say that • A is closed if it contains all its boundary points, • A is open if it contains none of its boundary points. Open and closed sets were not defined in Chapter 3 and in the context of the Euclidean space R, although they could be defined there. This is because they were unnecessary for our destinations in that chapter. Nevertheless, we touched them superficially in some examples and exercises. If you look at the exercises of Chapter 3 after finishing the current section, you will be able to find those exercises which are related to open and closed sets.

6.3. Open and Closed Sets

257

Example 6.36. Let (X, ρ) be a discrete metric space. By Example 6.22, every subset of X is both open and closed. Example 6.37. We saw in Example 6.24 that for the set A = {(x, y) ∈ R2 : x = y} in the Euclidean space R2 , ∂A = A. This shows that A is closed, and that it is not open. Of course, unlike doors which are either open or closed, a set can be neither open nor closed. This is the content of the following example. Example 6.38. In the Euclidean space R, each interval of the form [a, b) is neither open nor closed. This is because a is a boundary point of [a, b), which is also an element of this set, while b is a boundary point of [a, b) which lies outside of the set. However, the conditions of being open and closed are the dual of each other in some sense. Theorem 6.39. In an arbitrary metric space X, a set A ⊆ X is open if and only if Ac is closed. Proof. Let A be open. If x is a boundary point of Ac , then x is also a boundary point of A. Since A is open, x is not an element of A or, equivalently, x ∈ Ac . This shows that Ac is closed. Conversely, suppose that Ac is closed. If x is a boundary point of A, then x is a boundary point of Ac . Since Ac is closed, x ∈ Ac . Hence, x cannot be an element of A. This proves that A is open.  Example 6.40. In the Euclidean space R, the interval [0, 1] is closed. The set (−∞, 0) ∪ (1, +∞) is therefore open as the complement of [0, 1]. Example 6.41. Since {(x, y) ∈ R2 : x = y} is closed in the Euclidean space R2 , in the same space {(x, y) ∈ R2 : x = y} is an open set. The following characterization of open and closed sets can be helpful in many situations. Theorem 6.42. A subset A of a metric space X is (1) open if and only if all its elements are interior points, (2) closed if and only if it contains all its limit points. Proof. By Theorem 6.21(4), the elements of a set are either interior points or boundary points. This proves (1). To prove (2), first assume that A contains all its limit points. If x is a boundary point of A such that x ∈ A, then x is a limit point of A by Theorem 6.21(2), and we should have x ∈ A by our assumption. This contradiction shows that A contains all its boundary points, and hence that A is a closed set. Now suppose that A is closed and consider some x ∈ A . If x ∈ A, then there is nothing to prove. Otherwise, x is a boundary point of A by Theorem 6.21(1), and it must be an element of A by the fact that A is a closed set. Thus A contains all its limit points, as desired. 

258

6. Basic Theory of Metric Spaces

Exercise 6.43. Show that in any metric space X, the sets ∅ and X are both closed and open. In other words, A is open if and only if A = A◦ , and A is closed if and only if A ⊆ A. As a result of the above theorem and Theorem 6.21(6), we find that finite sets are closed in any metric space. 

Example 6.44. Let f and g be real-valued functions which are continuous on [a, b]. If A = {x ∈ [a, b] : f (x) = g(x)}, then we saw in Example 3.72 that A ⊆ A. Thus, Theorem 6.42(2) tells us that A is actually a closed set. What do Examples 3.72 and 6.44 say? Examples 3.72 and 6.44 say that the subset of [a, b] on which two continuous real-valued functions agree is closed (as a subset of the Euclidean space R). Since by Proposition 6.33 the interior of sets remains unchanged if we replace a metric by some equivalent one, Theorem 6.42 gives us the following important result. Corollary 6.45. If d and d1 are equivalent metrics on a set X, then the metric spaces (X, d) and (X, d1 ) have the same collection of open sets. It follows from Proposition 6.26 and Theorem 6.42(1) that every neighborhood in any metric space is an open set. Also, it follows from Proposition 6.26, Theorem 6.42(1), and Theorem 6.39 that in any metric space each set of the form {y ∈ X : d(y, x) ≤ ε}, with x ∈ X and ε > 0 arbitrary, is closed. For this reason, we say that this is the closed neighborhood of x with radius ε. This set is denoted by Nε [x]. Some Set-Theoretic Observations. Next, it is convenient to study the way the set-theoretic operations of union and intersection behave open and closed sets. Theorem 6.46. In any metric space, (1) an arbitrary union of open sets is open, (2) a finite intersection of open sets is open, (3) an arbitrary intersection of closed sets is closed, and (4) a finite union of closed sets is closed. & Proof. (1) Let {Aα }α∈I be a family of open sets. If x ∈ A := α∈I Aα is arbitrary, then there exists α0 ∈ I such that x ∈ Aα0 . Since Aα0 is open, we find some r > 0 such that Nr (x) ⊆ Aα0 . But Aα0 is a subset of A, so the last inclusion shows that x is an interior point of A. This proves that A is open. (2) Suppose that {Bi }ni=1 is a finite family of open sets and x ∈ B := ni=1 Bi . Then x ∈ Bi for every i ∈ {1, . . . , n} and since Bi is open, we find ri > 0 such that Nri (x) ⊆ Bi . Now, letting r = min{r1 , . . . , rn }, we see that Nr (x) ⊆ B, showing that x is an interior point of B. This tells us that B is open.

6.3. Open and Closed Sets

259

The truth of (3) and (4) follows from those of (1) and (2) and De Morgan’s laws of set theory. We leave the details to the reader.  In items (2) and (4) of the above theorem, finite families may not be replaced by infinite ones. Below we give an example that manifests this fact for (2), and we leave it to the reader to find relevant examples for (4). Example 6.47. In the Euclidean space R, consider the sets An = (−1/n, 1/n) for ∞ every n ∈ N. Then, each An is open, while n=1 An = {0}, which is not open. The following result follows from (1) of Theorem 6.42, (1) of the above theorem, and the fact that in any metric space neighborhoods are open. Corollary 6.48. A subset of a metric space is open if and only if it is a union of neighborhoods. What does Corollary 6.48 say? Corollary 6.48 says that we know the open subsets of a metric space whenever we know what the open neighborhoods are. For this reason, the corollary is usually described by saying that in any metric space, the collection of all neighborhoods forms a basis for the space. See Exercise 43 at the end of this chapter. Closures and Cluster Points. As defined in the beginning of this section, a subset A of a metric space is closed if it contains all its boundary points. This gives us an obvious way for obtaining closed sets from arbitrary subsets of a metric space: To obtain a closed set that contains A, we just need to adjoin the boundary points of A to this set, and form the set A ∪ ∂A. We will denote this set by A and call it the closure of A. It will be instructive to look at Figures 16 and 17 once more. The set of points drawn in Figure 17 is the closure of that in Figure 16. This is because the former set is obtained by closing the boundary of the latter. Finding the closure of sets is actually an art! Finding the closure of sets is a nice instance of an important ability of mathematicians, the ability of creating desirable things from given objects: By finding the closure of a set we make a closed set from it. It is clear that A equals its closure, A = A, if and only if A is closed. Example 6.49. In the Euclidean space R, [a, b] is the closure of the intervals (a, b), (a, b], [a, b), and [a, b]. Example 6.50. In the Euclidean space R2 , the closure of a neighborhood Nε (P ), for arbitrary P ∈ R2 and ε > 0, is the closed neighborhood Nε [P ]. This follows from Example 6.27. It is not true, however, that in any metric space the closure of a neighborhood is the closed neighborhood with the same center and radius. This is the content of the following example.

260

6. Basic Theory of Metric Spaces

Example 6.51. In a discrete metric space, the closure of a neighborhood Nε (x) is the neighborhood itself, which may be a proper subset of the closed neighborhood Nε [x]. As in the case of closedness, it is useful to find a characterization of closures in terms of limit points. Theorem 6.52. Let A be an arbitrary subset of a metric space. Then A = A ∪ A . Proof. We should show that A ∪ ∂A = A ∪ A . This can be easily seen by items (1) and (2) of Theorem 6.21 and our definition of the closure.  As a result of the above theorem, Example 3.9 and Example 3.4(1), we can compute the closure of some important subsets of the Euclidean space R. Example 6.53. In the Euclidean space R, • the closure of E = {1/n : n ∈ N} is E ∪ {0}, and • Q = R. If you re-examine Example 3.4(1), you will see that the equality Q = R, which implies that Q = R, is a consequence of the density of the rational numbers in R: every neighborhood of every x ∈ R contains some element of Q. If E is as in the above example, then every neighborhood of every element of E ∪{0} also contains some element of E. These observations lead us to the following theorem. Theorem 6.54. Let A be a subset of some metric space X, and let x be an element of X. Then, the following are equivalent. (1) The point x lies in the closure of A. (2) Every neighborhood of x contains some element of A. Proof. (1) ⇒ (2). If x ∈ A, then at least one of the following cases holds for x. • x ∈ A. Then every neighborhood of x contains at least one element of A, namely, x itself. • x ∈ A . Then every neighborhood of x contains at least one element of A other than x. Thus, (2) follows in each case. (2) ⇒ (1). If x is an element of A, then it is clear that x ∈ A. If (2) is true and x ∈ A, then every neighborhood of x contains an element of A other than x. This  shows that x ∈ A , and hence that x ∈ A. When x is related to A as in Theorem 6.54(2), we say that x is a cluster point of A. More precisely, x is a cluster point of A if for every ε > 0, Nε (x) ∩ A = ∅. With this terminology, the above theorem says that A is the set of all cluster points of A. When the closure of A is the whole space X, that is, A = X, Theorem 6.54 says that every neighborhood of every point of X contains some point of A. This shows

6.3. Open and Closed Sets

261

that the elements of A can be seen everywhere in X, as was the case for A = Q and X = R, and suggests that we may call A a dense subset of X. Definition 6.55. Let X be a metric space, and let A ⊆ X. We say that A is dense in X if the closure of A is all of X, that is, A = X. Clearly, a proper dense subset of a metric space is not closed as it cannot be equal to its closure. Example 6.56. Let X be a metric space. Prove that a set A ⊂ X is dense in X if and only if A ∩ G = ∅ for every nonempty open set G ⊂ X. Solution. First assume that A ∩ G = ∅ for every nonempty open set G ⊂ X. Since neighborhoods are open sets, the assumption yields that every x ∈ X is a cluster point of A. Thus A is dense in X. If A is dense in X and G is a nonempty open subset of X, choose some x from G. Then x is an element of X and by the density assumption, x is a cluster point of A. The assumption that G is open gives us some ε > 0 such that Nε (x) ⊆ G. Since Nε (x) must meet A in some point, the same will be true for G. What does Example 6.56 say? Example 6.56 says that a subset A of a metric space X is dense in X if and only if A intersects every open set G ⊂ X. Finally, we present a theorem that demonstrates the way the set-theoretic operations and relations behave in closures. Theorem 6.57. Let A and B be subsets of a metric space. (1) If A ⊆ B, then A ⊆ B. (2) (A ∩ B) ⊆ A ∩ B, and the inclusion may be strict. (3) (A ∪ B) = A ∪ B. The proof is left as exercise. Open Subsets of the Euclidean Space R. Among the various metric spaces one considers, the Euclidean space R occupies an important place. This is, first of all, due to the fact that the space is the prototype of the notion of metric space. On the other hand, the space has some features which are not shared by generic metric spaces, and these allow one to prove results in the Euclidean space which are not generalizable to the abstract context. In this subsection, we describe these results which follow from the existence of Q, as a countable dense subset of the Euclidean space. Theorem 6.58. Every open subset of the Euclidean space R is the union of a countable family of open intervals. Proof. Let G be an open subset of the Euclidean space R. Then, G is the union of a family of open intervals (Corollary 6.48), say {Iα }α∈A . For every x in some Iα , choose an open interval Jx,α with rational endpoints such that x ∈ Jx,α ⊂ Iα .

262

6. Basic Theory of Metric Spaces

Since Q is countable, a countable number of intervals Jx,α may exist. Denote these intervals by Ji , where i may take values in a countable set B. Then ' ' G= Iα = Ji , α∈A

proving the desired result.

i∈B



Of course, more can be proved in the context of the above theorem: Every open subset of the Euclidean space R is the union of a countable family of pairwise disjoint open intervals. This is Exercise 39 at the end of this chapter. Example 6.59. If F is a family of pairwise disjoint open sets in the Euclidean space R, prove that F is countable. Solution. Suppose that {xn : n ∈ N} is an enumeration of Q. Since Q = R, we find that for every V ∈ F, V ∩ Q = ∅. If V ∈ F is arbitrary, let nV be the smallest element of N such that xnV ∈ V . Now, the disjointness assumption shows that V → nV is an injective mapping from F into N. This shows that F is countable. The key property of the Euclidean space R which helped us in the solution of the above example is that it has a countable dense subset, namely Q. The property of having a countable dense subset is worth considering in the metric space setting, as we will see in the exercises at the end of this chapter. We therefore give a specific name to spaces which satisfy this property. Definition 6.60. A metric space (X, d) is said to be separable if it has a countable dense subset. With this definition we can generalize the result of the above example to separable metric spaces (see Exercise 41 at the end of this chapter). The following example shows that the conclusion of the above example is not true for spaces that are not separable. Example 6.61. If X is an uncountable set equipped with the discrete metric ρ, F = {{x} : x ∈ X} is an uncountable family of pairwise disjoint open sets. Note that the only dense subset of X is X itself, which is uncountable by our assumption. To learn more about separable spaces, see Exercises 40–45 at the end of this chapter.

6.4. Metric Subspaces When (X, d) is a metric space and Y ⊂ X, the restriction of d to Y × Y defines a metric on Y , which we denote by d|Y . The metric space (Y, d|Y ) is said to be a metric subspace of (X, d). Example 6.62. Let A = (0, 1). The space (A, de |A ) is a subspace of (R, de ), the Euclidean space R. In this subspace, the distance of any pair x, y ∈ A is the same as the distance they have as elements of R. For instance, de |A ( 13 , 12 ) = de ( 31 , 12 ) = 16 .

6.4. Metric Subspaces

263

How can we make a metric subspace? Making a metric subspace of a metric space (X, d) means to replace X by a smaller set Y , in the sense of inclusion, and to keep the metric d unchanged. Example 6.63. Every subspace of a discrete metric space is a discrete space. If X and Y are as above, then it is quite natural to consider sets Z satisfying Z ⊂ Y ⊂ X. This raises some natural questions: If Z is open (resp., closed) as a set in the space (X, d), is it also open (resp., closed) in the space (Y, d|Y )? What about the converse? Recall that when Z is an open set in (X, d), for every z ∈ Z one finds ε > 0 such that x ∈ X and d(x, z) < ε imply x ∈ Z, while when Z is open in (Y, d|Y ), for every z ∈ Z, ε > 0 can be found such that x ∈ Y and d(x, z) < ε imply x ∈ Z. The conditions x ∈ X in the former and x ∈ Y in the latter case make a real distinction between open sets in (X, d) and those in (Y, d|Y ), as the following example shows. Example 6.64. The set [0, 1) is not open in the Euclidean space R. Is it open as a set in the subspace [0, 2]? Solution. That Z := [0, 1) is not open in (R, de ) is due to the fact that Z contains one of its boundary points, namely 0. Nevertheless, we claim that Z is open in the subspace Y := [0, 2]. To see this, consider two cases for z ∈ Z. • 0 < z < 1. In this case define ε = • z = 0. Let ε =

1 2

min{z, 1 − z}.

1 2.

Then observe that in either case, x ∈ Y and de (x, z) < ε imply x ∈ Z. This shows that Z is open as a set in the subspace Y . Note that in the latter case, x ∈ Y excludes negative values for x, so that every x ∈ Y satisfying de (x, z) < ε is an element of [0, 12 ) ⊂ Z. But, when we consider some x ∈ R satisfying de (x, z) < ε (ε > 0 arbitrary), then x may be a negative number which lies outside of Z. Thus, we observed that open subsets of a metric subspace (Y, d|Y ) may not be open in the larger space (X, d). Nevertheless, we may determine the open subsets of (Y, d|Y ) in terms of those of (X, d). To understand how, consider the previous example again. The set [0, 1), which we proved is open as a subset of [0, 2], is the intersection of an open subset of R, (−1, 1) for example, with [0, 2]: [0, 1) = (−1, 1) ∩ [0, 2]. The following theorem shows that this relation holds in general. Theorem 6.65. Let (X, d) be a metric space, and let Z ⊂ Y ⊂ X. Then, Z is open in the subspace (Y, d|Y ) if and only if there exists an open set G ⊆ X such that Z = G ∩ Y . Proof. Suppose that Z is open in Y . Then for every z ∈ Z, there exists εz > 0 such that y ∈ Y and d(y, z) < εz imply y ∈ Z. Define ' (6.11) G := {x ∈ X : d(x, z) < εz }. z∈Z

264

6. Basic Theory of Metric Spaces

Then G is an open subset of X (because the set on the right-hand side of (6.11) is indeed the εz -neighborhood of z in X) and Z = G ∩ Y . Now assume that (6.12)

Z =G∩Y

for some open subset G of X. Then, every z ∈ Z is an element of G, and therefore there exists ε > 0 such that x ∈ X and d(x, z) < ε imply x ∈ G. Now, in view of (6.12), x ∈ Y and d(x, z) < ε imply x ∈ Z. This proves that Z is open in Y . 

Figure 18. The gray region denotes C.

Example 6.66. Let C = {(x, y) ∈ R2 : x2 + y 2 < 1, 0 ≤ y ≤ 1/2}. Then C is not an open subset of the Euclidean space R2 . This is because, every point on the √ √ 3 1 3 1 line segment that joins (− 2 , 2 ) to ( 2 , 2 ) (and every point on the line segment that joins (−1, 0) to (1, 0)) is a boundary point of C which lies in the set. (Can you verify this analytically? See Figure 18.) However, C is open as a subset of the subspace Y = [−2, 2] × [0, 1/2] because it is the intersection of the open disk → − N1 ( 0 ) = {(x, y) ∈ R2 : x2 + y 2 < 1}, which is open in the Euclidean space R2 , with Y . A result, similar to Theorem 6.65, can be established for closed sets. Theorem 6.67. Let (X, d) be a metric space, and let Z ⊂ Y ⊂ X. Then Z is closed in the subspace (Y, d|Y ) if and only if there exists a closed set F ⊆ X such that Z = F ∩ Y . The proof of the above theorem is left as an exercise. Example 6.68. In the subspace Y = (1, 3] of the Euclidean space R, A = (1, 2] is closed, because A = [−2, 2] ∩ Y and [−2, 2] is a closed subset of the Euclidean space R. Note that A is not closed as a set in the Euclidean space R.

6.5. Boundedness and Total Boundedness

265

6.5. Boundedness and Total Boundedness One of the useful properties of the subsets of R, which was defined with the aid of the order relation 0 such that |x| < M for every x ∈ A. Although the last inequality still needs order, we may interpret this version of boundedness in terms of neighborhoods: a set A ⊂ R is bounded if and only if it can be entirely contained in a neighborhood NM (0) for a sufficiently large M . But, this interpretation of boundedness is not also extendable to the metric space context, because it contains 0, whose existence depends on to the algebraic operation of addition. To avoid the use of 0 in the definition of boundedness, we only need to consider the following characterization of bounded subsets of R, which we leave as an exercise. Exercise 6.69. Prove that a set A ⊂ R is bounded if and only if there exist M > 0 and x ∈ R such that A ⊆ NM (x). With this observation, it is now easy to generalize boundedness from R to the general metric space setting. Definition 6.70. Let A be a subset of a metric space (X, d). We say that A is bounded if there exist M > 0 and x ∈ X such that A ⊆ NM (x). Example 6.71. In a discrete metric space X, every set is bounded. This is because N2 (x) = X for arbitrary x ∈ X, so that every set A ⊆ X satisfies A ⊆ N2 (x). Example 6.72. Prove that in the Euclidean space R2 , the set

  1 A= n, :n∈N n is not bounded. Solution. We should show that for arbitrary P = (x1 , x2 ) ∈ R2 and given ε > 0, A is not contained in Nε (P ). To see this, choose n0 ∈ N such that n0 > x1 + ε. Then (n0 − x1 )2 > ε2 and so      2 √ 1 1 2 2 − x2 ≥ ε2 = ε, de P, n0 , > ε + n0 n0 which proves what we desired. See Figure 19.

266

6. Basic Theory of Metric Spaces

Figure 19. The point Q = (n0 ,

1 ) n0

∈ A lies outside of Nε (P ).

In the above example the unboundedness of A is due to the first component of its elements, namely, n. This was the reason we did not use the second component in our solution. Exercise 6.73. Prove that in any metric space, finite sets are bounded. Exercise 6.74. Let {xn } and {yn } be sequences of real numbers. Prove that B = {(xn , yn ) : n ∈ N} is a bounded subset of the Euclidean space R2 if and only if {xn } and {yn } are bounded sequences of real numbers. Bounded subsets of metric spaces can also be characterized using the useful notion of diameter, defined below. Definition 6.75. Let A be a nonempty subset of a metric space (X, d). The diameter of A is the extended real number diam(A) = sup{d(x, y) : x, y ∈ A}. Example 6.76. In the Euclidean space R, let A = {1/n : n ∈ N}, B = (0, 1), C = [0, 1), D = [0, 1], E = [0, +∞), and F = (−∞, 0). Then diam(A) = diam(B) = diam(C) = diam(D) = 1, while diam(E) = diam(F ) = +∞. Example 6.77. In any metric space, the diameter of a set is zero if and only if the set is singleton. Example 6.78. In a discrete metric space, the diameter of every set with more than one element is 1. In Example 6.76, the diameter of bounded subsets of R was a real number, while that of unbounded sets was +∞. The following theorem shows that this is true in general.

6.5. Boundedness and Total Boundedness

267

Theorem 6.79. Let A be a subset of a metric space (X, d). Then A is bounded if and only if diam(A) < +∞. Proof. If A is bounded, then there exist M > 0 and x ∈ X such that A ⊆ NM (x). So, if y, z ∈ A, then d(y, z) ≤ d(y, x) + d(x, z) < 2M. This proves that diam(A) ≤ 2M < +∞. Conversely, suppose that diam(A) = α < +∞. If α = 0, then A is a singleton, which is bounded clearly. Otherwise, choose some x from A and note that for every z ∈ A, d(z, x) ≤ diam(A) = α. Hence A is a subset of N2α (x), meaning that A is bounded.



On the boundedness of a set and that of its closure. The way we get from a set A to its closure suggests that A and A should have the same boundedness situation. To see this, look at Figures 16 and 17 again. As we mentioned before, the set depicted in the latter figure is the closure of that of the former. Here, the only things the closure has, in addition to the points of the original set, are a number of boundary points. But the boundary points are adhered to the set so that adjoining them seems not to change the boundedness situation. Lemma 6.80. Let A be a subset of a metric space. Then diam(A) = diam(A). Proof. Since A ⊆ A, it follows from the properties of the supremum that diam(A) ≤ diam(A). To obtain the desired result, we show that for any given ε > 0, (6.14)

diam(A) ≤ diam(A) + ε.

Consider arbitrary elements x and y of A. By Theorem 6.54 there exist x0 , y0 ∈ A such that d(x0 , x) and d(y0 , y) are less than ε/2. Using the triangle inequality two times, we obtain d(x, y) ≤ d(x, x0 ) + d(x0 , y) ≤ d(x, x0 ) + d(x0 , y0 ) + d(y0 , y) < diam(A) + ε. Inequality (6.14) now follows from this.



Corollary 6.81. Let A be a subset of a metric space. Then A is bounded if and only if A is bounded. Proof. This follows from the above lemma and Theorem 6.79. Exercise 6.82. Is it true that “A is bounded if and only if A◦ is bounded”?



268

6. Basic Theory of Metric Spaces

Total Boundedness. Let a and b be real numbers satisfying a < b, and let ε > 0 be given. Then, we can cover the interval (a, b) with a finite number of neighborhoods, each of radius ε. To see this, just choose n ∈ N so large that nε > b − a, the length of the interval (a, b), and consider the ε-neighborhoods of a + k 2ε for k = 1, . . . , 2n. Then, it is obvious that (a, b) is a subset of the union of these neighborhoods:      2n  ' k k − 1 ε, a + +1 ε . (a, b) ⊂ a+ 2 2 k=1

It will be instructive to prove this inclusion set-theoretically, and to depict (a, b) together with the neighborhoods to find a geometric insight. Since every bounded subset of the Euclidean space R can be contained in some interval (a, b), the same conclusion is true for bounded subsets of this space. Given any ε > 0, every bounded subset of the Euclidean space R can be covered by a finite number of neighborhoods, each of radius ε. Now, it is appropriate to see if this last statement is generalizable to the metric space setting. More precisely, we are interested in the following question. Is it true that every bounded subset of a metric space can be covered by a finite number of neighborhoods of radius ε, for every ε > 0? The following example shows that the answer is negative. Example 6.83. Let X be an infinite set equipped with the discrete metric ρ. If A ⊆ X is infinite, then A is bounded (see Example 6.71 or Example 6.78), but it cannot be covered by a finite number of neighborhoods, each of radius ε = 1/2. Hence, the property we considered above is not equivalent to boundedness. We therefore need to give it a new name. Definition 6.84. Let A be a subset of a metric space X. We say that A is totally bounded if for every ε > 0, A can be covered by a finite number of neighborhoods, each of radius ε. More precisely, A is totally bounded if for every ε > 0, there exist x1 , . . . , xn ∈ X such that A ⊆ Nε (x1 ) ∪ · · · ∪ Nε (xn ). With this definition, our above argument shows that bounded subsets of the Euclidean space R are totally bounded. Nevertheless, Example 6.83 shows that bounded subsets of metric spaces may not be totally bounded in general. Of course, total boundedness is stronger than boundedness, as the following theorem shows. Theorem 6.85. In any metric space, totally bounded sets are bounded. Proof. Let A be a totally bounded subset of a metric space (X, d). Find x1 , . . . , xn ∈ X such that A ⊆ N1 (x1 ) ∪ · · · ∪ N1 (xn ).

Notes on Essence and Generalizability

269

Let r = max{d(xj , x1 ) : j = 2, . . . , n}. If y ∈ A, find i ∈ {1, . . . , n} such that y ∈ N1 (xi ). Then, d(y, x1 ) ≤
0, the elements of the set {y ∈ X : d(y, x) > ε} are all interior points. 3. Prove Proposition 6.28. 4. Prove Theorem 6.29. 5. Prove Proposition 6.33. 6. Complete the proof of Theorem 6.46. 7. Prove Theorem 6.57. 8. Prove Theorem 6.67. 9. Let X be a nonempty set, and let d : X × X → R be a function with the following properties. (a) d(x, y) = 0 if and only if x = y. (b) For every x, y and z in X, d(x, y) ≤ d(x, z) + d(y, z). Show that d is a metric on X. 10. Let (X, d) be a metric space and define d (x, y) = min{1, d(x, y)}. Prove that d is a metric on X. We encountered this distance function in Example 6.34. Show that d and d induce the same collection of open sets. More precisely, prove that a subset A of X is open in (X, d) if and only if it is open in (X, d ). Note that nevertheless, d may not be equivalent to d. 11. Let (X, d) be a metric space, and let n > 1 be a natural number. Define  d n1 (x, y) = n d(x, y). Verify that d n1 is a metric on X. Is this equivalent to d? 12. Let S denote the set of all sequences of real numbers, and define d : S × S → R by ∞  |xn − yn | . d({xn }, {yn }) = n (1 + |x − y |) 2 n n n=1 Verify that d is a metric on S. Then, observe that the diameter of S with respect to d is 1. Define d on S similar to d, with 2n replaced with n2 . Prove that d is also a metric on S. Can you determine the diameter of S with respect to d ? 13. Let (X, d) be a metric space and n ∈ N. Show that dn (x, y) =

d(x, y) 1 + nd(x, y)

defines a metric on X. Is dn equivalent to d?

Exercises

271

14. Let (X, d) be a metric space. Prove that for every ε > 0, a metric d , equivalent to d, can be defined on X such that d (x, y) < ε for every x, y ∈ X. 15. Suppose that d and d are metrics on a set X. Show that d∗ (x, y) = max{d(x, y), d (x, y)} is also a metric on X. What happens if we replace max with min in the definition of d∗ ? 16. Let de and ρ denote the restriction of the Euclidean metric of R to N and the discrete metric on N, respectively. Prove that the spaces (N, de ) and (N, ρ) have the same collection of open sets and that, nevertheless, de and ρ are not equivalent metrics on N. 17. Verify that on Rn , the Euclidean metric dne and the discrete metric ρ are not equivalent. Let (X, d) be a metric space, let A and B be nonempty subsets of X, and let x ∈ X. Define the distance of x and A by d(x, A) = inf{d(x, y) : y ∈ A}, and the distance of A and B by d(A, B) = inf{d(a, b) : a ∈ A, b ∈ B}.

(6.15)

In Exercises 18–23 below, we discuss these notions. 18. In the Euclidean space R, let A = { n1 : n ∈ N} and x = 0. Find d(x, A). 19. In the Euclidean space R, let B = {2, 3, 4, . . .} and C = {n + d(B, C).

1 n

: n ∈ B}. Find

20. If, in the general context, d(x, A) = 0, is it necessarily true that x ∈ A? If d(A, B) = 0, does it necessarily follow that A ∩ B = ∅? 21. Let P ∗ (X) denote the set of all nonempty subsets of X. Is the function d : P ∗ (X) × P ∗ (X) → R defined by (6.15) a metric on P ∗ (X)? 22. Prove that for every A ⊂ X, A = {y ∈ X : d(y, A) = 0}. 23. Show that for all nonempty subsets A and B of X, inf{d(x, A) : x ∈ B} = inf{d(x, B) : x ∈ A}. 24. Let (X, d) be a metric space and let A and B be disjoint closed subsets of X. Show that there exist disjoint open sets U and V such that A ⊂ U and B ⊂ V . Because of this separation property, which is stronger than the one presented in Theorem 6.19, we say that metric spaces are normal. Hint. If A and B are nonempty, for every a ∈ A and b ∈ B, define ra = d(a, B) and sb = d(b, A). Then ra , sb > 0, and we may consider

Verify that U =

Ra = N r3a (a), Sb = N sb (b). 3 & a∈A Ra and V = b∈B Sb are what we were looking for.

&

25. Let x be a limit point of a set A in a metric space (X, d). If B is any finite subset of A, prove that x is also a limit point of A\B.

272

6. Basic Theory of Metric Spaces

26. In the Euclidean space R2 , find the sets of all interior, boundary, limit, and isolated points of the following sets. 2 2 (a) E = {(x, y) ∈ R2 : x4 + y9 < 1}. (b) F = {(x, y) ∈ R2 : y > x}. (c) G = {(x, y) ∈ R2 : xy > 1}. 27. Suppose that (X, d) is a metric space which has a finite open set. Prove that every subset of X is open. 28. Prove that in a metric space X, every subset of X is open if and only if no subset of X has a limit point in X. 29. Suppose that A and B are subsets of a metric space. If A is closed and A◦ = B ◦ = ∅, show that (A ∪ B)◦ = ∅. 30. Show, by means of an example, that the union of an infinite family of closed sets is not necessarily closed.   31. Prove that n + 21n : n ∈ N is a closed subset of the Euclidean space R. 32. Find the closure of the set

1 1 + : m, n ∈ N m n2



in the Euclidean space R. 33. In the Euclidean space R, let A be a proper subset of R which contains Q. Can A be a closed set? Why? 34. Consider A = {(x, y) ∈ R2 : x2 + y 2 > 0} as a subset of the Euclidean space R2 . Is A closed? Is it open? Why? 35. In the Euclidean space R2 , show that

  1 1 2n + m, n : m, n ∈ N 2n 2 2 + 2m is not a closed set. 36. Let A be a subset of a metric space. (a) Show that A is the smallest closed set that contains A. (b) Show that A◦ is the largest open set which is contained in A. (c) Prove that the sets A and ∂A are closed. (d) Verify that ∂A = A ∩ (Ac ) = A\A◦ . (e) Show that (Ac ) = (A◦ )c and (A)c = (Ac )◦ . 37. Let A be an arbitrary subset of a metric space X. If U is an open subset of X, prove that U ∩ A ⊆ U ∩ A. 38. Let A and B be dense subsets of a metric space X. If A is open, show that A ∩ B is also dense in X. Show by means of an example that when A and B are not open, A ∩ B may not be dense. 39. Prove that every open subset of the Euclidean space R is the union of a countable family of pairwise disjoint open intervals.

Exercises

273

In Exercises 40–45 below, we explore separable metric spaces. 40. Show that Qn is dense in the Euclidean space Rn . Note that Qn is countable and, hence, Rn is a separable space. 41. If (X, d) is a separable metric space and F is a family of pairwise disjoint open sets in X, prove that F is countable. Hint. The proof is similar to the solution of Example 6.59. 42. If (X, d) is a separable metric space, then the set of all isolated points of X is countable. To understand the next exercise, you need to know what is meant by a basis. A family {Vi }i∈I of open sets in a metric space (X, d) is said to be a basis for the space, if every open set G ⊆ X can be written as the union of a subfamily of {Vi }i∈I . For example, in any metric space, the family of all neighborhoods is a basis, as we observed in Corollary 6.48, in any discrete metric space the family of all singletons is a basis, etc. 43. A metric space (X, d) is separable if and only if it has a countable basis. Hint. If A = {xn : n ∈ N} is a countable dense set, prove that {N m1 (xn ) : m, n ∈ N} is a basis for X. Conversely, if {Vn : n ∈ N} is a countable basis for X, for each n choose xn from Vn and show that {xn : n ∈ N} is dense in X. 44. Prove that every totally bounded metric space is separable. Hint. For each n ∈ N, a finite number of neighborhoods of radius n1 covers X. &∞ Let Cn denote the set of all such neighborhoods. Then show that C = n=1 Cn is a basis for X. 45. Let X be a metric space in which every infinite subset has a limit point. Show that X is separable. 46. Consider Y = [0, 1] ∪ (2, 4) as a subspace of the Euclidean space R. Show that A = [0, 1] is both open and closed in this space. 47. Consider Y = [0, 1] as a subspace of the Euclidean space R, and let  m n : m, n ∈ N, 1 ≤ m ≤ 2 − 1 . A= 2n Prove that A is dense in Y . 48. Consider Y = [0, +∞) as a subspace of the Euclidean space R, and let m  A= : m, n ∈ N . 2n Prove that A is dense in Y . 49. Give an example of a metric space in which every set is totally bounded. 50. Is every bounded subset of the Euclidean space Rn totally bounded? Why? 51. If A is a totally bounded subset of a metric space, show that A is also totally bounded. 52. Show that a metric space X is totally bounded if and only if every infinite subset of X contains distinct points which are arbitrarily close to each other. 53. Suppose that X is a totally bounded metric space and A ⊂ X. Show that A is totally bounded in every subspace Z of X which contains A. More generally, prove that A ∩ Y is totally bounded in every subspace Y of X.

274

6. Basic Theory of Metric Spaces

54. Let X be a metric space. Prove that the union of every finite collection of totally bounded subsets of X is totally bounded. A subset A of a metric space X is said to be nowhere dense if (A)◦ = ∅. 55. Verify that in the Euclidean space R, the set { n1 : n ∈ N} is nowhere dense. 56. Prove that in a discrete metric space, the only nowhere dense set is ∅. 57. Show that in any metric space, the closure of a nowhere dense set is nowhere dense. 58. Show that in any metric space, every subset of a nowhere dense set is also nowhere dense. 59. Prove that a finite union of nowhere dense subsets of a metric space is nowhere dense. Let X be a nonempty set, and let d : X × X → R be a function which satisfies all the properties of a metric, with the exception that d(x, y) = 0 does not imply x = y. We say that d is a pseudometric on X. 60. Suppose that C is the set of all convergent sequences of real numbers. Define d : C × C → R by     d({xn }, {yn }) =  lim (xn − yn ) . n→∞

Prove that d is a pseudometric on C. 61. If X is a nonempty set and d is a pseudometric on X, define a relation E on X by xEy if and only if d(x, y) = 0. Observe that E is an equivalence relation on X. For every x ∈ X, let Ex denote the equivalence class of x modulo E, and let X E be the set of all such equivalence classes. Define a function d : by

X X × →R E E

x y , = d(x, y). E E X Show that d is a metric on E . d

Chapter 7

Sequences in General Metric Spaces

As we mentioned in Chapter 2, the notion of sequence has meaning in any nonempty set X. But if we want to talk about the convergence or divergence of sequences in X, we need to have a distance function. Our aim in this chapter is to generalize the theory of real sequences to a theory for sequences in general metric spaces. We will see that some of the aspects of real sequence theory are generalizable, while some others are not. An instance of the latter aspects is the theory of real series, because the series are defined using addition, and this is not meaningful in arbitrary metric spaces. This is why the word series is not contained in the title of this chapter.

7.1. Convergence and Divergence in Metric Spaces Based on our experience with real sequences, we can immediately generalize the concept of convergence to the context of general metric spaces. This is a straightforward generalization as we only need to replace the Euclidean distance function de by a general one d. Definition 7.1. Let {xn } be a sequence in a metric space (X, d), and let x ∈ X. We say that {xn } converges to x, and we write limn→∞ xn = x if the following statement is true. (MC) For every ε > 0, we can find N ∈ N such that d(xn , x) < ε for every n ≥ N. (We used MC as an abbreviation for metric space convergence.) When a sequence fails to converge to a point of the space, we say that it is divergent. As we will see shortly and as it seems to be true by (MC), the convergence or divergence of {xn } depends strongly on the distance function. For this reason, 275

276

7. Sequences in General Metric Spaces

when (MC) is true, we sometimes say that {xn } converges to x with respect to d or that it converges to x in (X, d). Exercise 7.2. Suppose {xn } is a sequence in a metric space (X, d). Verify that {xn } converges to some x ∈ X if and only if lim d(xn , x) = 0.

n→∞

What does (MC) say? The statement (MC) has an interpretation similar to that of the convergence of real sequences. This can be stated as follows. The terms xn will be as close to x as we wish, provided that the index n is sufficiently large. But this time, closeness is expressed in terms of the generic distance function d. The following examples show the diversity of contexts in which convergence can be considered. Example 7.3. Let X be any nonempty set equipped with the discrete metric. If {xn } is a sequence in X and x ∈ X, prove that {xn } converges to x if and only if there exists N ∈ N such that xn = x for every n ≥ N . Solution. It is clear that when xn = x for all sufficiently large n, then {xn } converges to x. This is indeed true in any metric space. To prove the converse, note that when {xn } converges to x, for ε = 1/2 we can find N ∈ N such that for every n ≥ N , xn ∈ Nε (x) = {x}. This completes the proof. What does Example 7.3 say? Example 7.3 says that a sequence in a discrete metric space is convergent if and only if it is constant for all sufficiently large indices. As a result of this example we see that the sequence {1/n}, which converges to 0 in the Euclidean space R, is divergent when we equip R with the discrete metric. This shows that the convergence or divergence of a sequence depends on the metric we consider on the underlying set. Our next example shows that the convergence situation may change if we fix the metric and change the underlying set. Example 7.4. The sequence {1/n} converges to 0 in the Euclidean space R. If we consider this as a sequence in the subspace Y = (0, 1], then the sequence is divergent in Y , simply because 0 is not an element of Y . In our next two examples, we examine the convergence of sequences in the Euclidean space Rn and in spaces C([a, b]).

7.1. Convergence and Divergence in Metric Spaces

277

−→ n Example 7.5. Let {− x→ m } be a sequence in R , where xm = (xm1 , . . . , xmn ) for − → → → − x with respect to the each m. If x = (x1 , . . . , xn ), then prove that limm→∞ xm = − n Euclidean metric of R if and only if limm→∞ xmi = xi , for each i ∈ {1, . . . , n}, with respect to the Euclidean metric of R. → − Solution. Let ε > 0 be given. If limm→∞ − x→ m = x , then there exists N ∈ N such that for every m ≥ N ,  → − x→ (xm1 − x1 )2 + · · · + (xmn − xn )2 < ε. dne (− m, x ) = So, for every i ∈ {1, . . . , n} and every m ≥ N ,  → − x→ de (xmi , xi ) = |xmi − xi | = (xmi − xi )2 ≤ dne (− m , x ) < ε. This proves that for each i ∈ {1, . . . , n}, limm→∞ xmi = xi . To prove the converse, assume that for every i ∈ {1, . . . , n}, limm→∞ xmi = xi . Then for each i there exists Ni ∈ N such that m ≥ Ni implies ε de (xmi , xi ) = |xmi − xi | < . n Then Example 1.63 shows that for every m ≥ N := max{N1 , . . . , Nn }, → − x→ dne (− m, x ) ≤

n 

|xmi − xi |
0, we can find N ∈ N such that for every n ≥ N and every x ∈ [a, b], |fn (x) − f (x)| < ε. (We used UCO as an abbreviation for uniform convergence.)

278

7. Sequences in General Metric Spaces

In particular, if {fn } converges to f with respect to du , then lim fn (x) = f (x)

n→∞

for every x ∈ [a, b]. Solution. Note that {fn } converges to f with respect to the metric du if and only if for every ε > 0, there exists N ∈ N such that for every n ≥ N , du (fn , f ) = sup{|fn (x) − f (x)| : x ∈ [a, b]} < ε. That the above statement is equivalent to (UCO) can be easily verified using the properties of the supremum. It is also clear that (UCO) implies the convergence of {fn (x)} to f (x) in the space (R, de ), for every x ∈ [a, b]. Remark 7.8. If we want to formally describe that {fn (x)} converges to f (x) for every x ∈ [a, b], then we may write the following. For every ε > 0 and every x ∈ [a, b] there exists Nx ∈ N such that for every n ≥ Nx , |fn (x) − f (x)| < ε. The subscript x in Nx shows that this natural number is not only related to ε, but it also depends on x. The statement (UCO) therefore presents a more general notion of convergence for the sequence {fn }, to which we will refer, in Chapter 9, as uniform convergence. The subscript u in du refers to uniform, and the metric du will be frequently called the uniform metric. Note that in (UCO), there is some N ∈ N that works for all x ∈ [a, b] in the formal definition of limn→∞ fn (x) = f (x). When this last equality holds for all x ∈ [a, b], we say that {fn } converges to f pointwise on [a, b]. 2

Exercise 7.9. For n ∈ N and x ∈ [0, 1] define fn (x) = xn . Prove that {fn } converges to the function ⎧ ⎨ 0 0 ≤ x < 1, f (x) = ⎩ 1 x=1 pointwise on [0, 1]. Does {fn } converge to f with respect to the uniform metric of C([0, 1])? From now on, we will try to generalize some of the basic facts that we proved for real sequences to the context of metric spaces. The following is the first instance of such generalizations. Proposition 7.10. In any metric space a convergent sequence has a unique limit. The proof is similar to that of Proposition 2.12, where we proved that the limit of real sequences is unique. The details are therefore left to the reader as exercise.

7.1. Convergence and Divergence in Metric Spaces

279

Monotonicity cannot be generalized to the metric space context. Of course, many results proved for real sequences have no meaning for general metric spaces. An instance is the fact that every monotone and bounded sequence of real numbers converges. This is because monotone sequences were defined using the order < on R, but as we know, such an order may not exist in an arbitrary metric space. For this reason, another important result which is not generalizable is Theorem 2.59 which asserts that every sequence of real numbers has a monotone subsequence. So, the notion of monotonicity and its allied theorems cannot be considered in the metric space setting. On the other hand, some arguments which do not seem to be generalizable can be extended using some tricks. For example, boundedness of real sequences was also defined in terms of the order 0 such that for every n and every x ∈ [a, b], |fn (x)| ≤ M . (We use UBO as an abbreviation for uniform boundedness.) Solution. If {fn } is bounded, there exist f ∈ C([a, b]) and ε > 0 such that for every n, du (fn , f ) < ε. If M1 is an upper bound for the values of |f | on [a, b], whose existence is a consequence of the continuity of this function, then for every x ∈ [a, b] and every n, |fn (x)| ≤ |fn (x) − f (x)| + |f (x)| < ε + M1 . If we let M = ε + M1 , this completes the first part of the proof. Conversely, if we assume that (UBO) is true, then for every n and every x ∈ [a, b], |fn (x) − f1 (x)| ≤ |fn (x)| + |f1 (x)| ≤ 2M, so that the range of {fn } is contained in NK (f1 ) when K > 2M is arbitrary. Thus, {fn } is bounded.

280

7. Sequences in General Metric Spaces

Note that (UBO) shows that M is an upper bound for the values of |fn | on [a, b], for each n. For this reason, when (UBO) is the case, we say that {fn } is a uniformly bounded sequence of real-valued functions. Example 7.15. Which of the given sequences is bounded in the metric space (C([0, 1]), du )? (1) fn (x) = sin nx. (2) gn (x) = n(1 + x). Solution. (1) For every n ∈ N and every x ∈ [0, 1], |fn (x)| = | sin nx| ≤ 1. So, Example 7.14 shows that {fn } is bounded. (2) If M > 0 is given, choose n0 so large that gn0 (1) = (1 + 1)n0 > M. It follows that the statement for every n ∈ N and every x ∈ [0, 1], |gn (x)| ≤ M is not true. Since M was arbitrary, we find in view of Example 7.14 that {gn } is not a bounded sequence in (C([0, 1]), du ). Subsequences. The notion of subsequence has meaning for sequences in arbitrary sets. If {xn } is a sequence in some set X, and {nk } is a strictly increasing sequence of natural numbers, then {xnk } is called a subsequence of {xn }. Although the theory of subsequences for general metric spaces has some differences with that in the Euclidean space R, the following generalization of Theorem 2.55 can be proved similarly. Theorem 7.16. Let {xn } be a sequence in a metric space (X, d), and let x ∈ X. Then, the following conditions are equivalent. (1) The sequence {xn } converges to x. (2) Every subsequence of {xn } converges to x. (3) The subsequences of even- and odd-indexed terms of {xn } converge to x. The first difference between subsequence theory in the Euclidean space R and that in general metric spaces is that the concepts of limit superior and limit inferior have no meaning in the latter context. This is because the concepts are defined using suprema and infima, and these are defined using the order relation < on R, which is not present in arbitrary metric spaces. Nevertheless, the concept of subsequential limit can be defined in our current setting. Definition 7.17. Let {xn } be a sequence in a metric space (X, d). An element x of X is called a subsequential limit of {xn } if there exists a subsequence {xnk } of {xn } such that limk→∞ xnk = x. The set of all subsequential limits of {xn } will be denoted by {xn }.

7.1. Convergence and Divergence in Metric Spaces

281

If {xn } is a sequence in the Euclidean space R, the set of subsequential limits defined in the above definition may differ from the set E({xn }) considered in Section 2.2. For example, for the sequence ⎧ 1 ⎨ n n is even, xn = ⎩ n n is odd, the set E({xn }) is {0, +∞}, while {xn } = {0}. The symbols −∞ and +∞ were adjoined to R in Chapter 1 to improve the theory of suprema and infima. Hence we cannot expect to see them in our discussion of subsequences in general metric spaces. It follows from Theorem 7.16 that a sequence {xn } converges to x if and only if {xn } = {x}. Example 7.18. If we consider the sequence {n} in the Euclidean space R, then it is clear that {n} = ∅. Thus, the set of all subsequential limits of a real sequence, in the sense of Definition 7.17, may be empty. Compare this with what we observed in Section 2.2: the set E({xn }) is nonempty for every sequence {xn } of real numbers. The second difference between the theory of subsequences in R and in the general context is related to the existence of convergent subsequences. As you may remember, a central result in Chapter 2 was the Bolzano–Weierstrass theorem which asserts that any bounded sequence of real numbers has a convergent subsequence. Although the same assertion can be phrased in the metric space setting, the following example shows that this theorem cannot be generalized to the abstract theory of metric spaces, even if we work in R with a different metric. Example 7.19. If we consider the discrete metric on R, {n} is a bounded sequence in this metric space which has no convergent subsequences by Example 7.3. Example 7.20. For each n ∈ N, define a function fn on [0, 1] by fn (x) = sin(x/n) if n is even and by fn (x) = cos(x/n) when n is odd. Find the set {fn } of subsequential limits of {fn } in the space (C([0, 1]), du ). Solution. The subsequences {f2k } and {f2k−1 } converge to the constant functions f ≡ 0 and g ≡ 1, respectively. To see why, note that for every k ∈ N and every x ∈ [0, 1],  1 x   x   . |f2k (x) − 0| = sin  ≤   ≤ 2k 2k 2k So, if ε > 0 is given and we choose N ∈ N so large that 1/N < 2ε, then for every k ≥ N and every x ∈ [0, 1], 1 1 ≤ < ε. |f2k (x) − 0| ≤ 2k 2N This proves that for all k ≥ N , du (f2k , f ) < ε, and therefore that {f2k } converges to f with respect to the uniform metric. Similar reasoning shows that {f2k−1 } converges to g with respect to du . The set {fn } of subsequential limits of {fn } is therefore {f, g}.

282

7. Sequences in General Metric Spaces

Limit Points, Closure, and Closedness in Terms of Sequences. One application of sequences in metric space theory is that we may use them to determine the closedness of sets. The following is the main result in this connection. Proposition 7.21. If A is a subset of a metric space X and x ∈ A, then (1) x ∈ A if and only if x is the limit of a sequence in A\{x}, and (2) x ∈ A if and only if x is the limit of a sequence of elements of A. Proof. We only prove (2) because (1) is a straightforward generalization of Proposition 3.6. To prove (2), note that when x ∈ A, we have two cases: • x ∈ A, in which case the sequence defined by xn = x for every n ∈ N is a sequence in A that converges to x; • x ∈ A , in which (1) gives us a sequence {xn } in A such that xn = x for every n ∈ N and limn→∞ xn = x. As for the converse, let x be such that limn→∞ xn = x for a sequence {xn } in A. If x = xn for some n ∈ N, then x ∈ A. Otherwise, (1) shows that x is a limit  point of A. Thus, in either case, x belongs to A. Item (2) of the above proposition says that x is a cluster point of A if and only if it is the limit of a sequence in A. An important corollary of Proposition 7.21 is the following interesting result. Corollary 7.22. If A is a subset of R (equipped with de ) which is bounded from above and we let x = sup A, then x ∈ A. Proof. This follows easily from Proposition 7.21(2) because x is the limit of a sequence of the elements of A by Theorem 2.4.  What does Corollary 7.22 say? Corollary 7.22 says that the supremum of a set A of real numbers is a cluster point of the set. Based on our understanding of closure and cluster points, this result confirms the previously mentioned fact that the supremum of a set A of real numbers is adhered to the set. As another consequence of Proposition 7.21, we can prove the following sequential characterization of closed subsets of metric spaces. Corollary 7.23. For a subset A of a metric space X, the following conditions are equivalent. (1) The set A is closed. (2) If {xn } is a sequence in A that converges to some x, then x ∈ A. Proof. This follows from the above proposition and the fact that A is closed if and only if A equals A. 

7.1. Convergence and Divergence in Metric Spaces

283

What does Corollary 7.23 say? Corollary 7.23 says that a subset A of a metric space is closed if and only if it contains the limit of each of the sequences that lie entirely in A. Recall that in Proposition 2.80 a similar assertion was proved for intervals of the form [a, b] as subsets of the Euclidean space R. Thus, as we mentioned there, such intervals are closed subsets of (R, de ). Exercise 7.24. Prove that x ∈ A if and only if x is the limit of a sequence with distinct terms in A. Example 7.25. In the Euclidean space R, (0, 1) is not closed, as we knew. One way 1 to see this is to note that the sequence { 2n } is a sequence in (0, 1) that converges to 0 in R, but 0 ∈ (0, 1) Example 7.26. In the Euclidean space R2 , determine whether

  1 n , A= :n∈N n 2n − 3 is closed or not. Solution. The set A is not closed because     1 n 1 , lim = 0, , n→∞ n 2n − 3 2 and (0, 1/2) ∈ A\A. It is clear that A = A ∪ {(0, 1/2)}. The following proposition generalizes the above example to the context of metric spaces. Proposition 7.27. If {xn } is a sequence in a metric space (X, d) and we let A = {xn : n ∈ N}, then A = A ∪ {xn }. If, in particular, {xn } converges to some x, then A = A ∪ {x}. Proof. If y ∈ {xn }, y is the limit of a subsequence {xnk } of {xn }. Since {xnk } is a sequence in A, Proposition 7.21 tells us that y ∈ A. Thus, A ∪ {xn } ⊆ A. On the other hand, if z ∈ A is arbitrary, then Proposition 7.21(2) gives us a sequence {zk } in A that converges to z. Consider two cases for the range Z of {zk }. • Z is finite. Then there exists N ∈ N such that for every k ≥ N , zk = z. Thus z ∈ A in this case. • Z is infinite. In this case we may choose a subsequence of {zk } with distinct terms, which is therefore such a subsequence of {xn } that converges to z. So, z ∈ {xn }. Thus, in either case, z ∈ A ∪ {xn }. This shows that A ⊆ A ∪ {xn }.



284

7. Sequences in General Metric Spaces

What does Proposition 7.27 say? Proposition 7.27 says that when we consider a sequence {xn } as a set A, the cluster points of A are the elements of A and the subsequential limits of {xn }. Example 7.28. In the metric space (C([0, 1]), du ) find the closure of the set A = {gk : k ∈ N} ∪ {hk : k ∈ N} , x x and hk (x) = cos 2k−1 for every k ∈ N and every x ∈ [0, 1]. where gk (x) = sin 2k

Solution. We saw in Example 7.20 that A is the range of the sequence {fn } defined by fn (x) = sin(x/n) if n is even, and fn (x) = cos(x/n) when n is odd. We also observed in that example that {fn } = {f, g}, where f ≡ 0 and g ≡ 1 on [0, 1]. So A = A ∪ {f, g}.

7.2. Cauchy Sequences and Complete Metric Spaces Just as easily as we defined the convergence of sequences, the notion of Cauchy sequence can be generalized to the context of metric spaces. Definition 7.29. Let {xn } be a sequence in a metric space (X, d). We say that {xn } is a Cauchy sequence if the following statement is true. For every ε > 0, we can find N ∈ N such that d(xm , xn ) < ε for all m, n ≥ N . As we saw in Chapter 2, Cauchy sequences in R and [a, b] are convergent. This leads us to guess that the same conclusion is true in any metric space. But, it can be easily seen that { n1 } is a Cauchy sequence in (0, 1] which is divergent there. Thus (0, 1], considered as a subspace of the Euclidean space R, is a metric space in which Cauchy sequences are not necessarily convergent. This shows that Theorem 2.78, stating that every Cauchy sequence of real numbers converges in R, cannot be generalized to the metric space setting. This is quite surprising, as our definition of the Cauchy sequence was (in some sense) a copy of that of Cauchy sequences in R. Here, the divergence of { n1 } is a flaw of the space (0, 1], not of the sequence itself. If we replace (0, 1] by an appropriate larger subspace of the Euclidean space R that contains 0, then { n1 } converges in this space. We describe this flaw by saying that the space (0, 1] is not as complete as we wish. This is our motivation for the following definition. Definition 7.30. A metric space (X, d) is called complete if every Cauchy sequence in X converges. With this definition, the sets R and [a, b] with their Euclidean metric are complete. Example 7.31. Prove that any discrete metric space is complete.

7.2. Cauchy Sequences and Complete Metric Spaces

285

Solution. If {xn } is a Cauchy sequence in the discrete metric space (X, ρ), let N be a natural number such that for every n ≥ N , ρ(xn , xN ) < 12 . Then, our definition of the discrete metric ρ implies that for every n ≥ N , xn = xN . Hence, {xn } converges to xN . Completeness is not hereditary. Our above argument shows that a subspace of a complete metric space may not be complete, as (0, 1] is an incomplete subspace of the complete space R. Thus, completeness is not a hereditary metric space property. The following proposition characterizes those subspaces which inherit completeness. Proposition 7.32. For a subset Y of a complete metric space (X, d), the following conditions are equivalent. (1) The set Y is a closed subset of X. (2) The space Y , considered as a metric subspace of (X, d), is complete. Proof. (1) ⇒ (2). If {yn } is a Cauchy sequence in Y , then it is also a Cauchy sequence in X. Since X is complete, {yn } converges to some x ∈ X. That Y is closed shows that x ∈ Y , so that {yn } converges in Y . This proves that the subspace Y is complete. (2) ⇒ (1). Assume that Y is a complete subspace of X. To prove that Y is a closed subset of X, it is enough to show that when {yn } is a sequence in Y that converges to some y ∈ X, y is necessarily an element of Y . But if {yn } and y are so, {yn } is a Cauchy sequence in Y , and hence it converges in Y . This implies that y is an element of Y .  Example 7.33. None of the sets (1, +∞), (−1, 0] and { n1 : n ∈ N}, considered as a subspace of the Euclidean space R, is complete. On the other hand, the subspaces [1, +∞), Z and { n1 : n ∈ N} ∪ {0} of the Euclidean space R are complete. Exercise 7.34. Consider Q as a subspace of the Euclidean space R. Is this a complete space? Why? Example 7.35. Prove that (C([a, b]), du ) is a complete metric space. Solution. Let {fn } be a Cauchy sequence in C([a, b]). Since for every m, n ∈ N and every x ∈ [a, b], (7.1)

|fn (x) − fm (x)| ≤ du (fn , fm ),

the sequence {fk (x)} is a Cauchy sequence of real numbers. Since the Euclidean space R is complete, these sequences are actually convergent. Define a function f on [a, b] by f (x) = lim fk (x). k→∞

286

7. Sequences in General Metric Spaces

To complete the proof that du makes C([a, b]) into a complete space, we need to verify that (1) f is continuous on [a, b] so that f ∈ C([a, b]), and (2) {fn } converges to f with respect to du . Let ε > 0 be given. By (7.1) and the fact that {fn } is Cauchy, there exists N ∈ N such that for all m, n ≥ N and every x ∈ [a, b], |fn (x) − fm (x)| < ε. Fix n ≥ N and let m tend to infinity in the last inequality to obtain that (∗)

there exists N such that for all n ≥ N and every x ∈ [a, b], |fn (x)−f (x)| < ε.

If we can show that (1) is true, then this statement in view of Example 7.7 proves (2). To prove (1), consider an arbitrary x0 ∈ [a, b], and let N be as in (∗). Since fN is continuous, we may choose δ > 0 such that |fN (x) − fN (x0 )| < ε whenever |x − x0 | < δ and x ∈ [a, b]. Then, for every x ∈ [a, b] with |x − x0 | < δ, |f (x) − f (x0 )| ≤ |f (x) − fN (x)| + |fN (x) − fN (x0 )| + |fN (x0 ) − f (x0 )| < 3ε, which proves (1). Exercise 14 at the end of this chapter shows the way we may construct a complete metric space from a given metric space. A Sequential Characterization of Total Boundedness. Cauchy sequences can be used to find a useful characterization of totally bounded sets. As we will see shortly, this result will be used in the proof of Theorem 7.49 below. Theorem 7.36. For a subset Y of a metric space (X, d), the following are equivalent. (1) The set Y is totally bounded. (2) Every sequence in Y has a Cauchy subsequence. Proof. (1) ⇒ (2). Suppose that Y is totally bounded. If Y is a finite set, (2) is clearly satisfied. So assume that Y is infinite and consider a sequence {yn } in Y . Since Y is totally bound, it can be written as a finite union of sets with diameter less than 1. Since Y is infinite and the number of such sets is finite, at least one of the sets, say Y1 , contains an infinite number of the terms yn . Choose n1 ∈ N such that yn1 ∈ Y1 . But Y1 is also totally bounded, and we can write it as a finite union of sets, each of diameter less than 1/2. We can find a set Y2 among such sets that contains an infinite number of the terms yn . Choose n2 > n1 such that yn2 ∈ Y2 . Continuing in this way, we construct a sequence {Yk } of sets such that Y1 ⊇ Y2 ⊇ Y3 ⊇ · · · and diam(Yk ) < 1/k for every k, and find a subsequence {ynk } of {yn } such that ynk ∈ Yk for every k. Now, if ε > 0 is given, we find k0 ∈ N such that 1/k0 < ε. Then for all j, l ≥ k0 , ynj , ynl ∈ Yk0 and hence d(ynj , ynl ) ≤ diam(Yk0 ) < This proves that the subsequence {ynk } is Cauchy.

1 < ε. k0

7.3. Compactness: Definition and Some Basic Results

287

(2) ⇒ (1). We assume that Y is not totally bounded and find a sequence {yn } in Y with no Cauchy subsequences. Clearly, this in view of the contrapositive law gives us the desired result. Since Y is not totally bounded, there exists ε > 0 such that Y cannot be covered by a finite number of ε-neighborhoods. Starting from some y1 ∈ Y , we can find y2 ∈ Y such that d(y2 , y1 ) ≥ ε; otherwise, Y will be contained in Nε (y1 ), contradicting our assumption. Since Y is not a subset of Nε (y1 ) ∪ Nε (y2 ), we may find y3 ∈ Y such that y3 = yi and d(y3 , yi ) ≥ ε for i = 1, 2. Continuing in this way, we find a sequence {yn } of distinct elements of Y such that d(yn , ym ) ≥ ε for all m, n ∈ N with n = m. The sequence therefore fails to have any Cauchy subsequence.  Comparing Theorem 7.36 with the Bolzano–Weierstrass theorem. It will be instructive to compare Theorem 7.36 with the Bolzano–Weierstrass theorem. As we observed in this chapter, the Bolzano–Weierstrass theorem cannot be generalized to the metric space setting. This means that a bounded sequence in a metric space may fail to have any convergent subsequence. The implication (1) ⇒ (2) of Theorem 7.36 says that by replacing the boundedness assumption with a stronger requirement (total boundedness) we can deduce a property weaker than the property of having convergent subsequences (the property of having Cauchy subsequences)!

7.3. Compactness: Definition and Some Basic Results As we saw in Chapters 2 and 3, among the various kinds of subsets of R, intervals of the form [a, b] occupy a distinguished place. This was because of some important results, which we now list. In all of the results, [a, b] is equipped with the Euclidean metric. • In [a, b] every Cauchy sequence converges (Theorem 2.82). • If f : [a, b] → R is continuous, then f attains the maximum and minimum of its values on [a, b] (Theorem 3.78). • Every continuous function defined on [a, b] is uniformly continuous on this interval (Theorem 3.94). We observed that none of the above statements may be true when we replace [a, b] by another kind of interval such as (a, b) or (a, b]. If you re-examine the proof of the above statements, you will find a common point which plays a decisive role in the proofs. This is the important fact that every sequence in [a, b] has a subsequence that converges to some element of this set, which was established in Theorem 2.81. This observation motivates us to seek for sets, or spaces, which satisfy a similar property. More precisely, we want to study those sets or spaces in which every sequence has a convergent subsequence, and try

288

7. Sequences in General Metric Spaces

to generalize the above statements to their context. With this in mind, we first give a name to such sets. Definition 7.37. A subset K of a metric space X is said to be compact if every sequence in K has a subsequence that converges to some point of K. Example 7.38. The Euclidean space R is not compact. This is because the sequence {n} has no convergent subsequences in R. Theorem 2.81 shows that in the Euclidean space R each interval [a, b] is compact. Example 7.39. In any metric space, every finite set is compact, which is easy to verify. Exercise 7.40. Show that in a discrete metric space, a set is compact if and only if it is finite. Example 7.41. In the Euclidean space R, which of the following sets is compact? (1) (0, 1). (2) { n1 : n ∈ N}. 1 Solution. (1) The sequence {1 − 2n }, whose terms are all in (0, 1), converges to 1 in the Euclidean space R. So the same is true for all its subsequences. This shows that the sequence has no convergent subsequences in (0, 1). Therefore, (0, 1) is not compact. (2) If {xn } is a convergent sequence in the given set, it certainly converges to 0. Since 0 is not an element of this set, it is not compact.

The following theorem describes the reason why the sets of the example above are not compact. Theorem 7.42. Compact subsets of metric spaces are closed and totally bounded. Proof. Let K be a compact subset of a metric space (X, d). To prove that K is closed, it is sufficient, in view of Corollary 7.23, to show that when {xn } is a sequence in K that converges to some x ∈ X, x must be an element of K. But our definition of compactness implies that {xn } has a subsequence which converges to some element of K. Since {xn } converges to x, this subsequence should also converge to x, showing that x ∈ K. Next, we show that K is totally bounded. If we assume, to the contrary, that K is not totally bounded, then there exists ε > 0 such that K cannot be covered by a finite number of neighborhoods of radius ε. So, if x1 is arbitrary, K is not a subset of Nε (x1 ). This gives us x2 ∈ X\Nε (x1 ). Again, K is not a subset of Nε (x1 ) ∪ Nε (x2 ). So, we find x3 in the complement of Nε (x1 ) ∪ Nε (x2 ). Continuing in this way, we obtain a sequence {xn } in K such that for distinct m and n, d(xm , xn ) ≥ ε. Then {xn } has no convergent subsequences, contradicting the compactness of K. This shows that K must be totally bounded and completes the proof. 

7.3. Compactness: Definition and Some Basic Results

289

Example 7.43. In the Euclidean space R, [0, +∞) is unbounded, so it is not compact. The interval [0, 1) is not closed. So it is not also a compact set. The set { n1 : n ∈ N} ∪ {0} is both closed and totally bounded, but based on our present knowledge we cannot determine whether it is compact or not. In fact, the above theorem only says that this set has the necessary condition for being compact. The following exercise shows that the set { n1 : n ∈ N} ∪ {0} is indeed a compact subset of the Euclidean space R. Exercise 7.44. Let (X, d) be a metric space, and let {xn } be a sequence in X that converges to x. Then, the set E = {xn : n ∈ N} ∪ {x} is compact. An equivalent formulation of compactness in the Euclidean space R. The converse of Theorem 7.42 is true in the Euclidean space R. This follows from Theorem 3.11, in which, with the terminology we now have, we proved that a set A of real numbers is compact if and only if it is closed and bounded. Unfortunately, Theorem 3.11 cannot be generalized to the context of metric spaces, as the following example shows. Example 7.45. In the metric space C([0, 1]), K = {f ∈ C([0, 1]) : ∀x ∈ [0, 1], |f (x)| ≤ 1} is closed and bounded, but not compact. Solution. Let {fn } be a sequence in K that converges to some f ∈ C([0, 1]) with respect to du . By Example 7.7, the continuity of the absolute value function and our choice of the fn ’s, |f (x)| = | lim fn (x)| ≤ 1 n→∞

for every x ∈ [0, 1]. This shows that f ∈ K, so that K is closed. Also, it is clear that K ⊂ N2 (h), where h ≡ 0 on [0, 1], and K is therefore bounded. To prove that K is not compact, we consider the sequence {gn } defined by gn (x) = xn . Evidently, {gn } is a sequence in K. If {gn } had a convergent subsequence, then by Example 7.7 we could find some g ∈ C([0, 1]) such that for every x ∈ [0, 1], (7.2)

lim gnk (x) = g(x).

k→∞

But it is easy to see that for every x ∈ [0, 1), limn→∞ gn (x) = 0, and limn→∞ gn (1) = 1. Since the function ⎧ ⎨ 0 0 ≤ x < 1, J(x) = ⎩ 1 x = 1, is not continuous on [0, 1], a function g ∈ C([0, 1]) satisfying (7.2) cannot be found. This shows that {gn } has no convergent subsequences and that K is not compact.

290

7. Sequences in General Metric Spaces

Now that we know compact sets are closed, it is natural to think about the truth of the converse. In the Euclidean space R, [1, +∞) is a closed set which, being unbounded, is not compact. The following theorem presents a sufficient condition for the compactness of closed sets. Theorem 7.46. Closed subsets of compact metric spaces are compact. Proof. Let F be a closed subset of a compact metric space X, and consider a sequence {xn } in F . Since X is compact, {xn } has a subsequence {xnk } that converges to some x ∈ X. But {xnk } is a sequence in the closed set F . So Corollary 7.23 tells us that x must be an element of F . This shows that F is also compact. 

7.4. Compactness: Some Equivalent Forms We now come to the deep part of our theory. The results of this section present other useful characterizations of compactness. These are not generalizations of what we have seen before. They followed from a careful examination of the general theory. Theorem 7.47. A metric space X is compact if and only if every infinite subset of X has a limit point in X. Proof. First assume that X is compact, and let A be an infinite subset of X. Since every infinite set contains a countably infinite set, we can choose a sequence {xn } in A with distinct terms. Since X is compact, {xn } has a subsequence {xnk } with distinct terms that converges to some x ∈ X. Now every neighborhood of x contains the terms xnk for all sufficiently large k. Since {xnk } has distinct terms, every neighborhood of x contains an infinite number of the xnk ’s, and hence an infinite number of the elements of A. This shows that x is a limit point of A. Now suppose that every infinite subset of X has a limit point in this set, and consider an arbitrary sequence {xn } in X. To complete the proof, we show that {xn } has a subsequence that converges in X. This leads us to consider two cases for A = {xn : x ∈ N}. • A is finite. Then there exists N ∈ N such that for an infinite number of indices j, xj = xN . This gives us a subsequence of {xn } all of whose terms are equal to xN . Clearly, this subsequence converges to xN . • A is infinite. In this case our assumption shows that A has a limit point x in X. Choose n1 ∈ N such that d(xn1 , x) < 1. After choosing n1 , . . . , nk−1 in such a way that d(xni , x) < 1/i for every i ∈ {1, . . . , k − 1}, choose nk as the smallest element of N such that nk > nk−1 and d(xnk , x) < k1 . Then {xnk } is  a subsequence of {xn } that converges to x.

7.4. Compactness: Some Equivalent Forms

291

Example 7.48. Suppose that f is differentiable on a closed and bounded interval I and that for no x ∈ I, f (x) = f  (x) = 0 is true. Prove that A = {x ∈ I : f (x) = 0} is a finite set. Solution. If we assume that A is an infinite set, the fact that I is compact gives us, in view of the above theorem, a limit point a for A. Since A is closed by Example 3.72, a is an element of A, that is f (a) = 0. On the other hand, that a is a limit point of A gives us a sequence {xn } in A such that xn = a for every n and limn→∞ xn = a. Since f is differentiable at a, Example 4.5 tells us that f  (a) = lim

n→∞

f (xn ) − f (a) , xn − a

and since xn , a ∈ A, we deduce that f  (a) = 0. In summary, we found an element a of I with f (a) = f  (a) = 0. This contradicts our assumption that f (x) = f  (x) = 0 may not happen for an element of I. We saw in Theorem 7.42 that compact spaces are totally bounded. The following theorem characterizes compact spaces in terms of total boundedness. Theorem 7.49. A metric space (X, d) is compact if and only if it is totally bounded and complete. Proof. First assume that X is compact. We only need to show that X is complete because we know from Theorem 7.42 that X is totally bounded. If {xn } is a sequence in X, the compactness assumption ensures that {xn } has a subsequence that converges to some x ∈ X. If we assume in addition that {xn } is Cauchy, then it follows that {xn } also converges to x, proving that X is complete. (Here we used a generalization of Proposition 2.79 to the metric space context. As an exercise, state and prove this result.) Now suppose that X is totally bounded and complete, and consider an arbitrary sequence {an } in X. The total boundedness, in view of Theorem 7.36, tells us that {an } has a Cauchy subsequence {ank } in X. Since X is complete, {ank } converges in X. This proves that X is a compact space.  Using the above theorems, we can prove a more important characterization of compact spaces. To state this, we should first present a definition. Definition 7.50. Let X be a set,&and let A ⊆ X. A cover of A is a family {Aα }α∈I of subsets of X such that A ⊆ α∈I Aα . In this case any subfamily of {Aα }α∈I which is also a cover of A will be called a subcover of this family. If X is a metric space, each Aα is open, and {Aα }α∈I is a cover of A, then we say that {Aα }α∈I is an open cover of A. Examples of open covers will be found in the sequel. For now, let us present the most important characterization of compactness which is usually known as the Heine–Borel theorem.

292

7. Sequences in General Metric Spaces

Theorem 7.51. A metric space (X, d) is compact if and only if every open cover of X has a finite subcover. Proof. If X is compact, then it is totally bounded and complete by Theorem 7.49. Let {Gα } be an open cover of X which has no finite subcover. Since X is totally bounded, it is bounded, and therefore there exist r > 0 and x0 ∈ X such that X = Nr (x0 ). For every n ∈ N, let εn = 2rn . Since X is totally bounded, it can be covered by finitely many balls of radius ε1 . By our assumption, at least one of these balls, Nε1 (x1 ) for example, cannot be covered by a finite number of Gα ’s. Since Nε1 (x1 ) itself is totally bounded, we find by a similar argument some x2 ∈ Nε1 (x1 ) such that Nε2 (x2 ) cannot be covered by a finite number of Gα ’s. Continuing in this way, we find a sequence {xn } such that xn+1 ∈ Nεn (xn ) and Nεn (xn ) cannot be covered by a finite number of Gα ’s. We claim that {xn } is convergent. Since for each n ∈ N, d(xn , xn+1 ) < εn , for every k ∈ N, d(xn , xn+k )

≤ d(xn , xn+1 ) + d(xn+1 , xn+2 ) + · · · + d(xn+k−1 , xn+k ) < εn + εn+1 + · · · + εn+k−1   1 1 1 = r + + · · · + 2n 2n+1 2n+k−1   r 1 = 1− k n−1 2 2 r . < 2n−1

This shows that {xn } is Cauchy. Since X is complete, {xn } converges to some x ∈ X. Since {Gα }α∈I is a cover of X, there exists α0 ∈ I such that x ∈ Gα0 . But Gα0 is open and hence δ > 0 can be found such that Nδ (x) ⊆ Gα0 . Choose n so large that d(xn , x) and εn are both less than δ/2. Then for every y ∈ X with d(y, xn ) < εn , d(y, x) ≤
0 such that Nrx (x) contains at most one element of A, namely, x when x ∈ A. Since A is infinite, no finite subfamily of {Nrx (x) : x ∈ X} can cover A. But A ⊆ X, so this is also true for X. Thus, if every open cover of X has a finite subcover, then every infinite subset of X has a limit point in X, and X is compact by Theorem 7.47. 

7.4. Compactness: Some Equivalent Forms

293

A note on the equivalent formulation of compactness. The equivalent condition we obtained for compactness in the above theorem is used in many texts as the definition of compactness. We will refer to this as the open cover definition of compactness. It describes the reason compact sets are named so. Indeed, when a set is compact and is covered by a number of open sets, only a finite number of them is sufficient for covering the set. This shows that the set cannot be so large in some sense. In a subsequent course, entitled General Topology, you will encounter spaces which are more general than metric spaces. These are called topological spaces. In the context of topological spaces, compactness is defined using open covers, and the equivalence given in Theorem 7.51 is not true. For this reason, in that context the definition of compactness we gave is used to define a weaker condition known as sequential compactness. Since sequential compactness and compactness are equivalent in metric spaces (Theorem 7.51), and motivating the former is much easier than the latter (Theorem 2.81), we preferred to define compactness as in Definition 7.37. Nevertheless, we will feel free to use the open cover interpretation whenever it seems to be necessary. Example 7.52. Prove by the open cover definition of compactness that (1) when {xn } is a sequence in the metric space (X, d) that converges to some x ∈ X, then E = {xn : n ∈ N} ∪ {x} is compact; (2) compact subsets of metric spaces are closed; and (3) in the Euclidean space R, (0, 1) is not compact. Solution. (1) We asked you to prove this in Exercise 7.44. To prove this using the open cover definition of compactness, let {Gα }α∈I be an open cover of E. Then there exists α0 ∈ I such that x ∈ Gα0 . Since Gα0 is open, we find δ > 0 such that Nδ (x) ⊆ Gα0 . But {xn } converges to x and so there exists N ∈ N such that for every n ≥ N , xn ∈ Nδ (x). Thus, all elements of E, except perhaps x1 , . . . , xN −1 , −1 lie in Gα0 . If for i ∈ {1, . . . , N − 1}, xi ∈ Gαi , then {Gαi }N i=0 is a finite subcover of {Gα }α∈I for E. This shows that E is compact. (2) Let K be a compact subset of a metric space (X, d). To prove that K is closed, it is enough to show that K c is open. So, considering an arbitrary x ∈ K c , we will show that x is an interior point of K c . Since x ∈ K c , for every y ∈ K, y = x. Hence, Theorem 6.19 gives us some εy > 0 for every y ∈ K such that Nεy (x)∩Nεy (y) = ∅. The family {Nεy (y) : y ∈ K} is now an open cover of K, which n by the compactness of K has a finite subcover {Nεyi (yi )}ni=1 . If we let V = i=1 Nεyi (x), then V is a neighborhood of x which is entirely contained in K c . This completes the proof by showing that x is an interior point of K c . 1 (3) For each n, let An = (0, 1 − 2n ). It&can be shown, using the Archimedean ∞ property of real numbers, that (0, 1) = n=1 An . So, {An } is an open cover for (0, 1). If we assume that {An1 , . . . , Ank } is a finite subcover of {An } and

294

7. Sequences in General Metric Spaces

nj = max{n1 , . . . , nk }, then we see that (0, 1) =

k '

Ani

i=1

  1 = 0, 1 − , 2nj

a contradiction. Thus, {An } has no finite subcover, and (0, 1) is not compact. Exercise 7.53. Prove, by the open cover definition of compactness, that R equipped with the Euclidean metric is not compact. As an application of the open cover definition of compactness, we can prove the following interesting property of compact sets. As we will see in the next section, this property is crucial in our construction of Cantor’s set. Theorem 7.54. Let {Kn } be a sequence of nonempty compact subsets of a metric space X such that Kn+1 ⊆ Kn for every n ∈ N. Then, the set ∞ ( K := Kn n=1

is nonempty. Proof. Assume to the contrary that K is empty. If for each n we let Gn = Knc , then this assumption shows that ∞ ' Gn . X= n=1

Since K1 ⊆ X is a compact set and the sets Gn are open, we can find j1 , . . . , jm ∈ N such that m ' G ji . K1 ⊆ i=1

But, by our definition of the sets Gn , this shows that  m ( ( K1 Kji = ∅. i=1

This is clearly a contradiction, because by letting j = max{1, j1 , . . . , jm } we see that  m ( ( K1 Kji = Kj = ∅.  i=1

Exercises 23 and 24 at the end of this chapter complement Theorem 7.54.

7.5. Perfect Sets and Cantor’s Set We know that intervals of the form [a, b] are closed in the Euclidean space R. This means that each such interval contains all its limit points. But, as we saw in Example 3.4(2), every point of such intervals is also a limit point, showing that intervals of the form [a, b] are equal to their derived set. We have already seen other examples of sets with the same property: the set A = {(x, y) ∈ R2 : x = y} in the Euclidean space R2 is equal to its derived set A .

7.5. Perfect Sets and Cantor’s Set

295

On the other hand, there are numerous examples of sets which are not equal to their derived sets: intervals of the form (a, b) in the Euclidean space R, nonempty finite subsets of arbitrary metric spaces, and so on. This motivates us to think further about sets that equal their derived set. This is also a natural problem to consider, as we previously considered sets that are equal to their interior, namely, open sets. The first step in our study is to give a specific name to the sets we are going to consider. Definition 7.55. A subset A of a metric space is said to be perfect whenever A = A . With this definition, perfect sets are closed. Nonempty finite subsets of metric spaces are closed but not perfect. Example 7.56. We know that in the Euclidean space R, the set E = { n1 : n ∈ N} is not closed. It is not therefore a perfect set. The derived set of E, namely E  = {0}, is also not perfect because (E  ) = ∅. Exercise 7.57. In a discrete metric space, which sets are perfect? Cantor’s Set. As our final argument in this chapter, we present a particularly important subset C of the Euclidean space R which is known as Cantor’s set. To construct C, we begin with the set C0 = [0, 1] and remove the open interval (1/3, 2/3) from it to obtain C1 = [0, 1/3] ∪ [2/3, 1]. Then, we remove the open middle third of the intervals [0, 1/3] and [2/3, 1] from C1 to get C2 = [0, 1/9] ∪ [2/9, 3/9] ∪ [6/9, 7/9] ∪ [8/9, 1]. Having found C1 , . . . , Cn in this way, we construct a set Cn+1 by removing the open middle third of each of the intervals whose union is Cn . Then each of the sets Cn is compact, being a finite union of compact sets. Moreover, it is clear that C1 ⊃ C2 ⊃ C3 ⊃ · · · and that Cn is the union of 2 intervals, each of length 1/3n , for every n ∈ N. By Theorem 7.54, the set ∞ ( C= Cn n

n=1

is nonempty. This set is compact, as an intersection of compact sets. We claim that C is perfect. To see this, we show that every element of C is a limit point of C. So, let x be an arbitrary element of C. It is enough to show that for any positive real number r, (7.3)

Nr (x) ∩ (C\{x}) = ∅.

To see this, let n be so large that 1/3n < r. Our definition of C and the assumption x ∈ C imply that x ∈ Cn . Let I be the unique interval among the 2n intervals that constitute Cn which contains x. Our choice of n shows that (7.4)

I ⊂ Nr (x).

296

7. Sequences in General Metric Spaces

Let y be the endpoint of I that is not equal to x. Since our construction of C implies y ∈ C, (7.3) follows from (7.4). One important property of Cantor’s set is that it is an uncountable set that contains no intervals. See Exercises 25 and 26 at the end of this chapter.

Notes on Essence and Generalizability In this chapter we developed a theory for sequences in the abstract setting of metric spaces. We observed that some of the important concepts of classical theory are not generalizable to the metric space context. One particularly important instance of such a concept is series. The definition of a series needs the algebraic operation of addition, and this has no meaning in a generic metric space. The concept of series can be considered, however, in those metric spaces which are also vector spaces. One important class of such metric spaces is composed of Banach spaces, which are studied in a field of mathematics known as functional analysis. Diestel’s great book [10] shows the way sequences and series are applied in the theory of Banach spaces.

Exercises 1. Prove Proposition 7.10. 2. Prove Theorem 7.12. 3. Prove Theorem 7.16. 4. Prove Proposition 7.21(1). 5. Which of the given sequences is bounded in the metric space (C([0, 1]), du )? (a) fn (x) = (cos xn )/n. (b) gn (x) = (1 − 3 ln(1 + x2n ))/(2 + n2 ). (c) hn (x) = ln(1 + xn ). 6. Find the set {fn } of subsequential limits of the sequence {fn } defined by fn (x) = 1 − nx if n is even and fn (x) =

nx − 2 (n + 1)x + 3

when n is odd as a sequence in (C([0, 1]), du ). 7. Let {xn } be a sequence in a metric space X. For each n ∈ N, let En = {xm : m ≥ n}. Prove that {xn } is Cauchy if and only if lim diam(En ) = 0.

n→∞

8. Find a metric space (X, d) with this property: Every sequence {xn } in X whose range is infinite diverges. 9. Prove that for a metric space (X, d) the following are equivalent. (a) The space X is complete.

Exercises

297

(b) If {An } is a sequence of nonempty closed subsets of X such that An+1 ⊆ An for every n and limn→∞ diam(An ) = 0, then ∞ n=1 An is a singleton. This result is known as Cantor’s intersection theorem. 10. Let {xn } and {yn } be sequences in a metric space (X, d) such that for every n ∈ N, d(xn , yn ) < n1 . Prove the following statements. (a) If {xn } is Cauchy, then so is {yn }. (b) If {xn } converges to x, then so does {yn }.   m :m∈N . 11. In the Euclidean space R, find the closure of the set m+1 12. Let {xn } be a sequence in a metric space (X, d), and let 0 < α < 1 be such that for every n ∈ N, d(xn+2 , xn+1 ) ≤ α d(xn+1 , xn ). Prove that {xn } is Cauchy. 13. Let {xn } and {yn } be Cauchy sequences in a metric space (X, d). Prove that {d(xn , yn )} is convergent as a sequence in the Euclidean space R. 14. Let (X, d) be a metric space. (a) Say that Cauchy sequences {xn } and {yn } are equivalent if lim d(xn , yn ) = 0.

n→∞

Prove that this is an equivalence relation. (b) Denote the set of all equivalence classes obtained from this relation by X ∗ . If A, B ∈ X ∗ , define D(A, B) = lim d(an , bn ), n→∞

where {an } and {bn } are arbitrary elements of A and B, respectively. Show that D is a distance function on X ∗ . (c) Prove that (X ∗ , D) is a complete metric space. (d) Verify that X can be considered as a dense subspace of X ∗ by observing that a function φ : X → X ∗ can be found with the following properties. • For all x, y ∈ X, d(x, y) = D(φ(x), φ(y)). • The set φ(X) is dense in X ∗ . • If X is complete, then φ(X) = X ∗ . The metric space X ∗ is called the completion of X. 15. Find the completion of the metric space (Q, de |Q ). 16. Suppose d and d1 are equivalent metrics on a set X. If one of the spaces (X, d) and (X, d1 ) is complete, what can be said about the completeness of the other space? 17. Is the space (Rn , dne ) complete? 18. Prove that in the Euclidean space R, each interval [a, b] is compact. Do this by completing the following steps. (a) Let {Gα }α∈I be an arbitrary open cover of [a, b]. Let F denote the set of all those elements x of [a, b] such that [a, x] can be covered by a finite number of the sets Gα . Show that F is nonempty and bounded from above, so that it has a supremum c. (b) Prove that c is actually an element of F. (c) Complete the proof by showing that c is equal to b.

298

7. Sequences in General Metric Spaces

19. Let A and B be compact and closed subsets of the Euclidean space Rn , respec→ → − → − → tively. Prove that A + B = {− a + b :− a ∈ A, b ∈ B} is a closed set. Show, by means of an example, that when A and B are both assumed to be closed, then A + B may not be a closed set. Why don’t we state our assertions in a general metric space? 20. In each case find an open cover of the given subset of the Euclidean space R that fails to have any finite subcover. (a) A = {1/n : n ∈ N}. (b) B = {2n : n ∈ N}. (c) C = [−1, 1). 21. Suppose (X, d) is a metric space and that K ⊆ Y ⊂ X. Prove that K is compact as a set in the subspace (Y, d|Y ) if and only if it is compact in the space (X, d). 22. Let (X, d) be a metric space in which every closed neighborhood is compact. Prove that in this metric space, the following statements are true. (a) Every closed and bounded subset of X is compact. (b) The space X is complete. 23. Show by means of an example that the compactness assumption cannot be removed from Theorem 7.54. 24. Prove the following extension of Theorem 7.54. Let {Kα }α∈I be such a collection of compact subsets of a metric space X that the intersection of each finite subcollection of {Kα }α∈I is nonempty. Prove that the set ( Kα α∈I

is nonempty. 25. Prove that Cantor’s set is uncountable. 26. Show that Cantor’s set contains no interval. Hint. First observe that C has no intersection with intervals of the form   3k + 1 3k + 2 , 3m 3m with k, m ∈ N. 27. Prove that every nonempty perfect subset of the space (Rn , dne ) is uncountable.

Chapter 8

Limit and Continuity of Functions in Metric Spaces

In the previous chapter we generalized some aspects of real sequence theory to the abstract context of metric spaces. As expected, it is now time to generalize the concepts of limit and continuity from real function theory to the metric space setting. Fortunately, the strengthened definitions of limit and continuity presented in Chapter 3 make our present task much easier. As in the case of sequence theory, some aspects of classical theory are generalizable and some others are not. It is one of our main tasks in this chapter to distinguish between these two classes of issues. We begin our first section with the generalization of the concept of limit.

8.1. The Definition of Limit in General Metric Spaces Our first goal in this chapter is to introduce an appropriate concept of limit for functions whose domain and range are (contained in) metric spaces. To pursue our studies in the widest framework possible, we allow the domain and range of our functions to be subsets of arbitrary metric spaces. Fortunately, we did most of the necessary work when proposing the strengthened definition of limit for real functions, namely Definition 3.12. Having that definition in mind, the following is a straightforward generalization. Definition 8.1. Let (X, dX ) and (Y, dY ) be metric spaces, let E ⊆ X, and let a be a limit point of E. Suppose also that f is a function from E into Y . We say that the limit of f at a is equal to y, or that f (x) tends to y when x approaches a, and we write limx→a f (x) = y if y is an element of Y for which the following statement is true. For every ε > 0, we can find δ > 0 such that 0 < dX (x, a) < δ and x ∈ E imply dY (f (x), y) < ε. 299

300

8. Limit and Continuity of Functions in Metric Spaces

As an exercise, try to interpret the statement above using the concept of distance. Again, the assumption that a is a limit point of E ensures the existence of an element of E in the δ-neighborhood Nδ (a), for every δ > 0. Note that by letting X = Y = R and dX = dY = de in the above definition, we obtain the strengthened concept of limit presented in Definition 3.12. Is further generalization possible? Based on our current knowledge, Definition 8.1 is the most general extension of Definition 3.12 we can imagine. In fact, letting Y = R and dY = de in Definition 8.1, we also obtain a generalization of Definition 3.12 to the metric space setting. But it is always worthwhile to develop our abstract theories in the widest framework possible. Similarly to what we proved for the limit of real functions, we can prove the following theorem. Theorem 8.2. Let (X, dX ) and (Y, dY ) be metric spaces, let E ⊆ X, and let a be a limit point of E. If f is a function from E into Y , then the following statements are true. (1) The limit of f at a is unique whenever it exists. (2) If f has a limit at a, then δ > 0 can be found such that f is bounded on the set Nδ (a) ∩ (E\{a}). (3) The following statement is equivalent to limx→a f (x) = y. For every sequence {an } in E\{a} that converges to a, the sequence {f (an )} converges to y. The proofs of (1), (2), and (3) of Theorem 8.2 proceed similarly to those of Theorem 3.13, Proposition 3.20, and Theorem 3.29. We therefore leave them as exercises for the reader. Example 8.3. Suppose (Y, ρ) is a discrete metric space, f is a one-to-one function from the Euclidean space R into Y , and a is an arbitrary element of R. Prove that f does not have a limit at a. Solution. We know that a is a limit point of the domain of f , namely R. Assume to the contrary that (8.1)

lim f (x) = y

x→a

for some y ∈ Y . We then find δ > 0 such that x ∈ R and 0 < de (x, a) = |x − a| < δ imply ρ(f (x), y) < 1/2. Since ρ is the discrete metric, the latter inequality holds only when y = f (x). Thus, by assuming (8.1), we find that for some δ > 0, the deleted neighborhood of a with radius δ is a subset of f −1 ({y}). Since f is one-toone, the latter set is either empty or a singleton. This shows that the inclusion we just mentioned is a contradiction.

8.1. The Definition of Limit in General Metric Spaces

301

Another unusual consequence of discreteness. We have seen in many situations in the previous chapters that the discrete metric allows us to obtain counterintuitive results. Example 8.3 gives us another instance of such unusual results. To see this, let Y = R and consider f to be the identity function in Example 8.3. Then f , viewed as a one-to-one function from the Euclidean space R onto (R, ρ), has no limit at any point of R! Compare this with the obvious fact that f , as a function from the Euclidean space R onto itself, has a limit at every point of R. Example 8.4. If

⎧ ⎨ f (x, y) =

xy |x|+|y|

(x, y) = (0, 0),

0

(x, y) = (0, 0),



prove that lim

(x,y)→(0,0) 2

f (x, y) = 0,

where we equipped the sets R and R with their Euclidean metrics d2e and de , respectively. Solution. If (x, y) = (0, 0) is given, then     xy  − 0 de (f (x, y), 0) =  |x| + |y| |x| |y| |x| + |y|  x2 + y 2 . ≤ =

The inequality is true because at least one of x and y, say x for example, is nonzero, and this allows us to write  |x| |y| |x| |y| ≤ = |y| = y 2 . |x| + |y| |x| Hence, when ε > 0 is given, by choosing 0 < δ < ε, we find that d2e ((x, y), (0, 0)) < δ implies de (f (x, y), 0) < ε. This proves the desired result. Next we note that everything we studied in Chapter 3 cannot be generalized to the metric space context. An instance is the notion of infinite limit. To see this, recall the formal definition of limx→a f (x) = +∞ (Definition 3.38). For every M > 0, we can find δ > 0 such that x ∈ E and 0 < de (x, a) = |x−a| < δ imply f (x) > M . Here, E ⊆ R is the domain of f and a is assumed to be a limit point of E. As you may have noticed, everything can be generalized to the metric space setting except the inequality f (x) > M . This inequality means that we should be able to compare f (x), an element of the range of f , with M , which is a real number, using

302

8. Limit and Continuity of Functions in Metric Spaces

some order 0, we can find δ > 0 such that x ∈ E and 0 < d(x, a) < δ imply f (x) > M . Is this an appropriate generalization of infinite limit to the context of metric spaces? The answer is no! The reason is that by presenting the above definition, we neglected the metric space structure of R and just used its order. For this reason, this cannot be considered to be an appropriate definition in the theory of metric spaces. Exercise 8.5. In each case determine if the given result or concept is generalizable to the setting of metric spaces. (1) The Squeeze Theorem for functions (Theorem 3.34). (2) The concept of limit at infinity (Definition 3.41). (3) The notion of infinite limit at infinity (Definition 3.47). (4) The concept of one-sided limit (Definition 3.50).

8.2. Continuity and Uniform Continuity Now that we have adjoined the concept of limit to our metric space theory, it is time to think of continuous functions from one metric space into another. We begin with the following straightforward generalization of Definition 3.59. Definition 8.6. Let (X, dX ) and (Y, dY ) be metric spaces, let E ⊆ X, and let a be a point of E. Assume also that f is a function from E into Y . We say that f is continuous at a if the following statement is true. For every ε > 0 there exists δ > 0 such that x ∈ E and dX (x, a) < δ imply dY (f (x), f (a)) < ε. Example 8.7. Suppose that (X, ρ) is a discrete metric space and (Y, d) is an arbitrary metric space. If f is any function from X into Y , then prove that f is continuous at every point of X. Solution. Let a be an arbitrary point of X. If ε > 0 is given, let δ = 1/2. Then x ∈ X and ρ(x, a) < δ imply d(f (x), f (a)) = 0 < ε, because ρ is the discrete metric. This proves that f is continuous at a. Exercise 8.8. If X and Y are metric spaces, verify that any constant function from X into Y is continuous at every point of X.

8.2. Continuity and Uniform Continuity

303

Example 8.9. Let I denote the identity function from the Euclidean space R into the discrete space (R, ρ). Prove that I is not continuous at 0. Solution. If we assume to the contrary that I is continuous at 0, then we find δ > 0 such that de (x, 0) < δ implies ρ(I(x), 0) < 1/2. Since I(x) = x and ρ is the discrete metric, the last inequality holds only when x = 0. So we found δ > 0 for which the interval (−δ, δ) is a subset of {0}. This contradiction proves the desired result. Exercise 8.10. At which points is the function I of the previous example continuous? As in classical theory, a can be either an isolated point or a limit point of E in the above definition of continuity. Thus, we can prove the following proposition similarly to Proposition 3.60. Proposition 8.11. Let (X, dX ) and (Y, dY ) be metric spaces, let E ⊆ X, and let a be a point of E. For a function f from E into Y , the following statements are true. (1) If a is an isolated point of E, then f is continuous at a. (2) If a is a limit point of E, then f is continuous at a if and only if lim f (x) = f (a).

x→a

Next, we state the following straightforward generalization of Theorem 3.70. Theorem 8.12. Let X and Y be metric spaces, let E ⊆ X, and let a be an element of E. Then, the following are equivalent for a function f from E into Y . (1) The function f is continuous at a. (2) For every sequence {an } of elements of E that converges to a, the sequence {f (an )} converges to f (a). Exercise 8.13. Prove the following generalization of Example 3.72. Let X and Y be metric spaces. If f and g are continuous functions from X into Y , prove that the set A = {x ∈ X : f (x) = g(x)} is a closed subset of X. Exercise 8.14. Verify that Theorem 3.73 (continuity of composite functions) can be generalized to the metric space setting. More precisely, observe that the composition of two continuous functions is continuous in the current context. Do this by proving a generalized version of Theorem 3.73. We will use this result in Example 8.28. Continuity on the Whole Space. Functions that are continuous everywhere in their domains deserve particular attention. We begin the study of such functions with a useful definition. Definition 8.15. Let X and Y be metric spaces, and let f be a function from X into Y . We say that f is continuous on X if f is continuous at every point of X.

304

8. Limit and Continuity of Functions in Metric Spaces

Discreteness of the domain implies continuity. With this definition of global continuity, we obtain the following equivalent formulation of Example 8.7. Any function whose domain is a discrete metric space is necessarily continuous on that space. Example 8.16. Suppose (X, d) is a metric space and that x0 is an arbitrary element of X. Prove that the function dx0 defined on X by dx0 (x) = d(x0 , x) is a continuous function from X into the Euclidean space R. Solution. If a and x are arbitrary elements of X, then de (dx0 (x), dx0 (a)) = = ≤ =

|dx0 (x) − dx0 (a)| |d(x0 , x) − d(x0 , a)| d(x0 , x0 ) + d(x, a) d(x, a),

where the inequality follows from Example 6.3. If ε > 0 is given and we choose 0 < δ < ε, then it follows from the above inequality that d(x, a) < δ implies de (dx0 (x), dx0 (a)) < ε. This shows that dx0 is continuous at a and, since a was arbitrary, that it is continuous on X. The following useful characterization of continuous functions will be used frequently in the sequel. It is not a generalization of a corresponding result in the classical theory of continuity, simply because we did not present its special case in Chapter 3! Theorem 8.17. If (X, dX ) and (Y, dY ) are metric spaces, for a function f from X into Y the following are equivalent. (1) The function f is continuous on X. (2) For every open set V ⊆ Y , the set f −1 (V ) is open in X. Proof. (1) ⇒ (2). Let V be an open subset of Y . To prove that f −1 (V ) is open as a subset of X, we consider an arbitrary element x of f −1 (V ) and show that it is actually an interior point of this set. If x is as above, then f (x) ∈ V and hence by the assumption that V is open, ε > 0 can be found such that (8.2)

Nε (f (x)) ⊆ V,

where Nε (f (x)) denotes the ε-neighborhood of f (x) in Y . Since f is continuous at x, there exists δ > 0 such that for every z ∈ X with dX (z, x) < δ, dY (f (z), f (x)) < ε. It now follows from this and (8.2) that the δ-neighborhood of x in X is a subset of f −1 (V ), showing that x is an interior point of f −1 (V ). (2) ⇒ (1). Let x be an arbitrary element of X. To prove that f is continuous at x, let ε > 0 be given. Denote the ε-neighborhood of f (x) in Y by V . Since V is an open subset of Y , our assumption shows that f −1 (V ) is an open subset of

8.2. Continuity and Uniform Continuity

305

X. That x ∈ f −1 (V ) now gives us some δ > 0 such that the δ-neighborhood of x in X is contained in f −1 (V ). This means that when z ∈ X satisfies dX (z, x) < δ, z ∈ f −1 (V ). This completes the proof that f is continuous at x, because z ∈ f −1 (V ) is equivalent to f (z) ∈ V , and this is in turn equivalent to dY (f (z), f (x)) < ε by our definition of V .  What does Theorem 8.17 say? Theorem 8.17 says that a function f from a metric space into some other is continuous if and only if the inverse image of any open subset of the range space is open in the domain. Corollary 8.18. If (X, dX ) and (Y, dY ) are metric spaces, for a function f from X into Y the following are equivalent. (1) The function f is continuous on X. (2) For every closed set F ⊆ Y , the set f −1 (F ) is closed in X. Proof. (1) ⇒ (2). If we assume that f is continuous on X and that F is a closed subset of Y , then f −1 (F c ) is an open subset of X by Theorem 8.17. Since f −1 (F c ) = (f −1 (F ))c , we find that f −1 (F ) is a closed subset of X. (2) ⇒ (1). If V is an open subset of Y , then V c is closed and f −1 (V c ) is a closed subset of X by (2). Now, the equality f −1 (V c ) = (f −1 (V ))c shows us that f −1 (V ) is an open subset of X. The continuity of f then follows from Theorem 8.17.  Example 8.19. Use Theorem 8.17 to show that the function f defined by ⎧ ⎨ 1/x x = 0, f (x) = ⎩ 0 x = 0, is not continuous as a function from (R, de ) into itself. Solution. It is easy to see that f −1 ((−1, 1)) = (−∞, −1) ∪ {0} ∪ (1, +∞). Note that f −1 ((−1, 1)) is not an open subset of the Euclidean space R, because 0 is not an interior point of this set. Since (−1, 1) itself is open, Theorem 8.17 tells us that f is not continuous. Uniform Continuity. Uniform continuity is an important property of real functions which is stronger than continuity. We studied uniformly continuous functions in Section 3.7. The concept of uniform continuity can easily be generalized to metric spaces. Compare the following with Definition 3.93.

306

8. Limit and Continuity of Functions in Metric Spaces

Definition 8.20. Let (X, dX ) and (Y, dY ) be metric spaces, let E be a subset of X, and let f be a function from E into Y . We say that f is uniformly continuous on E if the following statement is true. For every ε > 0, we can find δ > 0 such that for all x, y ∈ E satisfying dX (x, y) < δ, dY (f (x), f (y)) < ε. It is clear that a uniformly continuous function is continuous. We discussed the difference between continuity and uniform continuity in the classical case in Section 3.7. Therefore, the same can be said about the difference between corresponding abstract concepts. Example 8.21. Given an interval [a, b] and some x0 ∈ [a, b], define a function Tx0 from C([a, b]) into R by Tx0 (f ) = f (x0 ). Prove that Tx0 is uniformly continuous on C([a, b]) if we equip C([a, b]) and R with the metrics du and de , respectively. Solution. First note that for all f, g ∈ C([a, b]), de (Tx0 (f ), Tx0 (g)) = |f (x0 ) − g(x0 )| ≤ du (f, g). Hence, when ε > 0 is given, by choosing 0 < δ < ε, we find that f, g ∈ C([a, b]) and du (f, g) < δ imply de (Tx0 (f ), Tx0 (g)) < ε. This shows that Tx0 is uniformly continuous on C([a, b]). Exercise 8.22. If we replace de with the discrete metric ρ in the above example, is it still possible to deduce that Tx0 is uniformly continuous on C([a, b])? Example 8.23. A simple argument, similar to the one in Example 8.21, shows that the function dx0 introduced in Example 8.16 is uniformly continuous from (X, d) into the Euclidean space R. An important class of uniformly continuous functions consists of functions to which we refer as contractions. Definition 8.24. Let (X, dX ) and (Y, dY ) be metric spaces, and let f be a function from X into Y . We say that f is a contraction if 0 < α < 1 can be found such that dY (f (x), f (y)) ≤ α dX (x, y) for all x, y ∈ X. It can be easily verified that any contraction is uniformly continuous. Exercise 8.25. Give examples of uniformly continuous functions that fail to be contractions. Example 8.26. Prove that the function f (x) = ax + b is a contraction from (R, de ) into itself when |a| < 1.

8.3. Continuity and Compactness

307

Solution. For arbitrary x, y ∈ R, de (f (x), f (y)) = |(ax + b) − (ay + b)| = |a| |x − y| = |a| de (x, y). Hence, f is a contraction when |a| < 1. One important result in Section 3.7 was the fact that the continuity of a function on an interval of the form [a, b] implies its uniform continuity (Theorem 3.94). We will generalize this result in the next section (Theorem 8.29).

8.3. Continuity and Compactness The extreme value theorem (Theorem 3.78) is one of the most important results in the classical theory of continuity. So it is natural to think about its generalizability. Is it possible to find a generalization of the extreme value theorem to the context of metric spaces? To answer this question, let us recall what the theorem says. If a real-valued function f is continuous on [a, b], then f attains the supremum and infimum of its values in [a, b]. It is clear that intervals of the form [a, b], and more generally intervals, may be undefinable in arbitrary metric spaces. This is because intervals are defined using order, for example [a, b] = {x ∈ R : a ≤ x ≤ b}. To find a generalization of the extreme value theorem, we should think of those properties of [a, b] which are of a metric space flavor. When f is as in the statement of the extreme value theorem, x0 and y0 in [a, b] can be found such that for every x ∈ [a, b], (8.3)

f (x0 ) ≤ f (x) ≤ f (y0 ).

It follows from the extreme value theorem that f ([a, b]) is also a closed and bounded interval. To see this, note that by the continuity of f and Theorem 3.86, f ([a, b]) is an interval. By (8.3), this interval is contained in [f (x0 ), f (y0 )]. But, if we consider some element z of (f (x0 ), f (y0 )), then z is a real number between two elements of the interval f ([a, b]), and it is therefore an element of f ([a, b]) by Theorem 1.58. So we conclude that (8.4)

f ([a, b]) = [f (x0 ), f (y0 )],

where x0 and y0 are determined by (8.3). Note that by (8.4), the continuous image of [a, b], as a closed and bounded subset of R, is an interval of this kind. Since in the Euclidean space R the property of being closed and bounded is equivalent to compactness, we obtain the following equivalent formulation of the extreme value theorem. In the Euclidean space R, the continuous image of any compact interval is a compact interval. This allows us to to guess that in general, the continuous image of a compact subset of a metric space will be compact. That this guess is indeed true is the content of the following theorem. Theorem 8.27. Let X and Y be metric spaces. If X is compact and f is a continuous function from X into Y , then f (X) is also compact.

308

8. Limit and Continuity of Functions in Metric Spaces

Proof. Suppose {Vα }α∈I is an open cover of f (X) in Y . Then ' ' X = f −1 (f (X)) ⊆ f −1 ( Vα ) = f −1 (Vα ), α∈I

α∈I

by the known properties of the inverse image of functions. Hence {f −1 (Vα )}α∈I is a cover for X. Since each Vα is open, Theorem 8.17 says that this is indeed an open cover of X. That X is compact now gives us a finite subset {α1 , . . . , αn } of I such that n ' f −1 (Vαi ). X⊆ i=1

Now, f (X) ⊆ f (

n ' i=1

f −1 (Vαi )) =

n '

f (f −1 (Vαi )) ⊆

i=1

n '

Vαi .

i=1

Thus, {Vαi }ni=1 is a finite subcover of {Vα }α∈I for f (X), and this shows that f (X) is compact.  The extreme value theorem can be deduced from Theorem 8.27. To see this, let X be [a, b] considered as a subspace of (R, de ), and let Y be the latter metric space. Then, it follows from Theorem 2.81 (the compactness of [a, b]) and Theorem 8.27 that f ([a, b]) is a compact, and hence closed and bounded, subset of the Euclidean space R. The boundedness of f ([a, b]) now gives us the real numbers m = inf f ([a, b]), M = sup f ([a, b]). Finally, the closedness of f ([a, b]), in view of Corollary 7.22 (and its obvious analogue for infima), shows that m and M belong to f ([a, b]). Example 8.28. Show by means of an example that the continuous image of a noncompact set may be compact. Solution. As a result of Example 7.45 we know that the set K = {f ∈ C([0, 1]) : ∀x ∈ [0, 1], |f (x)| ≤ 1} is not compact in the metric space (C([0, 1]), du ). Define a function φ from C([0, 1]) into R (equipped with the Euclidean distance function) by φ(f ) = |f (1)|. Then φ is the composition of the absolute value function with the function T1 introduced in Example 8.21, and it is therefore continuous on C([0, 1]). We claim that (8.5)

φ(K) = [0, 1].

Once this is proved, we find a noncompact set K whose image under the continuous function φ is a compact set. To prove (8.5), we first note that for every f ∈ K, 0 ≤ φ(f ) = |f (1)| ≤ 1, and that this shows (8.6)

φ(K) ⊆ [0, 1].

On the other hand given α ∈ [0, 1], the function fα defined on [0, 1] by fα (x) = αx is an element of K and φ(fα ) = |fα (1)| = |α| = α,

8.3. Continuity and Compactness

309

showing that [0, 1] ⊆ φ(K).

(8.7)

Now, (8.5) follows from (8.6) and (8.7). Next, we state the generalization of Theorem 3.94 as promised in the previous section. The idea behind this generalization is that intervals of the form [a, b] are compact subsets of the Euclidean space R. Theorem 8.29. Suppose f is a continuous function from a compact metric space (X, dX ) into a metric space (Y, dY ). Then, f is uniformly continuous on X. Proof. Let ε > 0 be given. Since f is continuous at every x ∈ X, for each such x we can find δx > 0 such that y ∈ X and dX (y, x) < δx imply dY (f (y), f (x)) < ε/2. To prove the uniform continuity of f , we need to replace the numbers δx with a single number δ > 0 that can be used in the definition of continuity for every x. If we are sure that the infimum of the set Δ := {δx : x ∈ X} is positive, then δ = inf Δ would be an appropriate choice. But since X may be an infinite set in general, the infimum may be 0. Here is the point at which compactness comes to our assistance. To understand how, associate to each x ∈ X the set Nx := {y ∈ X : dX (y, x) < δx /2}. Since Nx contains x for every x ∈ X, {Nx }x∈X is an open cover of X. Now, the compactness of X gives us some elements x1 , . . . , xn of this set such that (8.8)

X = Nx1 ∪ · · · ∪ Nxn .

We claim that δ := (1/2) min{δx1 , . . . , δxn } is the positive number we were looking for. To see this, suppose x and y are elements of X such that dX (x, y) < δ. By (8.8), m ∈ {1, . . . , n} exists such that x ∈ Nxm , or equivalently dX (x, xm )
y}, and these sets are separated. In general, if we remove a straight line from the plane, we obtain a disconnected set.

8.4. Connectedness and Its Relation to Continuity

313

Return to the Main Problem. We began this section with a desire to generalize the intermediate value theorem to the metric space context. Based on our above argument, it is now easy to believe the following theorem as the desired generalization. Theorem 8.35. Suppose X and Y are metric spaces, E ⊆ X, and f is a continuous function from X into Y . If E is a connected subset of X, then f (E) is a connected subset of Y . Proof. We prove the theorem by the contrapositive law. So, assume that f (E) is a disconnected subset of Y . This gives us subsets A and B of Y such that A ∩ B = A ∩ B = ∅ and f (E) = A ∪ B.

(8.13)

We use these sets to prove that E is also disconnected. Since by (8.13) (8.14)

E ⊆ f −1 (f (E)) = f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B),

we see by letting C = E ∩ f −1 (A) and D = E ∩ f −1 (B) that E = C ∪ D. So to complete the proof it is enough to show that the sets C and D are separated. We prove this in two steps. (1) The sets C and D are nonempty. If we assume that C = ∅, then it follows from our definition of C and (8.14) that E ⊆ f −1 (B) and hence f (E) ⊆ f (f −1 (B)) ⊆ B. Since A ∩ B = ∅, it follows that f (E) ∩ A = ∅. This contradicts our choice of A and B and shows that C is indeed nonempty. That D is nonempty follows by a similar argument. (2) The intersections C ∩ D and C ∩ D are empty. We only prove that C ∩ D = ∅. The proof that C ∩ D = ∅ can be done similarly. Since C ⊆ f −1 (A) and A ⊆ A, C ⊆ f −1 (A).

(8.15)

But A is closed and f is continuous. Hence Corollary 8.18 tells us that by (8.15), C ⊆ f −1 (A).

(8.16)

Since D ⊆ f −1 (B), it follows from (8.16) that C ∩ D ⊆ f −1 (A ∩ B) = f −1 (∅) = ∅. This shows that C ∩ D = ∅. What does Theorem 8.35 say? Theorem 8.35 says that continuous functions map connected sets onto connected ones or that the continuous image of a connected set is connected.



314

8. Limit and Continuity of Functions in Metric Spaces

8.5. Banach’s Fixed Point Theorem We saw several examples of complete metric spaces in the previous chapter. But what is the property of completeness good for? Perhaps the best answer to this question is given in Stefan Banach’s celebrated fixed point theorem. To state this theorem, recall that if X is any set and f : X → X is a function, a fixed point of f is an element x of X such that f (x) = x. Theorem 8.36 (Banach’s Fixed Point Theorem). Let (X, d) be a complete metric space, and let f be a contraction from X into itself. Then f has a unique fixed point in X which can be found using the following iterative method. Starting from an arbitrary element x0 of X, we define a sequence {xn } in X by xn = f (xn−1 ); n ∈ N.

(8.17)

Then the sequence {xn } converges to the unique fixed point of f . Proof. Suppose that 0 < α < 1 is such that d(f (x), f (y)) ≤ αd(x, y)

(8.18)

for all x, y ∈ X. We first prove that f can have at most one fixed point. In fact, if y and z are fixed points for X, then (8.18) implies that d(y, z) = d(f (y), f (z)) ≤ α d(y, z). Since 0 < α < 1, the above inequality holds only when d(y, z) = 0, or equivalently when y = z. We now turn to the existence part of the proof. Let x0 be an arbitrary element of X and define a sequence {xn } as in (8.17). We claim that {xn } converges to a fixed point of f . Once this is established, the proof is completed by the above argument. Since X is complete, to prove the convergence of {xn } it is enough to show that this sequence is Cauchy. First note that for every n ∈ N, d(xn+1 , xn ) = d(f (xn ), f (xn−1 )) ≤ α d(xn , xn−1 ) ≤ · · · ≤ αn d(x1 , x0 ). Thus, for all natural numbers n and m with m > n, d(xm , xn )

≤ d(xm , xm−1 ) + · · · + d(xn+1 , xn ) m−n−1  d(xn+j+1 , xn+j ) = j=0



m−n−1 

αn+j d(x1 , x0 )

j=0



αn d(x1 , x0 ). 1−α

Since 0 < α < 1, the sequence {αn } converges to 0. Hence, the above inequalities prove that {xn } is Cauchy.

8.5. Banach’s Fixed Point Theorem

315

Now, assume that x is the limit of the sequence {xn }. Then, by the fact that f is continuous and by our definition of the sequence {xn }, f (x) = f ( lim xn ) = lim f (xn ) = lim xn+1 = x. n→∞

n→∞

n→∞

This shows that x is a fixed point for f and completes the proof.



Example 8.37. Show by means of an example that the completeness assumption is essential in Banach’s fixed point theorem. Solution. Consider X = (0, 1) as a subspace of the Euclidean space R. The function f defined by f (x) = x/2 is a contraction from X into itself. If f (x0 ) = x0 for some x0 ∈ X, then we would have x0 = 0, which is a contradiction because 0 ∈ X. Thus f has no fixed points in X. Note that the space X is not complete. Example 8.38. By Example 8.26, the function f (x) = ax + b is a contraction from (R, de ) into itself when |a| < 1. In this case, Theorem 8.36 says that f has a unique fixed point x0 in R. To find x0 , we may solve the equation f (x0 ) = ax0 + b = x0 to obtain

b . 1−a Another way for finding x0 is to use the iterative method of Theorem 8.36. Starting from an arbitrary real number β, we construct a sequence {yn } in R as follows: x0 =

y0 = β, yn = f (yn−1 ); n ∈ N. Then, it can be easily seen that for every n ∈ N, n−1     1 − an n (8.19) yn = a β + b ai = an β + b . 1−a i=0 As we mentioned above, the function f is a contraction when |a| < 1. Hence, Theorem 8.36 says in this case that the sequence {yn } converges to x0 . This can be easily seen by letting n tend to infinity in (8.19). Abstract Theories May Be Applied. Banach’s fixed point theorem is a good instance of an abstract theorem with numerous applications in mathematics and also in other sciences such as physics. As an illustration for this fact, we show the way the theorem can be applied in connection with the existence and uniqueness of solutions for certain integral equations. Example 8.39. The (mathematical) aim of considering the integral equation  1/2 (x + y)g(y) dy + cos x; x ∈ [0, 1/2] (8.20) g(x) = 0

is to find a solution g = g(x) for it in the space C([0, 1/2]). Using Banach’s fixed point theorem, we can prove that this integral equation has a unique solution in C([0, 1/2]). To see this, let us define a function f by  1/2 (f (h))(x) := (x + y)h(y) dy, 0

316

8. Limit and Continuity of Functions in Metric Spaces

for every h ∈ C([0, 1/2]) and every x ∈ [0, 1/2]. We claim that f (h) is actually an element of C([0, 1/2]). This can be easily seen, because for all x, z ∈ [0, 1/2],  1/2 |(x + y) − (z + y)| |h(y)| dy |(f (h))(x) − (f (h))(z)| ≤ 0

≤ ((1/2) sup{|h(w)| : w ∈ [0, 1/2]}) |x − z|. This shows that the function J defined by  1/2 (J(h))(x) := (x + y)h(y) dy + cos x 0

for every h ∈ C([0, 1/2]) and every x ∈ [0, 1/2] is in fact a function from C([0, 1/2]) into itself. Considering this function, we can transform the problem of finding a solution for (8.20) to the problem of finding a fixed point for J. More precisely, a function g ∈ C([0, 1/2]) is a solution of (8.20) if and only if it satisfies (8.21)

J(g) = g.

To prove that such a function g exists and is unique using Banach’s fixed point theorem, it is enough to show that J is a contraction with respect to the metric du . To see this, note that for every x ∈ [0, 1/2] and all h, p ∈ C([0, 1/2]),  1/2 |x + y| |h(y) − p(y)| dy |(J(h))(x) − (J(p))(x)| ≤ 0



(1/2) du (h, p),

and that this shows du (J(h), J(p)) ≤ (1/2)du (h, p). The function J is therefore a contraction from C([0, 1/2]) into itself, and Banach’s fixed point theorem gives a unique element g of C([0, 1/2]) for which (8.21) is true. By our above argument, g is the unique solution of (8.20). Exercise 8.40. Apply the iterative method described in Banach’s fixed point theorem with the initial guess g0 (x) = x to find the first three terms g1 , g2 , and g3 of the sequence that converges to the function g of the above example.

Notes on Essence and Generalizability In this chapter we generalized the concepts of limit and continuity of functions to the context of metric spaces. In particular, we generalized the intermediate value theorem and used this as an opportunity to define the concept of connectedness. The most general context in which the concepts of limit and continuity can be studied is the theory of topological spaces. It turns out that any metric space is a topological space, but not conversely. If you have the chance to study topological spaces within a course entitled “General Topology”, you will see that some of the results we proved in the metric space theory are not generalizable to the context of topological spaces. A good reference for general topology is [21].

Exercises

317

Exercises 1. Prove Theorem 8.2. 2. Prove Proposition 8.11. 3. Prove Theorem 8.12. 4. Suppose X and Y are metric spaces and f is a function from X into Y . Prove that each of the following statements is equivalent to the continuity of f on X. (a) For every A ⊆ X, f (A) ⊆ f (A). (b) For every B ⊆ Y , f −1 (B) ⊆ f −1 (B). 5. Suppose X and Y are metric spaces, f is a continuous function from X into Y and A is a dense subset of X. Show that f (A) is dense in f (X). 6. Let X be a compact metric space and consider a continuous function f from X into X. Prove that a nonempty set A ⊆ X exists such that f (A) = A. 7. Let (X, d) be a compact metric space. If f is a function from X into X such that for all x, y ∈ X, (8.22)

d(f (x), f (y)) = d(x, y),

prove that f is onto. If we omit the compactness assumption, is it still possible to deduce from (8.22) that f is onto? 8. Let f be a continuous one-to-one function from a compact metric space X onto a metric space Y . Prove that the inverse function f −1 is a continuous function from Y onto X. 9. The aim of this exercise is to show that the compactness assumption in the above exercise is essential. Consider X = [0, 2π) and Y = {(x, y) ∈ R2 : d2e ((x, y), (0, 0)) = 1} as subspaces of (R, de ) and (R2 , d2e ), respectively. Recall that Y is, geometrically, the set of all points that lie on the circle with center at the origin and radius 1. Moreover, recall that we may represent the points of Y parametrically as those points (x, y) in R2 such that x = cos t, y = sin t for some 0 ≤ t < 2π. Define a function f : X → Y by f (t) = (cos t, sin t). Prove that f is a continuous one-to-one function from X onto Y , and that nevertheless the inverse function f −1 is not continuous on Y . Note that the space X is not compact. 10. Consider A = {(x/2, cos x) : x ∈ [0, π]} as a set in the metric space (R2 , d2e ). Is A a compact set? Is A connected? Why? 11. Suppose A and B are disjoint open subsets of a metric space. Prove that A and B are separated. 12. Prove that R2 \{(0, 0)} is a connected subset of the space (R2 , d2e ). Compare this with Example 8.33.

318

8. Limit and Continuity of Functions in Metric Spaces

13. In each case determine if the given set is connected as a set in the space (R2 , d2e ). (a) {(x, y) : y > x2 }. (b) {(x, y) : |x| + |y| = 1}. 14. Is the union of two connected sets always connected? What about the intersection? 15. Let X be a compact metric space, and let {Fn } be a sequence of nonempty ∞ closed subsets of X such that Fn+1 ⊆ Fn for every n. Prove that n=1 Fn is nonempty and connected. 16. Is Cantor’s set a connected subset of the Euclidean space R? 17. Prove that in any metric space, the closure of a connected set is connected. 18. Show by means of an example that the interior of a connected set may fail to be connected. 19. Prove that a connected metric space with at least two points is uncountable. 20. In each case determine if Banach’s fixed point theorem can be applied to show that the integral equation has a unique solution. 1/3 (a) g(x) = 0 sin(x + y)g(y) dy − x5 ; x ∈ [0, 1/3]. (b) g(x) =

1/2 2 (x 0

+ y 3 )g(y) dy + sin x; x ∈ [0, 1/2].

Chapter 9

Sequences and Series of Functions

In the previous chapters we observed that the terms of a sequence may be functions. Our aim in this final chapter is to study such sequences and to define two important kinds of convergence for them. Since series appear naturally wherever sequences of real numbers exist, series of functions are also studied within this chapter. To pursue our studies in the widest possible framework, we begin with functions that map the elements of arbitrary sets to those of metric spaces. In our study of the series, however, we restrict our attention to those functions whose range is a subset of the Euclidean space R.

9.1. Sequences of Functions and Their Pointwise Convergence We encountered sequences of functions in two places in the previous chapters. These are the theory of power series and the metric space (C([a, b]), du ). In its broadest sense, a sequence of functions is nothing but a sequence {fn } whose terms are functions from a set X into some set Y . We said that this definition is stated in the broadest sense because the domain and range of the functions are assumed to be sets, and we did not make any assumptions about the existence of extra structures on these sets. As we will see shortly, to be able to talk about the convergence of sequences of functions, we need to equip the range space Y with a metric. We briefly discussed the way sequences of functions appear in the theory of power end of Chapter 2. In a more general context, for a power series ∞ series at the n a (x − x ) centered at some x0 ∈ R, we may define a sequence {sn } of 0 n=0 n polynomials by sn (x) = a0 + a1 (x − x0 ) + · · · + an (x − x0 )n . 319

320

9. Sequences and Series of Functions

Now assume that the series converges for every x in a subset A of R, A being a bounded interval centered at x0 or A = R, and define a function f on A by (9.1)

f (x) =

∞ 

an (x − x0 )n .

n=0

Then (9.1) shows that for every x ∈ A, (9.2)

lim sn (x) = f (x).

n→∞

Since (9.2) says that the sequence {sn (x)} converges to f (x) for every point x of A, we may describe (9.2) by saying that {sn } converges to f pointwise on A. The appearance of sequences of functions in (C([a, b]), du ) is quite natural, because this is actually a space of functions. In Example 7.7 we found an interpretation of the convergence fn → f in the metric space (C([a, b]), du ). More precisely, we observed that {fn } converges to f with respect to du if and only if the following statement (from Chapter 7) is true. (UCO) For every ε > 0 there exists N ∈ N such that for every n ≥ N and every x ∈ [a, b], |fn (x) − f (x)| < ε. It is clear that [a, b] can be replaced by an arbitrary set X in (UCO), and the assumption that f is continuous can be also removed. Moreover, we may replace (R, de ), which is the range of the involved functions in (UCO), with a metric space (Y, d). These observations allow us to rewrite an appropriate version of (UCO) for sequences of functions from a set X into some metric space (Y, d). As we mentioned in the discussion just after Example 7.7, the resulting type of convergence for sequences of functions is called uniform convergence. Our aim in this final chapter is to precisely define and study the notions of pointwise and uniform convergence discussed above. With this goal in mind, we begin with the formal definition of pointwise convergence and postpone the study of uniform convergence to the next section. Definition 9.1. Let X be a nonempty set, and let Y be a metric space. Suppose also that {fn } is a sequence of functions from X into Y . We say that {fn } converges pointwise on X to a function f : X → Y if for every x ∈ X, (9.3)

lim fn (x) = f (x).

n→∞

In this situation, we also say that f is the pointwise limit of {fn } on X, and we write fn → f pointwise on X. Although X can be any set in the above definition, the assumption that Y is a metric space is necessary for (9.3) to be meaningful. Example 9.2. For each n ∈ N, let fn be a function from [0, 1] into R defined by fn (x) = xn . Then, {fn (x)} converges to 0 if 0 ≤ x < 1 and {fn (1)} converges to 1. Thus, the pointwise limit of {fn } on [0, 1] is ⎧ ⎨ 0 0 ≤ x < 1, f (x) = ⎩ 1 x = 1.

9.1. Sequences of Functions and Their Pointwise Convergence

321

Example 9.3. Suppose {fn } is a sequence of functions defined on the subspace [0, +∞) of the Euclidean space R by ⎧ ⎨ nx 0 ≤ x ≤ n1 , fn (x) = ⎩ 1 x > n1 . nx Prove that {fn } converges pointwise to a function f on [0, +∞). Solution. Since fn (0) = 0 for every n, the sequence {fn (0)} converges to 0. If x > 0, then N ∈ N can be found such that 1/N < x. Hence, x > 1/n and therefore fn (x) = 1/(nx) for every n ≥ N . This shows that 1 =0 nx when x > 0. The pointwise limit of the sequence {fn } on [0, +∞) is therefore the constant function f ≡ 0. lim fn (x) = lim

n→∞

n→∞

Example 9.4. Suppose {fn } is the sequence of functions from R into the discrete space (R, ρ) defined by fn (x) = x/n. If x ∈ R is nonzero, then {fn (x)} is a sequence with pairwise distinct terms, and it is therefore divergent in (R, ρ) by Example 7.3. Of course, the sequence {fn (0)} converges to 0. The conclusion is that {fn } does not converge pointwise on R. It is easy to see that {fn } converges pointwise to the constant function f ≡ 0 on R if we consider the fn ’s as functions from R into (R, de ). Example 9.5. Define a sequence {φn } of functions from N into (C([0, 1]), du ) by φn (m) = gm,n , where (n + 1)x gm,n (x) = mn for every x ∈ [0, 1]. Since for every m ∈ N and every x ∈ [0, 1], x (n + 1)x = , mn m it seems that {φn } converges pointwise on N to a function φ which is defined as follows. For every m ∈ N, φ(m) = gm is a function on [0, 1] such that for every x, x gm (x) = . m To prove this, we should use the distance function du . In fact, given n, m ∈ N, lim

n→∞

du (φn (m), φ(m)) = sup{|gm,n (x) − gm (x)| : x ∈ [0, 1]} 

  (n + 1)x x   −  : x ∈ [0, 1] = sup  mn m =

1 . mn

This shows that for every m ∈ N, lim du (φn (m), φ(m)) = 0.

n→∞

That {φn } converges pointwise to φ on N now follows from Exercise 7.2.

322

9. Sequences and Series of Functions

When we work with a sequence of functions in the general sense, i.e., a sequence of functions from a set X into a metric space (Y, d), it is not meaningful to talk about the continuity and Riemann integrability of the involved functions. For this reason, we may restrict our attention to the following cases. (1) The set X is also equipped with a metric. In this case, for a sequence {fn } of functions from X into Y , we may think of the continuity of the fn ’s. We are therefore faced with a natural question in this case. (PCC) If each fn is continuous on X and fn → f pointwise on X, does it follow that f is also continuous on X? (Here, PCC is the abbreviation of pointwise convergence and continuity.) (2) The sets X and Y are both subsets of R. In this case, given any sequence {fn } of functions from X into Y , we may think of the Riemann integrability of the fn ’s. This observation leads us to the following question. (PCI) If [a, b] is an interval contained in X, each fn is Riemann integrable on [a, b] and fn → f pointwise on [a, b], is f necessarily Riemann integrable on [a, b]? If yes, is the following equality also true?  b  b (9.4) lim fn (x) dx = f (x) dx. n→∞

a

a

(Here, PCI is the abbreviation of pointwise convergence and integrability.) What does (9.4) say? Note that (9.4) can be rewritten as   b lim fn (x) dx = n→∞

a

b a



lim fn (x) dx.

n→∞

So when (9.4) is assumed (or proved) to be true for a sequence {fn } of integrable functions, it shows that the limit of that sequence can be interchanged with the Riemann integral. The relation of the convergence of sequences and differentiability is studied in Exercises 15–17 at the end of this chapter. A First Examination of (PCC) and (PCI). We have already seen in Example 9.2 that the answer of (PCC) is negative: although each function fn (x) = xn is continuous on [0, 1] and {fn } converges pointwise to the function ⎧ ⎨ 0 0 ≤ x < 1, f (x) = ⎩ 1 x = 1, on [0, 1], the function f is not continuous on [0, 1]. Example 9.6. Let {rn } be an enumeration of the rational numbers of the interval [0, 1]. For each n define a function gn on [0, 1] by ⎧ ⎨ 1 x ∈ {r1 , . . . , rn }, gn (x) = ⎩ 0 otherwise.

9.2. Uniform Convergence

323

Then, each gn is discontinuous at only a finite number of points of [0, 1], and in view of Theorem 5.34 this shows that gn ∈ R([0, 1]) for every n ∈ N. Nevertheless, the sequence {gn } converges pointwise to ⎧ ⎨ 1 x ∈ Q ∩ [0, 1], g(x) = ⎩ 0 x ∈ Qc ∩ [0, 1], on [0, 1]. The function g is the restriction of Dirichlet’s function to [0, 1], which is not integrable on this interval by Example 5.3. This shows that (PCI) has a negative answer: The pointwise limit of a sequence of Riemann integrable functions is not necessarily Riemann integrable. To prove the pointwise convergence of {gn } to g on [0, 1], first note that when x ∈ Qc ∩ [0, 1], gn (x) = 0 for every n ∈ N, and hence {gn (x)} converges to 0 in this case. On the other hand, if x ∈ Q ∩ [0, 1], then x = rj for a unique j ∈ N. Thus gn (x) = 1 for every n ≥ j and this shows that limn→∞ gn (x) = 1. A note on pointwise convergence. The above examples show that pointwise convergence is not as nice as we wish, because it fails to transfer the important properties of continuity and Riemann integrability to the limit function.

9.2. Uniform Convergence Motivated by the above-mentioned deficiencies of the pointwise convergence, we turn to the following stronger notion of convergence. Definition 9.7. Let X be a nonempty set, and let (Y, d) be a metric space. Suppose also that {fn } is a sequence of functions from X into Y . We say that {fn } converges uniformly on X to a function f : X → Y if the following statement is true. (MUCO) For every ε > 0 there exists a natural number N such that the inequality d(fn (x), f (x)) < ε holds for every n ≥ N and every x ∈ X. (Here, we used MUCO as an abbreviation for metric space uniform convergence.) In this situation, we also say that f is the uniform limit of {fn } on X, and we write fn → f uniformly on X. Example 9.8. Let {φn } be the sequence of functions introduced in Example 9.5. We observed in that example that {φn } converges pointwise to a function φ on N. Is the convergence uniform? Solution. We observed in Example 9.5 that for arbitrary natural numbers m and n, 1 du (φn (m), φ(m)) = . mn It now follows from this that for every m ∈ N, 1 (9.5) du (φn (m), φ(m)) ≤ . n

324

9. Sequences and Series of Functions

Let ε > 0 be given and find N ∈ N such that 1/N < ε. Then (9.5) shows that for every n ≥ N and every m ∈ N, du (φn (m), φ(m)) < ε. This proves that {φn } converges uniformly to φ on N. Some Useful Characterizations of Uniform Convergence. We begin with a useful criterion for uniform convergence whose straightforward proof is left as an exercise. Theorem 9.9. Let {fn } be a sequence of functions from a set X into a metric space (Y, d), and let f : X → Y be a function. For each n ∈ N, let Mn = sup{d(fn (x), f (x)) : x ∈ X}. Then, fn → f uniformly on X if and only if the sequence {Mn } converges to 0. Example 9.10. We observed in Example 9.3 that the sequence {fn } defined by ⎧ ⎨ nx 0 ≤ x ≤ n1 , fn (x) = ⎩ 1 x > n1 , nx converges pointwise to the constant function f ≡ 0 on [0, +∞). Is the convergence fn → f uniform on [0, +∞)? Solution. Given n ∈ N, we note that for every 0 ≤ x ≤ 1/n, 0 ≤ |fn (x)| = nx ≤ 1, and for every x > 1/n, 0 < |fn (x)| =

1 < 1. nx

Hence, Mn = sup{|fn (x)| : x ∈ [0, +∞)} = 1. The sequence {Mn } therefore converges to 1, and Theorem 9.9 shows that the convergence of {fn } to f is not uniform. Our next result is Cauchy’s criterion for uniform convergence. This will be used in the proof of Theorem 9.21. Theorem 9.11. Let {fn } be a sequence of functions from a set X into a complete metric space (Y, d). Then, {fn } converges uniformly on X if and only if the following statement is true. For every ε > 0, we can find N ∈ N such that for all m, n ≥ N and every x ∈ X, d(fn (x), fm (x)) < ε. Proof. If we assume that {fn } converges uniformly to some function f on X, then for the given ε > 0 we can find N ∈ N such that ε d(fn (x), f (x)) < 2 holds for all n ≥ N and every x ∈ X. So, for all m, n ≥ N and every x ∈ X, ε ε d(fn (x), fm (x)) ≤ d(fn (x), f (x)) + d(f (x), fm (x)) < + = ε. 2 2

9.2. Uniform Convergence

325

As for the converse, note that when Cauchy’s condition is satisfied, the sequence {fn (x)} is Cauchy in Y for every x ∈ X. Since (Y, d) is complete, this gives us a function f : X → Y as the pointwise limit of {fn } on X. We leave it to the reader  to verify that f is actually the uniform limit of {fn } on X. Note that the assumption that the range space is complete is only used in the second part of the proof. What does Theorem 9.11 say? Cauchy’s criterion presents a limit-free description of uniform convergence. For this reason, when a sequence {fn } satisfies Cauchy’s condition, we say that it is uniformly Cauchy. Theorem 9.11 therefore says that a sequence {fn } of functions from a set X into a complete metric space Y is uniformly convergent if and only if it is uniformly Cauchy.

Uniform Convergence and Continuity. It is now time to show that the uniform convergence is better than pointwise convergence in connection with continuity. Theorem 9.12. Let (X, dX ) and (Y, dY ) be metric spaces, and let {fn } be a sequence of continuous functions from X into Y . If {fn } converges uniformly to a function f on X, then f is also continuous on X. Proof. In view of Theorem 8.17, it is enough to show that f −1 (V ) is an open subset of X for every open set V ⊆ Y . Let V be an open subset of Y , and let x be an arbitrary point in f −1 (V ). We show that x is an interior point of f −1 (V ). Since x ∈ f −1 (V ), f (x) ∈ V and by the assumption that V is open, we find r > 0 such that y ∈ Y and dY (y, f (x)) < r imply y ∈ V . On the other hand, the assumption that f is the uniform limit of {fn } on X gives us some N ∈ N such that for every z ∈ X, r dY (fN (z), f (z)) < . 3 Since fN is continuous on X, it is continuous at x and we find, accordingly, some δ > 0 such that w ∈ X and dX (w, x) < δ imply r dY (fN (w), fN (x)) < . 3 Now, we claim that the δ-neighborhood of x in X is a subset of f −1 (V ). As we discussed above, the truth of this claim establishes our desired result. So, assume that w ∈ X is such that dX (w, x) < δ. Then, by our choice of N and δ, dY (f (w), f (x)) ≤ dY (f (w), fN (w)) + dY (fN (w), fN (x)) + dY (fN (x), f (x)) r r r + + = r. < 3 3 3 Since the r-neighborhood of f (x) in Y is contained in V , we find that f (w) ∈ V , or equivalently w ∈ f −1 (V ). This completes the proof. 

326

9. Sequences and Series of Functions

What does Theorem 9.12 say? Theorem 9.12 says that the uniform limit of a sequence of continuous functions is continuous. Example 9.13. Show by means of an example that when a sequence {fn } converges pointwise to f and the fn ’s and f are all continuous, we cannot deduce that the convergence fn → f is uniform. Solution. The sequence {fn } defined by fn (x) = x/n converges to the constant function f ≡ 0 on R. It is also clear that the fn ’s and f are all continuous on R. Nevertheless, the convergence fn → f is not uniform. This is because for every n,   n2  x   = n, Mn = sup  − 0 : x ∈ R ≥ n n and hence the sequence {Mn } cannot tend to 0. Uniform Convergence and Riemann Integrability. The following theorem shows that uniform convergence treats the Riemann integrability in the way we expect. Theorem 9.14. Suppose {fn } is a sequence of real-valued functions which are Riemann integrable on [a, b]. If fn → f uniformly on [a, b], then f is also Riemann integrable on [a, b] and  b  b f (x) dx = lim fn (x) dx. (9.6) n→∞

a

a

Proof. For each n ∈ N, let Mn = sup{|fn (x) − f (x)| : x ∈ [a, b]}. It follows that for every x ∈ [a, b], fn (x) − Mn ≤ f (x) ≤ fn (x) + Mn . By Exercise 5.16 we then obtain  b  b  (fn (x) − Mn ) dx ≤ f (x) dx ≤ (9.7) a

a



b

f (x) dx ≤ a

b

(fn (x) + Mn ) dx a

and consequently that  b  b  b  b f (x) dx − f (x) dx ≤ (fn (x) + Mn ) dx − (fn (x) − Mn ) dx a

a

a

a



b

= 2Mn

dx a

= 2Mn (b − a). Since {Mn } converges to 0 by the assumption that fn → f uniformly on [a, b], it follows from the above inequality that  b  b f (x) dx = f (x) dx, a

a

9.2. Uniform Convergence

327

that is, f is integrable on [a, b]. We now deduce from (9.7) that    b  b    f (x) dx − fn (x) dx ≤ Mn (b − a).   a  a Finally, (9.6) follows from this inequality and the fact that {Mn } converges to 0.  What does Theorem 9.14 say? Theorem 9.14 says that the uniform limit of a sequence of Riemann integrable functions is Riemann integrable. Moreover, as we mentioned above, equation (9.6) shows that in this case, the limit can be interchanged with the integral. Example 9.15. Suppose {fn } is a sequence of functions from [0, +∞) into R such that the fn ’s are all integrable on each interval of the form [0, x] with x > 0. Assume also that fn → f uniformly on [0, +∞). If we let   1 x 1 x gn (x) = fn (t) dt, g(x) = f (t) dt, x 0 x 0 for x > 0, prove that gn → g uniformly on (0, +∞). Solution. Since fn → f uniformly on [0, +∞), Theorem 9.14 shows that for every x ∈ (0, ∞), f is integrable on [0, x] and  x  x lim fn (t) dt = f (t) dt. n→∞

0

0

This implies that gn → g pointwise on (0, +∞). To prove that the convergence is indeed uniform, let Mn = sup{|fn (t) − f (t)| : t ∈ [0, +∞)} and Tn = sup{|gn (x) − g(x)| : x ∈ (0, +∞)}. Then, Tn



 x  1   = sup (fn (t) − f (t)) dt : x ∈ (0, +∞) x

 0x 1 ≤ sup |fn (t) − f (t)| dt : x ∈ (0, +∞) x 0 ≤ Mn .

Since each Tn is nonnegative and {Mn } converges to 0 by the assumption that fn → f uniformly on [0, +∞), the above inequalities show that {Tn } also converges to 0. By Theorem 9.9 this means that gn → g uniformly on (0, +∞). Example 9.16. Define a sequence of real-valued functions on [0, 1] by ⎧ ⎨ n − n2 x 0 < x < 1/n, fn (x) = ⎩ 0 otherwise.

328

9. Sequences and Series of Functions

Show that {fn } converges pointwise to the constant function f ≡ 0 on [0, 1]. Is the convergence uniform? Solution. Since fn (0) = 0 for every n, the sequence {fn (0)} converges to 0. If 0 < x ≤ 1, then we find N ∈ N such that 1/N < x, and this shows that fn (x) = 0 for every n ≥ N . Thus {fn (x)} converges to 0 whenever x > 0. The conclusion is that {fn } converges pointwise to the constant function f ≡ 0 on [0, 1]. Since each fn is continuous on (0, 1], Theorem 5.34 shows that fn ∈ R([0, 1]) for every n. Moreover, it can be easily seen that  1  1/n 1 fn (x) dx = (n − n2 x) dx = . 2 0 0 Thus,



1

fn (x) dx =

lim

n→∞

which is not equal to



0

1 , 2

1

f (x) dx = 0. 0

This tells us, in view of Theorem 9.14, that the convergence fn → f is not uniform.

9.3. Weierstrass’s Approximation Theorem As we mentioned in the beginning of this chapter, when a function f is expanded as a power series on some interval, the function f can be considered as the pointwise limit of a sequence of polynomials on that interval. The following remarkable result, which is due to Weierstrass, shows that much more can be said in this direction. Theorem 9.17. If f is a real-valued function which is continuous on [a, b], then f is the uniform limit of a sequence of polynomials on [a, b]. To simplify the proof of the above theorem, we first prove a useful lemma. Lemma 9.18. The following statements are equivalent. (1) Every continuous function on [0, 1] is the uniform limit of a sequence of polynomials on this interval. (2) Every continuous function on an interval of the form [a, b] is the uniform limit of a sequence of polynomials on this interval. Proof. It is clear that (2) implies (1). So we only need to prove that (1) also implies (2). To see this, assume that (1) is true and let f be continuous on an interval of the form [a, b]. Since we assumed (1), it is appropriate to use f to define a function on [0, 1]. This is g(x) = f (a + (b − a)x). The function h defined by (9.8)

h(x) = a + (b − a)x

9.3. Weierstrass’s Approximation Theorem

329

is a continuous (and one-to-one) function from [0, 1] onto [a, b]. Thus, g is continuous on [0, 1]. By (1) we then find a sequence {Pn } of polynomials that converge uniformly to g on [0, 1]. For each n ∈ N and every x, define   x−a Qn (x) = Pn . b−a Here we used the inverse of the function h introduced in (9.8), namely h−1 (x) =

x−a , b−a

as a continuous function from [a, b] onto [0, 1]. Now, each Qn is a polynomial and we claim that Qn → f uniformly on [a, b]. To see this, note that for every n, Tn : = sup{|Qn (x) − f (x)| : x ∈ [a, b]}

      x−a x − a   : x ∈ [a, b] = sup Pn −g b−a b−a  ≤ Mn , where Mn := sup{|Pn (y) − g(y)| : y ∈ [0, 1]}. Since Pn → g uniformly on [0, 1], Theorem 9.9 shows that {Mn } tends to 0. Thus, the above inequality implies that {Tn } also converges to 0. Now, another application of Theorem 9.9 proves that Qn → f on [a, b]. This completes the proof.  Proof of Theorem 9.17. In view of the above lemma, it is enough to restrict our attention to the case [a, b] = [0, 1]. We further assume that f is such that f (0) = f (1) = 0. In fact, once the theorem is proved for functions with this property, the general case also follows by the simple reasoning we now discuss. Given a continuous function f on [0, 1], define a function g on this interval by (9.9)

g(x) = f (x) − f (0) − x(f (1) − f (0)).

Then, it is clear that g is continuous on [0, 1] and that g(0) = g(1) = 0. So by our assumption we find a sequence {Pn } of polynomials which converges uniformly to g on [0, 1]. Since f − g is a polynomial by (9.9) and f = (f − g) + g, it then follows that the sequence {(f − g) + Pn } of polynomials converges uniformly to f on [0, 1]. To complete the proof, we assume that f is continuous on [0, 1] with f (0) = f (1) = 0, and we try to find a sequence {Pn } of polynomials which converges uniformly to f on [0, 1]. The assumption f (0) = f (1) = 0 allows us to extend f to a function which is uniformly continuous on all of R. This can be done by letting f (x) = 0 whenever x ∈ [0, 1]. For each n ∈ N, let Qn be a polynomial defined by Qn (x) = cn (1 − x2 )n , where cn is chosen in such a way that  1 Qn (x) dx = 1. −1

330

9. Sequences and Series of Functions

For example, since



1

4 , 3 −1 we let c1 = 3/4. In what follows, we will need to know how large the cn ’s are. To know this, note that  1  1 (1 − x2 )n dx = 2 (1 − x2 )n dx (1 − x2 ) dx =

−1

0



√ 1/ n

≥ 2

(1 − x2 )n dx 0

 ≥ 2

√ 1/ n

(1 − nx2 ) dx, 0

where the last inequality follows from Example 4.41 (and also from Bernoulli’s inequality). Thus  1 1 4 (1 − x2 )n dx ≥ √ > √ , 3 n n −1 √ and this shows that cn < n for every n. Now for each n define a function Pn by  1 Pn (x) = f (x + t)Qn (t) dt. −1

Changing the variable to u = x + t shows that  1+x Pn (x) = f (u)Qn (u − x) du. −1+x

Since f ≡ 0 outside [0, 1], this implies that  1 Pn (x) = f (u)Qn (u − x) du. 0

But the latter integral is a polynomial by our choice of the Qn ’s. We claim that {Pn } converges uniformly to f on [0, 1]. To see this, note that for every n ∈ N and every x ∈ [0, 1],  1   1    |Pn (x) − f (x)| =  f (x + t)Qn (t) dt − f (x)Qn (t) dt 



−1 1

−1

−1

|f (x + t) − f (x)|Qn (t) dt.

Therefore, to prove that Pn → f uniformly on [0, 1], it is enough to show that for every ε > 0, N ∈ N can be found such that n ≥ N and x ∈ [0, 1] imply  1 |f (x + t) − f (x)|Qn (t) dt < ε. (9.10) −1

With this in mind, we first note that by the uniform continuity of f on R, 0 < δ < 1 exists such that x, y ∈ R and |x − y| < δ imply ε |f (x) − f (y)| < . 2

9.3. Weierstrass’s Approximation Theorem

331

If M is a bound for the values of f on R, our choice of δ shows that  1 |f (x + t) − f (x)|Qn (t) dt −1

 ≤



−δ

−1

2M Qn (t) dt + 

1

Qn (t) dt +

= 4M δ

 ≤ 4M

δ

1

ε 2



δ

(ε/2)Qn (t) dt +

−δ  δ

−δ

1

2M Qn (t) dt δ

Qn (t) dt

ε Qn (t) dt + . 2

Thus, to prove (9.10) and, hence, to complete the proof, it is enough to show that for all sufficiently large n,  1 ε (9.11) . Qn (t) dt < 8M δ To do so, we note that for every n ∈ N and every x ∈ [δ, 1], √ |Qn (x) − 0| = cn (1 − x2 )n < n(1 − δ 2 )n . √ Now, that the sequence { n(1 − δ 2 )n } converges to 0 (Exercise 2.17) gives us some N ∈ N such that for every n ≥ N , √ ε n(1 − δ 2 )n < . 2 It therefore follows that for all n ≥ N and every x ∈ [δ, 1], |Qn (x) − 0| < ε/2. This means that {Qn } converges uniformly to the constant function h ≡ 0 on [δ, 1]. Finally, we may use this fact together with Theorem 9.14 to deduce that  1 Qn (t) dt = 0. (9.12) lim n→∞

δ

The proof is now complete by noticing that by (9.12), the inequality (9.11) holds for all sufficiently large n.  Example 9.19. Suppose f : [a, b] → R is continuous and that for every n ∈ N,  b xn f (x) dx = 0. a

Prove that f (x) = 0 for every x ∈ [a, b]. Solution. By our assumption and the linearity of the integral,  b P (x)f (x) dx = 0 a

for every polynomial P . Since f is continuous on [a, b], it is the uniform limit of a sequence {Pn } of polynomials on [a, b] by Weierstrass’s theorem. It is then easy to see that Pn f → f 2 uniformly on [a, b]. Then, by Theorem 9.14,  b  b f 2 (x) dx = lim Pn (x)f (x) dx = 0. a

n→∞

a

The desired result now follows from Example 5.28.

332

9. Sequences and Series of Functions

9.4. Series of Functions and Their Convergence Finally, we turn to the series of functions and their convergence. As we mentioned earlier in this chapter, to be able to think of series of functions, we need to assume that our functions are real valued. The set R is equipped with the Euclidean metric de in what follows. Definition 9.20. Let {fn } be a sequence of functions from a set X into R and f : X → R be a function. For every n ∈ N, let sn be the function which is defined on X by sn (x) = f1 (x) + · · · + fn (x). ∞ We say that the series n=1 fn converges to f pointwise (resp., uniformly) on X if the sequence {sn } converges to f pointwise (resp., uniformly) on X. As our main result in this section, we prove a useful criterion for the uniform convergence of series which is usually known as Weierstrass’s test. Theorem 9.21. Let {fn } be a sequence of functions from a set X into R such that for every n ∈ N, |fn (x)| ≤ bn ∞ ∞ holds for all x ∈ X. If n=1 bn converges, then the series n=1 fn converges uniformly on X.  Proof. Let {sn } be the sequence of partial sums of the series ∞ n=1 fn . For all m, n ∈ N such that m > n and for every x ∈ X,  m  m m        (9.13) |sm (x) − sn (x)| =  fk (x) ≤ |fk (x)| ≤ bk .   k=n+1 k=n+1 k=n+1  Now suppose ε > 0 is given. Since the series ∞ n=1 bn converges by our assumption, its associated sequence of partial sums is Cauchy. Hence we find N ∈ N such that for all m, n ∈ N with m > n ≥ N , m  (9.14) bk < ε. k=n+1

It now follows from (9.13) and (9.14) that for all such m and n and every x ∈ X, |sm (x) − sn (x)| < ε. Now, Cauchy’s criterion (Theorem 9.11) tells us that on X.

∞ n=1

fn converges uniformly 

What does Theorem 9.21 say? Theorem 9.21  says that when each function  fn is bounded with bound bn ∞ and the series ∞ b converges, the series n n=1 n=1 fn converges uniformly. Example 9.22. In each case show that the given series converges uniformly on R. ∞ (cos nx3 )/(n4 + 1). (1) n=1 ∞ 2 2 (2) n=1 1/((x + n)(x + n + 1)).

Notes on Essence and Generalizability

333

Solution. (1) For every n ∈ N,    cos nx3  1    n4 + 1  ≤ n4 + 1 ∞ 4 holds for every x ∈ R. Since the series n=1 1/(n + 1) converges, the above theorem tells us that the given series converges uniformly on R. (2) For every n ∈ N,

(x2

1 1 1 ≤ 2 ≤ 2 2 2 + n)(x + n + 1) (x + n) n

 2 is true for all x in R. Since the series ∞ n=1 1/n converges, the given series converges uniformly on R by the above theorem. Example 9.23. Prove that the series ∞ x  (−1)n √ sin n n n=1

converges uniformly on every bounded subset of R. Solution. Suppose X is a bounded subset of R, and let M > 0 be such that |x| ≤ M for every x ∈ X. Since | sin y| ≤ |y| for every y ∈ R,      (−1)n  √ sin x  ≤ √1  x  ≤ √1 M   n n n n n n ∞ whenever x ∈ X. Since the series n=1 1/n3/2 converges, the given series converges uniformly on X by Theorem 9.21. As an interesting application of what we obtained in this chapter, one can prove the existence of a continuous nowhere differentiable function. The construction of such a function is sketched in Exercise 26 at the end of this chapter.

Notes on Essence and Generalizability In this final chapter we studied sequences and series of functions. The material of this chapter appeared to be abstract because it is developed within the context of metric spaces. One important result in this chapter was Weierstrass’s theorem that allows us to approximate continuous functions by polynomials. This result was generalized by Marshall H. Stone to the context of metric spaces. We did not present this version of the theorem as we believe that it is not appropriate for a first course on mathematical analysis. See Chapter 7 of the valuable book [28] for a comprehensive discussion of Stone’s results.

334

9. Sequences and Series of Functions

Exercises 1. Prove Theorem 9.9. 2. Complete the proof of Theorem 9.11. 3. For each n consider a function gn defined on R by gn (x) = ex/n . Prove that {gn } converges pointwise to a function on R. Is the convergence uniform? 4. Suppose g is a real-valued function which is continuous on [0, 1]. Define a sequence {fn } of functions on [0, 1] by g(x)(sin x)n . 1 + nx Prove that {fn } is uniformly convergent on [0, 1]. 5. Prove that the sequence {gn } defined by 1 gn (x) = sin nx n is uniformly convergent on R. 6. Show that the sequence of functions defined by ⎧ 1 0 x < n+1 , ⎪ ⎪ ⎪ ⎪ ⎨ 1 fn (x) = sin2 πx n+1 ≤x≤ ⎪ ⎪ ⎪ ⎪ ⎩ 0 x > n1 , fn (x) =

1 n,

converges pointwise to a function f on R. Is the convergence uniform? 7. In each case verify that the sequence {hn } converges pointwise to h on [0, 1]. Then determine if the convergence is uniform. ⎧ ⎧ 0 ≤ x < n1 , ⎨ sin πx x > 0, ⎨ 0 h(x) = (a) hn (x) = ⎩ ⎩ sin πx n1 ≤ x ≤ 1; 0 x = 0. ⎧ 0 x = 0, ⎪ ⎪ ⎪ ⎪ ⎨ n 0 < x < n1 , h ≡ 0, (b) hn (x) = ⎪ ⎪ ⎪ ⎪ ⎩ 0 n1 ≤ x ≤ 1. 8. For each n ∈ N and every x ∈ [0, 1], let x(1 − (nx)m ) . m→∞ 1 + (nx)m

fn (x) = lim

Prove that {fn } converges uniformly on [0, 1]. 9. Consider the sequence {gn } of functions defined on [0, 1) by xn gn (x) = . 1 + x2n Does {gn } converge uniformly on [0, 1)?

Exercises

335

10. Define a sequence {ψn } of functions from C([0, 1]) into R by   1 ψn (f ) = f . n Prove that {ψn } converges pointwise on C([0, 1]). Is the convergence uniform? 11. Suppose f is defined on R, and fn (x) = f (nx) for every x ∈ R and every n ∈ N. Prove that if f is continuous at 0 and {fn } is uniformly convergent on R, then f is constant. Let {fn } be a sequence of functions from a set X into R. Call the sequence pointwise bounded if for every x ∈ X, we can find a number Mx > 0 such that |fn (x)| ≤ Mx for all n ∈ N. Also, say that the sequence is uniformly bounded when M > 0 can be found such that |fn (x)| ≤ M for every x ∈ X and every n ∈ N. 12. Verify that a pointwise convergent sequence is necessarily pointwise bounded. 13. Observe that a uniformly convergent sequence is uniformly bounded. 14. If {fn } is a sequence of functions that converges uniformly on subsets A and B of X (the common domain of the functions fn ), prove that {fn } converges uniformly on A ∪ B. 15. Show by means of an example that even when a sequence of real-valued differentiable functions converges to a function f uniformly on some interval I, the limit function f may fail to be differentiable on I. 16. Let {fn } be a sequence of real-valued functions which are differentiable on (a, b). Assume also that {fn (x0 )} converges for some x0 ∈ (a, b), and that the sequence {fn } converges uniformly on (a, b). Prove that then {fn } converges uniformly to a function f on (a, b) such that f  (x) = lim fn (x) n→∞

for every x ∈ (a, b). 17. What happens if we omit the assumption that {fn (x0 )} converges for some x0 ∈ (a, b) from the previous exercise? 18. Verify that the series

∞ x  (−1)n √ cos n n n=1

converges uniformly on every bounded subset of R. 19. Prove that the series

∞  [nx] n3 n=1

converges uniformly on [0, 1]. Let f (x) denote the sum of the above series for arbitrary x ∈ [0, 1]. Show that the function f is continuous at 0 and all irrational numbers, and that it is discontinuous at every nonzero rational number in [0, 1].

336

9. Sequences and Series of Functions

∞ ∞ 20. If n=1 |fn | converges uniformly on X, prove that the same is true for n=1 fn . Is the converse of this also true? 21. Suppose {fn } and {gn } are sequences of functions from a set X into R. (a) If {fn } and {gn } are both uniformly convergent on X, prove that the same is true for the sequence {fn + gn }. (b) If c is any constant and {fn } converges uniformly on X, show that the same is true for the sequence {cfn }. (c) Show by means of an example that when {fn } and {gn } are both uniformly convergent on X, the sequence {fn gn } may fail to be uniformly convergent on X. Can you find a condition on {fn } and {gn } that ensures the uniform convergence of {fn gn } on X? 22. Suppose {fn } is a sequence of functions from a metric space (X, dX ) into a metric space (Y, dY ). If each fn is continuous on X and {fn } converges uniformly to a function f on X, prove that for every sequence {xn } in X that converges to some x ∈ X, lim fn (xn ) = f (x). n→∞

23. Let {fn } be a uniformly convergent sequence of functions from a set X into a metric space Y . Suppose g is a function from a set Z ⊆ Y that contains the range of the fn ’s into some metric space W . If g is uniformly continuous on Z, prove that the sequence {g ◦ fn } converges uniformly on X. 24. Prove that the series √ ∞  x+ n (−1)n+1 n n=1 converges uniformly on [−1, 1]. For which values of x ∈ [−1, 1] is the series absolutely convergent? 25. In each case determine if the given series is uniformly convergent on the associated set. ∞ 2n (a) x on [0, π/3]. n=1 sin ∞ n (b) n=1 x on (0, 1). ∞ (c) n=1 (sin nx)/(n + 1)! on R. ∞ −nx n (d) /2 on (0, +∞). n=1 e 26. Let g be the absolute value function on [−1, 1], and extend g to a function on R by letting g(x + 2) = g(x). (a) Observe that g is uniformly continuous on R. (b) Define a function f by ∞  n  3 g(4n x). f (x) = 4 n=0 Verify that the convergence of the above series to f is uniform, and deduce from this that f is continuous on R. (c) Prove that f is not differentiable at any point of R.

Appendix

This appendix aims to present those definitions and results that you will need if you want to answer the questions posed in the beginning of Chapters 2–5. An exception is the material of the last section which contains a couple of identities and inequalities that involve the sine, cosine, and natural logarithmic functions. We recall that the definitions presented here are the calculus-based ones, and as you will see in the main text, we will revise some of them. We also make the convention that all of the involved functions are real valued.

Real Sequences and Series Informally, a sequence in some nonempty set X is an ordered list of the elements of X, something like x1 , x2 , x3 , . . . . Formally, we can define a sequence in X as a function from N into X. This occurs by associating to the above list a function x : N → X defined by x(n) = xn for every n ∈ N. In practice, however, we prefer to denote a sequence x by listing the elements of its range or by writing its general term as {xn }. Our emphasis in this book will be on real sequences, that is, sequences whose terms are real numbers. For example, 1, 4, 9, . . . and {n2 } both represent the real sequence whose nth term is n2 . Convergence and Divergence of Real Sequences. If {xn } is a real sequence and x ∈ R, we say that {xn } converges to x, or that x is the limit of {xn }, if the following statement is true. For every ε > 0, N ∈ N can be found such that |xn − x| < ε for all n ≥ N . In this situation, we write lim xn = x. n→∞

337

338

Appendix

For example, it can be shown that 1 1 n−2 = 0, lim = . n→∞ 3n + 1 n 3 A sequence which fails to converge to any real number x is said to be divergent. For instance, sequences {n}, {ln n}, and {(−1)n } are all divergent. lim

n→∞

Monotone Sequences. If {xn } is a sequence of real numbers, we say that {xn } is • increasing (resp., strictly increasing) if xn ≤ xn+1 (resp., xn < xn+1 ) for every n ∈ N, • decreasing (resp., strictly decreasing) if xn ≥ xn+1 (resp., xn > xn+1 ) for every n ∈ N. A monotone sequence is one which is either increasing or decreasing. For example, sequences {n}, {ln n}, and {en } are strictly increasing, while {1/n2 } is strictly decreasing. The sequence {(−1)n } is not monotone. Subsequences. If {xn } is a real sequence and {nk } is a strictly increasing sequence of natural numbers, then we call {xnk } a subsequence of {xn }. For instance, by letting nk = 2k − 1 for every k ∈ N, we get the subsequence of odd-indexed terms of {xn }. As a particular example note that the subsequence of odd-indexed terms of {(1 + (−1)n+1 )n} is 2, 4, 6, . . . . Infinite Series. Given a sequence {xn } of real numbers, we can define a new sequence {sn } by s n = x1 + · · · + xn for every n ∈ N. If {sn } converges to a real number x, then we say that the series ∞ 

xn

n=1

converges to x, and we write ∞ 

xn = x.

n=1

Otherwise we say that the series is divergent. The sequence {sn } is said to be the  x . sequence of partial sums of the series ∞ n n=1 For example, starting from the sequence {1/(n + 1)(n + 2)} and writing 1 1 1 = − , (n + 1)(n + 2) n+1 n+2 we see that for every n ∈ N, sn =

1 1 − . 2 n+2

Limit and Continuity of Functions

339

Hence, that {sn } converges to 1/2 allows us to write ∞ 

1 1 = . (n + 1)(n + 2) 2 n=1 

 n+1 ln = ln(n + 1) − ln(n) n for every n ∈ N, similar reasoning shows that the series   ∞  n+1 ln n n=1

Since

diverges.

Limit and Continuity of Functions Suppose f is a function whose domain E contains a deleted neighborhood of a, that is, a set of the form (a − γ, a) ∪ (a, a + γ) for some γ > 0. We say that some L ∈ R is the limit of f at a if the following statement is true. For every ε > 0, we can find δ > 0 such that x ∈ E and 0 < |x − a| < δ imply |f (x) − L| < ε. In this situation we write limx→a f (x) = L. For example, limx→a sin x = sin a for every a ∈ R, and limx→0 g(x) = 1 if ⎧ 3 ⎨ x + 1 x = 0, g(x) = ⎩ 2 x = 0. Continuity. If f is defined in a neighborhood of a, that is, an interval of the form (a − γ, a + γ) for some γ > 0, we say that f is continuous at a if lim f (x) = f (a).

x→a

For instance, the sine function is continuous at every a ∈ R and the function g above is continuous at every x = 0. One-Sided Limits. Suppose f is a function whose domain E contains an interval (a, a + γ) (resp., (a − γ, a)) for some γ > 0. A real number L is said to be the right (resp., left) limit of f at a if the following statement is true. For every ε > 0, we can find δ > 0 such that x ∈ E and a < x < a + δ (resp., a − δ < x < a) imply |f (x) − L| < ε. In this situation we write limx→a+ f (x) = L (resp., limx→a− f (x) = L). For instance, if we let ⎧ ⎨ 1 x > 0, f (x) = ⎩ −1 x < 0, then limx→0+ f (x) = 1 and limx→0− f (x) = −1. It is clear that the limit of a function f at a exists and is equal to L if and only if the right and left limits of f at this point exist and are equal to L.

340

Appendix

The Concepts of Derivative and Differentiability If f is a function defined in a neighborhood of a, then we say that f is differentiable at a if f (x) − f (a) lim x→a x−a exists. In this case we denote the limit by f  (a) and call it the derivative of f at a. If f is differentiable at every a in an interval I contained in the domain of f , then we say that f is differentiable on I. In this case we can define a function f  on I that associates to each x ∈ I the derivative of f at x, namely, f  (x). For example, the function f (x) = sin x is differentiable on R with f  (x) = cos x, and the function g(x) = ln x is differentiable on (0, +∞) and g  (x) = 1/x on this interval.

The Riemann Integral Suppose f is a bounded function defined on an interval [a, b]. The Riemann integral b of f on [a, b], denoted by a f (x) dx, is defined as follows. For each n ∈ N, we consider the equidistant points xi = a +

(b − a) i, n

where i takes values in the set {0, 1, . . . , n}, and we choose some element ti from the interval [xi−1 , xi ] for all such i. Then, the quantity n 

f (ti )Δx

i=1

represents the sum of the areas of n rectangles of width Δx = (b − a)/n, where the ith rectangle has height f (ti ). Figure 1 illustrates this when n = 3.

Figure 1. The graph of f together with the rectangles.

The Riemann Integral

341

If lim

n→∞

n 

f (ti )Δx

i=1

exists and is independent of the choice of the sample points t1 , . . . , tn , then we b denote the limit by a f (x) dx and call it the Riemann integral of f on [a, b]. In this situation, we also say that f is Riemann integrable on [a, b]. It is clear from the geometric interpretation of the integral that when f has b nonnegative values on [a, b], a f (x) dx gives us the area of the region that lies under the graph of f from a to b. It can be shown that every continuous function is integrable on [a, b]. Since the Riemann integral of a function cannot be easily computed by the above definition, we usually use the fundamental theorem of calculus as a computational tool. This asserts that if F is such that F  (x) = f (x) holds for every x ∈ [a, b], then  b f (x) dx = F (b) − F (a). a

For example, that the derivative of the sine function is the cosine function implies that  π2 π cos x dx = sin − sin 0 = 1. 2 0 Some Useful Inequalities and Identities. As we mentioned in the introductory part of the book, we will not enter into details of the way one defines transcendental functions. Nevertheless, we list a couple of useful inequalities and identities that are related to such functions and will be utilized throughout the book. First of all, we recall that the inequality | sin x| ≤ |x| holds for every x ∈ R, and that the inequalities cos x