The number systems: Foundations of algebra and analysis [1 ed.]

Citation preview

SOLOMON FEFERMAN •

NUNC COCNOSCO EX PARTE

TRENT UNIVERSITY LIBRARY

Digitized by the Internet Archive in 2019 with funding from Kahle/Austin Foundation

https://archive.org/details/numbersystemsfouOOOOfefe

THE

Foundations of Algebra O

and Analysis J

This book is in the ADDISON-WESLEY SERIES IN MATHEMATICS

Lynn Loomis,

Consulting Editor

THE

NUMBER SYSTEMS Foundations of Algebra and Analysis J

by

SOLOMON FEFERMAN Department of Mathematics Stanford University

A

ADDISON -WESLEY PUBLISHING COMPANY, INC. READING,

MASS.



PALO

ALTO



LONDON

Copyright © 1964

ADDISON-WESLEY PUBLISHING COMPANY, INC. Printed in the United States of America ALL RIGHTS RESERVED.

THIS BOOK, OR PARTS THERE¬

OF, MAY NOT BE REPRODUCED IN ANY FORM WITH¬ OUT WRITTEN PERMISSION OF THE PUBLISHERS.

Library of Congress Catalog Card No. 63-12470

ONULP

Dedicated to my mother and father.

1

R

PREFACE The subject matter of this book is the successive construction and development of the basic number systems of mathematics, namely the positive integers, integers, rational numbers, real numbers, and complex numbers. It is a subject that many mathematicians feel should be learned by every serious student in this field. Preferably, he should do this as soon as possible after his first course in mathematical analysis (calculus)— either before or during his introduction to more rigorous treatments of analysis and algebra. Despite the significance of this subject in a mathematical education, there does not seem to be any special provision for its study in most American universities. Sometimes a hasty review of the material is given in intermediate courses on algebra or analysis. Another approach often taken in these courses is to begin with the real number system as axiomatically given, rather than to develop its properties from more basic notions and results. We believe this situation has come about for several reasons. First of all, the (now) classical presentations of this material have a curious isola¬ tion from the rest of mathematics. The ideas and methods employed seem to have a “once only” character and lack the sense of interrelated¬ ness of most other important mathematical concepts. Second, the rate at which knowledge is increasing makes it imperative that the student of mathematics hasten his mastery of the main parts of his field. Finally, and in tune with the “abstractness” of modern mathematics, there is a growing tendency to present all its parts axiomatically. As a result of these circumstances, there is often a gap in the student’s education between his “concrete” computational work in the calculus and his more advanced work. It is true that modern abstract analysis and algebra have developed as the proper means to encompass, and then to advance beyond, the particular notions and results concerning the classical number systems uncovered before this century. However, a firm grasp of the significant particular cases provides the best basis for an appreciation of the newer developments. It thus seems to us that the subject of this book provides the most appropriate material for this transition period in the student’s mathe¬ matical education. We have tried to give here a presentation which is on the one hand up to date, complete, and rigorous, and on the other hand constantly motivated with reference to both the student’s background and the needs of modern mathematics. vii

Vlll

PREFACE

We believe the approach taken here makes the text adaptable to a variety of teaching situations. It can be used for a one-quarter or a onesemester course specifically set aside for this material and demanding no prerequisites at this level. It can be the text for the first part of a con¬ ventional intermediate course in algebra or analysis, with certain sections omitted or merely sketched, according to the specifications of the course and the taste of the instructor. It might also be used as a basic reference work for such a course or as the text for a reading course, which the student would master by independent study. It is with this last possibility espe¬ cially in mind that we have chosen to make this book self-contained and to pursue clarity and completeness, rather than conciseness. S. F. Stanford, California October 1963

CONTENTS

Chapter 1

1.1

The Logical Background.

1

Introduction.

1

The mathematical method 3 1.2

Logic.

4

Mathematical statements and their structure 4 • Existence 7 • Logical connectives 10 Chapter 2

2.1

The Set-Theoretical Background.14

Sets.14 Sets as abstractions from conditions 14 • Extensions of the con¬ cept of set 17 • Identity and inclusion 19 • Some peculiar sets 21

2.2

An algebra of sets.■.25 Intersection, union, and complement 25 • Basic laws of the algebra of sets 29 • Extended intersections and unions 34

2.3

Relations and functions.36 Relations as abstractions from conditions 36 • Ordered pairs and cartesian products 37 • Domain, range, and converse 39 • Ternary (etc.) relations 42 • Operations on relations; composi¬ tion 43 • Special kinds of relations 44 • Equivalence relations and partitions 45 • Functions 46 • Congruence relations 50 • Converse and composition of functions 52

2.4 Mathematical systems of relations and functions Isomorphism

55 • Set-theoretical

equivalence

....

55

57 • Subsys¬

tems 58 Chapter 3

3.1

The Positive Integers.64

Basic properties.64 Peano systems and inductive proofs 64 • Functions on Peano systems 66 • Isomorphism of Peano systems 70

3.2

The arithmetic of positive integers.73 Recursive definitions 73 • Addition of positive integers 75 • Multiplication of positive integers 79 • Exponentiation and other operations 81 IX

X

CONTENTS

3.3

Order.82 Simply ordered systems 83 • Well-ordered systems 85 • Order¬ ing and the arithmetical operations 90

3.4

Sequences, sums and products.91 Finite and infinite sequences 91 • Extended sums and prod¬ ucts 93 • Generalized associative and commutative laws 94 • Some special sums and products 98

Chapter 4

4.1

The Integers and Integral Domains.

....

101

Toward extending the positive integers.101 Practical motivations 101 • Algebraic motivations 103 • Com¬ mutative rings with unity 104

4.2

Integral domains.108 Ordered integral domains 110 • Absolute value 112

4.3

Construction and characterization of the integers

....

113

The existence theorem 115 • Uniqueness of the characteriza¬ tion 120 4.4

The integers as an indexing system.123 More general associative and commutative laws 125 • Geo¬ metric series; binomial expansion 127

4.5

Mathematical properties of the integers.131 The division algorithm 131 • The divisibility relation and the primes 133 • Greatest common divisors 135 • Factorization of integers into primes 139 • Positional notations for integers 143

4.6

Congruence relations in the integers.147 Homomorphism 148 • Properties preserved under homomor¬ phism 150 • Congruence modulo an integer 152 • Applications to a Diophantine problem 155

Chapter 5

5.1

Polynomials.15g

Polynomial functions and polynomial forms.158 Existence and uniqueness of simple transcendental exten¬ sions 160 • Divisibility and roots of polynomials 167 • Formal derivatives 168

5.2

Polynomials in several variables.170 &-fold transcendental extensions 171 • Symmetric polyno¬ mials 174 • The fundamental theorem on symmetric poly¬ nomials 178

XI

CONTENTS

Chapter 6

6.1

The Rational Numbers and Fields.183

Toward extending integral domains.183 Algebraic motivations 183 • Geometric motivations 184 • Fields 187 • Ordered fields; dense orderings 189 • Some finite fields 190

6.2

Fields of quotients.192 The existence theorem 192 • Isomorphism of fields of quo¬ tients 197 • The rational numbers; fields of rational forms 198

6.3

Solutions of algebraic equations in fields.200 Systems of linear equations 201 • Linear equations in integral domains 206 • Polynomial equations in the rationals 208

6.4

PoRnomials over a field.210 Basic properties of divisibility 210 • Prime polynomials 211 • The division algorithm for polynomials 214 • Greatest com¬ mon divisors 215 • Unique factorization theorem for poly¬ nomials 217

Chapter 7

7.1

The Real Numbers.222

Toward extending the rationals.222 Algebraic motivations 222 • Geometric motivations 224 • Upper and lower sections; continuously ordered systems 226 • Existence of continuously ordered systems 229 • Greatest lower bounds and least upper bounds 232

7.2

Continuously ordered fields.235 The Archimedean property 235 • Isomorphism of continuously ordered fields 238 • Fundamental sequences 242 • The Bolzano-Weierstrass Theorem 243 • Construction of a continu¬ ously ordered field 248

7.3

Infinite series and representations of real numbers

....

256

Positional notations for real numbers 257 • Power series 262 • The exponential function 264 7.4

Polynomials and continuous functions on the real numbers .

.

267

Weierstrass’ Nullstellensatz 268 • Real polynomials and their roots 271 • Computations of roots 275 • Location of all roots: Sturm’s theorem 278 • Rational and real powers of real num¬ bers 285 7.5

Algebraic and transcendental numbers.288 Cantor’s method 289 • Denumerable and nondenumerable sets 291 • The existence of transcendental real numbers 295 • Liouville’s method 296

xii

CONTENTS

Chapter 8

8.1

The Complex Numbers.303

Basic properties.303 Characterization of the complex numbers 303 • jugates 306 • Square roots of complex numbers metric interpretation 309 ■ Absolute value 310 erties of the trigonometric functions 313 • The representation; De Moivre’s theorem 317

8.2

Complex con¬ 307 • A geo¬ • Basic prop¬ trigonometric

Polynomials and continuous functions in the complex numbers .

322

Limits and the Bolzano-Weierstrass theorem extended 324 • Continuity extended 327 • Polynomial functions; growth and minimum of the modulus 328 • The fundamental theorem of complex algebra 331 • On computing roots of complex poly¬ nomials 334 • Decomposition of real polynomials 335 8.3

Boots of complex polynomials.337 Boots of polynomials over a subfield 337 ■ Algebraically closed subfields 337 • Multiple roots; discriminants 342 • Boots of cubic equations 346 ■ Boots of fourth degree equations 349 • On equations of higher degree 350

Chapter 9

9.1

Algebraic Number Fields and Field Extensions

353

Generation of subfields.353 The general extension process 355 • Simple extensions 356 • Simple transcendental extensions 357 • Simple algebraic exten¬ sions 357 • Adjoining roots to arbitrary fields 362

9.2

Algebraic extensions.365 Linearly generated extensions; bases and dimension 366 • Finite field extensions 369 • Iterated finite extensions 371

9.3

Applications to geometric construction problems.374 Basic geometric notions 374 • The realization in the cartesian plane 374 • Buler and compass constructions 376 ■ The alge¬ braic equivalent of constructibility 378 • Some classical con¬ struction problems 381 • Begular polygons; Gauss’ solution 383

9.4

Conclusion.

3^6

Appendix I

Some Axioms for Set Theory.391

Appendix II

The Analytical Basis of the Trigonometric Functions.400

Bibliography. Index

,nQ

411

CHAPTER

1

THE LOGICAL BACKGROUND 1.1 Introduction. following:

The basic number systems of mathematics are the

(1) the collection P of positive integers 1, 2, 3, . . . ; (2) the collection I of integers . . . , —3, —2, — 1, 0, 1, 2, 3, . . . ; (3) the collection Ra of rational numbers, consisting of all fractions a/b, where a, b are integers and 6^0 (such as 2/3, —8/7, 2/—4) ; (4) the collection Re of real numbers consisting of the rational num¬ bers and of the irrational numbers (such as s/2, —vm, 7r, \/3/7r,

eV‘2); (5) the collection C of complex numbers, consisting of the real num¬ bers and the imaginary numbers and their combinations (such as V=T, I - Vs V=T, VV + 2V=T). Historically, the understanding and use of these number systems have evolved over a period of several thousand years, more or less in the order presented. There are a number of persuasive reasons why this development took place and why the development came to a certain completion with the complex numbers. In this book, we shall give a systematic exposi¬ tion of the same growth of ideas. We hope at the same time to convince the reader that there is nothing capricious in this evolution. The only accidental aspect of the subject is the use of certain words such as “ra¬ tional,” “irrational,” “real,” “imaginary,” and “complex” (indicating the initial resistance which the introduction of new numbers met at each stage). The force of long usage prevents us from replacing these by more appropriate words. The student has learned to understand something of the nature of the different kinds of numbers and to perform various arithmetical computa¬ tions with them in grade school. However, at a certain point his ideas about these may become clouded. How do we know that a/2 is not rational? What do we mean when we say that 7r is approximately even better that it is approximately 3.141(3, and even better that it is approximately 3.14159? We are accustomed to thinking of real numbers as measuring certain lengths, and of the product of real numbers, such as a/2 • 7r, as measuring the area of a rectangle, in this case with sides of length a/2 and 7r; on what basis do we assign another real number to that area, i.e., give a length equivalent to that area? The student is now at a point in his education where he can expect to get and is able to assimilate clear (though not necessarily simple) answers to such questions. 1

2

THE LOGICAL BACKGROUND

[chap.

1

One can go quite far on the basis of an uncritical use of the various number systems. Much of the differential and integral calculus that we know today, as well as the physical theories which are expressed in these mathematical terms, was developed in just such a way. For example, in the calculus we are asked to consider computations of infinite length, such as i

A _l A_i

1

3 W 9

i_ . . .

27 W

In this particular case, we are easily convinced, on the basis of the formula (1 + r)( 1



r + r2 — r3 + • • •) = 1,

(which is verified by “multiplying out”) that the result of the computation should be 1/(1 + -J) = f. However, if we are uncritical we should avoid asking what the result of l-l + l-His. For if we apply our formula, we obtain as answer 1/(1 + 1) = + while it is equally evident that the answer should be

(1 — 1) + (1 — 1) H-= 0 + 04-= 0 and also that it should be

1 + (-1 4- 1) + (-1 + 1) 4-=1+0 + 04-= 1. It is not that the uncritical approach necessarily gives wrong answers, but rather that there are certain questions for which it provides no coherent answer at all. In the study of Fourier series (which have many applications in engi¬ neering and physics) in the latter half of the nineteenth century there arose certain questions which could not be adequately answered on the basis of an uncritical approach to the number systems and which, at the same time, could not be avoided. In response, a number of workers in mathematics and logic embarked on a critical program to clarify the concepts which were involved. The result of their work gradually re¬ solved itself into a systematic theory which could be used to settle the troublesome questions to the satisfaction of most mathematicians. An understanding of this theory is now an essential prerequisite to the study of modern mathematics. We confine ourselves in this book to that part of this theory which has most to do with the number systems themselves, and to those matters in mathematics which are most directly related to the number systems. Those which are closest at hand are first—in the field of algebra—the determination of the potentialities and limitations on solving algebraic

1.1]

INTRODUCTION

3

equations in various settings, and second—in the field of analysis—the development of the limit concept and of its basic properties. (For various reasons the algebraic questions will receive somewhat heavier emphasis in this book.) Although our book is subtitled “Foundations of algebra and analysis,” it should not be thought that the subject can be meaning¬ fully separated into two stages, the first entirely occupied with the study of the number systems and the second with the applications of this study to the critical treatment of mathematics. For it is the demands of the already informally understood concepts and results of algebra and analysis which shape the particular development that is taken. To ignore this would be to deliberately place our critical understanding at a disadvantage. Thus our attempt throughout is to gain the advantages of both intuition and rigor by intertwining motivation, precise development, and appro¬ priate applications. The mathematical method. The objects with which mathematics deals, such as numbers and geometrical figures, are abstract in nature and are usually, in any given study, infinite in number. Although our ideas about these objects are closely related to our perceptions of various groupings of material objects, it is very rarely that we can settle a mathematical question by direct appeal to reality. Thus a certain amount of experimenta¬ tion with pencil, paper, ruler, and compass may lead us to guess that the medians of any triangle all meet in a single point, but no amount of experimentation could verify that this statement is true of the infinitely many conceivable triangles. The specifically mathematical method used to settle such questions is as follows. Certain statements regarding the objects we have in mind are regarded as evident, as being part and parcel of our conceptions of the objects themselves; these statements are generally called axioms or postulates. Once the axioms and basic concepts are granted all else in mathematics is obtained by logical argument in which new concepts, if they appear, are defined only in terms of earlier ones. Now it is conceivable that someone is unwilling to grant a given group of axioms. He may do this on the ground that he cannot conceive of any objects to which the axioms correctly apply or on the ground that the axioms do not correctly apply to the notions to which he thinks they are intended to apply. There is no logical method by which such a person can be persuaded to believe otherwise. To such a person, mathematics (at least, as developed from that particular group of axioms) is a con¬ tentless game; it may, nevertheless, be a game which he enjoys playing. However, the true value and power of the mathematical method are that it leads those -who do grant the axioms, as expressing simple and intui¬ tively clear truths about certain objects, incontrovertibly to complicated and often surprising truths about the same objects.

4

THE LOGICAL BACKGROUND

[chap.

1

The student is no doubt familiar with this “axioms-definitions-theorems ” description of mathematical activity from his course in plane geometry; he may have even been brought to the conclusion that such an approach to mathematics is sterile and barren. Indeed, he is much more apt to be convinced of a statement in geometry or calculus by a few diagrams of "typical” cases, or by a manipulation with infinite series which appears as if it ought to be right, than by a careful logical argument. But this approach is strictly limited and can provide only a thin appreciation and understanding of mathematics. Thus, in our presentation here, exact definitions of concepts and careful arguments in proofs will be in the fore¬ front. This does not mean that intuition must be abandoned. On the con¬ trary, and in contrast to mechanical experimentation, the finding of a correct proof often demands great ingenuity combined with intuitive understanding. The student will have the opportunity to develop such understanding both in following the proofs given here and in carrying out proofs of his own. For purposes of illustration of certain basic "pre-number” concepts we will assume some familiarity with the number systems in this chapter and the next. However, after that point we will proceed to carry out the development proposed above with only the simplest prior assumptions con¬ cerning numbers as a basis. 1.2 Logic. It is possible to give a completely exact description of the notion of logical deduction; this has been accomplished in the last half century in the field of mathematics devoted to symbolic or formal logic. We do not assume that the reader is familiar with symbolic logic, nor shall we attempt to describe this subject to him here, since logical think¬ ing in mathematics can be learned only by observation and experience. (In fact, the ability to reason correctly and to understand correct reason¬ ing is itself a prerequisite to the study of formal logic.) Nevertheless, there are logical aspects of our study of the number systems which are worth approaching informally before we embark on our subject matter proper.* Mathematical statements and their structure. In mathematics we are concerned solely with affirmative or declarative statements (also called propositions) which must either be true or false. Thus such statements as Goldbach’s conjecture is probably true” or questions such as "Can every map be colored with only four colors?” though they play an important role in the doing of mathematics, are not part of mathematics proper. When we use the word “statement” in the following, we have in mind only affirmative statements. For the reader who is interested in finding out more about symbolic logic we recommend the textbooks listed in the Bibliography.

1.2]

LOGIC

5

The transition from arithmetic to more advanced subjects in mathe¬ matics corresponds to the transition from particular statements, such as 12 + 7 = 19, 12 • 7 = 86 (the first of which is true, the second false), to statements involving references to arbitrary objects of a certain kind. The most economical means for formulating statements of the latter sort is by the use of variables. These are certain letters, such as a, b, x, y, z, m, n, etc., which in a given statement can refer to some or all of these objects. Other symbols, such as 12, e, tv, y/2, etc., which are intended to refer to certain fixed objects, are usually called constants. The following expression, (1:2-1)

x + y = 0,

which involves variables, is not regarded as a statement, since it is neither true nor false; it is called a condition (on x and y). However, it can be used to form a statement in several ways. One way is to substitute constants for the variables, e.g., 3 for x and —2 for y, thus yielding the particular statement (1:2-2)

3 + (-2) = 0,

which is, of course, false. Another way is provided by the use of the words “all” (every, any) and “some” (there is, there exists). Some ex¬ amples of statements which can be formed from the condition (1:2-1) using these words are: (1:2-3)

for all integers x and y, x + y = 0;

(1:2-4)

for some integers x and y, x -j- y = 0;

(1:2-5)

for any integer x there is an integer y such that x + y = 0;

(1:2-6)

for any positive integer x there exists a positive integer y such that x + y = 0.

Clearly, statement (1:2-3) is false and (1:2-4) is true (in particular, 0 + 0 = 0). (1:2-5) is true since for any integer x the integer —x is an instance of an integer y which satisfies the condition; on the other hand, (1:2-6) is false. Consider now the condition (1:2-7)

z + y = 5

and the statements (1:2-8)

for any integer x there is an integer y such that x + y = 5, and

(1:2-9)

for any positive integer x there is a positive integer y such that x + y = 5.

6

THE LOGICAL BACKGROUND

[CHAP.

1

Again (1:2-8) is true since, given an integer x, the integer 5 — x is an integer y for which the condition (1:2-7) is true. Concerning (1:2-9), we see that there are some positive integers x satisfying the condition, (1:2-10)

there is a 'positive integer y such that x + y = 5,

namely the integers 1, 2, 3, and 4; however, (1:2-9) is still false, since the condition (1:2-9) is not true of all positive integers x, in particular not true of the number 5. It is seen that variables serve roughly the same purpose in mathematical statements as do the pronouns “it,” “this,” “that” in ordinary language. If we did not use some such device as variables, even the simplest mathe¬ matical statements, such as (1:2—11)

for all integers x and y, x2 — y2 = (x — y)(x + y),

would demand unnecessarily complicated expression. For example: (1:2-12)

for any two integers (not necessarily distinct), the result of squaring the first and subtracting the square of the second is the same as forming the product of two terms, the first of which is the result of subtracting the second given integer from the first, while the second term, of the product is the result of adding the two integers together.

It is apparent that mathematics without the use of variables could hardly be advanced beyond arithmetic and would be much more difficult to master. However, the use of letters as variables is not an essential feature, since one could use other kinds of simple symbols almost as well: (1:2-13)

for all integers _ and . . . , (_)2 — (. . ,)2

= (_







■)

(_+





•)•

One note of warning should be sounded about the use in many mathe¬ matical texts of the words variable” and “constant.” For example, one may read such a phrase as: (1:2-14)

consider the polynomial ax2 + bx + c, where a, b, c are constants.

In actuality, all the letters a, b, c, and x in (1:2-14) are variables in our sense. What would be intended, in a discussion launched by (1:2-14), would be to obtain some information regarding the behavior of ax2 + bx fi- c for any given a, b, c when x varies; in other words, there is a dif¬ ference in interest in the respective roles of a, b, c, and x. It would perhaps

1.2]

LOGIC

7

be better to distinguish these roles by referring to a, b, c as 'parameters, as is often done. A second point to be noted regarding normal mathematical writing, and which again seems in conflict with our discussion, is connected with the practice of referring to conditions such as (1:2—15)

x + y = y + x

as being true statements. What is intended is that the condition is asserted to hold true for all values of the variables. To be properly stated, one must first determine from the context what types of objects are under con¬ sideration. For example, in this particular case it may be real numbers. Then, stated in full, we would have (1:2—16)

for all real numbers x and y, x + y = y + x.

Similarly the following “statement,” (1:2-17)

let x < — 2] then 4: < x2,

would (in a discussion of real numbers) be properly translated into the statement (1:2-18)

for any real number x, if x < —2, then 4 < x2.

When there is no ambiguity, we may also follow such practices as are indicated in (1:2-15) and (1:2-17). Existence. A very important idea connected with the type of statements we have been considering concerns the meaning in mathematics of ex¬ istence. The student is used, from his first courses in mathematics, to dealing with problems for which there is a solution. Furthermore, he ex¬ pects to explicitly obtain that solution or to find a formula or rule for obtaining the solution in any particular case. A typical example is the formula for the solution of the quadratic equation ax2 + bx + c = 0. Consider now the following situation. In the study of questions of division among integers, for example to find greatest common divisors or least com¬ mon multiples, the prime numbers are very useful. These are the positive integers, other than 1, which have no integer divisors other than them¬ selves and 1. The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, ... Now it can be proved that (1:2-19)

there exist infinitely many prime numbers.

8

THE LOGICAL BACKGROUND

[CHAP.

1

This is a consequence of the following statement: (1:2-20)

for every prime number x there exists a prime number y with x < y.

However, no simple formula is known which will associate with every prime number x the next larger prime number. On the other hand, there is a simple (though tedious) method for computing this number. For ex¬ ample, given that 89 is prime, we need only compute all possible divisors of each of the numbers in the list 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, . . . Eventually, according to (1:2-20), we must come upon a prime number; in this case, it can be verified that the next prime after 89 is 97. Thus, even though we may be able to prove the existence of a solu¬ tion to a certain problem, we must in general be satisfied with some systematic method for computing the solution, in contrast to finding a formula which will “exhibit” the solution. As a second example, consider the following statement, which can be proved: (1:2-21)

there exists a real number x such that x5 — 7x2 + 2 = 0.

The intuitive reason for the truth of (1:2-21) is simple. The polynomial x5 7x2 + 2 has the value 2 at x = 0 and —4 at x = 1; since its value varies continuously between 2 and — 4 as x varies between 0 and 1, there must exist an x between 0 and 1 for which the value is 0. (This is not a precise proof; we shall be able to give precise proofs of statements of this sort in Chapter 7.) Again there is no known formula which will exhibit a solution of the given equation (and, in a precise sense which is given in courses of advanced algebra, there is probably no hope of finding one; this will be discussed in more detail in Chapters 8, 9). Moreover, no finite computation of predetermined length will end with exhibiting a specific real number x as a solution of (1:2-20). However, the following infinite sequence of computations will bring one closer and closer to a solution, if it does not end in a finite number of steps with an exact solution. First compute the value of x5 — 7x2 + 2 at 0.0, 0.1, 0.2, . . . , 0.9, 1.0. It is possible that one of these rational numbers is an exact root, and our computation is ended. Otherwise, in one of these intervals the value at the left endpoint must be positive and at the right endpoint negative. Suppose, for example, that (0.5)5 — 7(0.5)2 + 2 > 0 and (0.6)5 — 7(0.6)2 + 2 < 0. Thus a root of the polynomial lies between 0.5 and 0.6. Then we compute the value of the polynomial at 0.50, 0.51, 0.52, . . . , 0.59, 0.60, testing in each case to see whether this value is zero, positive, or negative. By continuing in this way, we can find the decimal value of a root to any desired number of places. The necessity of performing a

1.21

LOGIC

9

potentially infinite sequence of computations is less satisfactory than the case where a finite sequence will suffice; however, when there is no alter¬ native one must be content with this method of computing a solution, in contrast to finding a formula which will “exhibit” the solution. For a final example concerning the meaning of “existence” we turn to a question in analysis. In many problems it is important to find a relative or absolute maximum value for a function. Simple examples show that a function f(x) can be defined for all values of x such that a < x < b and yet have no absolute maximum in that interval. However, the fol¬ lowing theorem can be proved (we give it in Chapter 7): (1:2-22)

if a, b are real numbers with a < b and f(x) is a continuous function for all values of x such that a < x < b then there exists a number c with a < c < b for which /(c) is an abso¬ lute maximum.

If one follows a proof of this theorem, no way is seen of extracting from it a systematic procedure for calculating, say, the decimal expansion of c. In fact, it is known that there are continuous functions such that one can systematically calculate (to any degree of accuracy) the decimal expansion of f(x), given a procedure for calculating the decimal expansion of x, but such that there is no systematic procedure for calculating any value of c at which / attains its absolute maximum value.* The reader may wonder what value, or even meaning, such a statement as (1:2-22) has if we cannot be sure that in any given application we shall be able to compute the solution. In this he would be supported by a small, but hardy, group of logicians and mathematicians (known as construct¬ ivists or intuitionists). The usual position of most mathematicians on this question might be summarized by the following points: (a) there is no foundation for the optimistic belief that humans can solve all problems which they set themselves, but it is still meaningful to pose such problems; (b) a statement of existence gives us a minimum guarantee of informa¬ tion, from which we can try to proceed for more information in more specific cases [for example, in the case (1:2-22), one proceeds to investi¬ gate added conditions on the function, such as differentiability, which would permit computability of the location of the maximum]; (c) in this way we may be led to proofs of the correctness of some computational procedures via a long chain of “pure” existential results, yet there may be no way to eliminate such noncomputational statements from our arguments. * The precise statement and proof of this is based on the advanced theory of recursive functions developed in recent research papers.

10

THE LOGICAL BACKGROUND

[CHAP.

1

To summarize, we see that there are four situations which can accom¬ pany a proof of existence. First we may extract from the proof a simple formula for the solution, such as with the solution of the quadratic equa¬ tion. Second, the proof may lead us to a finite systematic computation procedure (often called an algorithm), such as in case (1:2-20). Third, the proof may lead us to an infinite systematic computation procedure, as in case (1:2-21); this is especially true in connection with various problems whose solutions are real or complex numbers. Finally, the proof may lead us to no computation procedure at all, and must rest as a pure statement of existence; this is the case in (1:2-22). The student will find these different kinds of situations intertwined throughout this book and his further work in mathematics; he should always keep a sharp eye out for the distinctions, if they are not explicitly mentioned. If he becomes worried about the small amount of attention that is paid to computation in his further courses of mathematics, he should remember that this is much more often out of necessity than out of perversity. One more point to be made about statements of existence, which is away from this main issue, is that “there exists” is to be interpreted as “there is at least one.” Thus in (1:2-21) one can see that there must be another root of x5 — 7x2 + 2 in the interval from 1 to 2 (compute the value of the polynomial at 2). Similarly, in (1:2-22) there may be many values of c in the given interval at which the function attains its maximum. In order to say that there is just one object satisfying a given condition, we usually use the words “there is exactly one” or “there is a unique.” For example, the following is true: (1:2-23)

there is a unique real number x such that x5 — 7x2 + 2 = 0 and 0 < x < 1.

Logical connectives. We return now to a further analysis of the logical structure of mathematical statements. Certain other words, besides “all” and “some,” serve a purely logical function, i.e., can be used to build more complicated conditions or statements from simpler ones in such a way that the meaning and truth of the more complicated conditions are completely determined by the simpler ones. Examples of such words are “not,” “and,” “or.” However, the meaning of “or” in mathematics is not completely determined by normal usage; the mathematical convention is that it be used in the nonexclusive sense, i.e., the resulting statement is true if either of its parts is true, and also if both of them are true. Thus the statement (1:2-24)

for all integers x, either x < 2 or 2 < x,

is counted as true, but would not be so counted under the exclusive inter¬ pretation of “or. ” It may be that we do not give enough credit to the non-

1.2]

LOGIC

11

exclusive sense of “or” in ordinary language. Thus if we grant the state¬ ment “We must have much bigger rocket engines or we will not be able to put a man on Mars,” we should keep in mind the possibility that we may have much bigger rocket engines and still be unable to put a man on Mars. Other statement connectives which are quite commonly used in logic and mathematics, much more than in ordinary discourse, are provided by the words “if—then ...” and “if and only if. ” A condition which is formed using the first of these, such as if x < —1 then 9 < x2

(1:2-25)

is called an implication. The first part of it, £ < — 1, is called the hypoth¬ esis or antecedent and the second part, 9 < x2, is called the conclusion or consequent. The use of implication in mathematics differs in certain respects from ordinary usage. Very often we intend to convey, in every¬ day language, that there is some sort of cause and effect relationship between the hypothesis and conclusion of an implication. Examples of such are “If you eat that green apple (then) you will get sick,” and “If you do that again (then) I’ll spank you. ” It seems very difficult to try to give this sort of relationship a precise meaning, especially in connection with mathematical statements. The simplest precise way to provide a uniform treatment of implication is to demand that the truth or falsity of an implication depend only on the truth or falsity of its components, and should not necessarily depend on the sense of the components. We see only one way to show that an implication is false, namely by showing that the hypothesis is true while the conclusion is false. In all the other (three) cases, under this understanding of implication, the implication should be counted as being true. Consider, for example, the following instances of (1:2-25): (1:2-25')

(a)

if — 4 < —1 then 9 < (—4)2;

(b)

if

4 < -1 then 9 < 42;

(c)

if

3 < -1 then 9 < 32;

(d)

if— 3 < -1 then 9 < (-3)2.

Of these four, there is only one in which the conclusion is false and the hypothesis true, namely (d); hence in all other cases the implication is true. This may go slightly against the grain, especially in cases (b) and (c), but only if we try to think of the conclusion as “necessarily following” from the hypothesis. The hypothesis in both these cases is false, and one often says, in such cases, that the whole implication is vacuously true. In case the reader still has doubts he should compare these with such

12

THE LOGICAL BACKGROUND

[CHAP.

1

statements as “If you high jump seven feet then I’ll eat my hat. ” [Which of (1:2-25,)(b), (c) is this like?] Only (l:2-25')(d), of the four cases given, shows that (1:2-25) is not true for all integers x. The reader may feel that it is an academic matter to discuss implica¬ tions which have a false hypothesis. This is not the case, since a number of arguments in mathematics, most notably proofs by contradiction (or reductio ad absurdum) involve just such situations. In these proofs we show that a certain statement (2 is not true as follows: we imagine that (2 is true and we infer from this another statement, (B, which is known to be false. In other words, we prove (1:2-26)

if (2 then (B.

Assuming that the inference is correct, i.e., that (1:2-26) is true, it follows from the falsity of that (2 cannot be true. Of course, if we knew in ad¬ vance that (2 is not true, we would not be interested in the implication (1:2-26), since it is vacuously true. We shall give no examples of such proofs by contradiction now; several of these, which we shall point out explicitly, will be found later in the text. There is one sense in which the notion of implication is used in mathe¬ matics which is more closely related to the everyday sense of “necessary consequence. ” That is when we say that one statement (2 implies another statement (B, by which we mean that (B can be logically inferred from (2 on the basis of the initial axioms. From the point of view of formal logic, this will happen only in the case when the statement “if G then «” can be inferred from the axioms, so that again there is no necessary connection between G and (B. However, from the informal point of view, we usually concern ourselves only with implications involving statements whose contents are somehow related. We now turn to the use of the words “if and only if. ” A condition such as (1:2-27)

\x — 2\ 9. For all positive integers x, if x > 2 then x2 > 9. For all real numbers x and y, if x < y and y ^ 0 then (x/y) < 1. For all integers x, x2 < 16 if and only if —4 < x and x < 4. For all integers x, x3 < 27 if and only if x < 3. For all integers x, (x3 — 1)/13 < 2 if and only if x < 2.

2. Which of the four possible combinations of truth and falsity in the hypoth¬ esis and conclusion can be realized by substituting particular integers for x in the following condition? (Give examples of each.) If x2 > 9 then x < .0 or x > 2. Is this condition true for all integers x?

CHAPTER 2

THE SET-THEORETICAL BACKGROUND 2.1 Sets.

Sets as abstractions from conditions. two conditions:

Consider the following

(2:1-1)

x is a real number and \x — 2| 5 then x & S.

For most sets it is humanly impossible to explicitly list all elements of the set; these are the infinite sets. Those for which it is (at least in principle) possible to completely list the elements, given enough time and space, are called finite. This explanation of the word “finite ” is, of course, very vague. Most people would regard the notions of finiteness and infinitude as being intuitively clear and acceptable as basic undefined concepts of our development. However, we shall see in Section 2.4 that it is possible to define these notions in terms of simpler undefined concepts in such a way as to accord with our intuitive understanding of them. I'or the finite sets we have another natural notation besides the basic notation of (2:1—6), namely, that obtained by writing down descriptions of each of the elements of the set, and enclosing the result in braces. Thus

(2:1-12)

{-5, tt + 2, W2)

denotes the set whose only elements are —5, tt + 2, y/2. example, we have

(2:1-13)

As another

{x: x is an integer and \x — 2| < 3} = (0, 1, 2, 3, 4}.

With this notation for a finite set it is immaterial in what order the ele¬ ments are listed or even whether they are listed more than once. Thus {?r

2, -5, W2}, —5, v/2, 7r + 2,

(W2, -5, —5, 7r 2 • 0.

It is not difficult to see (by factoring 2x2 — 3xy + y2 and considering the different possibilities for the factors) that for any x, y, (2:3-1) is equivalent to the following condition ffi(x, y): (2:3-2)

x, y are positive integers and y < x or 2x < y.

This second condition makes it easier to see which numbers x, y are “solu¬ tions” of (2:3-1), in the sense that they make a(x, y) true. For example, foi x 1 we have solutions y = 1 and y = 2, 3, ... ; for x = 2 we have solutions y = 1, 2 and y = 4, 5, . . . ; for * = 5 we have solutions y ~ 4, 5 and y = 10, 11, 12, ... ; etc. We cannot possibly list all solutions x, y, since there are infinitely many of these; but we can imagine a kind of infinite list which one could look into, to see whether or not a given pair a, b is a solution. Let us, for the moment, use the notion of a list m this extended sense. Schematically, such a list could be indicated as in the table:

X

l

l

1

.

.

2

2

2

.

.

5

5

5

...

y

l

2

3

.

.

1

2

4

.

.

4

5

10

...

Now since (2:3-1) and (2:3-2) are equivalent for all values of x, y, the list of values associated with condition (2:3-1) is exactly the same as that associated with condition (2:3-2). In other words, such a list serves the

2.3]

RELATIONS AND FUNCTIONS

37

same purpose with respect to conditions involving two free variables as does a set for conditions with one free variable. Unfortunately, the notion of a list carries with it some connotations (such as “can be written down on paper” and “given in a certain order”) which should be avoided. Thus it is necessary to carry our abstraction one step further. Suppose we were presented with the condition (2:3-2) in the following form: (2:3-3)

(_), (. . .) are 'positive integers and (...) < (_) or 2(_) < (...).

Now what does it mean that a given pair of integers satisfies this con¬ dition? If we speak of the pair 1, 2 it evidently doesn’t matter whether we place 1 for (_) and 2 for (...), or if we do just the opposite. On the other hand, if we speak of the pair 2, 3 we get different results according as we place 2 for (_) and 3 for (. . .) or conversely, for in the first case the condition is not satisfied, while in the second it is. Hence the order in which a given pair of integers a, b is presented and the manner in which these are to be associated with the free variable (“empty places”) of a condition must be specified. This leads to the concept of an ordered pair of objects a, b; we shall denote such by (2:3-4)

(a, b).

Ordered pairs and cartesian products. The ordered pair (a, b) stands in contrast with the unordered pair {a, b} which we have already discussed. For though we have {a, b} = {b, a), it is essential to the concept of ordered pair that we have (a, 6) ^ (6, a), unless a = b. More generally, we have (2:3-5)

(a, b) = (c, d) if and only if a = c and b = d.

We trust that it is no more difficult for the student to grant the existence of objects (a, b) with this property than it is to grant the existence of sets; in other words, we take the idea of ordered pair here as being a primitive undefined notion. However, it is possible by a slightly sophisticated trick to define it in terms of more basic notions (compare the first exercise at the end of this section). Having ordered pairs, the next step is easy. Instead of talking about (possibly infinite) lists, we simply talk about sets of ordered pairs. For example, associated with the condition (2:3-2) is (2:3-6)

the set of all ordered pairs (x, y) such that x, y are positive integers and y < x or 2x < y.

Among members of this set we find the pairs (1, 1), (1, 2), (1, 3), ■ • ■ , (2, 1), (2, 2), (2, 4), . . . , (5, 4), (5, 5), (5, 10), . . . ; among nonmembers

38

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

we find (2, 3), (3, 4), (3, 5), . . . , also ( 1, 2), (t, 1), etc. More generally, given any condition a(x, y) it seems that we can associate with it (2:3-7)

the set of all ordered pairs (x, y) such that a(x, y).

In Section 2.1 we made certain reservations about the unrestricted formation of sets; presumably, similar considerations should apply here. It is not clear whether it makes sense to form such “large ” sets of ordered pairs as {(X, Y):X,Y are sets and X c F) or {(X, F): X, Y are sets and X g F). To avoid the possibility of paradoxes and yet provide sufficient freedom in forming such sets as desired in (2:3-7), the following state¬ ment is provided in axiomatic set theory: (2:3-8)

for any sets A, B there exists a set C such that for all z, z G C if and only if for some x, y we have z = (x, y) and x G A and V e B.

In other words, C has as members those, and only those ordered pairs (x, y) for which x G A and y G B. This set C is denoted by (2:3-9)

AxB

and is called the cartesian product of A and B (after the philosophermathematician Descartes). It is seen that (2.3-10)

if z g A X B then there are unique x, y such that x G A, y £ B, and z = (x, y).

We call x the first term of 2 and y the second term of 2. We can now apply the principle (2:2-39) of the preceding section to see that (2:3-11)

for any sets A, B and condition a(x, y) there exists a set W such that for all z, z G W if and only if z G A X B and, for the unique x, y such that z = (x, y), we have Gt(x, y).

This is the set IF = {2: for some x G A, y g B, a(x, y) and or, as we shall write more economically, (2:3-12).

2

= (x, y)\

W = {(x, y):x G A, y G B and a(x, y)}.

In particular, the set of (2:3-6) is denoted by (2.3-13)

{(x, y): x G P, y g P and y < x or 2x < y].

If A, B are finite sets then A X B is also finite, for we can list com¬ pletely all possible combinations (x, y) of elements s of A with elements

2.3]

Figure 2.16 y of B.

39

RELATIONS AND FUNCTIONS

Figure 2.17

(In case A = 0 or B = 0 then A X B = 0.)

For example, if

A = (-2, 0, 5}, B = {3, 5} then AxB = {(-2, 3), (-2, 5), (0, 3), (0, 5), (5, 3), (5, 5)};

in this case A X B has six (distinct) elements. In general, if A is a finite set with n distinct elements a\, . . . , an and B is a finite set with m distinct elements bu ...,bm then A X B has the n ■ m distinct elements (di, bi), . . . , (ai, bm), (a,2, b 1) • • • , (a2> bm), . . . , (a n, b 1), . • ■ , (fln, A simple geometrical interpretation is given in Fig. 2.16. The points in¬ dicated by the dots correspond to the elements of the sets A, B, while those indicated by the crosses correspond to the elements of A X B. For example, the second point directly above a3 corresponds to the element (a3, b2). The same sort of geometrical interpretation can be visualized for infinite sets, for example for P X P. Every subset of P X P then cor¬ responds to a certain subset of the set of intersections of the vertical and horizontal lines. This is indicated in Fig. 2.17 for the set {(x, y): x G P, y e P and 2x < y + 1 or y < x}. Domain, range, and converse.

In general, if A, B are sets and W c

A x B then W is said to be a relation between elements of A and elements

If (a, 6) G IF we say that the relation holds between a and b; in some cases this is also written aWb or Wab. For example, if IF = {{x, y): x G P, 2/ e P, and x < y}, then we usually refer to IF as the less-than relation; we have here the choice of writing (a, b) G IF or a < b, and it would not be out of the way to use the symbol < to denote the relation itself and write la, 6) G and x lies on V} with domain Pt and range L. More often the domain and range are not explicitly given in some definition of a relation but must be deduced from it. For example, the relation IF = {(x, y): x S I, y G I, and x2 + Ay2 < 16} is seen to have the domain (—4, —3 -2, -1, 0, 1, 2, 3, 4} and range (-2, -1, 0, 1, 2,}. Of course, a relation does not in general hold between all elements of its domain and of its range. There are many pairs in the preceding example which do not belong to W but which do belong to 35(IF) X (R(IF). A way to geometrically visualize the domain and range of a relation in A X B is given in Fig. 2.18 The singlehatched area corresponds to A X B, the crosshatched area to IF and the heavy lines to the domain and range of IF. The rectangular area bounded by the two pairs of dashed lines corresponds to 55(IF) X (R(IF)

2.3]

41

RELATIONS AND FUNCTIONS

B

«0U)

©07)

4

Figure 2.18

Since relations are just special kinds of sets, it follows that the con¬ dition (2:1-23) for the identity of two sets can be applied equally well to relations. However, every element of a relation is an ordered pair, so we can replace the condition in this case by the following more special one: (2:3-19)

if U, W are relations then U = W if and only if, for all x, y, (x, y) G U if and only if (x, y) G IF.

In particular, if we consider relations defined by certain conditions, we have (2:3-20)

{(x, y)\ x G A, y G B,anda(x, y)} = {(x, y):x G A,y G B, and (x, y)} if and only if for all x G A and y G B, d(x, y) is equivalent to (x, y).

In this respect relations play a role for conditions with two free variables which is completely analogous to the role played by sets for conditions with one free variable; they serve to identify equivalent conditions. We must, however, be cautious about one point in the analogy. Whereas there is at most one set associated with each condition Q(x), there are in general two relations associated with conditions a(x,y). To see this, return to the form (2:3-3) in which we expressed a certain condition using symbols _ and . . . instead of variables x and y. In this form there is no reason to prefer one symbol to vary over the domain of the relation and the other to vary over the range. Associated with the given condition are two rela¬ tions IF and W, one consisting of all pairs (_,...) satisfying the condition, while the other consists of all pairs (...,_) satisfying the condition. We can say that the relations W, IF are connected in the following way: (2:3-21)

for all x, y, (x, ly) G W if and only if (y, x) G W.

In such a case W is said to be the converse of IF; hence also IF is the con¬ verse of W. Consider, for example, the relation IF = {(1, 1), (2, 1), (3, 2)}

42

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

with £>(TF) = {1,2,3}, (R(W)j= {1,2}.

The converse of W is W = (Cl, !); (1, 2), (2, 3)} with SD(JF) = (1, 2}, (R{W) = (1, 2, 3}. Geometrically, the two relations are compared in the following figure.

l

2

3 Figure

2.19

Although a relation W and its converse W are in general distinct, it is possible to deduce from any property of IT a corresponding property of W by means of the equivalence (2:3-21). Hence in a discussion of the set of solutions of a condition it makes little difference which of the two associated relations one considers. The important thing is to make clear in advance which is being studied. Ternary {etc.) relations. The step to the treatment of conditions with more than two free variables is now clear. For conditions with three free variables we should use ordered triples (2:3—22)

(a, b, c);

these should have the basic property (2:3-23)

(a, b, c) = (e,f, g) if and only if a = e, b = f, and c = g.

It turns out in this case that we do not need to take this as a new primitive notion so long as all we demand of this notion is that it fulfill (2:3-23). If we define (2:3-24)

(a, b, c) = ((a, b), c),

then we can deduce (2:3-23) from the basic property (2:3-5) of ordered pairs. This also leads us to define (2:3-25)

A X B X C = (A X B) X C,

so that A X B X C is the set of all triples (x, y, z), in the sense of (2:3-24) such that x E A, y e B, and z e C. For example, for A = (—2 0 5^B = (3, 5} and C = {0, 3}, 1

2.3]

RELATIONS AND FUNCTIONS

43

This is not the same as the set A X (B X C). For example, (—2, 3, 0) = ((—2, 3), 0) by definition, which is distinct from (—2, (3, 0)). However, there is a clear one-to-one correspondence between the elements of the two sets. It is now easily seen how one would define the notion of ordered quadruple (a, b, c, d) and the product A X B X C X D, and so on, for larger numbers of factors, and in this way see how to treat conditions with arbitrarily many free variables. We thus have for any specified positive integer n a notion of ordered n-tuple, which agrees with that of ordered pair for n — 2 and of ordered triple for n = 3. We have defined a relation as being a subset of A X B for some A, B or, equivalently, (2:3-17) as being any set of ordered pairs. (Then 0 is a relation, since 0c4 X B for any A, B.) Under our definition (2:3-24), every ordered triple (a, b, c) is at the same time an ordered pair, although, of course, the converse is not true. Thus every set W of ordered triples is a set of ordered pairs ((a, b), c) and hence is a relation. It is, however, a relation of a more special kind, which we call a ternary relation. (More generally, using the notion of ordered n-tuple, we could single out for any specified positive integer n, the n-ary relations.) We could, if we wished, refer to an arbitrary relation as being a binary relation, but this only serves to re-emphasize the fact that it is a set of ordered pairs. A nonmathematical example of a ternary relation is provided by the set W = {(x, y, z): x, y, z are people and 2 is a son of x and y}. It is seen that there are many a, b for which there is no c with (a, b,c) G W; for example a, b may not be married or may be married but have no son. On the other hand, every human male c is the son of some a, b, so that (ft(W) = the set of human males. W also has the property that if (a, b, c) G W then (b, a, c) G IF; it does not have the property that if (a, b, c) G IF and (a, b, c') G IF then c = c'. A mathematical example of a ternary relation is provided by the set IF' = {(x, y, z): x and y are odd prime numbers and z = x + y). Let 0 be the set of odd prime numbers 3, 5, 7, 11, ... , and let U6 be the set of even numbers z > 6. Then £>(W') = 0X0 and (R(IF') c U6; it is a famous open question (Goldbach’s problem) whether (R(fF') = E6. W' has the property that if (a, b, c) G W' then (6, a, c) G IF'; it also has the property that if (a, b, c) G IF' and (a, b, c') G IF' then c = c'. In mathematics we are often con¬ cerned with various binary relations in a set S (between elements of $), i.e., with subsets oi S X S. There is an algebra attached to such relations analogous to the algebra of subsets of an arbitrary set. It makes sense to ask of any two relations IF, IF' in S whether IF c IF'. I or example, Operations on relations; composition.

{(*, y): x, y G I and x < y} Q {(x, y): x, y G I and x < y), but the c does not hold in the reverse direction. Under c, 0 is the smallest relation

44

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

in S and S X /SAs thejargest. If W, W' are relations in S then IF n IF', 11 U IF', and IF(=IF(,SX'S,) = (S X S) — W) are again relations in S. For example, {{x, y): x, y el and x < y} U {(x, y):x,y el and x = y] = {(x, y): x, y

e I and x < y}

and (in I X I) (0, y): X, y e I and x < y} = {(x, y): x, y e I and y < x}. Now the operations n, U, “ correspond to the use of the words “and,” “or, ” and “not” as applied to defining conditions of sets (2:2-2). The words for every and for some have also been used in connection with opera¬ tions on sets, namely D and U (2:2-12), but in this case the variables were sets, not elements. An operation on relations which uses the words “for some” attached to elements is that of forming the domain, (2:3-15). However, ©(IF) is not, in general, a relation, even if IF is. It has been found that the following is an appropriate and useful operation on relations, to lead again to relations. (2:3-26)

W; IF' = {(x, y):for some z, (x, z) e IF and (z, y) e IF'}.

IF; IF' is called the composition of IF and IF'. (Some writers use the symbol IF ° IF' for this.) For example, if IF = {(x, y): x is a son of y} and IF — {(x, y). x is a child of y} then IF; IF' = {(x, y): x is a grandson of y} ■ lfW = (Of y):x,y El and x < y) then IF; IF = {{x, y): x, y e I and x + 1 < y}. A similar operation on relations using the words “for all” can also be defined, but we would find no use for it here. The following are some interesting mathe¬ matical relations in the set I of integers: the identity relation {{x, y): x, y e I and * = y}; the less-than relation {(x, y): x,y e I and x < y}; the lessthan-or-equal-to relation {{x, y): x,y el and x < y}; the divisibility rela¬ tion {(x, y):x,y el and x is a divisor of . We also have a relation between subsets of I, the inclusion relation {(A, F): X c I, fc} and X c Yx Special kinds of relations.

These have various characteristic properties, which we now describe for an arbitrary relation IF in a set S. (2.3-27)

II is said to be reflexive (in S) if for all a e S, (a, a) e W ; IF is said to be irreflexive (in S) if for all a e S, (a, a) e IF; IF is said to be symmetric if whenever (a, b) e IF then (b, a) e IF; IF is said to be antisymmetric if whenever (a, b) e W and (b, a) e IF then a = b; IF is said to be transitive if whenever (a, b) then (a, c) e IF.

e W and (b c) e IF

2.3]

RELATIONS AND FUNCTIONS

45

If no set S is specified, we assume S = 3D (IF) U ffi(TF). For the relations in the integers described above we have that the identity relation is reflexive, symmetric, and transitive (it is also antisymmetric); the lessthan-or-equal-to relation is reflexive, antisymmetric, and transitive (but not symmetric). The student should also classify the other relations with respect to these properties. Equivalence relations and 'partitions. We now define: (2:3-28)

IF is said to be an equivalence relation (in S) if W is reflexive (in S), symmetric, and transitive.

Equivalence relations are very much like the identity relation. Consider the following two relations: IF = {(x, y): x, y £ I and x — y is a multiple of 3}, W' = {(x, y): x, y E Re and x — y E Ra}. The first of these is an equivalence relation in the integers, the second in the real numbers. If we write a = b instead of (a, b) e IF, we have . . .

—6

= —3 = 0 = 3 =

. . . — 5 = —2 =

1

6

==

9 = . . .

= 4 = 7 = 10 = . . .

. . . —4 = —1 =52^5 =

8

= 11 = ...,

where, by transitivity, the relation = holds between any two elements in the same row, while, on the other hand, it never holds between two elements from different, rows. Thus if we put X0 ={..., —6, —3, 0, 6, 9, . . .}, Xx= {..., -5, -2, 1, 4, 7, 10, . . .}, X2 = {. . . , -4, 2, 5, 8, 11, . . .}, we see that X0, X1} X2 are pairwise disjoint sets such that every integer is in one of the sets, hence I0UliUl2 = I, and such that for any a, b, a = b if and only if a, b belong to the same set Xi. Moreover, if a is any element of I, the set Wa = {x: x E I and x = a} is one of the sets X0, Xx, X2; hence for any a, b either Wa = IF& or Wa n IF5 = 0. Clearly, a = b if and only if Wa = IF5. Much the same situa¬ tion holds with the relation IF', except in this case we cannot conveniently list all the associated sets. However, we can describe the process in general terms as follows: (2:3-29)

Suppose that W is an equivalence relation (in S). Let Wa = (x: x G X and (x, a) e IF} for each a G S. Let M be the collection of all sets IFa for a e S. Then if X, Y e M either X — Y or X D Y = 0; further, U-^[W £ M] = S. Finally, (a, 6) £ IF holds if and only if there is an X E M such that a, b £ X, and also if and only if Wa = Wb-

46

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

The sets in M are called the equivalence sets associated with W. One often writes [a] instead of Wa when working with some fixed relation W. (2:3—29) shows that the equivalence relation W in S corresponds directly to the identity relation in M. (2:3-30)

A collection M of subsets of S is called a partition of S if, first, for any X £ M we have 1^0 and, second, for any X, Y £ M either X = Y or I n F = 0 and, finally, X[X e ] = S.

u

M

Then we have: (2:3-31)

Suppose that M is a partition of S. Define a relation W in S by: (a, b) E W if and only if there is an X e M such that a, b £ X. Then W is an equivalence relation in S whose as¬ sociated equivalence sets are just the members of M.

We leave it to the student to verify this. (2:3-29) and (2:3-31) show that we have a direct correspondence between equivalence relations and parti¬ tions. The identity relation has other interesting mathematical properties. For example, if a, b, c are integers and a = b then a + c = b + c and a ■ c = b ■ c. To what extent are these properties shared by other equiv¬ alence relations? For example, if = is the relation defined above, so that a = b if and only if there is a u such that uel, with a — b = 3 • u, we see that a = b implies a + c = b -f- c [compute (a + c) — (b -j- c)] and a - c = b ■ c [compute (a ■ c) — (b • c)]. For the relation W' between reals, which we write now =', so that a = b if and only if a — b e Ra, we have again a =' b implies a + c =' b + c, but we cannot in general infer that a • c =' b • c (for 1 =' 0 but 1 • \/2 0 • \/2.) Equivalence relations which do have such additional algebraic properties will prove to be very useful in our development. Functions. We turn now to the study of another very important class of relations, the functions. The definition of these given in set theory is intended to make certain intuitive notions precise. One of these intuitive concepts has a physical source, namely that one physical quantity is strictly determined once other related quantities are fixed. For example, the distance y which a dropped body falls during a given period of time x depends (in the simplest physical analysis) only on that period of time, and does not depend, say, on the mass of the body. It was natural for physicists to hope that such relationships could be characterized exactly by mathe¬ matical “laws.” This related to the mathematical notion of a calcula¬ tion procedure which associates with every value of some quantity x

2.3]

RELATIONS AND FUNCTIONS

47

another strictly determined quantity y. Thus for the above physical situation, experiment suggests that y = 16a;2, when x represents seconds and y feet. We can easily conceive, though, of such situations of regularity in nature for which we can find no “law” which will accurately reflect the relationship between the quantities. This general notion of a deter¬ minate or functional relationship which is not necessarily tied to any particular way of expressing it can be explained precisely in the language of relations as follows: (2:3-32)

a relation F is said to be a function if for each x G 3D(F) there is a unique y with (x, y) e F; this unique y is denoted by F(x).

Since whenever F is a relation and x G 3D(F) there is at least one y with Cx, y) G F, the property of being a function can be re-expressed: (2:3-33)

a relation F is a function if and only if for any x, y\, and y2, (x, yi) G F and (x, y2) G F implies tji = y2.

The notion of a function as provided by a law is contained in the above notion by means of the construction of relations from conditions &(x, y). Thus, for example, the function associated with the relationship y — x3 — 1 between real numbers is defined by (2:3-34) (a)

F = {(x, y): x, y & Re and y — x3 — 1},

or, equivalently, by (2:3-34)(b)

F = {(x, x3 — l):x G Re}

(since whenever x G Re, also x3 — 1 G Re). usual practice, we could also write (2:3-34) (c)

In greater accordance with

F is the function with domain Re such that for each x G Re, F(x) = x3 — 1.

It should be noted that this particular function F is distinct from the function G = {{x, y)\ x, y G I and y = x3 — 1}. As relations, we have G c F, since whenever (x, y) G G, also (x, y) G F; but G ^ F, since £>((?) = I and 3D(F) = Re. Thus the concept (2:3-32) of function is sharper than the vague concept of law. This is as it should be, for F and G have many different properties. For example, for each y G Re there is an x G 3D(F) with F(x) = y; in other words (R(F) = Re. But it is not true that for each y G I there is an x G 3D((7) with G(x) = y (5 is not x3 — 1 for any x G I).

48

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

Even if one is primarily interested in functions defined by particular conditions, the above general notion of function is of great usefulness. For example, we may obtain some result about all continuous functions; then whenever we recognize that a particular condition defines a func¬ tion of this type we know immediately that the result applies. In this respect, arbitrary functions stand to particular laws as variables stand to constants. The graphical interpretation of relations is very naturally applied to functions. Usually, we have to deal with a function whose domain and range are contained in some preassigned set, for example the set Re of real numbers. Then the function is a subset of Re X Re. We picture certain reference sets in Re X Re, the “x-axis” consisting of all pairs (x, 0) for x G Re and the “yards” consisting of all pairs (0, y) for y e Re; these intersect in (0, 0). (Thus a point known to be on the x-axis can be uniquely labeled by the number x.) The graph of a function in Re X Re might then look as follows.

The conditions for a relation F to be a function is simply interpreted by the statement that no point in the graph of F lies directly above another point of the graph, for otherwise we would have (x, y) e F, (x, y2) E F with yi y2. On the other hand, to any given y there may correspond no value or many values of x such that (x, y) e F. A notion which appears frequently in mathematics courses is that of an implicit function. For example, the condition x3 — y = 1 implicitly determines y as a function of x; explicitly, y = x3 — 1. It is also said that x2 + y2 = 1 implicitly determines y as a function of x; formally, V = V1 — ^2- Here, however, the situation is more subtle. We must first determine which kinds of numbers are to be considered. If these are to be real numbers, we cannot allow x > 1 or x < — 1, i.e., we must have 1 < x < 1. Second, if we settle, for the sake of definiteness, that Va signifies the (unique) positive square root of a when a > 0, we see that there are at least two functions determined by the condition: y = \/l — x2 for —1 < x < 1, or y = —y/l — x2. Actually, there are many such functions; for example, another is y = — x2 when — 1 + (n/2) < x < —1 + (n + l)/2, n = 0, 1, 2, 3. The graphs of

2.3]

RELATIONS AND FUNCTIONS

49

Figure 2.21

The union of the first two graphs is just the set of all (x, y) such that x2 + y2 = 1; however, that set is clearly not a function. A precise definition of what it means for F to be an implicit function associated with a given condition d(x, y), within a preassigned domain S, might run as follows: the domain of F is {x \ x G S and for some y G S, &(x, y)} and for each x G 3D(F), F{x) G S and d(x, F(x)) holds. Given a condition d(x, y) and set S, for each x G S let Wx = {{x, y):y G S and CL(x, y)}; then let D = {x\ Wx ^ 0}. For any Xi, x2, if WXl 9^ WX2 then X]_ x2, hence WXl n WX2 = 0. Let M be the collection of all sets Wx for x G D. By the axiom of choice there is a set F such that F n Wx contains exactly one element for each x G D, Hence for each x G D there is a unique y G S with (x, y) G F; moreover, (x, y) G Wx, so that d(x, y). Thus there always exists at least one implicit function associated with d(x, y) and S. The problem of implicit functions in calculus goes deeper: in which cases can we prove the existence of at least one im¬ plicit function satisfying additional conditions of continuity, differentia¬ bility, etc.? It is with respect to such functions that, say, rules of ‘'im¬ plicit differentiation” are supposed to have significance. Closely related to the implicit functions are the so-called multivalued functions. Authors who use this term often refer to the notion of function presented here as being that of a single-valued function. For example, it is known that with every complex number z 9^ 0 is associated exactly two complex numbers w with w2 — z. The question of distinguishing between these two square roots of z is not as simple as in the case of real numbers, since we cannot speak of positive or negative complex numbers. The equation F(z) = -s/z does not define a function in our sense of the word. There are two approaches to this problem in the theory of complex numbers. One is to speak of the branches of the “function” \/z, i.e., of certain single-valued functions which together provide both square roots for every number z. The second is to expand the notion of complex number by the use of Riemann surfaces. For the square-root function, in place of a single complex number z ^ 0 there will now be two numbers Z\, z2 on the associated surface, and one single-valued function F such

50

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

that F(zx) is one square root of 2 and F(z2) is the other. In this book we shall always use the word “function” in its single-valued sense, i.e., according to (2:3-32) or (2:3-33), and we shall treat situations which lead to multivalued functions in terms of these. A ternary relation, i.e., a set of ordered triples {{x, y), 2), can also be a function. The condition for F to be such is given by (2:3-33): for any x> Vi zi> z2, if ((%, y), Z\) e F and ((x, y), 22) E F then zx = z2. Accord¬ ing to our definitions, the unique 2 associated with (x, y), if there is any, is denoted by F((x, y)); for simplicity, we shall instead denote it by F(x, y). If a function F is a ternary relation we shall call it a binary func¬ tion (function of two arguments or variables). Although it is not necessary to qualify it in this way, we can refer to an arbitrary function (which is not otherwise specified to be binary) as being a unary function (function of one variable). Functions of more than two variables can be treated similarly, so that for any specified positive integer n, we can talk of functions of n variables, or, simply, of n-ary functions. The clearest way to express that a function is, say, binary is to describe its domain, for example 2D (F) = S X S. In algebra it is customary to use the word operation instead of function, but these have exactly the same meaning. Thus when we speak of the operation of multiplication on real numbers we mean the function F, with domain Re X Re, such that for all a, b £ Re, F(a, b) = a • b. An opera¬ tion is called unary, binary, etc., under the same conditions as a function. Thus multiplication is a binary operation.

(2:3-35)

Suppose that G is an operation with domain S and that A c S; A is said to be closed under G if whenever also G{x) e A. Suppose that F is an operation with domain S X S and that A c S', A is said to be closed under F if whenever x, y E. A also F(x, y) e A.

Consider, for example, the operation of inversion of nonzero real numbers: = Re {0}, G(x) = l/x. The set Ra — {0} of nonzero rationals is closed under inversion, but the set I - {0} of nonzero numbers is not. The operation of subtraction on real numbers is the function F with 3D(F) = Re X Re, F(x, y) = x — y. The set I of integers is closed under subtraction, but the set P of positive integers is not.

Congruence relations. The statements one finds in geometry connecting equality with operations, such as “if equals are added to equals, the results are equal, ” are seen to hold trivially when translated into our language of relations and functions. In fact if F is any binary operation and ax = a2

2.3]

RELATIONS AND FUNCTIONS

51

and bi = b2 then F(ax, 5X) — F(a2,b2) whenever (ax, 6X) G 3D(F); for ((ax, bi), F{a\, b\)) E F, hence by (ax, 6X) = (a2, b2), also ((a2, b2), F(ai, bi)) E F, therefore F(ax, 5X) is a z for which ((a2, b2), z) G F—but there is only one such z, which we have called F(a2, b2). Similar statements hold for functions with other numbers of arguments. On the other hand, if = is an equivalence relation in a set S closed under an operation F, we have seen that it need not be true that if ax = a2 and 5X = b2 then F(ax, 6X) = F(a2, b2). The cases in which this is true are of special interest: (2:3-36)

Suppose that W is an equivalence relation in a set S; we put a = b if (a, b) E IT. (i) If G is a unary operation with D(G) = S, (R(G) ^ S, then W is said to be a congruence relation with respect to G if whenever ax = a2 we have (7(ax) = G(a2). (ii) If F is a binary operation with 3D(F) = S X S, (R(F) Q S, then W is said to be a congruence relation with respect to F if whenever ax = a2 and 6X = b2 we have F(ax, 6X) = F(a2, b2).

We apply the words “congruence relation” to a IT only if we already know that W is an equivalence relation. It is clear how the notion of congruence relation can be applied also to functions of more than two arguments; however, we shall have no occasion to use such. Sometimes it is also convenient to define the notion: IT is a congruence relation with respect to the relation U. If U is, for example, binary, and = is taken as above, this holds if whenever ax = a2 and 6X = b2 and (ax, 6X) G U then (a2, b2) E U. The property of being a congruence relation IT with respect to, say, a binary function F can be rephrased by saying that the equivalence set which contains F(a, b) is uniquely determined by the equivalence sets of a, b, respectively; more briefly, that F is well defined with respect to IT. In other words, we are led to a function on equivalence sets: (2:3-37)

Suppose that IT is a congruence relation in a set S with respect to a binary operation F on S. Let M be the collection of equiva¬ lence sets [a] = Wa, for a E S. Then there is a binary opera¬ tion F with domain M such that for each a, b E S, F([o], [b]) = [F(a, 6)].

52

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

We need only see that the relation consisting of all triples (([a], [6]), [F{a, 6)J) is a function; if ([a'}, [6']) = ([a], [6]) then a = a', b = b' and hence F(a, 6) = F(a', b'), i.e., [F(a, 6)] = [F(a', 6')]. Consider, for ex¬ ample, the equivalence relation in the integers, W = {{x, y): x, y e I and x — y is a multiple of 3). We have three equivalence sets, [0], [1], [2]. It is easily seen that the operation F(a, b) = a + 6 is well defined with respect to this equivalence relation. For this we have by (2:3-37) an as¬ sociated operation F(X, Y) on equivalence sets, which we denote by X ® Y. Then it can be seen that [0] © [1] = [1], [1] © [2] = [3] = [0], [2] © [2] = [4] = [1], etc. More compactly:

©

[0]

[1]

[2]

[0]

[0]

[1]

[2]

[1]

[1]

[2]

[0]

[2]

[2]

[0]

[1]

1 he same sort of construction of functions on equivalence sets as given in

(2:3-37) can evidently be carried out for functions of one or any number of arguments, whenever the equivalence relation is a congruence relation with respect to such functions. Similarly, if W is a congruence relation with respect to a binary relation U, we can unambiguously define an associated relation U between equivalence sets by: ([a], [6]) e U if and only if (a, 6) e U. Converse and composition of functions. Since functions are special kinds of relations, we can also apply to them the notions of converse and com¬ position. Then the converse of a (unary) function F is the set of all pairs (v> x) such that (x, y) £ F, i.e., is the set of all pairs (F(x), x) for x E 36(F). It is seen that the converse of a function need not be a function. For ex¬ ample, for the function F(x) = x2 with domain I, both (2, 4) and (—2, 4) are in F, hence (4, 2), (4, —2) are in its converse. On the other hand, the converse G of the function G(x) = 2.r T- 1 with domain I is a function; for suppose (y, aq), (y, x2) G G, then y = 2xi + 1 and y = 2x2 + 1, hence 2aq + 1 = 2x2 + 1, and aq = x2. The domain of this function G is the same as the range of the function G, namely the set of all odd integers. Formally, we could define G(x) = (x — l)/2 on this domain. It is seen that (2:3-38)

the converse of a function F is again a function if and only if we have for any aq, a;2 e 36(F) that if F(aq) = F(x2) then xi = x2 [equivalently, if aq ^ x2 then F(aq) ^ F(x2)].

2.3]

RELATIONS AND FUNCTIONS

53

If a function F has as its converse a function, then F is called the inverse of F, and is denoted in this case by F~l] also F is said to be a one-to-one or bi-unique function or correspondence [from 2D(G0 onto (R(Cr)].

(2:3-39)

If F is a one-to-one function then 3D(F~:) = (R(F) and (R(F—1) = 2D(F); for each a E 2D(F), F~1(F(a)) = a and for each b E 3D(F~1), F(F~1(6)) = b.

For F~1(b) is the unique x such that (6, x) E F, i.e., such that F(x) = b; hence F~ 1(F(a)) is the unique x such that F(x) — F(a), i.e., is a itself. The notion of one-to-one function applies directly to functions of more than one argument, for then we merely view the domain as a set of ordered pairs, triples, etc. For example, the function G(x, y) = x -p y with domain

P X P is not one-to-one, while the function H(x, y) = 2X • 3?y is. (Why?) The composition F; G of two functions F, G is the set of all ordered pairs (x, y) such that for some z, (x, z) E F and (z, y) E G. Given x, y, if there is any such 2 then x E 2D (F) and 2 must be F(x); then (2, y) E G implies 2 e 2D (GO, i.e., F{x) E 2D (G), and y = G{z), i.e., y = G{F(x)). Hence it is seen that

(2:3-40)

if F, G are functions then so also is H = F; G. The domain of H consists of all x such that F(x) E 2D (GO, and for each x E 3D (I/), H(x) = G(F(x)).

For example, if F(x) = x2 with domain Re and G(x) = 2x + 1 with domain I, then (F; G)(x) = 2x2 + 1 with domain I.

Exercise Group 2.3 1. (a) Show that for any (not necessarily distinct) elements a, b, c, d, {a, b} = {c, d} if and only if a = c and b = d or a = d and b = c. (b) Show that for any elements a, b, c, d, {{a}, {a, 6}} = {{c}, {c, d}} if and only if a = c and b = d. Thus we can define the ordered pair in set-theoretical terms by (a, b) = {{a}, {a, 6}}. (c) Show that the existence of the cartesian product A X B (2:3-8) can be proved from the existence of A U B on the basis of (2:1-39), (2:1-40), if we define ordered pair as in (b). 2. Find the domain and range of each of the following relations.

Give a

geometric interpretation in each case. (a) (b) (c) (d) (e) (f)

{(x, {(z, {(x, {(x, {(x, {(x,

y):x, y y): x, y y): x, y y):x, y y, z): x, y, z): x,

E Re and x2 + Ay2 < 1} E Re and y2 = x) E Re and y = x2} G Re and y2 = x2} y, z E Re and x — 2y + 2 = 3} y, z E Re and x2 + V2 + z2 < 1 and x + y = 1}

54

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

3. Classify each of the following relations according to whether they do or do not have the properties of being reflexive, irreflexive, symmetric, antisymmetric, transitive, an equivalence relation. Give your reasons. (a) {{x, y): x, y £ I and x < y + 1} (b) {(x, y): x, y G I and x2 = y2} (c) {(x, y): x, y £ I and \x\ < \y\) (d) {(x, y): x, y £ I and x, y are both even and x < y or x, y are both odd and x < y or x is even and y is odd} (e) {(X, Y) :ICI, FCI and X f| Y = 0} 4. Show that the following are true for any relations U, V, IF; here W represents the converse of a relation W. (a) Vj (V; W) = (U; V); W (b)

'u

=

u

(c) (U; F) = V; G (d) F; (F U IF) = (t/; F) U (U; W) Show that in general t/; (F fl IF) ^ ([/; F) D (U; IF). 5. Show that if lb is a relation with 3D (IF) = S then IF is an equivalence relation in S if and only if IF = IF and IF; IF c IF. 6. Find the composite functions F; G and G; F in each of the following cases,

and find the domain in each case. (a) F{x) = x2 - 9 (xe Re), G{x) = Vx(xG Re, x > 0) (b) F{x) = {l/x){x G Re, x ^ 0), G = F (c) F{x) = x2 (x£Re), G(x) =



lf

-

x

— 1

[1 if x < —1 or 1 < x

7. Which of the following functions is one-to-one? In each such case describe the inverse function and find its domain. (a) Fix) = x2 + 1, x G Re (b) Fix) = x3, x G Re (c) F(z) = \/{x2 +1), x G Re, x > 0 (d) Fix, y) = 2X ■ y, x, y G Re 8. For x, y G I, let (z, y) £ IF if x — y is a multiple of 4; write x s= v if

Or, 2/) G IF. (a) Show that = is an equivalence relation with four equivalence sets [0], [1], [2], [3]. (b) Define operations ©, » on equivalence sets so that [a] © [6] = [a + b], [a] o [6] = [a • b]. Make a table for each of these operations. Which of the following are true and which false, for the class M of equivalence sets? Prove your statements. (i) For each X G M, X © [0] = X; (ii) For each A" G M, X » [I] = A"; (iii) For each A" £ M there is a Y £ M with X © Y = [0]; (iv) For each X £ M, if X ^ [0] there is a Y £ M with A" ° Y = [1] • (v) If X, Y, Z £ M, X ^ [0] and X ° Y = X ° Z then Y = Z • M If X, Y,Ze M then X «. (F © Z) = (X „ Y) © (X « Z).’

2.4]

55

MATHEMATICAL SYSTEMS OF RELATIONS AND FUNCTIONS

2.4 Mathematical systems of relations and functions. In this book we will be studying the properties of various sets of numbers S with respect to various operations F x, F2, ... on F and relations W x, W2, ... in S. Further, we often single out some particular elements ax, a2, . . . of S when these have some special properties which distinguish them from other elements. Different such systems of functions, relations, and distinguished elements may have closely related properties. For example, the set of real numbers Re, under addition +, with distinguished element 0, is very similar to the set of positive real numbers Re', under multiplication • , with distinguished element 1. For example, x + y = y + x for all x, y e Re and x ■ y = y ■ x for all x, y e Re'; x + 0 = x for all x G Re, and x ■ 1 = x for all x e Re'; for each x e Re there is a y £ Re with x + y = 0 (namely, —x) and for each x e Re' there is a y e Re' with x ■ y = 1 (namely, 1/x); and so on. It seems reasonable, therefore, to speak of the properties of the systems (ordered triples) (Re, +, 0) and (Re', •, 1). We are thus led to the following general notion: (2:4-1)

a mathematical or algebraic system is an ordered (k + l + m + 1 )-tuple (S, Fi, F2,

. . . , Fk,

Wx, W2,

■■■,

...,

Wh ax, a2,

am)

in which F1; F 2l . . . ,F k are operations on S under which S is closed, Wx, W 2, . . . ,Wi are relations in S, and ax, a2, . are certain specified elements of S. Isomorphism. From the algebraic point of view, the particular way in which the elements of S and the operations, relations, etc., on S are defined are not as important as the properties of S under these operations and relations. Thus we would say we are dealing with essentially the same system if we have another system (S', F[, F'2, ... , F'k, W[, W'2, ... W[, a'x, a2 ... , a'm) with exactly the same properties. To express this more precisely, we must first limit the kinds of systems to be compared. (2:4-2)

Two mathematical systems

(S, F1} F2, . . . , Fk, Wx, W2,

Wh ax, a2, . . . , am)

and

(S', F[, F'2, ... , Fr, W[, W2, . . ■ , W[', a'x,

, ... , a'm,)

02

are said to be of the same type if the following conditions hold: (i) k = k’, l = V, and m — m'; (ii) corresponding functions Ft- and Ft have the same number of arguments, so that, e.g., if F{ is unary on S, 3D(Ff) = S,

5G

THE SET-THEORETICAL BACKGROUND

[CHAP.

2

then F'i is unary on S', SD(F') = S', and if Ft is binary on S, £)(Fi) = S X S, then F'i is binary on S', ■D(Fi) = S' X S'; (iii)

corresponding relations Wi and W( apply to the same number of arguments, so that, e.g., if Wt is binary in S, Wi c S X S, then Wi is binary in S', W' c S' X S'.

As examples, (Re, (H) = P, (Pi(H) c; P. Then there is a unique function F satisfying the following conditions:

Theorem.

(i) 3D(F) = P X P and (R(F) c P; (ii) for any x E P, F(x, 1) = H(x); (iii) .for any x, y E P, F(x, Sc(y)) = G(x, F(x, y)). Proof. For any x

E

P put

(1)

cx = H{x),

(2)

Gx{z) = G(x, z),

i.e., Gx is a function with (3)

5>(GX) = P,

(R(GX) c P.

By 3.4 it follows that for each x E P there is a unique function, which we denote by Fx, satisfying the following conditions: (4)

(i) (ii) (iii)

3XFx) = P, 0, by the conditions: Definition.

n

n

and

JJ Xk = 1 k=m

and if m < n.

if n < m;

4.4]

THE

125

INTEGERS AS AN INDEXING SYSTEM

The condition (i) can be viewed as defining the sum and product of the empty sequence. 1 he choice of values 0 and 1 in (i) is arbitrary, so far as producing a well-defined notion is concerned, but is not arbitrary if we wish these special sums and products to share the general properties of usual sums and products. We have, for example,

2= (2 **) + *1 \k=1 )

= 0 +

xi

= Xi,

fc=l

and similarly i

n

Xk = i-xi

=

Xi,

k=l

so that the conditions of 4.26 accord with 3.42(i) and 3.43(i) in this case. The condition (ii) of the above definition is justified by recursive definition on Pm for each m E I, m > 0. Further, inductive proof on Pm easily serves to establish the following.

,

4.27 Theorem. Let (xo,. . ., Xk • ■. ) be an infinite sequence of elements of D, and let n, m, q G I, m > 0, q > 0. Then we have n

n

(i) ^2 xk and JJ Xk, as defined in 4.26, fc=l

S: = 1

accord with the values given by 3.42 and 3.43;

(ii)

n

n —m-\-q

^ y k=m

^ "] k=q

n

(iii) 2-

%k-\-m—q)

n

Xk = X) (z'Xk) for any z E D. k=m

k=m

4.27(iii) follows immediately from (ii) for q = 1 and the general dis¬ tributive law 3.46. The condition 4.27(ii) allows us to choose the initial value of k at our convenience. More general associative and commutative laws. We shall now formulate general associative and commutative laws. In the first of these we consider any nonnegative integers n\, n2, . . . , nt and set n = nx + n2 + • • • + nt. Given x\, . . . , xn, we can group these as: X\, ■ • • , Xn^\

I; • ■ • > Xni+n2>

Xn1-{-n2-\-lf ■ ■ ■ ,

+«24-n3; • • •

Here, if n4- = 0, the corresponding subsequence is regarded as being empty.

THE INTEGERS AND INTEGRAL DOMAINS

126

4.28

[CHAP.

4

Let (xx, . . . , xk, . . .) be an infinite sequence of elements of D. Let (wi, . . . , nt) be a sequence of nonnegative integers. For each i = 1, . . . , t, let mi = £}=i nj, and let n = mt + nt. Then Theorem.

n

X

k=1

t Xk



/ ni

X ( X Xmi+k

1=1 \fc=l

The same holds if we replace £ by II (but £, + remain unchanged). The reader should convince himself that this is the precise formulation of the desired result. The proof is left to him. For a formulation of a general commutative law we need the notion of a change in the order of factors. This is provided by the following. 4.29 Definition. A function F is said to be a permutation of a set S if it is one-to-one and 2D (F) = (R(F) = S. Thus the sequence (x3, xlt x2, xfi) is obtained by a permutation F from the sequence (xi, x2, x3, xfi) where F{ 1) = 3, F(2) = 1, F(3) = 2, F(4) = 4. 4.30

Let (x\, . . . , Xk, ■ ■ ■) be an infinite sequence of elements Then for any n £ P and any permutation F of {1, . . . , n) we

Theorem.

of D. have

n n X! xk = %F(ky k—1 k—1 The same holds if we replace £ by II. Proof. The proof is by induction on n G P. For n = 1 it is trivial. Suppose that it is true for n. Let Gbea permutation of {1, . . . , n + 1}. Then G(m) = n + 1 for a unique m, l < m < n -\- l. Then

= 4”A.

Further, we have the following recursive relationship: (4:4-10)

c[n+1) = c[n) + cin2x,

for

1 < k < n.

For consider a set {ax, . . . , a„, an+x} with n + 1 distinct elements. Consider a subset X of this set, where X has exactly k elements. Either an_|_x G X or not. The number of subsets X for which an_|_x £ X is cj^. The number of subsets X for which an +i e X is the same as the number of (k — 1) element subsets Y = X — {a„+x} of (ax, . . . , an}, hence is c£>x. (4:4-10) corresponds to the familiar Pascal triangle

(4:4-11)

1 1 1 2 1 13 3 1 1 4 6 4 1

4.4]

THE INTEGERS AS AN INDEXING SYSTEM

129

where each number in each row, other than the first and last 1, is obtained as the sum of the two closest numbers in the row directly above it. Thus (4:4-9) and (4:4-10), as reflected in the Pascal scheme (4:4-11), provide a simple recursive calculation procedure by which we can obtain any c*re). There is another approach to the calculation of the c£n) which involves counting permutations. The number of permutations of a set consisting ot n distinct elements is the same as the number of permutations of {1, . . . , n}. Every function on this set is alternatively described by 3.40, as a sequence (blt . . . , bn). The condition that such a sequence then be a permutation can simply be written as {bx, ... ,bn] = {1, ... ,n}. The number of such sequences is determined as follows: bx can be chosen in n different ways; once bi is fixed, b2 can be chosen in n — 1 different ways, . . . ; once 61? 62, ... , bn_2 have been chosen, 6n—1 can be chosen in two ways, and bn is then completely determined. Thus the number of distinct permutations 0, or equivalently (by Exercise 3 of Exercise Group 4.3), for a e P. Given b > 0, we prove by induction on a G P that (1)

there exist q, r E. I with a = bq + r and 0 < r < b.

This is true for a = 1. For if b = 1 we have l = 6- l+ 0;ifl 0, and hence q' — q > 1. It follows that the right-hand side of (4) is > b. On the other hand, the left-hand side of (4) is < r < 6, giving a contradiction. Similarly, if r' > r we obtain a contradiction. Thus r = r' and b(q' ~ q) = 0. But since wTe have an integral domain and 6^0, we must have q' — q = 0, i.e., q = q'. A more general theorem can be obtained when we change the restriction

b > 0 to the restriction 6^0. This is left as an exercise. The result 4.37 is often referred to as the division algorithm for the integers. Indeed, the following is one mechanical procedure which is suggested by it for computing q, r given a, b (b > 0). Compute the num¬ bers a — bx for x E I in the following order: a — b ■ 0, a — b ■ 1, a — b ■ (—1), a — 6 • 2, a — b ■ ( — 2), . . . , and with each computation compare a — bx to 0 and b. When we arrive at a q such that 0 < a — bq < b we have found the desired numbers. Moreover, Theorem 4.31 assures us that we will eventually find such a q in a finite number of steps. (This algorithm can be simplified once we know which of the cases b < a, — b < a < b, a < —b is the one that a satisfies.) It is an immediate corollary of 4.37, taking 6=2, that for every integer

a there exists q such that a = 2q or there exists q such that a = 2q + 1 ; in other words, we have the familiar fact that every integer is even or odd, and by uniqueness cannot be both.

The divisibility relation and the primes. The cases in which the remainder after division is 0 are of special interest. 4.38

Definition. Let a, b e I. We say that a is divisible by b, or that a is a multiple of b, or that b is a factor of a, if there exists q El such that a = bq. If this holds we write b\a. If a is not divisible by b we write b\a.

[CHAP.

THE INTEGERS AND INTEGRAL DOMAINS

134

4

We have the following consequences. 4.39 Theorem. Let a, b, c, aq, . . . , an E I. We have: (i) a|0; (ii) Oja if and only if a = 0; (iii) l|a; (iv) a|l if and only if a = ±1; (v) a|a; (vi) if c\b and b\a, then c|a; (vii) b\a and a\b if and only if a = =Lb; (viii) if c\a, then c\ab; (ix) if c\a and c\b, then c|(a T 6); (x) if 1 < i < n and c\ai, then c|XI*=i akProof. Cases (i)—(iii) are trivial. In (iv) and (vii) we use a = ±6 as an abbreviation for “a = b or a = —b.” (iv) is obvious in the “if ” direc¬ tion. Suppose that a|l; 1 = aq for q E I. Then 1 = |1| = |a| • |g|. We cannot have a = 0 or q = 0. Hence |a| > 0 and |g| > 0 and |a| • |g| > |a| > 1. Thus |a| = 1. If a > 1, then |a| = a > 1; if a < — 1, then |a| = —a > 1. Hence a = ±1. (v) and (vi) are trivial. For (vii) the “if” part is again obvious. Suppose that b\a, a\b, that is, a = bq\ = (aq2)qi = a(q2q\). If a = 0, then b = 0 by (ii). If a ^ 0, we obtain by cancellation 1 = q2qi, hence gjjl and gq = ±1 by (iv). Then a = ±6. (viii) and (ix) are easily seen to hold, the last by distributivity. (x) follows from (viii) and the generalized associative and commutative laws 4.28 and 4.30. 4.40 Definition. Let pel. Then p is said to be a prime if p ^ 0, p ^ ±1 and for all a, if a\p, then a = ±1 or a = ±p. It can be mechanically checked that the first few positive primes are 3,

5,

7,

11,

13,

17,

19,

23,

29,

31,

37,

Often it is only the positive primes which are referred to as primes. How¬ ever, from the algebraic point of view the present definition is more natural (this will be brought out in our discussion, later in the book, of divisibility questions for polynomials). We shall prove below that there are infinitely many positive primes (and hence infinitely many negative ones). The following useful property of nonprimes is easily verified. 4.41

Lemma. If a e I, a > 1, and a is not prime, then there exist ax, a2 E I with 1 < ai < a, 1 < a2 < a, and a = ax ■ a2.

4.5]

MATHEMATICAL PROPERTIES OF THE INTEGERS

135

Given a number a > 1 which is not prime, one or both of these numbers °i> a2 could be prime. However if, say, ax is not prime it can be factored further as ax = a[ ■ a2 where both a[, a2 are greater than 1. We proceed similarly with a2, if a2 is not prime. By continuing this process we ex¬ pect that we will reach, in a finite number of steps, a representation a = pi • p2 • . . . • pn, where p1} p2, . . . , pn are all positive primes. It is conceivable that, by performing these computations in a different order we would reach a representation a = p{ ■ p2 ■ . . . • p'm where n ^ m or where n = m, but the representations are essentially different in that the sequence p[, p2, ... ,pL is not a permutation of the sequence plt p2) , pn. This will be shown not to be possible if we can demonstrate that every prime p has the following property: if p\bc, then p\b or p\c. For from Pi\(P\ • P2 ■ ■ ■ • • Pm) will then follow px\p[ or px\{p2 • . . . • Pm), hence Pi = Pi or pi\(p2 ■ . . . ■pm); by repeating this procedure, we would eventually conclude that pi = p[ or pi = p2 or • • • or px = p Then cancelling px from both sides of the equation px ■ p2 ■ . . . • pn = p{ ■ p2 ■ . . . ■ Pn and repeating the argument for p2, p3, . . . , we could eventually realize that the representations must be the same, except possibly for the order of the factors. It is the object of the next group of theorems to make these ideas precise, ending with a proof of the existence and uniqueness of such representations. The following development, while not the most direct to gain this end, is more informative and more readily generalizable to questions of divisibility in other systems. Greatest common divisors. 4.42

Let a, b, d G I. We call d a greatest common divisor (gcd) of a and b if d has the following properties:

Definition.

(i) d\a and d\b; (ii) if x G I and x\a and x\b, then x\d. It follows that if d\, d2 are both gcd’s of a and b, then dx\d2 and d2\di) hence di = ±d2 by 4.39(vii). Further, if dx is a gcd of a, b, then so is —d\. Evidently “greatest” here does not refer to magnitude in the usual sense. These considerations do not yet yield the existence of a gcd for any pair a, b. This is easily established in certain special cases. If a = 0, then any number divides a; since b\b this shows that b is a common divisor of a and b. Moreover, 6 is obviously a gcd in this case. A gcd also evidently exists in the case a = b. The nontrivial cases are given by the restriction a ^ 0, b 0, and a ^ b. For simplicity, let us consider first the case a > b > 0. The following argument provides a proof of existence of the gcd of a, b in this case and at the same time gives what is known as the Euclidean

136

THE INTEGERS AND INTEGRAL DOMAINS

algorithm for finding such a gcd. we write (4:5-3)

[CHAP.

4

It is based on the observation that if

a — bq + r,

0 < r < b,

then for any del (4:5-4)

d\a

and

d\b

if and only if

d\b

and

d\r.

Hence a gcd of a, b must at the same time be a gcd of b, r. Then the problem of finding the gcd of a, b is reduced to the presumably simpler problem of finding the gcd of 6 with the smaller number r. If r = 0, we have b as a gcd. Otherwise we can repeat the process with b, r. If we do not reach a zero remainder in, say, n steps we have the following situation: (4:5-5)

a = bq -f rx, b = r1ql + r2 O = r2q2 + r3 —2

^n

^n — lQn — 1

0 < rx < 6; 0 < r2 < rx; 0 < r3 < r2; 4“

0

rn

rn—j.

At the next stage we write (4:5-6)

rn_x = rnqn + rn+1,

0 < rn+1 < rn.

Then (4:5-7)

if rn+x = 0 we have: rn = (a gcd of rn_lt rn) — • • • = (a gcd of rx, r2) = (a gcd of b,rx) = (a gcd of a, b).

Note that the sequence of numbers, rx, r2, . . . , is uniquely determined by a, b. Is it possible that for no n do we have r„+1 = 0? If so, then the set A = {/■ i, r2, . . . , rn, . . .} would be a set of positive integers without a least element, contrary to the well-ordering of P. Hence we must even¬ tually reach an n for which (4:5-5) and (4:5-6) hold and rn+1 = 0, thus providing us with rn as the desired gcd by (4:5-7). This is Euclid’s algorithm. Suppose that we have (4:5—5) and (4:5—6) with rn_|_x = 0. We can write out the way in which rx, r2, . . . , rn are determined by a, b as follows: rx — a — bq, r2 = b — rxqx = b — (a — bq)qx = — a9i + b(l + qqx),

4.5] etc.

MATHEMATICAL PROPERTIES OF THE INTEGERS

137

Thus we have rx = axi -T byi, where xi = 1, yx = —g, r2 =

+ ^2/2, where x2 = —gi, 2/2=1 + gg1; etc. expressed

ax2

Suppose that we have

rk-2 — axk—2 + byk—2

and nt-i —

axk—i

+

byk—i,

where k > 3; then rk — Tk—2

xk — 1 Qk — i)

rk — lQk — 1 — a{Xk—2

+

b(yk— 2 — yk—iqk—i),

so we can also write r* = ax^ + fo/#, with suitable Xk, yk■ It follows that rn, which is the positive gcd of a, b, can be written in the form ax + by, for certain x, y £ I. Consider any number ax' + by' where x', y' are inte¬ gers. If c £ I and c\a and c\b, then c\{ax' + by') by 4.39(viii), (ix). In particular, rn\(ax' + by'). If ax' + by' is positive, this implies rn < ax' + by'. Hence the positive gcd of a, b is the least positive number of the form ax' + by'. This characterization suggests another proof of the existence of gcd’s which is slightly more sophisticated than the foregoing but is in certain respects more informative. We will make use of the following two distinctive properties of the set L of all linear combinations ax + by, where x, y 6 I: (4:5-8)

(a)

if u £ L and z £ I, then uz G L ;

(b)

if u, v e L, then u + v E L.

It is seen that if L' is any other set which satisfies (4:5-8)(a), (b) and we have a, b G L', then L c L'. Clearly a = a • 1 + b ■ 0 and b = a • 0 + b • 1 are in L. Hence L is the smallest set L' satisfying all these conditions. Now we can prove a theorem about any set S satisfying the conditions of (4:5-8) in place of L, which will give us the desired result about L as a special case. 4.43

Theorem.

Suppose that S c

I,

S ^ 0, and that S satisfies:

(i) if u E S and z e I, then uz £ S; (ii) if u, v £ S, then u + v £ S. Then either S = {0}, or there exists ad > 0 such that S = {dz : z £ 1} ; in the latter case d is uniquely determined. Proof. Suppose that S ^ {0}. Since S ^ 0 we can pick u £ S, u ^ 0. If u < 0, then — u = u(— 1) £ hence in any case S contains a positive

138

THE INTEGERS AND INTEGRAL DOMAINS

[CHAP.

4

integer. Let A = S fl P; thus A 0. By the well-ordering of P, A con¬ tains a least element, call it d. Then d has the following properties: (1)

d e S;

(2)

d > 0;

(3)

if u G S and u > 0, then u > d.

It follows immediately from (1) and the hypothesis (i) that (4)

{dz :

26l}c

5.

Suppose, to prove the reverse, that u G S. (4.31) we can write (5)

By the division algorithm

u = dz -\- r, where z, r G I and 0 < r < d.

Then r=u — dq=u-\-d - (—q). Since u, d G S it follows from (i), (ii) that also r G S. If r > 0, we would have r > d by (3). Thus we must have r = 0 and u = dz. Hence (6)

S c {dz : z G 1}.

Steps (4) and (6) together give the desired equality. Thus d G S and is a divisor of every element of S. Suppose that d' were any other number with this property for which d' > 0. Then d\d' and d'\d; hence d = ±d' by 4.39(vii). Clearly we cannot have d = —d', so d = d'. 4.44

Let o, b G I, a ^ 0 or b ^ 0. Then a and b have a unique positive greatest common divisor d. For suitable s, t G I we have d = as + bt.

Theorem.

Proof. Let S = (a.r + by : x, y G I}. Then S satisfies 4.43(i), (ii). Furthermore, a, b G S, so that S ^ 0, S ^ {0}. Choose d > 0 with S = {dz : 2 G I}. Since d G S, we must have d = as + bt for some s, t. Since a, b G S we have d a divisor of both a, 6. If c is any other divisor of both a, b, then c|(as + bt) by 4.39(viii), (ix), that is, c\d. Hence d is a gcd of a, b. As we have seen, any other gcd d' satisfies d' = ±d. Hence, if d' > 0, then d' = d. 4.45

We define (a, b) for all a, b G I with a ^ 0 or b ^ 0 to be the unique positive gcd of a, b. Here a and b are called relatively prime if (a, b) = 1. Definition.

There is a possibility of confusing this notation for the gcd with that for the ordered pair. Both these notations are in standard use. The context will always determine which meaning is intended.

4.5]

MATHEMATICAL PROPERTIES OF THE INTEGERS

139

If (a, b) = 1, then a, b have no divisors in common other than ±1. This leads to the following [where, as in our further work, the use of the symbol (a, b) will implicitly involve the assumption a ^ 0 or M 0], 4.46 Theorem. c\b.

Suppose that a, b, c e I.

If (a, c) = 1 and c|a6, then

Proof. Since (a, c) = 1 we have by 4.44 (1)

1 = as -\- ct

for some s, t £ I. Hence (2)

b = (ab)s + c(bt).

Since c\ab and c\c{bt), we have c\b. 4.47 Theorem. Suppose that p is a prime and a, b, cq, . . . , an £ I. Then: (i) if p\a, then (a, p) = 1; (ii) if p\ab, then p\a or p\b; (iii) if p\(ai • . . . • an), then p|a, for some i. We leave the proof of this to the reader. Although it is now easily obtained, 4.47(iii) establishes the important property of prime numbers which led us to the consideration of gcd’s. We also leave the following for the reader to prove. 4.48 Theorem. Suppose that a, b, c £ I. (ab)\c.

If (a, b) = 1, a\c and 6|c, then

Factorization of integers into primes. The ground is now almost prepared for us to prove a representation theorem for integers as products of primes. First, consider the question of proving the existence of at least one repre¬ sentation for each a > 1. Consider any such a. If a is a prime, we are through. Otherwise, a = ax ■ a2 for some a1( a2 where 1 < ax < a, 1 < a2 < a. If we assume that the result holds for all b < a, we can then conclude that it holds for a. This suggests a new type of inductive proof, slightly different from the kind that we have used so far, in that we do not just consider whether the inductive hypothesis holds for the element which immediately precedes a but whether it holds for all elements which precede a. The validity of this type of argument is ensured by the following general theorem. 4.49 Theorem. Suppose that (S, 2. Let A be the set of all a e P2 for which there exists a sequence (pi, . . . , pn) satisfying (i) which is unique up to order in the sense of (ii). To prove our theorem it suffices, by 4.49, to show that (1)

if {x : 2 < x < a} c A, then a e A.

Assume the hypothesis of (1). We first show that (i) is also true for a. If a is prime, this is immediate. Otherwise, we have, by 4.42, (2)

a = ai ■ a2

where

1 < ax < a,

1 < a2 < a.

By hypothesis, there exist two sequences, (pu . . . , pni) and (p[, ... , p„2), of positive primes such that

n

n2

'“l

(3)

ai

=

*;=i

Vk,

0,2



n

k=1

Vk-

4.5]

MATHEMATICAL PROPERTIES OF THE INTEGERS

141

Let n — nx + n2 and define pUl+k for 1 < k < n2 as p*. Then by 4.27(ii) and 4.28,

(n

\

»i

/

pa

k= 1

)

n

\

n

f

k=l

• ( n Vk) = n ?*. \fc=n1-pi

Now to prove (ii), suppose further that

(5)

n

a

k= 1 where (qx, . . . , qm) are primes. We write

(6)

a

Pn-

Hence pn\ \Yk=i Qk- Then by 4.47 there is an i, 1 < i < m, such that PnWi- But since 1 < pn, this can only happen when pn = (p. Let G be any permutation of {l, ... ,m} such that G(m) = i. Then by the gen¬ eralized commutative law 4.30

(7)

a =

ii ?* = n uv

rrc

A:=l

fc=l

(if item)• a

?G(fc)

Pn-

From (6) and (7) we obtain TO — 1

n— 1

n

(8)

n

vk

k=i

90(*i)

by cancellation. Now it may happen that the quantity in (8), which we shall now call ai, is 1. This can only happen if a = p\ = q\, and (ii) is trivially established in this case. Otherwise, 2 < aq, and we also see from a = ai ■ pn and 2 < pn that ax < a. Hence we. can apply our inductive hypothesis to ax. By (ii) for a\, n — 1 = m — 1 and for a certain permutation H of (1, ... ,n — 1},

(9)

QG(j)

=

VHP)

for all

j < n

— 1.

Define (10)

F(k)

(H(j) (n

if if

k 9^ i and k = G(j) k — i.

By these conditions F is well determined; for each k < n (=m), k ^ i, has k = G(j) for a unique j < n by the fact that G is a permutation of (1, . . . , n} and G(n) = i. Further, F is a permutation of {1, . . . , n). For suppose that F{k{) = F(k2) where ki, k2 are 1, list all positive primes 2, 3, 5, . . . which are 1, we repeat this process, to obtain a2 = p2 ■ a . Then the sequence ax, a2, . . . of numbers thus obtained must eventually reach 1, since when¬ ever ak > 1, we have ak = Pk ■ ak+i and hence ak > ak+1. When an > 1 and an+x = 1 we have a — px ■ . . . ■ pn. 4.52 Theorem.

There exist infinitely many positive primes.

Proof. We shall show by induction on n e P that there exist at least n distinct positive primes. This is obvious for n = 1. Suppose that it is true for n. Let qx, . . . , qn be n distinct positive primes. Let a = (qi • • • ■ • qn) + L By 4.50, a has a representation as a product of positive primes. Hence there is at least one positive prime p such that p\a. Then p is distinct from each of qx, ... , qn. For otherwise if p = qk for some p\(q 1 • • • ■ • qn), and hence p\(a — (qx • . . . • qn))] but then p|l, so that P = L By the definition of prime, this is impossible. Thus if we take qn+\ — P, we have at least n + 1 distinct positive primes qx, . . . , qn+x.

4.5]

MATHEMATICAL PROPERTIES OF THE INTEGERS

143

Since our proposition is true for every n £ P, there cannot be finitely many primes. Since the division algorithm provides us with an algorithm for finding all divisors 6 of a number a, we have an algorithm for deciding whether any given number is a prime. Thus, by successively listing all numbers 2, 3, 4, 5, 6, . . . up to any given point, we can effectively list all prime num¬ bers up to that point. It is not simple, however, to determine in advance how far one must go in order to find a given number of primes. The answer to this is known, but only as an approximation statement, as the prime number theorem, one of the most celebrated mathematical results of the last century. Positional notations for integers. We now return to one of the questions which motivated the introduction of integers (at least of 0), the possibility of the positional decimal representation of positive integers. For example, we write 2037 as an abbreviation for the number 2 • 102 + 0 • 102 3 ■ 10 + 7. For the sake of uniformity we can also write here 3 • 10 = 3 • 101 and 7 = 7 • 1 = 7 • 10°. We generalize the desired result of decimal representation to repre¬ sentation in terms of an arbitrary base b > 1. Note that coefficients in representation to base 10 are the numbers 0, 1, 2, . . . , 9. Similarly, coefficients in representations to base b will be 0, l, ... ,b — 1. 4.53

Theorem.

unique n £ such that:

Suppose that a e P, 6 £ I, 6 > 1. Then there exists a I, n > 0, and a unique sequence (c0, . . . , cn) of integers

(i) a =

ckbn~k ; k=0

(ii) for each k with 0 < k < n we have 0 < ck < 6; (hi) c0 5^ 0. Proof. We shall prove the existence and unicity of such a representation simultaneously by course-of-values induction on P (4.49). Thus suppose that a £ P and that every ax £ P with aq < a has a unique representation of the desired form. We consider two possibilities, a < b or a > b. If a < b, then a = a ■ 1 = a ■ b° = E*=o ckb°~k, where c0 = a [so that (ii), (iii) clearly hold]. Suppose also that a = E*=o ckbn~k. If n > 0, then a = Cobn + E*=i ckbn~k and, since each ckbn~k is nonnegative, a > cobn; but Co > 1, so a > bn = b ■ bn~l > b, contrary to a < b. Hence n = 0 and a = c'0b° = c'0; thus Co = c0 and the representation is unique. Consider now the case a > b. Write (1)

a — qb

+ r

where

0
0 and we can write n' —1 /«' —1 \ a — c{b n + c'n’ — £ 4b1"'”' k=0 k=0 — 1)— fc



h

-p

cC-

4.5]

145

MATHEMATICAL PROPERTIES OF THE INTEGERS

Thus if we define (8)

m' — n' — 1,

d'k — ci,

for

0 < k < m',

and m'

(9)

5' = E fc=o

we have (10)

a = q’b + c'n’.

But 0 < c'n, < b, so that by the unicity condition of algorithm, (11)

a' = q

and

the division

crn’ = r.

Now we apply the inductive hypothesis of the uniqueness of representation (4) of q. It follows that m' = m and dk = dk for 0 < k < m. But then by (6) and (8) n' = m' + 1 = m + 1 = n, c'n, — ch — r = cn, and c'k == d'k = dk = Ck, for 0 < k < n — 1. Hence also the representation of a is unique, and the induction step is completed. By means of the result 4.33 for the geometric series, we can also add the following bit of information to the preceding. 4.54 Theorem. For a G P represented as in 4.53 we have bn < a < bn + 1 Furthermore, n is uniquely determined by these inequalities. Proof. Since c0 ^ 0 we clearly have bn < a. On the other hand, each Ck < b — 1, so a = k=0

c^n~k < Z (b - l)bn-k = (b - 1) hn~k k=0 k=0

= (b - 1) ^2 bk = bn+l - 1. k=0 For any other m with bm < a < bm+1, if say m < n, we would have m + 1 < n, hence bm+1 < bn < a, which is a contradiction. Similarly, we cannot have n < m. Hence n = m. We have given the representation theorem in the form which corre¬ sponds to our usual way of denoting numbers, with descending powers of b. Using the commutative law, we could equally well represent a as YJk=o Ckf>k, where 0 < Ck < b and cn ^ 0. From the representation of 4.53 we can deduce the familiar rules for adding and multiplying integers as learned in elementary arithmetic.

146

THE INTEGERS AND INTEGRAL DOMAINS

[CHAP.

4

Given ai, a2 to be added, say ni

n2

«i = X ci^bni~k k=0

and

a2 = X c2ikbn2~k, fc=0

where n2 > nx, put n = w2 and write n—A:

«i = X

k—0

dl’kb

and

«2 = X rf2,fcfrn fc, k—0

where di.fc = 0 d\ ,fc-)-(n—nj) — CX)& d2,k = c2,fc

if

0 < k < n — n\,

ii

0


m \y\m —

(ii) [%]m 'm [l/]m

' U\m-

We also write x + y(mod m) for [x -j- y]m and x • ?/(mod m) for [x



y\m-

For m — 4 we have the four congruence classes, [0]4, [1]4, [2]4, [3]4, which we also (as is common practice, but with slight danger of am¬ biguity) denote by 0, 1, 2, 3. We then have the following tables for + and • (mod 4). 0

1

0

0

1

1

1

2

2

2

3

3

3

0

+

3



2

3

3

0

0

1

2

0

1

2

3

0

0

0

0

0

1

0

1

2

3

1

2

0

2

0

2

2

3

0

3

2

1

Precisely written, we have, for example, [2]4 +4 [3]4 = [1]4,

[2]4

"4

[3]4 = [2]4.

Note that in this last we have [2]4 • [3]4 = [2]4 • [1]4, but [3]4 5^ [1]4; i.e., the cancellation law for multiplication does not hold in this system, hence it is not an integral domain. 4.00 Theorem. Let m e I, m > 1. Then (ITO, +m, -TO, [0]m, [1]TO) is a commutative ring with unity. It is an integral domain if and only if m is a prime. Proof. The first part is immediate from 4.55 and the fact that [0}m [1 ]m (otherwise 1 = 0 (mod m), that is, m\\). Suppose that m is prime and that [x]m -m [y]m = [0]m, i.e., by 4.59, [x • y\m = [0]TO, so x ■ y = 0 (mod m). Then m\(x • y). By 4.47(h), m\x or m\y, hence x = 0 (mod m) or y = 0 (modm), so [x]m = [0]m or [y]m = [0]w. Conversely, suppose that the system is an integral domain. If m is not prime, we can write m = a-b where 0 0 and an 5* 0, and m = 0 or m > 0 and bm ^ 0, then n = m and ai = bi for 0 < i < n.

In particular we should have n

(5:1-2)

if Y a£ = 0 then ai = °for ® < i < ni=0

For if ai 0 for some i, consider the largest such i < n; call it k. Thus ai = 0 for k < i < n. If k = 0, presuming (as is natural) £° = 1, we have a0 • 1 = 0 also. If k > 0, we apply (5:1-1) with k instead of n and m = 0, b0 = 0, giving a contradiction. Conversely, it can be seen that (5:1-2) implies (5:1-1). For example, if n > m, we define bi = 0, for m < i < n (if necessary). Then £“= o ai¥ = £?= o bi£l, hence (ai — bi)? = 0. Application of (5:1-2) then gives the desired result. Of course, all these manipulations with polynomial forms implicitly

160

POLYNOMIALS

[CHAP.

5

involve assumptions about operations +, •, —, which can be defined on them, extending the operations on D (forms of degree zero). In other words, what we should expect is that we are dealing with an integral domain. More explicitly, consider the following: 5.2

Suppose that (E, +, •, 0, 1) is an integral domain and £ G E. Then E is said to he a simple extension of D by £, in symbols E = D[£], if the following conditions are satisfied: Definition.

(i) D forms a subdomain of E; (ii) for each 6 E there are elements a0, . . . , an E D with ri = r?=o E is said to be a simple transcendental extension by £ if, in addition, (iii) whenever o0, . . . , an e D and £”=o «;£* = 0, then ai = 0 for 0 < i < n. (The reason for using the term “transcendental” will be explained in Chapter 7.) Existence and uniqueness of simple transcendental extensions. Our goal, for the proper use of polynomial forms, will be realized if we prove that any domain D has at least one simple transcendental extension, and that such is unique up to isomorphism. We first consider, however, some properties that apply to any simple extension. To make certain manip¬ ulations formally easier, we use the following: 5.3

Definition. Suppose that E = D[£] is a simple extension of D. Suppose that {a$, . . . , a,, . . .) is an infinite sequence of elements of D for which there is an n > 0 with a{ = 0 for all i > n. We shall call such a sequence essentially finite. With any such sequence we associate the element £°°=0 a;£\ defined to be £”=o a*£h

It is easily seen that this definition of £*=o n. 5.4

Lemma. Suppose that E = D[£] is a simple extension of D. Suppose that (a0, . . . , ai, . . .) and (bQ, are two essentially finite sequences of elements of D. Then there are essentially finite sequences (c0, . . . , C{, . . .), (d0, . . . , di, . . .) of elements of D for which: QQ

(i)

oo

£ ad’ + 2 i=0 i=0

oo

=

Ti Ci¥, givenby a =

+ 6f for alii, and

i=0

(n) / f] a^l\ • ( £ bi£z\ = £ d£, given by di = ay6f_,-. \i=0 / \i=0 / i=0 ?=o

5.1]

161

POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS

Proof. (i) For some k, l we have a; = 0 for i > k, hi = 0 for i > l. Let n be the maximum of k and l. Then

Z «-f‘ =

;=o

t * £ ^ i =

i—0

i—0

i=0

and

£

=

2=0

2=0

£ (a# + biP), i=0 £ (CLi

+

hi) if

i=0

by 3.45(i),

= J]

Cit =

i=0

£

c£,

i=0

if we define Ci as above. The proof of (ii) is left as an exercise to the student. Sometimes it is also convenient to think of the formula for di as being given by di — ^,j+k=i afik- If the reader tries to write down corresponding laws for the sum and product of £™=o aip, 1lT=o hip he will see the economy of the above formulation. However, for purposes of specific computations it is, of course, preferable to use these more usual forms. For example, we have (5:1-3) where, in particular, (5:1—4)

dn^.m = anbm.

In fact, this follows from 5.4(ii). We extend the given finite sequences (a0, . . . , an), (b0, . . . , bm) to essentially infinite sequences by making ca = 0 for i > n, bi = 0 for i > m. Then we see that if i > n + m then di = £y=o ay&.w = 0, for if 0 < j < i, either n < j or m < i — j, so that in any case afii_j = 0. On the other hand, n+rre

dn+m = ^ , O'jb(n+m)—j = anbm, 3=0

since if 0 < j < n then m < (n + m) — j, so that bn+m _y = 0, while if n < j < n + m, we have dj = 0. 5.5 Lemma. Suppose that E = D[£] is a simple transcendental extension of D. Suppose that (a0, . . . , Gq, . . .), (60, . . . , 6», • • •) are two essen¬ tially finite sequences of elements of D. Then oo

co

2] aiP = hiP if and only if di = hi for all i. i=0 i=0

1G2

[CHAP.

POLYNOMIALS

5

Proof. If the two sums are equal, then £”=0 ai£l + (—1) £*=0 = 0. Applying 5.4(h) to this special case shows that (—1) £0=0 ©) F = £*=o (—bi) £\ Then by 5.4(i) (a* — 6;)!1 — 0; furthermore, there is an n with a* — bi = 0 and hence eq = bi for all i > n. It follows that H"=o (eq — bi) F — 2Zi°=o («i — h) F = 0. Then by our defini¬ tion 5.3 of transcendental extension, also (a* — bf) = 0 for all i < n. Hence oq = bi for all i. 5.6 Theorem. Suppose that E = D[£], E' = D[£'] are two simple tran¬ scendental extensions of D. Then E = E'. We can choose the isomorphic mapping F so that F(a) = a for each a e D and F(£) = £'. Proof. The domains (E, +, •, 0, 1) and (E', ©, °, 0, 1) both contain D and have operations agreeing with those on D when applied to elements of D. However, they need not agree in any respect otherwise. We shall write for sums in E'. For each i/eEwe can find an essentially finite sequence (a0, . . . , a*, . . .) of elements of D with

v =

(1)

eqf\ i=0

By 5.5, the sequence of a; is uniquely determined by tj. Define F by

F(v) = g diur.

(2)

i=0

Then is is clear that (3)

34(F) = E, (R(F) = E', and F is one-to-one,

the last again by 5.5. Suppose that v, f e E, g given as in (1), and f = E?=o h?. Then (4)

F(v + f) = F(v) © F(f),

F(v • f) = F(v) o F(r),

because the same rules 5.4(i), (ii) for calculating +, • on infinite sums hold in E as hold for ©, ° in E'. For example, in the first case we have

f(v + n = F^p CiA, 00

where a = cq + bp, thus F(r, + f) = g c;(£')\ where a = at © bi} i=0

since ai, bi e D, and finally

f(v + n = E a^y © E 6i(€T. i—0

i=0

5.1]

POLYNOMIAL FUNCTIONS AND POLYNOMIAL FORMS

163

If a e D we have a = £“°=0 ai¥ where a0 = a, ai = 0 for i > 0, hence (5)

F(a) = a,

in particular (6)

F(0) = 0,

Finally, we have £ =

o

F(l) = 1.

where a\ = 1, a» = 0 for i ^ 1, so

(7)

F(£) = £'.

Thus our theorem is proved. It can be seen that the conditions that F be an isomorphism satisfying (5) and (7) in fact uniquely determine F{rf) for all 7? e E to be given as in (2). Having the unicity of simple transcendental extensions up to ==, the only thing we need prove now, in order to make these play the same role as symbolic polynomial expressions, is an existence theorem. The answer to the question as to what should serve as the elements of such an ex¬ tension is suggested immediately by 5.5: ordinary infinite sequences (a0, .. . ,dj,...), (60, are objects such that if (a0, . . . , ai} . . .) = (b0, . . . , bi, . . .) then a4- = bi for all i. A definition of sum and product of two such sequences is simply obtained by imitating 5.4(i), (ii). Finally, we can identify each a e D with the sequence (a, 0, 0, . . . , 0, . . .) and £ with the sequence (0, 1, 0, . . . , 0, . . .). 5.7 Theorem. For each integral domain D there exists a simple tran¬ scendental extension E = D[£], Proof. We first construct directly a domain (E, +, *, 0, 1) which is ji simple transcendental extension by a certain element £ of a domain D isomorphic to D. We define (1)

E = the set of all essentially finite sequences (a0, ...,«»,•••) of elements of D.

For (a0, (2)

.), (b0, (o0, d —

(.3)

G E we put + (b0, ...,b{,...)= (c0, ...

where

+ bi for all i,

(a0, ...,ai,...): (b0, . . ., 6f, . . .) = (d0, . . . , dh . . .) i

di — ^ ajbi-j 3=0

for each i.

where

164

POLYNOMIALS

[chap.

5

That E is closed under + is obvious, and that it is closed under 7 is easily verified by the same argument as given for (5:1-3) following 5.4. We define a function F which will map D isomorphically into E by (4)

F(a) = (a, 0, 0, D =