Algebra and Number Theory: A Selection of Highlights 9783110516142, 9783110515848

This two-volume set collects and presents some fundamentals of mathematics in an entertaining and performing manner. The

431 94 2MB

English Pages 342 Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Algebra and Number Theory: A Selection of Highlights 9783110516142, 9783110515848

This two-volume set collects and presents some fundamentals of mathematics in an entertaining and performing manner. The

248 63 13MB Read more

Algebra and Number Theory: A Selection of Highlights [2nd, revised Edition] 9783110790283, 9783110789980

This second edition gives a thorough introduction to the vast field of Abstract Algebra with a focus on its rich applica

168 82 49MB Read more

Algebra and Number Theory: A Selection of Highlights [2nd, revised Edition] 9783110790283, 9783110789980

This second edition gives a thorough introduction to the vast field of Abstract Algebra with a focus on its rich applica

181 25 7MB Read more

Geometry and Discrete Mathematics: A Selection of Highlights 311074077X, 9783110740776

Fundamentals of mathematics are presented in the two-volume set in an exciting and pedagogically sound way. The present

213 76 2MB Read more

Number Theory: An Introduction to Algebra

1,520 213 115MB Read more

Number Theory, An Introduction to Algebra 0818500069

Table of contents : Sem nome......Page 44

917 115 3MB Read more

Number Theory: An Introduction to Algebra

1,688 307 11MB Read more

Geometry and Discrete Mathematics A Selection of Highlights 9783110521450, 9783110521504, 9783110521535, 2018030914, 2018039651

This two-volume set collects and presents many fundamentals of mathematics in an enjoyable and elaborating fashion. The

344 125 4MB Read more

Geometry and Discrete Mathematics: A Selection of Highlights [2 ed.] 9783110740783, 9783110740776

In the two-volume set ‘A Selection of Highlights’ we present basics of mathematics in an exciting and pedagogically soun

165 11 35MB Read more

Number Theory, An Introduction to Algebra 0-8185-0006-9

621 116 3MB Read more

Algebra and Number Theory: A Selection of Highlights
9783110516142, 9783110515848

Author / Uploaded
Benjamin Fine
Anthony Gaglione
Anja Moldenhauer
Gerhard Rosenberger
Dennis Spellman

Table of contents :
Preface
Contents
1. The natural, integral and rational numbers
2. Division and factorization in the integers
3. Modular arithmetic
4. Exceptional numbers
5. Pythagorean triples and sums of squares
6. Polynomials and unique factorization
7. Field extensions and splitting fields
8. Permutations and symmetric polynomials
9. Real numbers
10. The complex numbers, the Fundamental Theorem of Algebra and polynomial equations
11. Quadratic number fields and Pell’s equation
12. Transcendental numbers and the numbers e and π
13. Compass and straightedge constructions and the classical problems
14. Euclidean vector spaces
Bibliography
Index

Citation preview

Benjamin Fine, Anthony Gaglione, Anja Moldenhauer, Gerhard Rosenberger, Dennis Spellman Algebra and Number Theory De Gruyter Textbook

Also of Interest Geometry and Discrete Mathematics. A Selection of Highlights Benjamin Fine, Anthony Gaglione, Anja Moldenhauer, Gerhard Rosenberger, Dennis Spellman, 2018 ISBN 978-3-11-052145-0, e-ISBN (PDF) 978-3-11-052150-4, e-ISBN (EPUB) 978-3-11-052153-5 Discrete Algebraic Methods. Arithmetic, Cryptography, Automata and Groups Volker Diekert, Manfred Kufleitner, Gerhard Rosenberger, Ulrich Hertrampf, 2016 ISBN 978-3-11-041332-8, e-ISBN (PDF) 978-3-11-041333-5, e-ISBN (EPUB) 978-3-11-041632-9 A Course in Mathematical Cryptography Gilbert Baumslag, Benjamin Fine, Martin Kreuzer, Gerhard Rosenberger, 2015 ISBN 978-3-11-037276-2, e-ISBN (PDF) 978-3-11-037277-9, e-ISBN (EPUB) 978-3-11-038616-5 Abstract Algebra. An Introduction with Applications Derek J. S. Robinson, 2015 ISBN 978-3-11-034086-0, e-ISBN (PDF) 978-3-11-034087-7, e-ISBN (EPUB) 978-3-11-038560-1 The Elementary Theory of Groups. A Guide through the Proofs of the Tarski Conjectures Benjamin Fine, Anthony Gaglione, Alexei Myasnikov, Gerhard Rosenberger, Dennis Spellman, 2014 ISBN 978-3-11-034199-7, e-ISBN (PDF) 978-3-11-034203-1, e-ISBN (EPUB) 978-3-11-038257-0 Abstract Algebra. Applications to Galois Theory, Algebraic Geometry and Cryptography Celine Carstensen, Benjamin Fine, Gerhard Rosenberger, 2011 ISBN 978-3-11-025008-4, e-ISBN (PDF) 978-3-11-025009-1

Benjamin Fine, Anthony Gaglione, Anja Moldenhauer, Gerhard Rosenberger, Dennis Spellman

Algebra and Number Theory |

A Selection of Highlights

Mathematics Subject Classification 2010 0001, 00A06, 1101, 1201 Authors Prof. Dr. Benjamin Fine Fairfield University Department of Mathematics 1073 North Benson Road Fairfield, CT 06430 USA

Prof. Dr. Gerhard Rosenberger University of Hamburg Department of Mathematics Bundesstr. 55 20146 Hamburg Germany

Prof. Dr. Anthony Gaglione United States Naval Academy Department of Mathematics 212 Blake Road Annapolis, MD 21401 USA

Prof. Dr. Dennis Spellman Temple University Department of Mathematics 1801 N Broad Street Philadelphia, PA 19122 USA

Dr. Anja Moldenhauer University of Hamburg Department of Mathematics Bundesstr. 55 20146 Hamburg Germany

ISBN 978-3-11-051584-8 e-ISBN (PDF) 978-3-11-051614-2 e-ISBN (EPUB) 978-3-11-051626-5 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter GmbH, Berlin/Boston Typesetting: VTeX UAB, Lithuania Printing and binding: CPI books GmbH, Leck Cover image: agsandrew / iStock / Getty Images Plus ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Preface To many students, as well as to many teachers, mathematics seems like a mundane discipline, filled with rules and algorithms and devoid of beauty and art. However to someone who truly digs deeply into mathematics this is quite far from the truth. The world of mathematics is populated with true gems; results that both astound and point to a unity in both the world and a seemingly chaotic subject. It is often that these gems and their surprising results are used to point to the existence of a force governing the universe; that is, they point to a higher power. Euler’s magic formula, eiπ + 1 = 0, which we go over and prove in this book is often cited as a proof of the existence of God. While to someone seeing this statement for the first time it might seem outlandish, however if one delves into how this result is generated naturally from such a disparate collection of numbers it does not seem so strange to attribute to it a certain mystical significance. Unfortunately most students of mathematics only see bits and pieces of this amazing discipline. In this book, which we call Algebra and Number Theory, we introduce and examine many of these exciting results. We planned this book to be used in courses for teachers and for the general mathematically interested so it is somewhat between a textbook and just a collection of results. We examine these mathematical gems and also their proofs, developing whatever mathematical results and techniques we need along the way. In Germany and the United States we see the book as a Masters Level Book for prospective teachers. With the increasing demand for education in the STEM subjects, there is the realization that to get better teaching in mathematics, the prospective teachers must both be more knowledgeable in mathematics and excited about the subject. The courses in teacher preparation do not touch many of these results that make the discipline so exciting. This book is intended to address this issue. The first volume is on Algebra and Number Theory. We touch on numbers and number systems, polynomials and polynomial equations, geometry and geometric constructions. These parts are somewhat independent so a professor can pick and choose the areas to concentrate on. Much more material is included than can be covered in a single course. We prove all relevant results that are not too technical or complicated to scare the students. We find that mathematics is also tied to its history so we include many historical comments. We try to introduce all that is necessary however we do presuppose certain subjects from school and undergraduate mathematics. These include basic knowledge in algebra, geometry and calculus as well as some knowledge of matrices and linear equations. Beyond these the book is self-contained. This first volume of two is called Algebra and Number Theory. There are fourteen chapters and we think we have introduced a very wide collection of results of the type that we have alluded to above. In Chapters 1–5 we look at highlights on the integers. We examine unique factorization and modular arithmetic and related ideas. We show how these become critical components of modern cryptography especially public key crypDOI 10.1515/9783110516142-201

VI | Preface tographic methods such as RSA. Three of the authors (Fine, Moldenhauer and Rosenberger) work partly as cryptographers so cryptography is mentioned and explained in several places. In Chapters 4 and 5 we look at exceptional classes of integers such as the Fibonacci numbers as well as the Fermat numbers, Mersenne numbers, perfect numbers and Pythagorean triples. We explain the golden section as well as expressing integers as sums of squares. In Chapters 6–8 we look at results involving polynomials and polynomial equations. We explain field extensions at an understandable level and then prove the insolvability of the quintic and beyond. The insolvability of the quintic in general is one of the important results of modern mathematics. In Chapters 9–12 we look at highlights from the real and complex numbers leading eventually to an explanation and proof of the Fundamental Theorem of Algebra. Along the way we consider the amazing properties of the numbers e and π and prove in detail that these two numbers are transcendent. Chapter 13 is concerned with the classical problem of geometric constructions and uses the material we developed on field extensions to prove the impossibility of certain constructions. Finally in Chapter 14 we look at Euclidean Vector Spaces. We give several geometric applications and look for instance at a secret sharing protocol using the closest vector theorem. We would like to thank the people who were involved in the preparation of the manuscript. Their dedicated participation in translating and proofreading are gratefully acknowledged. In particular, we have to mention Anja Rosenberger, Annika Schürenberg and the many students who have taken the respective courses in Dortmund, Fairfield and Hamburg. Those mathematical, stylistic, and orthographic errors that undoubtedly remain shall be charged to the authors. Last but not least, we thank de Gruyter for publishing our book. Benjamin Fine Anthony Gaglione Anja Moldenhauer Gerhard Rosenberger Dennis Spellman

Contents Preface | V 1 1.1 1.2 1.3 1.4 1.5

The natural, integral and rational numbers | 1 Number theory and axiomatic systems | 1 The natural numbers and induction | 1 The integers ℤ | 10 The rational numbers ℚ | 13 The absolute value in ℕ, ℤ and ℚ | 15

2 2.1 2.2 2.3 2.4 2.5

Division and factorization in the integers | 19 The Fundamental Theorem of Arithmetic | 19 The division algorithm and the greatest common divisor | 23 The Euclidean algorithm | 26 Least common multiples | 30 General gcd’s and lcm’s | 33

3 3.1 3.2 3.3 3.4 3.5

Modular arithmetic | 39 The ring of integers modulo n | 39 Units and the Euler φ-function | 43 RSA cryptosystem | 46 The Chinese Remainder Theorem | 47 Quadratic residues | 54

4 4.1 4.1.1 4.1.2 4.1.3 4.1.4 4.2 4.3

Exceptional numbers | 61 The Fibonacci numbers | 61 The golden rectangle | 67 Squares in semicircles | 68 Side length of a regular 10-gon | 69 Construction of the golden section α with compass and straightedge from a given a ∈ ℝ, a > 0 | 70 Perfect numbers and Mersenne numbers | 71 Fermat numbers | 78

5 5.1 5.2 5.3

Pythagorean triples and sums of squares | 83 The Pythagorean Theorem | 83 Classification of the Pythagorean triples | 85 Sum of squares | 89

VIII | Contents 6 6.1 6.2 6.3 6.3.1 6.3.2 6.4 6.5 6.5.1 6.5.2 6.5.3 6.6 6.6.1 6.6.2 6.6.3

Polynomials and unique factorization | 95 Polynomials over a ring | 95 Divisibility in rings | 98 The ring of polynomials over a field K | 100 The division algorithm for polynomials | 101 Zeros of polynomials | 103 Horner-Scheme | 108 The Euclidean algorithm and greatest common divisor of polynomials over fields | 112 The Euclidean algorithm for K[x] | 114 Unique factorization of polynomials in K[x] | 115 General unique factorization domains | 116 Polynomial interpolation and the Shamir secret sharing scheme | 117 Secret sharing | 117 Polynomial interpolation over a field K | 117 The Shamir secret sharing scheme | 121

7 7.1 7.2 7.3 7.3.1 7.4

Field extensions and splitting fields | 125 Fields, subfield and characteristic | 125 Field extensions | 126 Finite and algebraic field extensions | 131 Finite fields | 134 Splitting fields | 135

8 8.1 8.2 8.2.1 8.2.2 8.3

Permutations and symmetric polynomials | 141 Permutations | 141 Cycle decomposition of a permutation | 144 Conjugate elements in Sn | 147 Marshall Hall’s Theorem | 148 Symmetric polynomials | 151

9 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.7.1 9.7.2 9.7.3

Real numbers | 157 The real number system | 157 Decimal representation of real numbers | 168 Periodic decimal numbers and the rational number | 172 The uncountability of ℝ | 173 Continued fraction representation of real numbers | 175 Theorem of Dirichlet and Cauchy’s Inequality | 176 p-adic numbers | 178 Normed fields and Cauchy completions | 179 The p-adic fields | 180 The p-adic norm | 183

Contents | IX

9.7.4 9.7.5 10 10.1 10.2 10.2.1 10.2.2 10.2.3 10.2.4 10.3 10.3.1 10.3.2 10.4 10.5

The construction of ℚp | 184 Ostrowski’s theorem | 185 The complex numbers, the Fundamental Theorem of Algebra and polynomial equations | 189 The field ℂ of complex numbers | 189 The complex plane | 193 Geometric interpretation of complex operations | 196 Polar form and Euler’s identity | 197 Other constructions of ℂ | 201 The Gaussian integers | 201 The Fundamental Theorem of Algebra | 202 First proof of the Fundamental Theorem of Algebra | 204 Second proof of the Fundamental Theorem of Algebra | 207 Solving polynomial equations in terms of radicals | 209 Skew field extensions of ℂ and Frobenius’s Theorem | 220

11 Quadratic number fields and Pell’s equation | 227 11.1 Algebraic extensions of ℚ | 227 11.2 Algebraic and transcendental numbers | 228 11.3 Discriminant and norm | 230 11.4 Algebraic integers | 235 11.4.1 The ring of algebraic integers | 236 11.5 Integral bases | 238 11.6 Quadratic fields and quadratic integers | 240 12 Transcendental numbers and the numbers e and π | 249 12.1 The numbers e and π | 249 12.1.1 Calculation e of π | 251 12.2 The irrationality of e and π | 256 12.3 e and π throughout mathematics | 263 12.3.1 The normal distribution | 263 12.3.2 The Gamma Function and Stirling’s approximation | 264 12.3.3 The Wallis Product Formula | 266 12.4 Existence of a transcendental number | 270 12.5 The transcendence of e and π | 273 12.6 An amazing property of π and a connection to prime numbers | 282 13 13.1 13.2

Compass and straightedge constructions and the classical problems | 289 Historical remarks | 289 Geometric constructions | 289

X | Contents 13.3 13.3.1 13.3.2 13.3.3 13.3.4 14 14.1 14.2 14.3 14.4 14.5

Four classical construction problems | 296 Squaring the circle (problem of Anaxagoras 500–428 BC) | 296 The doubling of the cube or the problem from Deli | 296 The trisection of an angle | 297 Construction of a regular n-gon | 298 Euclidean vector spaces | 303 Length and angle | 303 Orthogonality and Applications in ℝ2 and ℝ3 | 309 Orthonormalization and closest vector | 317 Polynomial approximation | 321 Secret sharing scheme using the closest vector theorem | 323

Bibliography | 327 Index | 329

1 The natural, integral and rational numbers 1.1 Number theory and axiomatic systems Number theory begins as the study of the whole numbers or counting numbers. Formally the counting numbers 1, 2, … are called the natural numbers and denoted by ℕ. If we add to this the number zero, denoted by 0, and the negative whole numbers we get a more comprehensive system called the integers which we denote by ℤ. The focus of this book is on important and sometimes surprising results in number theory and then further results in algebra. Many results in number theory, as we shall see, seem like magic. In order to rigorously prove these results we place the whole theory in an axiomatic setting which we now explain. In mathematics, when developing a concept or a theory it is often not possible, all used terms, properties or claims to prove, especially existence of some mathematical fundamentals. One can solve this problem then by an axiomatic approach. The basis of a theory then is a system of axioms: – Certain objects and certain properties of these objects are taken as given and accepted. – A selection of statements (the axioms) are considered by definition as true and evident. A theorem in the theory then is a true statement, whose truth can be proved from the axioms with help of true implications. A system of axioms is consistent if one can not prove a statement of the form “A and not A”. The verification is in individual cases often a complicated or even an unsolvable problem. We are satisfied, if we can quote a model for the system of axioms, that is, a system of concrete objects, which meet all the given axioms. A system of axioms is called categorical if essentially there exists only one model. By this we mean that for any two models we always get from one model to the other by renaming of the objects. If this is true then we have an axiomatic characterization of the model. In the next section we introduce the natural numbers axiomatically.

1.2 The natural numbers and induction The natural numbers ℕ are presented by the system of axioms developed by G. Peano (1858–1932). This is done as follows. The set ℕ of the natural numbers is described by the following axioms: (ℕ 1) 1 ∈ ℕ. (ℕ 2) Each a ∈ ℕ has exactly one successor a+ ∈ ℕ. (ℕ 3) Always is a+ ≠ 1, and for each b ≠ 1 there exists an a ∈ ℕ with b = a+ . (ℕ 4) a ≠ b ⇒ a+ ≠ b+ . DOI 10.1515/9783110516142-001

2 | 1 The natural, integral and rational numbers (ℕ 5) If T ⊂ ℕ, 1 ∈ T, and if together with a ∈ T also a+ ∈ T, then T = ℕ. (Axiom of mathematical induction or just induction.) Remarks 1.1. (1) (ℕ 2) and (ℕ 4) mean that the map σ∶ℕ→ℕ a ↦ a+ is injective. (2) From the Peano axioms we get per definition an addition, a multiplication and an ordering for ℕ: (i) a + 1 ∶= a+ , a + b+ ∶= (a + b)+ , (ii) a ⋅ 1 ∶= a, a ⋅ b+ ∶= ab + a, (iii) a < b ∶⇔ ∃ x ∈ ℕ with a + x = b (“a smaller than b”), a ≤ b ∶⇔ a = b or a < b (“a equal or smaller than b”). We need to recall some definitions. A semigroup is a set H ≠ ∅ together with a binary operation ⋅ ∶ H × H → H that satisfies the associative property for all a, b, c ∈ H: (a ⋅ b) ⋅ c = a ⋅ (b ⋅ c). The semigroup is commutative if a ⋅ b = b ⋅ a. In the commutative case we often write the operation as addition + instead of multiplication ⋅ . A monoid S is a semigroup with a unity element e, that is, an element e with a ⋅ e = a = e ⋅ a for all a ∈ S; e is uniquely determined. Moreover, a monoid S is called a group if for each a ∈ S there exists an inverse element a−1 ∈ S with aa−1 = a−1 a = e. The monoid or group is named commutative or abelian if in addition a ⋅ b = b ⋅ a for all a, b ∈ S. We often write 1 instead of e. We also often drop ⋅ and use just juxtaposition for this operation. If we use the addition + we often write 0 instead of e and call 0 the zero element of S. Theorem 1.2. (1) The addition for ℕ is associative, that is, a + (b + c) = (a + b) + c,

1.2 The natural numbers and induction

| 3

and commutative, that is, a + b = b + a. This means, ℕ is a commutative semigroup with respect to the addition. (2) The multiplication for ℕ is associative, that is, a(bc) = (ab)c, and commutative, that is, ab = ba. ℕ has also the unity element 1 for the multiplication. Therefore, ℕ is a commutative monoid with respect to the multiplication. (3) The multiplication is distributive with respect to the addition, that is, (a + b)c = ac + bc. (4) For a, b ∈ ℕ exactly one of the following is true: a < b,

a=b

or

b < a.

(5) If a ≤ b and c ≤ d then a + c ≤ b + d and ac ≤ bd. Proof. The statements follow directly from the definition and the Peano axioms. We leave the proofs as an exercise. As an example we prove (3) using (1) and (2): Let a, b ∈ ℕ be arbitrary and T ⊂ ℕ the set of the c ∈ ℕ with (a + b)c = ac + bc. We have 1 ∈ T because (a + b) ⋅ 1 = a + b = a ⋅ 1 + b ⋅ 1. Now, let c ∈ T. Then (a + b)c+ = (a + b)c + (a + b) = ac + bc + a + b = ac + a + bc + b = ac+ + bc+ . Hence c+ ∈ T and so T = ℕ. As usual we write an for ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ a ⋅ a ⋯ a and na for ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ a + a + ⋯ + a, when a, n ∈ ℕ. n times

n times

Remarks 1.3. (1) By the development of the addition in ℕ we suggest the usual representation of natural numbers as numerals: 2 = 1+ = 1 + 1,

3 = 2+ = 2 + 1,

4 = 3+ = 3 + 1

and so on.

(2) From the Peano axioms we also get that for each natural number n there exist exactly one natural number m with m ≤ n < m + 1. The set ℕ is therefore a set unbounded from above.

4 | 1 The natural, integral and rational numbers (3) Theorem 1.2 also allows to subtract smaller natural numbers from larger ones. If a, b ∈ ℕ with a < b, then there is an x ∈ ℕ with a + x = b. We define the subtraction by x ∶= b − a and say “x is equal b minus a”. Example 1.4. 3 = 11 − 8 = 17 − 14, 31 = 50 − 19. (4) The mathematical proof technique mathematical induction is based on the Peano axiom (ℕ 5). It is a form of direct proof, and it is done in two steps. The first step, known as the base case, is to prove the given statement A(n), which is definable for all n ∈ ℕ, for the first natural number 1. The second step, known as the induction step, is to prove that the given statement A(n) is true for any natural number n implies the given statement is true for the next natural number. In other words, if A(1) is true and if we can show that under the assumption that A(n) is true for any n, then A(n + 1) is true, then A(n) is true for all n ∈ ℕ. We call the mathematical induction the first induction principle or the principle of mathematical induction (PMI). It is clear that we may start with the mathematical induction with any natural number n0 > 1 instead of 1, we just need a base. This can be done with the approach B(n) ∶= A(n0 − 1 + n). Examples 1.5. (1) Claim. n

∑k=

k=1

n(n + 1) 2

for all n ∈ ℕ.

Proof. Let A(n), n ∈ ℕ, be the asserted statement. (a) A(1) is true because 1

∑k=1=

k=1

1(1 + 1) . 2

(b) Assume that A(n) is true for n ∈ ℕ. We have to show that A(n + 1) is true: n+1

n

k=1

k=1

∑ k = ∑ k + (n + 1) = =

and this is A(n + 1).

(n + 1)(n + 2) , 2

n(n + 1) + (n + 1) 2

1.2 The natural numbers and induction

| 5

We remark that the n-th triangular number Tn , n ∈ ℕ, is defined as n(n+1) . Geomet2 rically Tn is the number of dots composing a regular triangle with n dots on a side, see Figure 1.1.

Figure 1.1: Geometrical representation of the triangular numbers T1 , T2 , T3 and T4 .

From this geometrical representation (see Figure 1.1) in a regular triangle we see by completing the triangle to a square that Tn−1 + Tn = n2 . This easily gives directly by induction T1 + T2 + T3 + ⋯ + Tn−1 + Tn =

n(n + 1)(n + 2) 6

for n ≥ 2.

Concerning the triangular numbers there is a nice story from the time when Gauss got his first lessons in Calculations. He could solve an exercise, which was considered as very time-consuming, in a very short time. The children had to calculate the sum of all numbers from 1 to 100. Gauss realized the pattern: 1 + 100 = 101,

2 + 99 = 101,

3 + 98 = 101,

…,

50 + 51 = 101.

So he had 50 pairs of numbers, and each pair results in the sum 101. It is clear that this adds up to 50 ⋅ 101 = 5 050. In an analogous manner we may show that n

∑ k2 =

k=1

n(n + 1)(2n + 1) 6

and n

∑ k3 =

k=1

for n ∈ ℕ.

n2 (n + 1)2 = Tn2 , 4

6 | 1 The natural, integral and rational numbers There is a beautiful general formula by Al-Haitam (965–1038): n

n

n

p

i=1

i=1

p=1

i=1

(n + 1) ∑ ik = ∑ ik+1 + ∑ (∑ ik ) . We present a geometrical proof, see Figure 1.2, which is still remarkable.

Figure 1.2: Geometrical proof of the formula by Al-Haitam.

Recall that 1k+1 = 1k , 2 ⋅ 2k = 2k+1 , 3 ⋅ 3k = 3k+1 , …. (2) Claim. 2n > n2

for all n ≥ 5.

Proof. Let A(n), n ≥ 5, be the asserted statement. (a) A(5) is true because 25 = 32 > 25 = 52 . (b) Assume that A(n) is true for n ≥ 5. Then 2n+1 = 2 ⋅ 2n > 2 ⋅ n2 = n2 + n2 > n2 + 2n + 1 = (n + 1)2 , because n2 > 2n + 1 for n ≥ 3 (which is left as an exercise). Hence A(n) is true for all n ∈ ℕ with n ≥ 5. (3) A polygon in the plane ℝ2 with n + 2 sides, n ∈ ℕ, is called convex, if the connecting line segment between any two points of the polygon is totally within the polygon. Examples 1.6. We give examples for convex polygons in Figure 1.3.

1.2 The natural numbers and induction

| 7

Figure 1.3: Convex polygons.

Claim. The angle sum Wn of the interior angles in a convex polygon with n + 2, n ∈ ℕ, sides is n ⋅ 180∘ . Proof. (a) If n = 1 we have a triangle from which it is known that W1 = 180∘ . (b) Assume that the assert statement holds for n ∈ ℕ. We consider a convex polygon with (n + 1) + 2 sides. We divide the polygon into a triangle and a polygon with n + 2 sides as in Figure 1.4.

Figure 1.4: Polygon divided into a triangle and a polygon with n + 2 sides.

Then Wn+1 = Wn + 180∘ = n ⋅ 180∘ + 180∘ = (n + 1) ⋅ 180∘ . (4) The geometric sum formula. Let q ∈ ℕ, q ≠ 1. Then n

∑ qi = i=1

qn+1 − q q−1

for n ∈ ℕ.

Proof. (a) If n = 1 then 1

∑ qi = q = q ⋅ i=1

q − 1 q2 − q = . q−1 q−1

(b) Assume that the asserted statement holds for n ∈ ℕ. Then n+1

n

i=1

i=1

∑ qi = ∑ qi + qn+1 =

qn+1 − q qn+2 − q + qn+1 = . q−1 q−1

8 | 1 The natural, integral and rational numbers We remark that

n

1 + ∑ qi = i=1

qn+1 − 1 . q−1

(5) Claim. Let M be an arbitrary set with n (distinct) elements, n ∈ ℕ. Let 𝒫(M) be the power set of M. Then |𝒫(M)| = 2n for the number of elements in 𝒫(M). Proof. (a) If n = 1 then M = {a} for some a, and hence, 𝒫(M) = {∅, {a}}, that is, |𝒫(M)| = 2 = 21 . (b) Assume that the asserted statement holds for each set with n elements. Let, without loss of generality, M = {a1 , a2 , … , an+1 } = {a1 , a2 , … , an } ∪ {an+1 } = M ′ ∪ {an+1 } be a set with n + 1 elements. By the induction hypothesis |𝒫(M ′ )| = 2n . Now, let A ⊂ M be a subset of M. Then we have exactly one of the following cases: Case 1. an+1 ∉ A. Then A ⊂ M ′ , and by the induction hypothesis there are exactly 2n of such subsets. Case 2. an+1 ∈ A. Then A ⧵ {an+1 } ⊂ M ′ . Again, by the induction hypothesis, there are exactly 2n such subsets. Altogether |𝒫(M)| = 2n + 2n = 2 ⋅ 2n = 2n+1 . The principle of induction is equivalent to another property called the least wellordering property. This is the following. Let S be a nonempty subset of the natural numbers ℕ. Then S has a least element. We will abbreviate the least well-ordering property by LWO. In the next theorem below we show that the principle of mathematical induction (PMI) is equivalent to the LWO. By equivalent, we mean here that if we assume that the PMI is true then we can prove the LWO and if we assume the LWO is true then we can prove the PMI. Theorem 1.7. The principle of mathematical induction is equivalent to the least wellordering property. Proof. To prove this we must assume first the principle of mathematical induction and show that the well-ordering property holds and then vice versa. Suppose that the PMI holds and let S be a nonempty subset of ℕ. We must show that S has a least element. We let T be the set T = {x ∈ ℕ ∣ x ≤ s, for all s ∈ S} .

1.2 The natural numbers and induction

| 9

Now 1 ∈ T since S is a subset of ℕ. If whenever x ∈ T it were to follow that (x + 1) ∈ T, then by the inductive property T = ℕ but then S would be empty contradicting that S is nonempty. Therefore there exists an a with a ∈ T and (a + 1) ∉ T. We claim that a is the least element of S. Now a ≤ s for all s ∈ S because a ∈ T. If a ∉ S then every s ∈ S would also satisfy (a + 1) ≤ s. This would imply that (a + 1) ∈ T which is a contradiction. Therefore a ∈ S and a ≤ s for all s ∈ S and hence a is the least element of S. Conversely suppose the well-ordering property holds and suppose that S is a subset of ℕ with the properties that 1 ∈ S and that whenever n ∈ S it follows that (n + 1) ∈ S. We must show that S = ℕ. If S ≠ ℕ, then the set difference ℕ ⧵ S, that is, the set of all elements in ℕ but not in S, would be a nonempty subset of ℕ. Thus by the LWO, it has a least element, say n. Hence (n − 1) is not in ℕ ⧵ S or (n − 1) ∈ S. But then by the assumed property of S we get that (n − 1) + 1 = n ∈ S which gives a contradiction. Therefore ℕ ⧵ S is empty and S = ℕ. Based on Theorem 1.7 we have a second form of mathematical induction that we call the second induction principle. This is also known as course of values induction or strong induction. Theorem 1.8. Let A(n) be a statement which is defined for all n ∈ ℕ. If A(1) is true and if we can show that under the assumption that A(k) is true for all k ∈ ℕ with k < n for any n ∈ ℕ, then A(n) is true for all n ∈ ℕ. Proof. Let T = {n ∈ ℕ ∣ A(n) is not true} ⊂ ℕ. Assume that T ≠ ∅. Then by LWO T contains a smallest element, which means that there is an n ∈ ℕ, n > 1, with A(n) not true but A(1) is true and A(k) is true for all k ∈ ℕ, k < n. But this contradicts our hypothesis. Therefore A(n) is true for all n ∈ ℕ. Corollary 1.9. The two principles for mathematical induction are equivalent. Proof. If the second principle holds then certainly also the first one. If the first principle holds then also the second one vie Theorem 1.8. Remarks 1.10. (1) Also for the second induction principle we may take as the base any n0 ∈ ℕ instead of 1. (2) Please note that mathematical induction is not an empirical method to get general statements from calculations or observations for some numbers. Example 1.11 (For the second principle). We define here the Fibonacci numbers fn , n ∈ ℕ, recursively by f1 ∶= f2 ∶= 1

and fn+1 ∶= fn + fn−1

for n ≥ 2.

Claim. For n ≥ 2 the Fibonacci number fn is the number of all 0–1-sequences of length n − 2 which do not contain neighboring 1’s.

10 | 1 The natural, integral and rational numbers Proof. Let Mn be the set of all 0–1-sequences of length n−2, n ≥ 2, which do not contain neighboring 1’s. We have |M2 | = f2 = 1, the empty sequence, and |M3 | = f3 = 2 because we just have the sequences 0 and 1. Now, let n ≥ 4, and we assume that the statement is true for all k with 2 ≤ k < n. Let (0) (1) (0) Mn = Mn ∪ Mn be the disjoint union of Mn , the set of the sequences in Mn ending (1) with a 0, and Mn , the set of the sequences in Mn ending with a 1. Because of our (1) condition, in fact, each sequence in Mn has to end with 01 (recall that we do not have neighboring 1’s). Therefore |Mn | = |Mn | + |Mn | = |Mn−1 | + |Mn−2 | = fn−1 + fn−2 = fn . (0)

(1)

Remark 1.12. We may establish the natural numbers constructive from the axiomatic set theory. One starts with the axiomatically existent empty set ∅ and gets for each set X a successor set using the axiom that there exists an infinite set, that is, there is a set which contains with ∅ and X also the set X ∪ {x}. One then defines 0 ∶= ∅, 1 ∶= ∅ ∪ {∅} = {0}, 2 ∶= 1 ∪ {1} = {0} ∪ {1} = {0, 1}, 3 ∶= 2 ∪ {2} = {0, 1} ∪ {2} = {0, 1, 2}, ⋮ n ∶= {0, 1, 2, … , n − 1}. In this approach the element 0 is a natural number and each n is the set of its predecessors 0, 1, 2, … , n − 1. We remember that in our approach using the Peano axioms the natural numbers start with 1. In the next section we use the above Remark 1.12 to introduce briefly the integers and the rational numbers in which the zero symbol plays a fundamental role.

1.3 The integers ℤ Since the integers play a fundamental role in our book, we now introduce them and deduce their existence from the natural numbers. We begin by looking at the difference of two natural numbers introduced in the last section. x = a − b for a, b ∈ ℕ and a > b. This means algebraically that we may solve the equation b+x=a

for a > b in ℕ.

The motivation is to solve the equation b+x=a over ℕ in general. This leads to the number 0 and the negative numbers.

1.3 The integers ℤ

| 11

Historically it took a long time before the negative numbers were accepted. In ancient times and in parts of the middle ages mathematicians like Al Khwarizmi (ca. 780–850), Cardano (1501–1576) and Vieta (1540–1603) considered the negative numbers as forbidden or worked with them solely symbolically believing that these negative numbers do not exist. This can be seen from the many quadratic equations over ℕ which in their feeling are forbidden or not-existent. We consider the set ℕ × ℕ = {(a, b) ∣ a, b ∈ ℕ}, the set of pairs of natural numbers. If a > b and c > d, then we may have the equation a − b = c − d in ℕ, or equivalently, a + d = c + b. This is the inspiring background for introducing the integers. We define an addition and multiplication on ℕ × ℕ as follows: (a, b) + (c, d) = (a + c, b + d), (a, b) ⋅ (c, d) = (ac + bd, ad + bc). With respect to the above equation a + d = b + c whenever a > b, c > d and a − b = c − d, we introduce a relation on ℕ × ℕ: (a, b) ∼ (c, d) ⇔ a + d = c + b. This is an equivalence relation. Certainly, (a, b) ∼ (a, b) and (c, d) ∼ (a, b) if (a, b) ∼ (c, d), that is, the relation is reflexive and symmetric. It is also transitive, that is, if (a, b) ∼ (c, d) and (c, d) ∼ (e, f ) then (a, b) ∼ (e, f ) because from a + b = c + d and c + d = e + f we get a + b = e + f . Let ℤ ∶= ℕ × ℕ/∼ be the set of equivalence classes. The addition and the multiplication of the pairs in ℕ × ℕ induce a well defined addition and multiplication on ℤ, that is, if (a, b) ∼ (a′ , b′ ) and (c, d) ∼ (c′ , d′ ), then (a, b) + (c, d) ∼ (a′ , b′ ) + (c′ , d′ ) and (a, b) ⋅ (c, d) ∼ (a′ , b′ ) ⋅ (c′ , d′ ). We leave the proof for this as an exercise. Together with these operations ℤ becomes a commutative ring with unity. Remark 1.13. We remind that a ring is a set R ≠ ∅ equipped with two binary operations + ∶ R × R → R and ⋅ ∶ R × R → R satisfying the following three sets of properties for all a, b, c ∈ R: – R is a commutative group under addition, that is, (1) (a + b) + c = a + (b + c).

12 | 1 The natural, integral and rational numbers

–

(2) a + b = b + a. (3) There is a zero element 0 ∈ R such that a + 0 = a for all a ∈ R. (4) For each a ∈ R exists −a ∈ R such that a + (−a) = 0. We call −a the negative element of a. R is a semigroup under multiplication, that is, (a ⋅ b) ⋅ c = a ⋅ (b ⋅ c).

–

The multiplication is distributive with respect to the addition, that is, a ⋅ (b + c) = (a ⋅ b) + (a ⋅ c)

and

(b + c) ⋅ a = (b ⋅ a) + (c ⋅ a). A commutative ring with unity 1 is a ring R, in which the semigroup under multiplication is also commutative and has the unity element 1. In the case ℤ the zero element is the class represented by (1, 1), and the unity element is the class represented by (2, 1). We briefly write (a, b) for the equivalence class represented by (a, b). The equivalence class (a, b) has for a > b a unique representative of the form (n + 1, 1), where n = a − b, and for b > a a unique representative of the form (1, m + 1), where m = b − a. This gives a possibility to embed ℕ into ℤ. We define the map φ∶ℕ→ℤ n ↦ (n + 1, 1). The map φ is injective and satisfies φ(a + b) = φ(a) + φ(b) and

φ(ab) = φ(a)φ(b)

for a, b ∈ ℕ. This means, that φ is an embedding. Therefore we may identify ℕ with φ(ℕ) ⊂ ℤ, and we write after the identification n = (n + 1, 1) for n ∈ ℕ. We write −n for the equivalence class (1, n + 1). This is reasonable because (n + 1, 1) + (1, n + 1) = (1, 1), the zero element of ℤ. Therefore we now write 0 for the class (1, 1). In this sense, ℤ is the disjoint union ℤ = ℕ− ∪ {0} ∪ ℕ, where ℕ− = {−n ∣ n ∈ ℕ}. With this, the addition and multiplication in ℤ is just according to customs.

1.4 The rational numbers ℚ

| 13

We now transfer the ordering in ℕ to an ordering in ℤ. We define a < b in ℤ if there exists an n ∈ ℕ with a + n = b, and a ≤ b if a = b or a < b. This ordering is compatible with the addition and the multiplication in ℤ, and we get directly from the definition and the respective statements in ℕ the following. Theorem 1.14. Let a, b, c, d ∈ ℤ. (1) If a ≤ b, c ≤ d, then a + c ≤ b + d. (2) If a ≤ b, 0 ≤ c, then ac ≤ bc. (3) If a ≤ b, c < 0, then bc ≤ ac. Remarks 1.15. (1) Instead of a a (“b is bigger than a”) and instead of a ≤ b also b ≥ a (“b is equal or bigger than a”). (2) We now want to consider the first algebraic property of ℤ. We know that ℤ is a commutative ring with unity 1. We call a commutative ring with unity 1 ≠ 0 an integral domain or just a domain, if there are no nontrivial zero divisors in R, that is, if a, b ∈ ℝ ⧵ {0} then ab ≠ 0. We have the following. Theorem 1.16. ℤ is an integral domain. Proof. Let a, b ∈ ℤ, a ≠ 0 ≠ b. Then ab ≠ 0.

1.4 The rational numbers ℚ We also may construct the rational numbers from the integers with the help of an equivalence relation. We are guided here by the known representation as fractions a with a, b ∈ ℤ, b ≠ 0. We use in a naive manner the following as a motivation: b a c = b d

if and only if ad = cb.

With help of this idea we introduce the rational numbers from the integers in a compact, but exact manner. We consider the Cartesian product A = ℤ × ℤ∗

with ℤ∗ = ℤ ⧵ {0}.

We define on A a relation by (a, b) ∼ (c, d) ⇔ ad = cb for (a, b), (c, d) ∈ A. This relation is an equivalence relation. Certainly, (a, b) ∼ (a, b) and (c, d) ∼ (a, b) if (a, b) ∼ (c, d), that is, the relation is reflexive and symmetric. It is also transitive, that is, if (a, b) ∼ (c, d) and (c, d) ∼ (e, f ) then (a, b) ∼ (e, f ) because from ad = bc and cf = ed we get af = eb. Using that ℤ is an integral domain.

14 | 1 The natural, integral and rational numbers Let ℚ ∶= A/∼ be the set of equivalence classes for this relation. We write the equivalence class {(a′ , b′ ) ∈ A ∣ (a′ , b′ ) ∼ (a, b)} represented by (a, b). The map

a b

to present

φ∶ℤ→ℚ n n↦ 1

is injective. This we see as follows. If φ(n) = φ(m) then n1 = m1 , and this means (n, 1) ∼ (m, 1), that is, n ⋅ 1 = m ⋅ 1 and hence, n = m. We now may introduce an addition and multiplication on ℚ as follows: a c ad + cb + = b d bd

a c ac ⋅ = . b d bd

and

These addition and multiplication are well defined, that is, if (a, b) ∼ (a′ , b′ )

and (c, d) ∼ (c′ , d′ ),

then a c a′ c ′ + = + b d b′ d′

and

a c a′ c ′ ⋅ = ⋅ . b d b′ d′

This is easy for the multiplication, and we leave this as an exercise. We present the proof for the more complicated addition: a c a′ c ′ ad + cb a′ d′ + c′ b′ + = ′+ ′ ⇔ = ⇔ (ad + cb)b′ d′ = (a′ d′ + c′ b′ )bd b d b d bd b′ d′ ⇔ adb′ d′ + cbb′ d′ = a′ d′ bd + c′ b′ bd. This last equation holds in ℤ because ab′ = a′ b and cd′ = c′ d. Therefore the addition and the multiplication are well defined. With this ℚ becomes a field. Remark 1.17. We recall that a field is a set K ≠ ∅ equipped with two binary operations + ∶ K × K → K and ⋅ ∶ K × K → K satisfying the following three sets of properties: – K is a commutative group under addition. – K ∗ = K ⧵ {0} is a commutative group under multiplication. – The multiplication is distributive with respect to the addition, that is, a ⋅ (b + c) = (a ⋅ b) + (a ⋅ c)

for all a, b, c ∈ K.

The multiplicative inverse for ba , a ≠ 0 ≠ b, is ba . The map φ∶ℤ→ℚ n n↦ 1

is not only injective, it also respects the addition and the multiplication, that is,

1.5 The absolute value in ℕ, ℤ and ℚ

| 15

φ(a + b) = φ(a) + φ(b) and φ(ab) = φ(a)φ(b) for a, b ∈ ℤ. Hence, φ is an embedding, and we may identify ℤ with φ(ℤ), and consider from now on ℤ as a subset of ℚ. We also may transfer the ordering in ℤ to an ordering in ℚ. We define ba > 0, if ba is represented by a pair (a′ , b′ ) with a′ > 0 and b′ > 0. Analogously, we define ba < 0, if ba is represented by a pair (a′ , b′ ) with a′ < 0 and b′ > 0. Certainly ba = 0 if and only if a = 0. In general we define then a c > b d

if

a c − > 0, b d

and a c ≥ b d

if

a c = b d

or

a c > . b d

This extends the ordering for ℤ to an ordering for ℚ. Theorem 1.14 can be directly transferred from ℤ to ℚ, and we get the following (we now write x instead of ba ). Theorem 1.18. Let x, y, u, v ∈ ℚ. (1) If x ≤ y, u ≤ v, then x + u ≤ y + x. (2) If x ≤ y, 0 ≤ u, then xu ≤ yu. (3) If x ≤ y, u < 0, then yu ≤ xu. Remarks 1.19. (1) Again, we also write x > y and x ≥ y if y < x and y ≤ x, respectively. (2) Certainly ℚ is the disjoint union ℚ = ℚ − ∪ {0} ∪ ℚ + , where ℚ − = {x ∈ ℚ ∣ x < 0} and ℚ + = {x ∈ ℚ ∣ x > 0}.

1.5 The absolute value in ℕ, ℤ and ℚ We need to consider only ℚ because ℕ ⊂ ℤ ⊂ ℚ. Let x ∈ ℚ. The absolute value |x| of x is defined by {x |x| = { −x {

if x ≥ 0, if x < 0.

Lemma 1.20. We always have x ≤ |x|. Proof. If x ≥ 0, then x = |x|. If x < 0, then −x > 0, that is, |x| = −x > 0 and therefore x < 0 < |x|. Theorem 1.21. Let x, y ∈ ℚ. Then the following hold: (1) |x| ≥ 0 and |x| = 0 ⇔ x = 0.

16 | 1 The natural, integral and rational numbers (2) |x ⋅ y| = |x| ⋅ |y| and |−x| = |x|. (3) |x + y| ≤ |x| + |y| (triangle inequality). Proof. We have (1) and (2) as an easy exercise. We now prove (3). If x + y ≥ 0 then |x + y| = x + y ≤ |x| + y ≤ |x| + |y|. If x + y < 0 then |x + y| = −(x + y) = −x − y ≤ |−x| + |−y| = |x| + |y|. Corollary 1.22. Let x, y ∈ ℚ. Then the following hold: (1) | xy | = |x| for y ≠ 0. |y|

(2) ||x| − |y|| ≤ |x − y|. Proof. (1) We have

|x| x x x |x| = | ⋅ y| = | ||y| ⇒ = | |. y y |y| y (2) We have |x| = |x − y + y| ≤ |x − y| + |y|, that is, |x| − |y| ≤ |x − y|. Analogously |y| − |x| ≤ |x − y|. Together we get ||x| + |y|| ≤ |x − y|.

Exercises 1.

Prove from Theorem 1.2 part (1), (2), (4) and (5), which are: (1) The addition for ℕ is associative, that is, a + (b + c) = (a + b) + c, and commutative, that is, a + b = b + a. This means, ℕ is a semigroup with respect to the addition. (2) The multiplication for ℕ is associative, that is, a(bc) = (ab)c, and commutative, that is, ab = ba. ℕ has also a unity element for the multiplication. Therefore, ℕ is a monoid with respect to the multiplication. (4) For a, b ∈ ℕ exactly one of the following is true: a < b,

a = b or

b < a.

(5) If a ≤ b and c ≤ d then a + c ≤ b + d and ac ≤ bd.

Exercises | 17

2. 3.

Prove n2 > 2n + 1 for all n ∈ ℕ and n ≥ 3, with the first induction principle. Use induction to prove for any natural number n 13 + 23 + ⋯ + n3 =

n2 (n + 1)2 . 4

4. Let n ∈ ℕ and we define recursively the integers an as a1 ∶= 1,

a2 ∶= 2, {a + an an+2 ∶= { n+1 a − an { n+1

5.

if an+1 ⋅ an is even,

if an+1 ⋅ an is odd.

Search for formulas to describe an directly from n, without calculating am for m < n and prove these formulas using the first induction principle. (Hint: By testing one suppose that a3n−2 = 4n − 3, a3n−1 = 2 and a3n = 4n − 1.) Prove that the addition and the multiplication of the pairs in ℕ × ℕ induce a well defined addition and multiplication on ℤ, that is, if (a, b) ∼ (a′ , b′ ) and (c, d) ∼ (c′ , d′ ), then (a, b) + (c, d) ∼ (a′ , b′ ) + (c′ , d′ ) and (a, b) ⋅ (c, d) ∼ (a′ , b′ ) ⋅ (c′ , d′ ).

6.

Prove that the multiplication on ℚ is well defined, that is, if (a, b) ∼ (a′ , b′ ) and (c, d) ∼ (c′ , d′ ), then a c a′ c ′ ⋅ = ⋅ . b d b′ d′

7.

Prove from Theorem 1.21 part (1) and (2), which are: (1) |x| ≥ 0 and |x| = 0 ⇔ x = 0. (2) |x ⋅ y| = |x| ⋅ |y| and |−x| = |x|. 8. Recall that if a and b are integers then a divides b, denoted a ∣ b, means there exists an integer c such that b = ac (see Chapter 2). (a) Show that if m, n are positive integers such that m ∣ n and n ∣ m, then m = n. (b) Show that if m is an integer and 2 ∣ m2 , then 2 ∣ m. We call an integer m even if 2 divides m and odd if not.

2 Division and factorization in the integers 2.1 The Fundamental Theorem of Arithmetic The most important result concerning the integers is the Fundamental Theorem of Arithmetic. This states that any integer can be expressed uniquely as a product of prime numbers where uniqueness is up to ordering and algebraic sign. To give this important result we must define divisibility. In what follows, a, b, c, … are always integers. Definition 2.1. The integer a is called a divisor or factor of b, written as a ∣ b if there exists a k ∈ ℤ with b = k ⋅ a. We call b a multiple of a.

(1) (2) (3) (4) (5) (6)

If a is not a divisor of b, then we write a ∤ b. The following properties hold: a ∣ b and b ∣ c ⇒ a ∣ c (transitivity). c ∣ a and c ∣ b ⇒ c ∣ (k1 a + k2 b) for all k1 , k2 ∈ ℤ. ±a ∣ a and ±1 ∣ a for all a ∈ ℤ. a ∣ 0 for all a ∈ ℤ. 0 ∣ a ⇔ a = 0. a ∣ b and b ∣ a ⇒ a = ±b.

These properties follow directly from the definition, and we leave them as an exercise. An arithmetic expression of the form k1 a + k2 b, as in (2) is called a linear combination of a and b. More generally, if a1 , a2 , … , an ∈ ℤ, then an arithmetic expression k1 a1 + k2 a2 + ⋯ + kn an is called a linear combination of a1 , a2 , … , an . Note that each a with |a| ≥ 2 has at least two positive divisors, that is, 1 and |a|. Definition 2.2. The integer p ≥ 2 is called a prime number or a prime, if p has exactly two positive divisors, that is, 1 and p. If p is a prime number and p ∣ a, then p is called a prime divisor or prime factor of a. Theorem 2.3. Each a ∈ ℕ with a ≥ 2 has a prime factor. DOI 10.1515/9783110516142-002

20 | 2 Division and factorization in the integers Proof. We use the second induction principle, that is, course of values induction. The lowest level is a = 2. The number 2 is a prime so the statement is true at the lowest level. Suppose that every integer 2 ≤ k < n has a prime factor, we must show that n then also has a prime factor. If n is prime then we are done. Suppose that n is not a prime number. Then n = m1 m2 with 1 < m1 < n, 1 < m2 < n. By the inductive hypothesis both m1 and m2 have prime factors. Therefore n also has a prime factor completing the proof. Now that we have a prime divisor the next result shows that for any integer there must be a smallest prime divisor. Theorem 2.4. Each a with |a| ≥ 2 has a smallest prime divisor. Proof. Let T = {x ∣ x ≥ 2 and x ∣ a}. Since |a| ∣ a and |a| ≥ 2 we get T ≠ ∅. T contains a smallest element p ≥ 2 (see Theorem 1.7); p must be a prime number. Otherwise if p = cd with |c| ≥ 2, |d| ≥ 2, then |c| ∣ a, which contradicts the minimality of p because 2 ≤ |c| < p. Hence, p is the smallest prime divisor of a. Definition 2.5. The integer a is called a composite number if there are b and c with a = bc and 2 ≤ |a|, |b| < |a|. Theorem 2.6. Let n be a composite number. Then n has a prime divisor p with p ≤ √|n|. Proof. Let n = ab with 2 ≤ |a|, |b| < |n|. Then |a| ≤ √|n| or |b| ≤ √|n|, because otherwise |n| = |a||b| > √|n|√|n| = |n|. Let |a| ≤ √|n| and p be the smallest prime divisor of |a|. Then p ∣ n and p ≤ |a| ≤ √|n|. Remarks 2.7. (1) There are historical reasons to assume that prime numbers are positive. The negative numbers were not settled or known in ancient cultures. Nowadays we often call p and −p prime elements of ℤ if p is a prime number. (2) If n is a natural number then one may ask in an obvious manner if n is a prime number or not. A test to determine if a given natural number is prime or not is called a primality test. In this chapter we will not consider primality tests, but at least we want to mention the deterministic but inefficient Sieve of Eratosthenes. This is a straightforward sieving method to obtain all the primes less than or equal to a fixed bound x. It is ascribed by Eratosthenes (276–194 BC), who was the chief librarian of the great ancient library in Alexandria. The Sieve of Eratosthenes is a direct method to determine primes. It works as follows. Given x > 0, list all the positive integers less than or equal to x. Starting with 2, which is a prime number, cross out all multiples of 2 on the list. The next

2.1 The Fundamental Theorem of Arithmetic | 21

number on the list not crossed out, which is 3, is a prime number. Next cross out all the multiples of 3 not already eliminated. The next number left uneliminated, which is 5, is a prime number. Continue in this manner. As explained in Theorem 2.6, for the primality test, the elimination must only be done for numbers less than or equal √x. If, for instance, we take x = 100 then we get as described that 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 are exactly the prime numbers between 2 and 100. We now present and prove the Fundamental Theorem of Arithmetic. Theorem 2.8 (Fundamental Theorem of Arithmetic). Each integer a ≠ 0 can be written in a unique way as a product of prime numbers: n

a = ϵ ∏ pi , i=1

ϵ = ±1,

n ≥ 0,

where p1 , p2 , … , pn are prime numbers. The uniqueness is up to the ordering of the factors and the algebraic sign. Proof. If a > 0 then ϵ = +1, and if a < 0 then ϵ = −1. We therefore may assume that a > 0. If a = 1, then n = 0, the empty product. Hence, let now a ≥ 2. Existence: We assume that there exists an a ≥ 2 which does not have such a decomposition. Then there exists a smallest such number, which we call b. The number b has a prime divisor p and hence, b = pc with 1 ≤ c < b. We have c ≠ 1 because otherwise b = p a prime number, and we have a decomposition of b. Therefore 2 ≤ c < b, and c has a decomposition c = p1 p2 ⋯ pk . Then b = p ⋅ p1 p2 ⋯ pk , which contradicts our assumption. Therefore each a ≥ 2 has such a decomposition. Uniqueness: We assume that there exists an a ≥ 2 which has such a decomposition but this is not unique (up to the ordering of the factors). Then there exists a smallest such number, which we again call b. This b has a smallest prime divisor, say q. We write b = cd with c ≥ 2, d ≥ 1. If q ∣ c, then c = fq for some f ∈ ℕ and b = qfd, and q occurs in the decomposition of b. Now, let q ∤ c. Let p be a prime divisor of c. Then p ∣ c and therefore p ∣ b. Because of the minimality of q and because q ∤ c we necessarily have p > q. It follows c > q because p ∣ c. We form the number r = b − qd = (c − q)d ∈ ℕ. By our assumption r < b. Now q ∣ b and q ∣ qd, therefore, q ∣ (c − q)d. We have q ∤ (c − q) since otherwise q ∣ (c − q + q) hence q ∣ c. Because of the uniqueness for the decomposition of r = (c − q)d we must have q ∣ d. Hence, also in the case q ∤ c the prime factor

22 | 2 Division and factorization in the integers q occurs in each decomposition of b. Therefore, altogether, q occurs in each decomposition of b, that means, if p1 , p2 , … , pk are prime numbers with b = p1 p2 ⋯ pk then q = pi for some i. Since the unique decomposition of bq < b we get the uniqueness for

b = q ⋅ bq which contradicts our assumption. The statement now follows from the second induction principle. Remark 2.9. In general we combine equal prime numbers and write α

α

α

a = p1 1 p2 2 ⋯ pk k

for a ≥ 2,

with p1 , p2 , … , pk pairwise different prime numbers, k ≥ 1, α1 , α2 , … , αk ∈ ℕ. Often we sort the pi by size, that is, p1 < p2 < ⋯ < pk . The corresponding expression α

α

α

a = p1 1 p2 2 ⋯ pk k

for a ≥ 2

is called the standard prime decomposition of a. If it is convenient we also write a = ∏ pα(p) , p

where the product is over all prime numbers, all α(p) ≥ 0, and almost all α(p) = 0, that is, all α(p) = 0 up to finitely many α(p) which are possibly not 0. Corollary 2.10. There exist infinitely many prime numbers. Proof. Assume that there are only finitely many prime numbers p1 , p2 , … , pk ,

k ≥ 1.

We form n = p1 p2 ⋯ pk + 1; n has a prime divisor p because n ≥ 2 since there exists at least the prime number 2. Since there are only finitely many prime numbers p1 , p2 , … , pk , the prime number p must occur in the list. Hence, without loss of generality, let p = p1 . Then n = p1 a for some a ∈ ℕ because p1 ∣ n. It follows: 1 = n − p1 p2 ⋯ pk = p1 (a − p2 ⋯ pk ), which means p1 ∣ 1, and this gives a contradiction because p1 ≥ 2. Therefore there exist infinitely many prime numbers.

2.2 The division algorithm and the greatest common divisor

| 23

Corollary 2.11 (Euclid’s Lemma – first version). Let p be a prime number with p ∣ ab, a, b ∈ ℕ. Then p ∣ a or p ∣ b. Proof. If we would have p ∤ a and p ∤ b, then ab would have a prime decomposition as a product of prime factors of a and b, which does not contain p. On the other side we have p ∣ ab, that is, ab = pc for some c ∈ ℤ. If we multiply the decomposition of c with p we get a decomposition of ab = pc which contains p. This gives a contradiction to Theorem 2.8. Therefore, p ∣ a or p ∣ b. Remark 2.12. If Euclid’s Lemma, which is named after Euclid of Alexandria (ca. 300 BC), is already available, then we get the uniqueness part in Theorem 2.8 for free: If a ≥ 2 is not a prime number, then a = bc with 1 < b, c < a. Let p be a prime divisor of a. By Euclid’s Lemma then p ∣ b or p ∣ c, and hence, p occurs in each prime decomposition of a which leads via a = p ⋅ pa to the uniqueness. What we want is a proof of Euclid’s Lemma without the Fundamental Theorem of Arithmetic. This can be done with the division algorithm that we introduce in the next section.

2.2 The division algorithm and the greatest common divisor The division algorithm describes the division of an integer by a smaller integer to obtain a quotient and a remainder. Theorem 2.13 (Division algorithm). Let a, b ∈ ℤ with b ≠ 0. Then a can be presented in the form a = qb + r

with 0 ≤ r < |b|.

The number q is called the quotient and the number r the remainder. This presentation is unique, that is, q and r are uniquely determined by a and b. Proof. Suppose first that 0 < b. The existence of a representation a = qb + r with 0 ≤ r 0, and there exists a q ∈ ℤ with (q −1)b < −a ≤ qb, and hence (−q)b ≤ a < −(q − 1)b, and −qb is the biggest multiple of b which is less than or equal to a. For the uniqueness we take two such representations a = q1 b + r1 ,

a = q2 b + r2 ,

0 ≤ r1 < b and 0 ≤ r2 < b.

24 | 2 Division and factorization in the integers Suppose, without loss of generality, that r1 ≥ r2 . Then 0 ≤ r1 − r2 < b. Subtraction leads to b(q1 − q2 ) + r1 − r2 = 0, that is, 0 ≤ r1 − r2 = b(q2 − q1 ) < b. From this necessarily q1 = q2 and then r1 = r2 . Now, let b < 0. Then −b > 0, and there exist unique numbers q1 and r1 such that a = q1 (−b) + r1 with 0 ≤ r1 < −b. Therefore a = (−q1 )b + r1 and 0 ≤ r1 < |b|. Now, take q = −q1 and r = r1 . Examples 2.14. (1) 87 = 8 ⋅ 10 + 7, (2) −52 = (−5) ⋅ 12 + 8. Now we want to introduce the greatest common divisor of two integers. Later we will introduce the greatest common divisor of integers a1 , a2 , … , an in general. We first give a definition which is suitable to present in school and then show the equivalence of this definition to the more suitable definition for algebra. We introduce some notation. If a ∈ ℤ then we write Ta = {x ∈ ℕ ∣ x|a}, for the set of the positive divisors of a. Examples 2.15. (1) T12 = {1, 2, 3, 4, 6, 12}, (2) T−15 = {1, 3, 5, 15}, (3) T0 = ℕ. We remark that 1 ∈ Ta for all a ∈ ℤ. Let a, b ∈ ℤ. Let Ta ∩ Tb be the intersection of Ta and Tb , that is, the subset of ℕ which contains exactly those elements which belong to Ta and Tb : Ta ∩ Tb = {x ∈ ℕ ∣ x ∈ Ta and x ∈ Tb }. We always have Ta ∩ Tb ≠ ∅ because 1 ∈ Ta and 1 ∈ Tb . Notice also that both Ta and Tb are bounded above by a and b respectively and therefore so is Ta ∩ Tb . Since this is a set of integers it follows that there must be a maximal element. Definition 2.16. Let a, b ∈ ℤ with a ≠ 0 or b ≠ 0. The greatest common divisor gcd(a, b) of a and b is gcd(a, b) ∶= max(Ta ∩ Tb ), that is, the biggest natural number in Ta ∩ Tb . For a = b = 0 the greatest common divisor is not defined.

2.2 The division algorithm and the greatest common divisor

| 25

Examples 2.17. (1) T12 ∩ T−15 = {1, 3}, and hence, gcd(12, −15) = 3. (2) T0 ∩ T12 = T12 , and hence, gcd(0, 12) = 12. Theorem 2.18. Let a, b ∈ ℤ with a ≠ 0 or b ≠ 0. Then, d ∈ ℕ is the gcd(a, b) if and only if the two conditions are satisfied: (i) d ∣ a and d ∣ b. (ii) If δ ∣ a and δ ∣ b for δ ∈ ℕ, then δ ∣ d. Proof. If the two conditions (i) and (ii) hold, then certainly d = gcd(a, b). Now, let d = gcd(a, b). Then condition (i) is clear by definition. Assume that condition (ii) does not hold, that is, δ ∣ a and δ ∣ b but δ ∤ d. Then there exists a prime number p with pα ∣ a and pα ∣ b for some α ∈ ℕ but pα ∤ d. Now, from d ∣ a and d ∣ b we get pα d ∣ a and pα d ∣ b which contradicts the maximality of d. Hence δ ∣ d, and condition (ii) holds. From the definition we obtain automatically the following properties: gcd(a, 1) = 1,

gcd(a, b) = gcd(a, −b) = gcd(−a, b) = gcd(−a, −b) = gcd(b, a)

for integers a, b, not both 0. Hence, to calculate the gcd(a, b) we may assume that a and b are both non-negative. Definition 2.19. Let a, b ∈ ℤ, a ≠ 0 or b ≠ 0. The integers a and b are called relatively prime if gcd(a, b) = 1. We show first how to calculate the gcd(a, b) in terms of the prime decompositions. Theorem 2.20. Let a, b ∈ ℕ0 = ℕ ∪ {0}, a ≠ 0 or b ≠ 0. If a = 0 then gcd(a, b) = b and if b = 0 then gcd(a, b) = a. Now suppose that a, b > 0 and we consider the prime decompositions of a and b: α

α

α

β

β

β

a = p1 1 p2 2 ⋯ pnn , and b = p1 1 p2 2 ⋯ pnn , where the pi are pairwise distinct prime numbers and αi , βi ≥ 0 for i = 1, 2, … , n. Then δ

δ

δ

gcd(a, b) = p1 1 p2 2 ⋯ pnn

with δi = min(αi , βi ) for i = 1, 2, … , n.

Proof. Given the prime decompositions of a and b it follows that each common divisor of a and b is of the form γ

γ

γ

p1 1 p22 ⋯ pnn The result then follows.

with 0 ≤ γi ≤ min(αi , βi ) for i = 1, 2, … , n.

26 | 2 Division and factorization in the integers Example 2.21. Let us find the gcd(12, 30). Here 12 = 22 ⋅ 31 ⋅ 50

and

30 = 21 ⋅ 31 ⋅ 51 .

It follows that gcd(12, 30) = 21 ⋅ 31 ⋅ 50 = 6. Theorem 2.20 provides an easy way to calculate the gcd(a, b) if we know the prime factorization of a and b. In general, however, it is difficult to find those prime factorizations. A more efficient method to find the greatest common divisor uses the Euclidean algorithm that we give in the next section.

2.3 The Euclidean algorithm The Euclidean algorithm is the process given in Theorem 2.24. It will give a general method to calculate the gcd(a, b). First we need the following result. Let a, b ∈ ℕ. Then by Theorem 2.13 there exists q and r ∈ ℤ with a = qb + r and 0 ≤ r < b. Lemma 2.22. gcd(a, b) = gcd(b, r). Proof. Since r = a − qb each common divisor of a and b is also a divisor of r. On the other side each common divisor of b and r is also a divisor of a. Hence, Ta ∩ Tb = Tb ∩ Tr , and gcd(a, b) = gcd(b, r). Corollary 2.23. From Lemma 2.22 we see that gcd(a, b) = b if b ∣ a. Theorem 2.24 (Euclidean algorithm). Let a, b ∈ ℕ, a > b and b ∤ a. Using the division algorithm we form the following scheme by repeating divisions: a = q1 b + r1

with 0 ≤ r1 < b,

r1 = q3 r2 + r3

with 0 ≤ r3 < r2 ,

b = q2 r1 + r2

with 0 ≤ r2 < r1 ,

⋮ rn−2 = qn rn−1 + rn

rn−1 = qn+1 rn (+ rn+1 )

with 0 ≤ rn < rn−1 , with rn+1 = 0.

This algorithm terminates, that is, there is a last nonzero remainder rn . This last nonzero remainder is the gcd(a, b).

2.3 The Euclidean algorithm

| 27

Proof. Since the remainders are non-negative and become smaller in each step it follows that we must get a remainder 0 after finitely many steps. From Lemma 2.22 we get gcd(a, b) = gcd(b, r1 ) = gcd(r1 , r2 ) = ⋯ = gcd(rn−2 , rn−1 ) = gcd(rn−1 , rn ) = rn , because rn ∣ rn−1 . Example 2.25. gcd(525, 231) = 21 because 525 = 2 ⋅ 231 + 63, 231 = 3 ⋅ 63 + 42, 63 = 1 ⋅ 42 + 21, 42 = 2 ⋅ 21 + 0. Remark 2.26. For the scheme we certainly also may start to divide the smaller number by the bigger one. In this case the first step just gives a permutation of a and b. Further it follows from the procedure that the last nonzero remainder rn can be expressed as a linear combination of a and b by successively eliminating the ri ’s in the intermediate equations. To express rn as a linear combination of a and b notice first that rn = rn−2 − qn rn−1 . Substituting this in the immediately preceding division, we get rn = rn−2 − qn (rn−3 − qn−1 rn−2 ) = (1 + qn qn−1 )rn−2 − qn rn−3 . Doing this successively, we ultimately express rn as a linear combination of a and b. This leads directly to the following. Theorem 2.27 (Lemma of E. Bézout (1730–1783)). Let a, b, d be nonzero integers. The equation ax + by = d is solvable with x, y ∈ ℤ if and only if gcd(a, b) ∣ d. Proof. Let ax + by = d for x, y ∈ ℤ. If y = 0, then x ≠ 0 because d ≠ 0, a ∣ d and gcd(a, b) ∣ d because gcd(a, b) ∣ a. Analogously we argue with x = 0. Now, let x ≠ 0 ≠ y. Then certainly gcd(a, b) ∣ d. Now, assume that gcd(a, b) ∣ d. Then d = gcd(a, b) ⋅ t for some t ∈ ℤ, t ≠ 0, because d ≠ 0.

28 | 2 Division and factorization in the integers Then there exist from the Euclidean algorithm x1 , y1 ∈ ℤ with ax1 + by1 = gcd(a, b), and we get ax1 t + by1 t = gcd(a, b) ⋅ t = d. Bézout’s Lemma leads to the following alternative characterization of the gcd(a, b). Corollary 2.28. The greatest common divisor of two integers a, b is the smallest positive linear combination of a and b. Corollary 2.29. Two integers a, b are relatively prime if and only if 1 can be written as a linear combination of a and b. Example 2.30. We saw that gcd(525, 231) = 21. We show how to express 21 as a linear combination of 525 and 231: 21 = 63 − 1 ⋅ 42 = 63 − (231 − 3 ⋅ 63) = −231 + 4 ⋅ 63 = −231 + 4 ⋅ (525 − 2 ⋅ 231) = 4 ⋅ 525 − 9 ⋅ 231. Remark 2.31. Let a, b ∈ ℤ, a ≠ 0 or b ≠ 0. It is possible to calculate algorithmically the gcd(a, b) and the corresponding linear combination simultaneously. This is called often the extended Euclidean algorithm. Consider the following steps: b a (1) If a = 0 and b ≠ 0 then output (0, |b| , |b|), if a ≠ 0 and b = 0 then output ( |a| , 0, |a|) and stop. (2) If a ≠ 0 ≠ b form the triples (c0 , d0 , e0 ) = (

a , 0, |a|) and |a|

(c1 , d1 , e1 ) = (0,

b , |b|). |b|

(3) Check if e1 ≤ e0 . If this is not the case, then interchange the triples (c0 , d0 , e0 ) and (c1 , d1 , e1 ) and make a renaming. (4) Write e0 = qe1 + r with q ∈ ℤ and 0 ≤ r < e1 (Division algorithm). Form the triple (c2 , d2 , e2 ) = (c0 − qc1 , d0 − qd1 , r). (5) Replace (c0 , d0 , e0 ) by (c1 , d1 , e1 ) and (c1 , d1 , e1 ) by (c2 , d2 , e2 ). (6) Repeat (4) and (5) until e1 = 0. Then output (c0 , d0 , e0 ).

2.3 The Euclidean algorithm

| 29

This is an algorithm which calculates a triple (c, d, e) with e = gcd(a, b) and e = c ⋅ a + d ⋅ b. This we can see as follows: Let a ≠ 0 ≠ b (step (1) with a = 0 or b = 0 is clear). Certainly the procedure stops after finitely many steps because r < e1 . The procedure is correct. If we consider only the third components ei , then the usual Euclidean algorithm proceeds. Hence, it is e = gcd(a, b). In each step we have c0 a + d0 b = e0

and

c1 a + d1 b = e1 .

(1) At the beginning we have a ⋅ a + 0 ⋅ b = |a| |a|

and

0⋅a+

b ⋅ b = |b|. |b|

(2) In step (4) we have c2 ⋅ a + d2 ⋅ b = (c0 − qc1 )a + (d0 − qd1 )b

= (c0 a + d0 b) − q(c1 a + d1 b)

= e0 − qe1 = r = e2 . Hence, the final result satisfies the equation

ca + db = e = gcd(a, b). We are now able to prove a second version of Euclid’s Lemma which extends Corollary 2.11 and its proof is independent of that of Corollary 2.11. Theorem 2.32 (Euclid’s Lemma – second version). Let a, b, c ∈ ℤ, a ≠ 0, b ≠ 0, c ≠ 0. Let a ∣ bc and gcd(a, b) = 1. Then a ∣ c. Proof. Since gcd(a, b) = 1 there exist x, y ∈ ℤ with ax + by = 1. Then acx + bcy = c. Now, a ∣ a and a ∣ bc, and hence, a ∣ c. Almost the same proof provides the following corollary. Corollary 2.33. Let a, b, c ∈ ℤ, a ≠ 0, b ≠ 0, c ≠ 0. Let a ∣ c, b ∣ c and gcd(a, b) = 1. Then ab ∣ c. Remark 2.34. We now consider the equation ax + by = d, d = gcd(a, b), a, b ∈ ℤ, a ≠ 0 ≠ b, and ask for all the possible solutions x, y ∈ ℤ. Without any loss of generality we may assume that d = 1, because if d > 1 then we may divide by d and get the equation da x + db y = 1.

30 | 2 Division and factorization in the integers Let x1 , y1 and x2 , y2 integer solutions of the equation ax + by = 1. Then ax1 + by1 = 1 and ax2 + by2 = 1. Subtraction leads to a(x1 − x2 ) = −b(y1 − y2 ). Since gcd(a, b) = 1 we get b ∣ (x1 − x2 ) by Theorem 2.32. This gives x2 = x1 + bt for some t ∈ ℤ. Then we get ax1 + by1 = a(x1 + bt) + by2 , hence, by1 = abt + by2 . Since b ≠ 0 we get y2 = y1 − at. Hence, all integral solutions of the equation ax + by = 1 are given by x2 = x1 + bt and y2 = y1 − at for one parameter t ∈ ℤ, where x1 , y1 is a special solution of ax + by = 1. Such a special solution x1 , y1 we may get with the Euclidean algorithm.

2.4 Least common multiples We now describe the least common multiple lcm(a, b) of two integers a, b. Since 0 ∣ a if and only if a = 0 the least common multiple of a and b is only reasonable if a ≠ 0 ≠ b. The only definition which makes some sense could be lcm(a, b) = 0 if and only if a = 0 or b = 0. From now on let a ≠ 0 ≠ b. Definition 2.35. Let a, b ∈ ℤ, a ≠ 0 ≠ b. The least common multiple lcm(a, b) of a and b is the natural number m = lcm(a, b) with the properties: (i) a ∣ m and b ∣ m. (ii) If a ∣ m′ and b ∣ m′ for some integers m′ then m ∣ m′ . In Theorem 2.20 we characterized gcd(a, b) in terms of the prime decompositions. We now do the same for lcm(a, b). Theorem 2.36. Let a, b ∈ ℕ. We consider the prime decomposition of a and b: α

α

α

β

β

β

a = p1 1 p2 2 ⋯ pnn and b = p1 1 p2 2 ⋯ pnn , where the pi are pairwise distinct prime numbers and αi , βi ≥ 0 for i = 1, 2, … , n. Then δ

δ

δ

with δi = min(αi , βi ) for i = 1, 2, … , n,

δ

δ

δ

with δi = max(αi , βi ) for i = 1, 2, … , n.

gcd(a, b) = p1 1 p2 2 ⋯ pnn and lcm(a, b) = p1 1 p2 2 ⋯ pnn

2.4 Least common multiples | 31

Proof. As for the greatest common divisor this follows directly from the definition. The Euclidean algorithm gives the technique for calculating the greatest common divisor. The following result uses this to calculate the least common multiple. Theorem 2.37. Let a, b ∈ ℕ. Then ab = gcd(a, b) ⋅ lcm(a, b). Proof. We give the proof with help of the prime factorization. Let α

α

α

a = p1 1 p2 2 ⋯ pnn

β

β

β

and b = p1 1 p2 2 ⋯ pnn ,

where n ∈ ℕ0 , the p1 , p2 , … , pn pairwise distinct prime numbers and all αi , βi ≥ 0. Let γi = min(αi , βi )

and δi = max(αi , βi ),

for all i = 1, 2, … , n. Then certainly δ

δ

δ

γ

γ

γ

lcm(a, b) = p1 1 p2 2 ⋯ pnn . We already know that gcd(a, b) = p1 1 p22 ⋯ pnn . Now, min(αi , βi ) + max(αi , βi ) = γi + δi = αi + βi , for i = 1, 2, … , n, and we get γ +δ γ +δ1 γ2 +δ2 p2 ⋯ pnn n

gcd(a, b) ⋅ lcm(a, b) = p1 1

α +β α +β1 α2 +β2 p2 ⋯ pnn n

= p1 1

= ab.

Remark 2.38. It is in general difficult to get the prime factorization of a natural number a. However if a, b ∈ ℕ then the Euclidean algorithm produces the gcd(a, b), maybe after many steps. Then lcm(a, b) =

ab , gcd(a, b)

and we may calculate the lcm(a, b). Example 2.39. Let a = 270 and b = 2 412. The Euclidean algorithm shows that it is 412 gcd(270, 2 412) = 18. Therefore lcm(270, 2 412) = 270⋅2 = 36 180. 18 Before we continue with the concept of the greatest common divisor and the least common multiple in general we want to make some comments about the Fundamental Theorem of Arithmetic. There are many proofs of this result. Most of the proofs are

32 | 2 Division and factorization in the integers distinguished by whether they use Euclid’s Lemma or not. The proofs for the existence of a prime factorization are more or less the same. Essentially they use the existence of a smallest prime factor. If we look at the two proofs given for the uniqueness part, people often argue that the first is a bit tricky and the second is a bit complicated for school mathematics because the Euclidean algorithm is not a school subject. Hence, we present here an alternative proof for the uniqueness of the prime decomposition part that only uses the first induction principle and which could be presented in a school setting. The proof was given to us by E. Wittmann in a private communication. For the proof we just need two harmless observations which essentially follow directly from the definition: (1) Let a, b ∈ ℕ. Then lcm(a, b) ∣ ab. More generally, if a ∣ n and b ∣ n for n ∈ ℕ then lcm(a, b) ∣ n. (2) For two distinct prime numbers p, q we have lcm(p, q) = pq. We use these two observations now to present a proof for the uniqueness of the prime factorization with induction from k to k + 1. The uniqueness is certainly correct for prime numbers, these have a prime factorization with just one prime factor, and a prime number cannot be simultaneously a composite number. So the statement holds for k = 1. Let k ≥ 1 be a natural number with the property, that the prime factorization of all natural numbers, which have at least one prime factorization with k prime factors, is unique (up to the ordering). We show that then also the prime factorization of all natural numbers, which have at least one prime factorization with k + 1 prime factors, is unique. Then the uniqueness part of the Fundamental Theorem of Arithmetic is proved by the first induction principle. Let z ∈ ℕ be a number with a prime factorization p1 p2 ⋯ pk pk+1 with k + 1 prime factors, and let q1 q2 ⋯ qm−1 qm be a second one of z with m prime factors. By our induction hypothesis we may assume that m ≥ k + 1 because if we would have m ≤ k we would get a contradiction to the uniqueness for k prime factors. We show that both decompositions are equal (up to the ordering). We have p1 p2 ⋯ pk pk+1 = z = q1 q2 ⋯ qm−1 qm .

(2.1)

(1) If p1 = q1 then we may divide both sides by p1 and get the number pz with the prime 1 factorizations p2 ⋯ pk pk+1 in k prime factors and q2 ⋯ qm−1 qm in m − 1 prime factors. Here the induction hypothesis works, especially m = k + 1 and both decompositions in equation (2.1) are equal up to the ordering.

2.5 General gcd’s and lcm’s | 33

(2) Let p1 ≠ q1 . Then p1 q1 = lcm(p1 , q1 ). The number z is a common multiple of p1 and q1 . Hence, p1 q1 t = z = p1 p2 ⋯ pk pk+1 , for some t ∈ ℕ. If we divide both sides by p1 we get p2 ⋯ pk pk+1 =

z = q1 t, p1

with k prime factors for the left prime factorization. By the induction hypothesis we have q1 = pi for some i > 1. By interchanging of the indices 1 and i we have case (2) reduced to case (1), and we are done.

2.5 General gcd’s and lcm’s We extend the concept of the greatest common divisor and least common multiple. To do this we need some more algebraic properties of the integers. We start with a general definition and repeat by this way some properties. Definition 2.40. Let R be a commutative ring with unity element 1. (1) R is called an integral domain if the following holds: 1 ≠ 0 and ab = 0 ⇒ a = 0 or b = 0 for a, b ∈ R. We say then that R has no zero divisors. (2) Let ∅ ≠ S ⊂ R. The set S is called a subring of R if a + b ∈ S,

a−b∈S

and

ab ∈ S,

for all a, b ∈ S. (3) Let ∅ ≠ I ⊂ R. The set I is called an ideal of R if (a) I is a subring of R and (b) ra ∈ I for all a ∈ I, r ∈ R. If I is an ideal of R, then we often write I ◁ R. Definition 2.41. Let R be an integral domain. R is called a principal ideal domain if each ideal I is generated by some element a ∈ R, that is, I = (a) ∶= {ra ∣ r ∈ R}, for some a ∈ R.

34 | 2 Division and factorization in the integers We know that ℤ is an integral domain. Using the division algorithm we get the following. Theorem 2.42. The set ℤ is a principal ideal domain, that is, if I ⊂ ℤ is an ideal in ℤ, then there exists an n ∈ ℕ0 with I = (n) = {zn ∣ z ∈ ℤ}. In particular if U is a subgroup of the additive group ℤ, then U = {zn ∣ z ∈ ℤ} for some n ∈ ℕ0 . Proof. Let I be an ideal in ℤ. If I = {0}, then I = (0) = {z ⋅ 0 ∣ z ∈ ℤ}, hence I is a principal ideal. Now, let I ≠ {0}. Then there exists an n ∈ ℤ, n ≠ 0, with n ∈ I. We may assume that n ∈ ℕ because if n ∈ I then also −n ∈ I. We also may assume that n is the smallest natural number which belongs to I using the well ordering of ℕ, that is, each nonempty subset of ℕ has a smallest element (see Chapter 1 Theorem 1.7). Now, let m ∈ I be arbitrary. Again, without loss of generality, let m ∈ ℕ. By the division algorithm there exist q, r ∈ ℤ with m = qn + r and 0 ≤ r < n. Assume that r ≠ 0. Since n ∈ I, qn ∈ I, m ∈ I we get that (m − qn) ∈ I, and hence r ∈ I. This contradicts the minimality of n. Hence, r = 0 and then m = qn, that is, m ∈ (n) = {zn ∣ z ∈ ℤ}. Since m ∈ I was arbitrary we get I = (n). The additional statement about subgroups of ℤ is now clear because in n ∈ U then also zn ∈ U for all z ∈ ℤ. Remark 2.43. Recall that a nonempty subset U of a group G is a subgroup of G if U itself is a group with the restriction of the binary operation for G to U. In case of addition + (multiplication ⋅) this just means equivalently that (a − b) ∈ U (ab−1 ∈ U) if a, b ∈ U. We now extend the concepts for the greatest common divisor and the least common multiple to more numbers. Let a1 , a2 , … , an ∈ ℤ, n ≥ 2. If, for instance, a1 = 0 then we would define gcd(a1 , a2 , … , an ) ∶= gcd(a2 , … , an ). If one ai is 0 then the least common multiple is anyway not meaningful. Hence, we assume from now on that ai ≠ 0 for i = 1, 2, … , n. Definition 2.44. Let a1 , a2 , … , an ∈ ℤ ⧵ {0}, n ≥ 2. A natural number d is called the greatest common divisor gcd(a1 , a2 , … , an ) of a1 , a2 , … , an if the following hold: (i) d ∣ ai for all i = 1, 2, … , n. (ii) If δ ∣ ai for all i = 1, 2, … , n, δ ∈ ℤ, then δ ∣ d.

2.5 General gcd’s and lcm’s | 35

Remark 2.45. The gcd(a1 , a2 , … , an ) exists by Theorem 2.42, and for n ≥ 3 one may calculate it in steps via gcd(a1 , a2 , … , an ) = gcd(a1 , gcd(a2 , … , an )). We also get that we may write d = gcd(a1 , a2 , … , an ) as a linear combination of a1 , a2 , … , an , that is, there exist integers x1 , x2 , … , xn with d = x1 a1 + x2 a2 + ⋯ + xn an . Example 2.46. We have gcd(144, 160, 175) = 1. We can calculate first gcd(144, 160) = 16

and then

gcd(16, 175) = 1. Primarily we get 16 = 160 − 144, and then 1 = 11 ⋅ 16 − 175, which leads to 1 = −11 ⋅ 144 + 11 ⋅ 160 − 1 ⋅ 175. If one knows the prime factorization of a1 , a2 , … , an we certainly may calculate gcd(a1 , a2 , … , an ) analogously as for two numbers. Definition 2.47. Let a1 , a2 , … , an ∈ ℤ ⧵ {0}, n ≥ 2. The least common multiple lcm(a1 , a2 , … , an ) of a1 , a2 , … , an is the natural number m = lcm(a1 , a2 , … , an ) with the properties: (i) ai ∣ m for all i = 1, 2, … , n. (ii) If ai ∣ m′ for m′ ∈ ℤ and i = 1, 2, … , n, then m ∣ m′ . We also have here the formula a1 a2 ⋯ an = gcd(a1 , a2 , … , an ) ⋅ lcm(a1 , a2 , … , an ), and we may calculate lcm(a1 , a2 , … , an ) by lcm(a1 , a2 , … , an ) =

a1 a2 ⋯ an . gcd(a1 , a2 , … , an )

36 | 2 Division and factorization in the integers

Exercises 1.

Prove the following properties for a, b, c ∈ ℤ: (1) a ∣ b and b ∣ c ⇒ a ∣ c (transitivity). (2) c ∣ a and c ∣ b ⇒ c ∣ (k1 a + k2 b) for all k1 , k2 ∈ ℤ. (3) ±a ∣ a and ±1 ∣ a for all a ∈ ℤ. (4) a ∣ 0 for all a ∈ ℤ. (5) 0 ∣ a ⇔ a = 0. (6) a ∣ b and b ∣ a ⇒ a = ±b. 2. Use the Sieve of Eratosthenes to find all prime numbers between 1 and 200. 3. Let a, b ∈ ℕ ⧵ {1} with gcd(a, b) = 1. Show that, if c2 = ab then a and b are square numbers, that is, it exist a1 , b1 ∈ ℕ with a = a21 and b = b21 . 4. Give a definition of the greatest common divisor gcd(a1 , a2 , … , an ), n ≥ 3 and ai ≠ 0, with the help of sets of positive divisors analogously to Definition 2.16. With this definition calculate the greatest common divisor for (a) a1 = 21, a2 = 33, a3 = 84; (b) a1 = 350, a2 = 165, a3 = 105, a4 = 55; (c) a1 = 14, a2 = 42, a3 = 154, a4 = 98; (d) a1 = 43, a2 = 79, a3 = 113, a4 = 137, a5 = 28, a6 = 53, a7 = 19. 5. Calculate for each of the following pairs of integers the greatest common divisor with the Euclidean algorithm and also the least common multiple. Write the greatest common divisor as a linear combination of these numbers: (a) 12 345 and 123; (b) 4 004 and 1 547; (c) 2 431 and 12 673. 6. Prove for each of the following equations if it is solvable with x, y ∈ ℤ: (a) 12 345x + 123y = 735; (b) 12 345x + 123y = 385; (c) 1 496x + 120y = 108; (d) 1 890x + 23y = 88; (e) 525x + 3 465y = 24 255. In the case of solvability calculate x, y ∈ ℤ. 7. We know that ℤ is a principal ideal domain: (a) Find 5 elements, which are elements of the ideal I = (6) in ℤ. (b) Find an ideal I ≠ ℤ in ℤ (with the help of one generating element), for which 18 ∈ I and 51 ∈ I. (c) Find an ideal I ≠ ℤ in ℤ (with the help of one generating element), for which 105, 231, −147 ∈ I. (d) In general, find an ideal I (with the help of one generating element), for which ai ∈ I, with a1 , a2 , … , an ∈ ℤ and n ≥ 1.

Exercises | 37

8. For a ∈ ℤ let (a) denote the principal ideal generated by a. That is, (a) = {ax ∣ x ∈ ℤ}. 9.

Show that if n, m > 0 then (n) ∩ (m) = (d) where d = lcm(n, m). Let n, m > 0 be integers. (a) Show that {nx + my ∣ x, y ∈ ℤ} is an ideal in ℤ. This is called the ideal generated by n, m and we denote this by (n, m). (b) Show that (n, m) = (k)

where k = gcd(n, m).

3 Modular arithmetic 3.1 The ring of integers modulo n In this chapter we consider modular arithmetic. This type of arithmetic is fundamental in number theory and is crucial for cryptography. To do modular arithmetic we must introduce a new class of rings, the rings of integers modulo n, one for each natural number n ≠ 1. These are constructed from the following equivalence relation on the integers ℤ, called congruence modulo n. Let n ∈ ℕ. If x, y ∈ ℤ such that n ∣ (x − y), then we call x congruent to y modulo n and write x≡y

mod n.

If n ∤ (x − y), then x and y are called incongruent modulo n and we write x≢y

mod n.

Hence, if x ≡ y mod n, then nk = x − y for some k ∈ ℤ, and therefore we also say that y is a remainder or residue of x modulo n. If x ∈ ℤ is given, then x = {y ∈ ℤ ∣ x ≡ y

mod n}

is called the residue class of x modulo n. Theorem 3.1. The congruence relation ≡ is an equivalence relation on ℤ. Therefore the residue classes partition ℤ. The residue classes 0, 1, … , (n − 1) form a complete set of residue classes. Proof. Certainly, the congruence relation is reflexive and symmetric because x ≡ x mod n and y ≡ x mod n if x ≡ y mod n. From x − z = x − y + y − z, that is, n ∣ (x − z) if n ∣ (x − y) and n ∣ (y − z), we get that ≡ is transitive. The rest follows from the division algorithm x = qn + r and 0 ≤ r < n, and if 0 ≤ r1 < r2 < n then r1 ≢ r2 mod n. If {x1 , x2 , … , xn } is a complete set of residue classes, then we say, that {x1 , x2 , … , xn } is a complete set of representatives modulo n. Especially, {0, 1, … , n − 1} is a complete set of representatives modulo n. Theorem 3.2. Let a ∈ ℤ with gcd(a, n) = 1. If {x1 , x2 , … , xn } is a complete set of representatives modulo n, then also {ax1 , ax2 , … , axn }. Proof. Let axi ≡ axj mod n. Then nk = a(xi − xj ) for some k ∈ ℤ. Since gcd(a, n) = 1 we have n ∣ (xi − xj ) by Euclid’s Lemma, (see Theorem 2.32). Then xi ≡ xj mod n, and therefore xi = xj . Hence, the axi are pairwise incongruent modulo n and form a complete set of representatives modulo n. DOI 10.1515/9783110516142-003

40 | 3 Modular arithmetic Lemma 3.3. If x ≡ y mod n, then gcd(x, n) = gcd(y, n). Proof. Let x − y = kn for some k ∈ ℤ. Each common divisor of x and n is also one of y and n. Theorem 3.4. Let n ∈ ℕ and {x1 , x2 , … , xn } be a complete set of residue classes modulo n. Then the set {x1 , x2 , … , xn } together with the addition defined by xi + xj ∶= xi + xj and the multiplication defined by xi ⋅ xj ∶= xi ⋅ xj forms a commutative ring with zero element 0 and unity element 1. This ring is called the residue class ring modulo n or the ring of integers modulo n and is denoted by ℤn or ℤ/nℤ. Proof. We only have to show that the addition and multiplication are well defined. The ring properties follow automatically from ℤ to ℤn . Let x ≡ x′ mod n and y ≡ y′ mod n. Then x = x′ and y = y′ , and there are k, l ∈ ℤ with x′ = x + kn and y′ = y + ln. With this we get x′ + y′ = x + kn + y + ln = (x + y) + (k + l)n = x + y and x′ ⋅ y′ = (x + kn) ⋅ (y + ln) = xy + n(xl + yk + kln) = x ⋅ y. Therefore addition and multiplication are independent from the representatives of the residue classes, and hence, well defined. Recall that a commutative ring R with unity element 1 (≠ 0) is called an integral domain if there are no zero divisors, that is, elements a, b with a ≠ 0 and b ≠ 0 but ab = 0. Further a commutative ring R with unity element 1 (≠ 0) is called a field, if in addition each nonzero element has a multiplicative inverse. This is equivalent to that the set R∗ = R ⧵ {0} is a group with respect to the multiplication in R. If R is a field we usually denote R as K. Lemma 3.5. A field K is an integral domain. Proof. Let K be a field. It is by definition a commutative ring with unity element 1 (≠ 0) so we must show that there are no zero divisors. Let ab = 0 with a, b ∈ K. If a ≠ 0 then a has a multiplicative inverse a−1 . Then a−1 (ab) = 0 ⇒ (a−1 a)b = 1 ⋅ b = b = 0. Therefore there are no zero divisors.

3.1 The ring of integers modulo n

|

41

In general an integral domain need not be a field. The integers ℤ are an example. However the modular ring ℤn is a field if and only if it is an integral domain precisely when n is a prime number. This is embodied in the next two results. Theorem 3.6. Let n ∈ ℕ, n ≥ 2. The ring ℤn is an integral domain if and only if n is a prime number. Proof. The ring ℤn is a commutative ring with a unity element. Therefore it is an integral domain if an only if it has no zero divisors. Suppose first that n is not prime. Then n = n1 n2 with 1 < n1 < n, 1 < n2 < n. Then in ℤn the corresponding elements n1 , n2 are both nonzero since neither can be a multiple of n. However in ℤn = n1 n2 = n = 0 and therefore both n1 and n2 are zero divisors. It follows that if n is not a prime number then ℤn cannot be an integral domain. Now suppose that n = p is a prime number. We show that ℤp is an integral domain. Suppose that ab = 0 in ℤp . Considered as integers, the product of a and b must then be a multiple of p and hence p ∣ ab. From Euclid’s Lemma then either p ∣ a or p ∣ b. If p ∣ a then a is a multiple of p and therefore a = 0 in ℤp . Identically if p ∣ b then

b = 0 in ℤp . Hence if p is a prime number then ℤp has no zero divisors and therefore is an integral domain. Theorem 3.7. Let n ∈ ℕ, n ≥ 2. The set ℤn is a field if and only if n is a prime number. Proof. Suppose that ℤn is a field. Then from Lemma 3.5 and Theorem 3.6 we get that n must be a prime. Conversely let n be a prime number. To show that ℤn is a field we must show that each residue class a ≠ 0 has a multiplicative inverse, that is, an x ≠ 0 with a⋅x = x ⋅a = 1. Let a ≠ 0. Then n ∤ a and therefore gcd(a, n) = 1 because n is a prime number. Therefore there exist x, y ∈ ℤ with xa + yn = 1 by the Euclidean algorithm. Hence, x ⋅ a = 1, and x is the multiplicative inverse of a. Putting together Theorems 3.6 and 3.7 we obtain: Corollary 3.8. Let n ∈ ℕ, n ≥ 2. Then ℤn is an integral domain if and only if ℤn is a field. We remark that in any case a finite integral domain R always is a field. This can be seen as follows. If R has only the two elements 0 and 1 then R is always a field. In fact, up to the writing, R = ℤ2 . Now assume that R contains more than two elements. Let {0, 1, a1 , … , an } be the elements of R. Fix one element ak . Then ak ai ≠ ak aj for i ≠ j because R is an integral domain. Hence, there must be an i with ak ai = 1, that is, ak has the multiplicative inverse ai . Hence, {0, 1, a1 , … , an } forms a multiplicative group. Therefore R is a field. Another corollary of these results is known as Wilson’s Theorem, which is named after J. Wilson (1741–1793).

42 | 3 Modular arithmetic Corollary 3.9 (Wilson’s Theorem). Let n ∈ ℕ, n ≥ 2. Then n is a prime number if and only if (n − 1)! ≡ −1 mod n. Proof. Let n = p be a prime number. By Theorem 3.7 we have that ℤp is a field. In particular, for each x ∈ ℤ∗p there exists one y ∈ ℤ∗p with x ⋅ y = 1. We have x = y if and only if x2 − 1 = (x + 1)(x − 1) = 0. Since ℤp is a field, if x = y we must have x = 1 or x = −1. It follows (p − 1)! = (p − 1) ⋅ (p − 2) ⋯ 2 ⋅ 1 = (p − 1) ⋅ 1 = −1. Conversely assume that (n − 1)! ≡ −1 mod n holds. Suppose that n is not a prime number. Then n has a prime divisor p, that is, n = p ⋅ m with p < n and p a prime number. Then p ∣ (n − 1)!, and therefore (n − 1)! ≡ 0 mod p. But we have (n − 1)! ≡ −1 mod n which implies 0 ≡ −1 mod p. This provides a contradiction. Therefore n is a prime number. The next result, which is called Fermat’s Little Theorem, named after P. de Fermat (1607–1665), becomes very important in more complicated primality testing. Theorem 3.10 (Fermat’s Little Theorem). Let p be a prime number and a ∈ ℤ with gcd(a, p) = 1, that is, p ∤ a. Then ap−1 ≡ 1

mod p

and hence, ap−1 = 1 in ℤp . Proof. The set {0, 1, 2, … , p − 1} is a complete set of representatives modulo p. By Theorem 3.2 also {0, a, 2a, … , (p − 1)a} is a complete set of representatives modulo p. It follows 1 ⋅ 2 ⋯ (p − 1) = a ⋅ 2a ⋯ (p − 1)a = ap−1 ⋅ 1 ⋅ 2 ⋯ (p − 1). By Corollary 3.9 we have 1 ⋅ 2 ⋯ (p − 1) = −1, and hence −1 = −ap−1 , that is, ap−1 = 1, and therefore ap−1 ≡ 1 mod p. Remark 3.11. If p ∣ a then a = 0, and therefore 0p ≡ 0 mod p. Hence, we have in general: If p is a prime number and a ∈ ℤ, then ap ≡ a mod p.

3.2 Units and the Euler φ-function |

43

3.2 Units and the Euler φ-function We now introduce a function which, given a natural number n, counts the integers less than n which are relatively prime to n. Definition 3.12. Let n ∈ ℕ. The function φ ∶ ℕ → ℕ, φ(n) = #{a ∈ ℕ ∣ 1 ≤ a ≤ n and gcd(a, n) = 1}, is called the Euler φ-function, which is named after L. Euler (1707–1783). Examples 3.13. (1) φ(1) = 1, φ(2) = 1. (2) φ(6) = 2, since from the numbers 1, 2, 3, 4, 5, 6 only 1 and 5 are relatively prime to 6. (3) φ(8) = 4, since for the numbers a with 1 ≤ a ≤ 8 only 1, 3, 5 and 7 are relatively prime to 8. Since for a prime number p its only positive divisors are 1, p it follows directly that φ(p) = p − 1

for a prime number p.

For powers pm of a prime number p its only divisors are pk , 0 ≤ k ≤ m, and therefore for a prime number p and m ∈ ℕ we have 1 φ(pm ) = pm − pm−1 = pm (1 − ). p To see this last statement notice that from the numbers 1, 2, 3, … , p, p + 1, … , 2p, … , pm there are exactly p, 2p, … , pm−1 ⋅ p = pm not relatively prime to pm . Finally notice that if n = pq with p and q two different prime numbers then φ(pq) = φ(p)φ(q) = (p − 1)(q − 1). This we see as follows. From the numbers 1, 2, … , pq are exactly p, 2p, … , qp and q, 2q, … , pq not relatively prime to pq. Here we counted pq twice. Hence, φ(pq) = pq − p − q + 1 = (p − 1)(q − 1) = φ(p)φ(q). Remark 3.14. We later show as an application of the Chinese Remainder Theorem (Theorem 3.27), that φ is a multiplicative number theoretical function, that is, we have φ(nm) = φ(n)φ(m) if gcd(n, m) = 1. Definition 3.15. Let n ∈ ℕ. A residue class a ∈ ℤn is called a unit of ℤn , if there exists a b ∈ ℤn with a ⋅ b = 1. Remark 3.16. The residue class a ∈ ℤn is a unit of ℤn if and only if gcd(a, n) = 1.

44 | 3 Modular arithmetic Proof. (1) a ⋅ b = 1 ⇒ ab + nx = 1 for some x ∈ ℤ ⇒ gcd(a, n) = 1 (see Chapter 2, Theorem 2.27). (2) gcd(a, n) = 1 ⇒ there exist x, y ∈ ℤ with ax + ny = 1 ⇒ a ⋅ x = 1, and hence a is a unit of ℤn . By definition we therefore have exactly φ(n) units of ℤn . This gives the following. Theorem 3.17. Let n ∈ ℕ, n ≥ 2. The units of ℤn form a multiplicative group ℤ∗n of order φ(n). Hence, ℤ∗n = {a ∈ ℤn ∣ gcd(a, n) = 1}. Proof. We already know that φ(n) is the number of units of ℤn . Let a, b be units of ℤn , that is, there exist x and y in ℤn with a ⋅ x = 1, b ⋅ y = 1. Then a ⋅ b is a unit of ℤn because a ⋅ b ⋅ x ⋅ y = 1. This means that the units of ℤn form a multiplicative group ℤ∗n . Definition 3.18. A residue class a ∈ ℤ∗n is called a prime residue class modulo n. Theorem 3.19 (Euler’s Theorem). Let n ∈ ℕ, a ∈ ℤ with gcd(a, n) = 1. Then aφ(n) ≡ 1 mod n and hence, aφ(n) = 1 in ℤ∗n . Proof. The proof is analogous to those of Theorem 3.10. Let ℤ∗n = {x1 , x2 , … , xφ(n) }. As in Theorem 3.2 we also have ℤ∗n = {a ⋅ x1 , a ⋅ x2 , … , a ⋅ xφ(n) }. It follows x1 ⋅ x2 ⋯ xφ(n) = a ⋅ x1 ⋯ a ⋅ xφ(n) = aφ(n) x1 ⋯ xφ(n) . We multiply this equation with the multiplicative inverse of x1 ⋅ x2 ⋯ xφ(n) and get

aφ(n) = 1.

Corollary 3.20 (Fermat’s Little Theorem). If p is a prime number, a ∈ ℤ with p ∤ a, then ap−1 ≡ 1 mod p. Proof. We have φ(p) = p − 1. Corollary 3.21. Let n ∈ ℕ, n ≥ 2, a ∈ ℤ∗n and d the order of a in ℤ∗n , that is, the smallest natural number d with ad = 1. Then d ∣ φ(n). If especially n = p is a prime number, then d ∣ (p − 1).

3.2 Units and the Euler φ-function

| 45

Proof. Since ℤ∗n is finite, there exists such a minimal d. By Theorem 3.19 we have aφ(n) = 1

in ℤ∗n .

Because of the minimality of d we get d ≤ φ(n). The division algorithm gives integers q, r with φ(n) = q ⋅ d + r with 0 ≤ r < d. From aφ(n) = ad = 1 we get aφ(n)−q⋅d = ar = 1. By the minimality of d we therefore have r = 0, that is, φ(n) = qd and so d ∣ φ(n). Recall that a finite (multiplicatively written) group G is called cyclic, if there exists a g ∈ G with G = {1, g, … , g n−1 }, where n = |G| is the order of G. We write G = ⟨g⟩. Theorem 3.22. Let p be a prime number. Then ℤ∗p is a cyclic group of order p − 1. Proof. We first remark that |ℤ∗p | = p − 1 and that ℤp is a field. Let a, b ∈ ℤ∗p with the

orders r and s, respectively. Then ab has order d = lcm(r, s) because d is the smallest natural number with (ab)d = 1. Let ℤ∗p = {a1 , a2 , … , ap−1 }, and let di be the order of ai in ℤ∗p for i = 1, 2, … , p − 1. Let k = lcm(d1 , d2 , … , dp−1 ). Then k ∣ (p − 1) by Corollary 3.21 and ak = 1 for all a ∈ ℤ∗p . Moreover, from the construction we see that there exists a g ∈ ℤ∗p which has order k. We have to show that k = p − 1. Assume that k 2, and we must have that k is even because (p − 1) has order 2 in ℤ∗p . The equation xk − 1 = 0 has in ℤ∗p more than k solutions, namely p − 1 > k. By induction on the divisors of k the equation y2 − 1 = (y − 1)(y + 1) = 0 then finally must have more than two solutions which is impossible in the field ℤp because a field has no nontrivial zero divisors (see also the remark below for a slightly different argument). Hence, k = p − 1, and ℤ∗p is cyclic of order p − 1. In Chapter 6 on polynomials with coefficients from a field K we will see that a polynomial f (x) with coefficients of K and degree n ≥ 1 has in K at most n pairwise different zeros, that is, there are in K at most n pairwise different a ∈ K with f (a) = 0. This gives directly k = p − 1 in the proof of Theorem 3.22. Together with the other arguments in the proof of Theorem 3.22 we obtain the following. Corollary 3.23. Let K be a field and G a finite subgroup of the multiplicative group K ∗ = K ⧵ {0}. Then G is cyclic. If p is a prime number, since ℤ⋆p is cyclic, there is a g ∈ ℤ with ℤ∗p = {1, g, g 2 , … , g p−2 } = ⟨g⟩. Such a g is called a primitive root modulo p. Example 3.24. Let p = 7. The residue classes 1, 2, 3, 4, 5, 6 have in ℤ∗7 the orders 1, 3, 6, 3, 6, 2, respectively. Hence, 3 and 5 are primitive roots modulo 7.

46 | 3 Modular arithmetic There does not exist a general procedure to find for a prime number p a primitive root modulo p. But it is clear that there are φ(p − 1) primitive roots modulo p. This follows from the following. Lemma 3.25. Let G = ⟨g⟩ be a finite cyclic group of order n. Then g k with 0 < k < n is a generator of G if and only if gcd(k, n) = 1. Proof. Suppose gcd(k, n) = 1. Then there exist x, y ∈ ℤ with kx + ny = 1. It follows that g = g 1 = g kx+ny = g kx = (g k )x . Therefore g is a power of g k and hence, g k is a generator of G. Conversely, suppose that g k is also a generator for G. Then there exists a power x k of g such that g = (g k )x = g kx . Hence kx ≡ 1 mod n, and so k is a unit modulo n, which implies that gcd(n, k) = 1. For an arbitrary n ∈ ℕ, n ≥ 2 the group ℤ∗n of the units in ℤn is in general not cyclic. The group ℤ∗n is cyclic if and only if n = 2, 4, pk , 2pk , where p ≥ 3 is a prime number and k ∈ ℕ. A proof is given, for instance, in [10]. Examples 3.26. (1) ℤ∗9 = ⟨2⟩ because φ(9) = 6 and 2 has order 6 in ℤ∗9 . (2) For ℤ∗8 we have |ℤ∗8 | = 4, and each element a ∈ ℤ∗8 , a ≠ 1, has order 2. Therefore ℤ∗8 is not cyclic. In fact, ℤ∗8 is isomorphic to the Klein four group which has exactly four elements 1, a, b and ab with a2 = b2 = (ab)2 = 1. (3) In ℤ∗15 each element a ≠ 1 has order 2 or 4, and |ℤ∗15 | = 8. Therefore ℤ∗15 is not cyclic.

3.3 RSA cryptosystem We mention that modular arithmetic plays a fundamental role in cryptography and encryption systems. For further general information on cryptography and cryptosystems we refer to [10] which contains a basic introduction to cryptography or to [1] for a more complete treatment. A public key cryptosystem is an encryption method which is easy to encrypt but difficult to decrypt without a secret key. Public key methods allow for encryption an open airway (see [10] or [1]). In this section we briefly discuss the RSA Method which is the best-known public key encryption method. It is named after the last names of its inventors Ronald Linn Rivest (born 1947), Adi Shamir (born 1952) and Leonard Adleman (born 1945). In cryptology it is common to call the two parties who want to communicate privately with each other Alice and Bob. In RSA, Alice chooses two distinct primes p and q and computes the product n = pq; n must be chosen large enough. Next, Alice computes two numbers e, s ≥ 3 such that es ≡ 1 mod (p − 1)(q − 1). Typically, Alice

3.4 The Chinese Remainder Theorem

| 47

chooses e, say e = 17 in case that 17 ∤ φ(n), and then she computes s using the extended Euclidean algorithm. A small number e, like e = 17, is not a problem as long as we take care that no message x is encrypted with x e ≤ n. However, the number s should be large; otherwise, the private key (n, s) is insecure due to an attack of Michael J. Wiener [22]. If s has less than one-quarter as many bits as the modulus n, then, in typical cases, this attack may discover the secret s. Assume that a plaintext message is given by an integer x ∈ ℤn , such that x ∈ {0, 1, 2, … , n − 1}. The public key is the pair (n, e); and encryption is done by x ↦ xe

mod n.

y ↦ ys

mod n.

Alice decrypts by Let y = xe mod n. If es = 1 + (p − 1)k, then 0 x ⋅ 1k

ys ≡ xes ≡ x ⋅ (xp−1 )k ≡ {

if p ∣ x }≡x otherwise

mod p

by Fermat’s Little Theorem. Analogously, one can show that ys ≡ x mod q. In other words, both p and q divide ys − x. Since p and q are coprime, n = pq divides ys − x and, hence, we have ys ≡ x mod n. If x ∈ {0, 1, 2, … , n − 1}, then we obtain x ≡ ys mod n. Thus, every encrypted message is decrypted correctly.

3.4 The Chinese Remainder Theorem The original version of the Chinese Remainder Theorem is from the book “Sum Zi’s handbook of the arithmetic” by the Chinese mathematician Sun Zi (ca. 250). He gave as an exercise the problem to find a number which has after division by 3, 5 and 7 the remainders 2, 3 and 2, respectively. The Chinese remainder theorem provides a straightforward and beautiful solution in general for such simultaneous congruences. The Chinese remainder theorem further has many applications in number theory that we will discuss after giving the main result. Theorem 3.27 (Chinese Remainder Theorem). Suppose that n1 , n2 , … , nr , r ∈ ℕ, r ≥ 2, are r natural numbers that are relatively prime in pairs, that is, gcd(ni , nj ) = 1 if i ≠ j. If a1 , a2 , … , ar are any integers then the system x ≡ ai

mod ni ,

i = 1, 2, … , r

of linear congruence has a common solution which is unique modulo n1 n2 ⋯ nr . Proof. Let n = n1 n2 ⋯ nr and n = n1 n2 ⋯ nk−1 nk+1 ⋯ nr Nk = nk

for k = 1, 2, … , r.

48 | 3 Modular arithmetic Since gcd(ni , nj ) = 1 for i ≠ j then for k = 1, 2, … , r there are no common divisors of nk and Nk , that is, gcd(nk , Nk ) = 1 for all k = 1, 2, … , r. Hence, for each k there exists an integer xk with Nk xk ≡ 1

mod nk .

We show that the number x = a1 N1 x1 + a2 N2 x2 + ⋯ + ar Nr xr is a solution of the given system. We have first x ≡ a1 N1 x1 + a2 N2 x2 + ⋯ + ar Nr xr ≡ ak Nk xk

mod nk

and second x ≡ ak

mod nk

mod nk

and

for k = 1, 2, … , r

because Nk xk ≡ 1

Nk xk ≡ 0

mod ni

for i ≠ k.

Hence, x is a solution of the system. We now show the uniqueness of x modulo n. Let x′ be a second solution which satisfies all congruences: x ≡ ak ≡ x′

mod nk

for k = 1, 2, … , r.

Then nk ∣ (x − x′ ) for all k = 1, 2, … , r. Since gcd(ni , nj ) = 1 for i ≠ j it follows from Euclid’s Lemma (see Chapter 2) that n1 n2 ⋯ nr ∣ (x − x′ ). Then x ≡ x′ mod n1 n2 ⋯ nr which gives the uniqueness modulo n1 n2 ⋯ nr . Before we present some examples we give some number theoretical consequences of the Chinese Remainder Theorem. Corollary 3.28. Suppose that n1 , n2 , … , nr , r ∈ ℕ, r ≥ 2, are r natural numbers that are relatively prime in pairs. Then the map f ∶ ℤn1 n2 ⋯nr → ℤn1 × ℤn2 × ⋯ × ℤnr a ↦ (a1 , a2 , … , ar )

is bijective where a = ai in ℤni , hence, a ≡ ai

mod ni

for i = 1, 2, … , r.

If we restrict ourselves to units, then the restriction of f is a bijection between ℤ∗n1 n2 ⋯nr and ℤ∗n1 × ℤ∗n2 × ⋯ × ℤ∗nr . Proof. The first part follows directly from the Chinese Reminder Theorem (Theorem 3.27). For the second part we only need gcd(a, n1 n2 ⋯ nr ) = 1 if and only if gcd(a, ni ) = 1 for all i = 1, 2, … , r and a ∈ ℤ. This is straightforward. If there are no

3.4 The Chinese Remainder Theorem

| 49

common divisors between a and the product n1 n2 ⋯ nr then there are no common divisors between a and any of the factors, and vice versa. The Chinese Remainder Theorem provides a method to determine a formula for evaluating the Euler φ-function. We need the following lemma. Lemma 3.29. Let m, n ∈ ℕ with gcd(m, n) = 1. Then φ(mn) = φ(m)φ(n) for Euler’s φ-function. Proof. By Corollary 3.28 we have that |ℤ∗mn | = |ℤ∗m | ⋅ |ℤ∗n | which gives by definition φ(mn) = φ(m) ⋅ φ(n). We now get the general formula. α

α

α

Theorem 3.30. Let n = p1 1 p2 2 ⋯ pk k be the prime decomposition of n with k ≥ 1, αi ∈ ℕ for i = 1, 2, … , k and the p1 , p2 , … , pk pairwise different prime numbers. Then α

α −1

α

α −1

α

α −1

φ(n) = (p1 1 − p1 1 )(p2 2 − p2 2 ) ⋯ (pk k − pk k ) α

= p1 1 (1 − k

α 1 α2 1 1 )p (1 − ) ⋯ pk k (1 − ) p1 2 p2 pk

= n ∏(1 − i=1

1 ). pi

Example 3.31. Let n = 126 = 2 ⋅ 32 ⋅ 7. Then φ(126) = φ(2) ⋅ φ(32 ) ⋅ φ(7) = 1 ⋅ (32 − 3) ⋅ 6 = 36. Hence |ℤ∗126 | = 36. Corollary 3.32. Let n ∈ ℕ. Then ∑ φ(d) = n.

d∈ℕ d∣n e

e

e

Proof. If n ∈ ℕ then n = p1 1 p22 ⋯ pkk for primes p1 , p2 , … , pk . The φ-function gives the reduction to the special case n = pα , p a prime number and α ∈ ℕ. Then ∑ φ(d) = φ(1) + φ(p) + φ(p2 ) + ⋯ + φ(pα )

d∈ℕ d∣pα

= 1 + (p − 1) + (p2 − p) + ⋯ + (pα − pα−1 ) = pα .

We now return to the Chinese Remainder Theorem and present some examples.

50 | 3 Modular arithmetic Example 3.33. The problem of Sun Zi corresponds to the system x≡2

mod 3,

x≡3

mod 5,

x≡2

mod 7.

Then n = 3 ⋅ 5 ⋅ 7 = 105 and

n n n = 35, N2 = = 21, N3 = = 15. 3 5 7 With x1 = 2 we have 35x1 ≡ 1 mod 3, with x2 = 1 we have 21x2 ≡ 1 mod 5 and with x3 = 1 we have 15x3 ≡ 1 mod 7. The solution modulo 105 is N1 =

x = 2 ⋅ 35 ⋅ 2 + 3 ⋅ 21 ⋅ 1 + 2 ⋅ 15 ⋅ 1 = 233. Now 233 ≡ 23

mod 105.

Therefore 23 is the smallest possible positive solution. Example 3.34. A gang of 17 pirates captured a sack with gold coins. When trying to allocate equally the prey, there remain 3 coins. At the dispute who should get the three coins one pirate is killed. When trying now to allocate equally the prey, there remain 10 coins. Again a quarrel flared up and another pirate is killed. Now, finally the surviving pirates can equally allocate the prey. What is the smallest possible number of coins the pirates captured? Expressed as a system of linear congruences the exercise is as follows: Calculate the smallest natural number x with the property that x≡ 3

mod 17,

x ≡ 10

mod 16,

x≡ 0

mod 15.

We have n = 17 ⋅ 16 ⋅ 15 = 4 080 and 4 080 4 080 4 080 N1 = = 240, N2 = = 255, N3 = = 272. 17 16 15 With x1 = 9, x2 = 15 and x3 = 8 we have 240x1 ≡ 1 mod 17, 255x2 ≡ 1 mod 16 and also 272x3 ≡ 1 mod 15. From this we get x = 3 ⋅ 240 ⋅ 9 + 10 ⋅ 255 ⋅ 15 + 0 ⋅ 272 ⋅ 8 = 44 730. Modulo n = 4 080 we get the desired number 3 930. We may interpret the Chinese Remainder Theorem also in the following manner. Theorem 3.35. Let n = n1 n2 ⋯ nr , r ∈ ℕ, r ≥ 2, with pairwise relatively prime n1 , n2 , … , nr , that is, gcd(ni , nj ) = 1 for i ≠ j.

3.4 The Chinese Remainder Theorem

| 51

Then the congruence a ≡ b mod n is equivalent to the system a≡b a≡b

mod n1 ,

mod n2 ,

⋮ a≡b

mod nr .

Proof. It is clear that from a ≡ b mod n the system follows, because from a − b = tn for some t ∈ ℤ we get a − b = (t ⋅ n1 ⋅ n2 ⋯ nk−1 ⋅ nk+1 ⋯ nr ) ⋅ nk = t ′ nk .

If on the other side the system holds, then nk ∣ (a − b) for all k. From Euclid’s Lemma (see Theorem 2.32) we get there n ∣ (a − b) because gcd(ni , nj ) = 1 for i ≠ j. Example 3.36. We want to solve the linear congruence 17x ≡ 9 mod 276. Because 276 = 3 ⋅ 4 ⋅ 23 this is equivalent to the solution of the system 17x ≡ 9

mod 3,

17x ≡ 9

mod 4,

17x ≡ 9

mod 23.

(3.1)

We give this system in the form of the Chinese Remainder Theorem. Since 1 = 17 ⋅ 2 − 3 ⋅ 11 = 17 ⋅ 1 − 4 ⋅ 4 = 17 ⋅ (−4) + 23 ⋅ 3 we get the equivalent system x ≡ 18 ≡ 0

mod 3,

x≡

mod 4,

9≡1

x ≡ −36 ≡ 10

mod 23.

Since n = 276 = 3 ⋅ 4 ⋅ 23 we get 276 276 276 = 92, N2 = = 69 and N3 = = 12. 3 4 23 With x1 = 2, x2 = 1 and x3 = 2 we get 92x1 ≡ 1 mod 3, 69x2 ≡ 1 mod 4 and also the congruence 12x3 ≡ 1 mod 23, respectively. Then N1 =

x = 0 ⋅ 92 ⋅ 2 + 1 ⋅ 69 ⋅ 1 + 10 ⋅ 12 ⋅ 2 = 309 and x ≡ 33

mod 276.

52 | 3 Modular arithmetic Often we have a system of linear congruences, for which we cannot apply the Chinese Remainder Theorem directly because some of the moduli n1 , n2 , … , nr are not relatively prime. Then we may try to rewrite the system in such a way, that the solutions are the same and we may apply the Chinese Remainder Theorem. Example 3.37 (Brahmagupta (ca. 700)). If one takes out of the basket each time simultaneously 2, 3, 4, 5, 6 or 7 eggs, then there are each time 1, 2, 3, 4, 5 or 0 eggs, respectively, left in the basket. Determine the smallest number of eggs which could be in the basket. Expressed in congruences, the task is as follows. Determine the smallest integer x with the property: x≡1

mod 2,

x≡2

mod 3,

x≡3

mod 4,

x≡4

mod 5,

x≡5

mod 6,

x≡0

mod 7.

We need not consider the first two congruences because automatically x≡3

mod 4 ⇒ x ≡ 1 mod 2

x≡5

mod 6 ⇒ x ≡ 2

and

mod 3.

Since gcd(4, 6) = 2 we have first to consider the two congruences x≡3

mod 4

and x ≡ 5

mod 6

separately. We have lcm(4, 6) = 12. Hence, we multiply the first congruence with 3 and the second with 2 and get 3x ≡ 9

mod 12

and

2x ≡ 10 mod 12.

Addition leads to 5x ≡ 19 ≡ 7

mod 12.

Since 25 ≡ 1 mod 12 we get x = 5 ⋅ 7 ≡ 11 mod 12. The same congruence we get by substraction directly: x ≡ −1 ≡ 11

mod 12.

From x ≡ 11 mod 12 we automatically get x ≡ 3 mod 4 and x ≡ 5 mod 6. So, we are left with the three congruences x ≡ 11

mod 12,

3.4 The Chinese Remainder Theorem

| 53

x ≡ 4 mod 5, x ≡ 0 mod 7. Now we may apply the Chinese Remainder Theorem directly. With N1 = 35, N2 = 84, N3 = 60 we get x1 = 11, x2 = 4, x3 = 2, respectively and finally x = 5 579. Modulo 420 = 12 ⋅ 5 ⋅ 7 the desired number is 119. We may apply the Chinese Remainder Theorem also in steps for two natural numbers a, b each time, starting, for instance, with a = n1 ⋅ n2 ⋯ nr−1 and b ≠ nr . If gcd(ni , nj ) = 1 for i ≠ j then also gcd(a, b) = 1. After an idea of G. Müller (see [18]) we may prove the Chinese Remainder Theorem for two natural numbers a, b also with the principle of reciprocal measuring. Theorem 3.38. Let a, b ∈ ℕ with gcd(a, b) = 1. If we measure the multiples of a from 1 ⋅ a to ba by b, that is, ja = qj b + rj ,

0 ≤ rj ≤ b − 1,

j = 1, 2, … b,

then each remainder rj occurs exactly once. Proof. Let ri = rj for i < j. Then ia − qi b = ri = rj = ja − qj b, and hence, (qj − qi )b = (j − i)a. We have generated a common multiple (j − i)a of a and b which is smaller than lcm(a, b) = ab because 1 ≤ j − i < b. This gives a contradiction. Hence, ri ≠ rj for i ≠ j. We order the numbers 1, 2, … , ab on the one side in an a-field with a columns and on the other side in a b-field with b columns. Example 3.39. a = 5 and b = 7. 1 6 11 16 21 26 31

2 7 12 17 22 27 32

3 8 13 18 23 28 33

4 9 14 19 24 29 34

5 10 15 20 25 30 35

1 8 15 22 29

2 9 16 23 30

3 10 17 24 31

4 11 18 25 32

5 12 19 26 33

6 13 20 27 34

7 14 21 28 35

In the columns of the a-field there are those numbers which have the same remainder after division by a. In the columns of the b-field there are those numbers

54 | 3 Modular arithmetic which have the same remainder after division by b. If we take two different elements from one column of the a-field and form the difference, then the common remainder disappears after subtraction, and we may apply Theorem 3.38 together with its proof. We get that the two elements, we started with, are located in two different columns of the b-field, and hence, give different remainders modulo b. We take in our example with a = 5, b = 7 the numbers 8 and 13. Modulo 5 both have the remainder 3, but 8 has the remainder 1 and 13 the remainder 6 modulo 7. Following this example we get an alternative version of the Chinese Remainder Theorem. Theorem 3.40. Let a, b ∈ ℕ with gcd(a, b) = 1. If the b numbers from 1 to ab = lcm(a, b) which have in the case of division by a the same remainder, are measured by b, then each reminder ri , i = 1, 2, … , b, 0 ≤ ri ≤ b − 1, occurs exactly once. We can conclude that there is a bijective relation between the ab numbers 1, 2, … , ab and the ab pairs of remainders modulo a and remainders modulo b. Especially there is a bijection between ℤab and ℤa × ℤb .

3.5 Quadratic residues Quadratic equations also arise in modular arithmetic. A square in a modular ring is called a quadratic residue. Definition 3.41. Let a ∈ ℤ, n ∈ ℕ with gcd(a, n) = 1. If the equation x 2 = a has a solution in ℤn , then a is called a quadratic residue modulo n. If x 2 = a has no solution in ℤn then a is a quadratic nonresidue modulo n. In this section we are interested for the most part in quadratic equations over ℤp for a prime number p. Hence, let in the following always n = p be a prime number. If a ∈ ℤ with p ∤ a, then a is either a quadratic residue or a quadratic nonresidue modulo p. If a ∈ ℤ with p ∣ a, then a is by definition neither a quadratic residue nor nonresidue. Theorem 3.42. Let p ≥ 3 be a prime number and a ∈ ℤ a quadratic residue modulo p, that is, gcd(a, p) = 1 and there exists an x0 ∈ ℤ with x0 2 = a in ℤp . Then x0 and −x0 are the both solutions of the equation x 2 − a = 0 in ℤp , and x0 , −x0 are different. Proof. We have x2 − a = (x − x0 )(x + x0 ). Since ℤp is a field, then x 2 − a = 0 in ℤp if and only if x = x0 or x = −x0 , and x0 , −x0 are different because p ≥ 3. Notation. Let p ≥ 3 be a prime number and a ∈ ℤ a quadratic residue modulo p, that is, p ∤ a and a = x0 2 for some x0 ∈ ℤp . We write also for short x0 = √a, without causing misunderstandings. We also use the notation b =

1 a

= a−1 if a ⋅ b = 1 in ℤp .

3.5 Quadratic residues |

55

Theorem 3.43. Let a, b, c, p ∈ ℤ, p ≥ 3, a prime number and gcd(a, p) = 1. (1) If b2 − 4ac is a quadratic residue modulo p, then the quadratic equation ax2 + bx + c = 0

over ℤp

has exactly the two solutions x1 = x2 =

1

2a 1 2a

2

(−b + √b − 4ac) 2

(−b − √b − 4ac)

and in ℤp .

(2) If b2 − 4ac is a quadratic nonresidue modulo p, then the quadratic equation ax 2 + bx + c = 0 has no solution in ℤp . (3) If p ∣ (b2 − 4ac), then the quadratic equation ax 2 + bx + c = 0 has in ℤp exactly the one solution x0 = −

b

2a

.

Proof. Since p ≥ 3 and gcd(a, p) = 1 there exist 1 and a1 in ℤp . The result follows from 2 completing the square procedure in ℤp which is analogous as in ℚ. Now, two questions arise: (1) How many quadratic residues modulo p exist? (2) How can we decide if a ∈ ℤ with p ∤ a is a quadratic residue modulo p? The answers are given by the following theorem. We remark that the case p = 2 is not of interest in this connection because ℤ2 = {0, 1} and 1 = −1. Theorem 3.44. Let p ≥ 3 be a prime number. Then the following hold: (a) In the set {1, 2, … , p − 1} there are exactly p−1 quadratic residues modulo p and ex2 actly p−1 quadratic nonresidues modulo p. 2 (b) An integer a ∈ ℤ, p ∤ a, is a quadratic residue (quadratic nonresidue) modulo p if p−1 p−1 and only if a 2 ≡ 1 mod p (a 2 ≡ −1 mod p). (c) If a, b ∈ ℤ, p ∤ a, p ∤ b, then ab is a quadratic residue modulo p if and only if either both a and b are quadratic residues modulo p or both a and b are quadratic nonresidues modulo p.

56 | 3 Modular arithmetic 2

2

Proof. (a) follows immediately from x2 = (−x)2 , because then 1 , 2 , … , ( p−1 )2 are ex2

actly the squares in ℤ∗p , and the remaining

p−1 2

elements in ℤ∗p are not squares in ℤ∗p . p−1

p−1

(b) By Theorem 3.10 (Fermat’s Theorem) we have (a 2 )2 ≡ 1 in ℤ∗p , hence a 2 ∈ {1, −1} for a ≠ 0. Let a1 , a2 , … , as be the pairwise different residue classes in {1, 2, … , p − 1}

with ai b

p−1 2

p−1 2

= 1 (such ai exist because at least 1 = 1

p−1 2

). There exists a b ∈ ℤ∗p with

= −1. This we can see as follows. ℤ∗p = ⟨g⟩ is cyclic of order p − 1, and then neces-

sarily g

p−1 2

= −1. Now the bai , i = 1, 2, … , s are pairwise different and satisfy (bai )

Hence, necessarily s = tion x

p−1 2

p−1

p−1 , 2

p−1 2

= −1.

and the a1 , a2 , … , as are exactly the solutions of the equa-

− 1 = 0, and the elements ba1 , ba2 , … , bas are exactly the solutions of the equa-

tion x 2 + 1 = 0. Now, let c ∈ {1, 2, … , p − 1} be a quadratic residue modulo p, that is, 2 c = d in ℤ∗p . Then c

p−1 2

2

= (d )

p−1 2

=d

p−1

= 1.

Hence, c = ai for one i. This, together with (a), already shows (b) but we want to give a direct argument for the other direction. p−1 2 If a ∈ ℤ∗p with a 2 = 1, then a = h for some h ∈ ℤp . We use again that ℤ∗p = ⟨g⟩ is p−1

t(p−1)

cyclic. Then a = g t for some t with 1 ≤ t ≤ p − 1. From a 2 = 1 we get g 2 = 1. Since ℤ∗p has order p − 1, we get t(p−1) ≡ 0 mod (p − 1), and hence t = 2k for some k ∈ ℤ. It 2 follows that a = g 2k = (g k )2 . This proves (b). p−1 (c) follows directly from (b). If (ab) 2 = 1 then either a

p−1 2

=b

p−1 2

=1

or a

p−1 2

=b

p−1 2

= −1

in ℤ∗p .

Corollary 3.45. Let p ≥ 3 be a prime number. Then −1 is a quadratic residue modulo p if and only if p ≡ 1 mod 4. Proof. If p = 4k + 3 then (−1)

p−1 2

= (−1)2k+1 = −1 and if p = 4k + 1 then (−1)

p−1 2

= (−1)2k = 1.

Notation. If a is a quadratic residue modulo n and x ∈ ℕ with x 2 ≡ a mod n, then x is often called a square root of a modulo n. Remark 3.46. Let p ≥ 3 be a prime number and a ∈ ℤ with p ∤ a a quadratic residue modulo p. We are left with the question how to calculate a square root x of a modulo p.

Exercises | 57 2

b

p−1 2

This is easy for p ≡ 3 mod 4. If p ≡ 3 mod 4 and a = b in ℤ∗p then a

p+1 4

=b

p+1 2

=

b = ±b. p+1 Hence ap ∶= a 4 is a square root of a modulo p if p ≡ 3 mod 4. The calculation of a square root for p ≡ 1 mod 4 is much more complicated. For small p with the congruence p ≡ mod 4 we may certainly find a square root by pinpoint test. In general there exists a reasonable probabilistic algorithm for the calculation of a square root modulo p (see, for instance, [1]).

Exercises 1.

Prove the following statements: (a) Let a, b, c, d, m ∈ ℤ and m ≥ 2 then: From a≡b

mod m and c ≡ d

mod m

it follows a+c≡b+d

mod m and

a⋅c≡b⋅d

mod m.

(b) Let a, b ∈ ℤ and p be a prime number, then: (a + b)p ≡ ap + bp

mod p.

(c) Let a, b, c, d, m ∈ ℤ and gcd(c, m) = 1 then: a⋅c≡b⋅c 2.

mod m ⇒ a ≡ b mod m. 7

(a) Determine the last 2 decimal digits of 99 . (Hint: Check what this number is modulo 100.) (b) Determine the last 3 decimal digits of 7123 456 . (c) Prove with induction that for all n ∈ ℕ the equation φ(n2 ) = nφ(n)

holds. (Hint to (a) and (b): This can be done without a calculator; use the Theorem of Euler.) 3. (a) Determine the multiplicative inverse element of 83 in ℤ105 . (b) Determine the multiplicative inverse element of 181 in ℤ236 . (c) Solve the linear congruence 4x + 6 = 2 in ℤ7 . 4. (a) Use the Euclidean algorithm to calculate gcd(105, 22) and give the multiplicative inverse element of 22 in ℤ105 . (b) Use the Euclidean algorithm to calculate gcd(231, 195) and write it as a linear combination of 231 and 195.

58 | 3 Modular arithmetic 5.

(a) Verify the equation ∑ φ(d) = n

d∈ℕ d∣n

6.

7.

for n = 42 and n = 91. (b) Prove, that for n > 2 the Euler φ-function φ(n) is even. Decide for each of the following three statements if it is true or false. If a statement is true prove it. If a statement is false give a counterexample. (a) Let m, n ∈ ℕ: If gcd(m, n) = 1 it follows gcd(φ(m), φ(n)) = 1. (b) Let n ∈ ℕ ⧵ {1, 2}: If n is a composite number (that is, not a prime number), then gcd(n, φ(n)) > 1. (c) Let m, n ∈ ℕ: If n and m have the same prime divisors then nφ(m) = mφ(n). Chinese Remainder Theorem: (a) Which is the smallest number x ∈ ℕ with x≡1

mod 3,

x≡2

mod 5,

x≡3

mod 11

x≡4

mod 13?

and

(b) Which is the smallest number x ∈ ℕ with x ≡ 1 mod 3, x ≡ 3 mod 7

and

x ≡ 5 mod 11? (c) Show with examples that the system of equations in the Chinese Remainder Theorem x ≡ ai

mod ni

for i = 1, 2, … , k

could have no solution or more than one solution in ℤn if the assumption gcd(ni , nj ) = 1, with 1 ≤ i < j ≤ k, does not hold. 8. Chinese Remainder Theorem: (a) Show that from x ≡ 8 mod 10 the congruence x ≡ 3 mod 5 follows. (b) Which is the smallest number x ∈ ℕ with x≡2

mod 3,

x≡3

mod 5,

x≡2

mod 7

x≡8

mod 10?

and

Proof your solution for the system of 4 congruences.

Exercises | 59

9.

Chinese Remainder Theorem: (a) Which is the smallest number x ∈ ℕ with x≡2

mod 3,

x≡5

mod 9

x≡3

mod 5?

and

(b) Which is the smallest number x ∈ ℕ with x ≡ 2 mod 4, x ≡ 6 mod 8, x ≡ 1 mod 3

and

x ≡ 3 mod 7? 10. (a) Determine all quadratic residues in ℤ7 . Explain your solution. (b) Prove if the following quadratic equations are solvable. Explain your decision. If a quadratic equation is solvable calculate the solution. (i) 5x2 + 2x + 4 = 0 in ℤ11 . (ii) 7x2 + 10x + 11 = 0 in ℤ13 . (iii) 5x2 + 4x + 3 = 0 in ℤ7 . 11. (a) Calculate the multiplicative inverse element for 63 in ℤ103 with the help of the extended Euclidean algorithm. (b) Use the solution of part (a), to solve the linear equation 63x + 101 = 0 in ℤ103 . (c) Calculate gcd(189, 105) with the help of the Euclidean algorithm. (d) Use the solution of gcd(189, 105) of part (c), to calculate lcm(189, 105).

4 Exceptional numbers 4.1 The Fibonacci numbers The development of European mathematics during the Renaissance is closely tied to the name Fibonacci or Leonardo of Pisa (ca. 1170–1240). Fibonacci, the son of a diplomat in Algeria, learned the algebra of Al-Khwarizmi (ca. 780–850) and Abu Kamil (850–930), and especially about the theory of quadratic equations. Returning to Pisa, Fibonacci wrote in 1202 the book “Liber Abaci”. In this book, among other things, he introduced the Indian–Arabic numerals and numeration system in Europe. However he became especially famous through an exercise in his book known as the rabbit problem. This problem introduced what are now known as the Fibonacci numbers. We describe this exercise. Let us assume the following: (1) On January first there is a pair of rabbits (an adult couple). (2) On February first they beget another pair of rabbits (baby couple), and so on at each first day of a month. (3) Each new couple grows up for a month and gets a new couple the first day of the third month of its life, and then so on at each first day of a month. Find the number of couples of rabbits on January first of the following year after the birth of the new couples at that day. We write “A” for adult couple and “B” for baby couple. We solve the problem in Table 4.1.

Table 4.1: Solution for rabbit exercise by Fibonacci.

January 1 February 1 March 1 April 1 May 1 June 1 July 1 August 1 September 1 October 1 November 1 December 1 January 1

Numbers of A’s

Numbers of B’s

1 1 2 3 5 8 13 21 34 55 89 144 233

0 1 1 2 3 5 8 13 21 34 55 89 144

DOI 10.1515/9783110516142-004

62 | 4 Exceptional numbers It follows that each new number comes from the addition of the two previous numbers. The respective total number is then the sum of the A’s and the B’s. From this problem we define the sequence of Fibonacci numbers, our first collection of exceptional numbers. Definition 4.1. The Fibonacci numbers are recursively defined by the sequence (fn )n∈ℕ with f1 = f2 = 1

and fn = fn−1 + fn−2

for n ≥ 3.

We also define f0 = 0. In the first chapter we saw as an application of the second induction principle the following interpretation: Let n ≥ 2. Then fn is the number of all 0–1-sequences of length n − 2 which do not have neighboring 1’s. The Fibonacci numbers are closely related to the golden section. This is the number α that is defined by π 1 α = (1 + √5) = 2 cos( ) ≈ 1,618. 2 5 At the conclusion of this section we present a geometric interpretation of α. Before we consider geometric aspects of the Fibonacci numbers and the golden section, we describe some number theoretical connections. Theorem 4.2 (The Formula of J. P. M. Binet (1786–1856)). Let α=

1 + √5 2

Then fn =

and

β = −α−1 =

αn − β n α−β

1 − √5 . 2

for n ≥ 1.

Proof. The numbers α and β are the two solutions of the equation x2 − x − 1 = 0, hence α2 = α + 1 and β2 = β + 1. It follows inductively that αn+2 = αn+1 + αn , βn+2 = βn+1 + βn

for n ≥ 1.

Further α − β = √5 ≠ 0. We show the Binet formula by induction.

If n = 1 then 1 = principle we get

α−β α−β

= 1, if n = 2 then 1 =

α2 +β2 α−β

= α + β = f2 . With the second induction

4.1 The Fibonacci numbers | 63

fn+2 = fn+1 + fn = = =

αn+1 − βn+1 αn − βn + α−β α−β

αn+1 + αn − (βn+1 + βn ) α−β αn+2 − βn+2 α−β

for n ≥ 1.

Corollary 4.3. fn+1 =1+ n→∞ fn

1

α = lim Proof. We have

1+

1

.

1 +⋯ β n+1

1 − (α) fn+1 αn+1 − βn+1 = = . β n fn αn − β n α−1 (1 − ( α ) ) β

Now, since | α | < 1 we get lim

n→∞

fn+1 1 = −1 = α. fn α

The remaining statement follows inductively fn+1 fn + fn−1 1 = =1+ . fn fn fn fn−1

Remark 4.4. Corollary 4.3 is the reason, why the Fibonacci numbers often occur in nature. Many plants show in the ordering of their leaves or seeds in inflorescent spirals whose numbers are given by the Fibonacci numbers. This is the case, when the angle between neighboring leaves or seeds with respect to the axis of the plant is the ≈ 3,88. The reason is, that the rational numbers which approximate golden angle 2π α the golden section are fractions of adjacent Fibonacci numbers. Therefore the spirals are formed by plant elements whose site numbers differ by Fibonacci numbers in the denominator and, hence, lead about the same direction. By this spiral ordering of the leaves and seeds the plants obtain the best possible light yield. More details of this connection can be found in the book [6]. We now describe some purely number theoretical properties of the Fibonacci numbers. Theorem 4.5. (a) f1 + f2 + ⋯ + fn = fn+2 − 1 for n ≥ 1. (b) f12 + f22 + ⋯ + fn2 = fn fn+1 for n ≥ 1. (c) fn+m = fn−1 fm + fn fm+1 for n, m ≥ 1. f fn 2 n (d) ( 11 01 )n = ( n+1 f f ), and hence especially, fn+1 fn−1 − fn = (−1) , for n ≥ 1. n

n−1

64 | 4 Exceptional numbers We leave these simple inductive proofs as an exercise for the reader. Theorem 4.6. (a) Let r, s ∈ ℕ with r ∣ s. Then fr ∣ fs . (b) Let n, m ∈ ℕ, m ≥ 2, with fn ∣ fm . Then n ∣ m. (c) Let n, m ∈ ℕ. Then gcd(fn , fm ) = fgcd(n,m) . Hence, if especially gcd(n, m) = 1 then gcd(fn , fm ) = 1. Proof. (a) Let s = rt for some t ∈ ℕ. Then S ∶=

αrt − βrt fs = fr αr − β r

= α(t−1)r + α(t−2)r βr + ⋯ + αr β(t−2)r + β(t−1)r . We show that in the above expression for S we have S ∈ ℤ. This can be seen from αβ = −1 and α + β = 1 as follows. We replace from the left β by −α−1 and from the right α by −β−1 . Since the expression for S is symmetric in α and β, S is a sum of terms of the form ±(αk + βk ) with k ∈ ℕ0 . Each term αk + βk , k ∈ ℕ0 , is a natural number which we see by induction: If k = 0 then α0 + β0 = 2 ∈ ℕ, if k = 1 then α + β = 1 ∈ ℕ, if k = 2 then α2 + β2 = α + 1 + β + 1 = 3 ∈ ℕ. Then, by the second induction principle, αk + βk = αk−1 + βk−1 + αk−2 + βk−2 ∈ ℕ

for k ≥ 2.

Now frt =

αrt − βrt αr − βr αrt − βrt = ⋅ α−β α − β αr − β r

= fr ⋅ S. It is S ≠ 0 because frt ≠ 0, and since frt > 0 and fr > 0, we have S ∈ ℕ. In any case fr ∣ frt , that is, fr ∣ fs for r ∣ s. We now show part (c). Let without loss of generality m > n. Using the Euclidean algorithm gives the following scheme: m = nq0 + r1

n = r1 q1 + r2 ⋮

with 0 ≤ r1 < n,

with 0 ≤ r2 < r1 ,

4.1 The Fibonacci numbers | 65

rk−2 = rk−1 qk−1 + rk rk−1 = rk qk ,

with 0 ≤ rk < rk−1 ,

and rk = gcd(m, n).

By Theorem 4.5 (c) we get gcd(fm , fn ) = gcd(fnq0 +r1 , fn ) = gcd(fnq0 −1 fr1 + fnq0 fr1 +1 , fn ) = gcd(fnq0 −1 fr1 , fn ) = gcd(fr1 , fn )

because fn ∣ fnq0 by part (a) and gcd(fnq0 , fnq0 −1 ) = 1 for two neighboring Fibonacci numbers. If we continue that way we get gcd(fr1 , fn ) = ⋯ = gcd(frk , frk−1 ) = frk because rk ∣ rk−1 and, hence, frk ∣ frk−1 by part (a). Therefore gcd(fm , fn ) = fgcd(m,n) . If especially gcd(m, n) = 1, then gcd(fm , fn ) = 1. This proves part (c). We now proof (b). Let without restriction m ≥ 2 and fn ∣ fm . Then fn = gcd(fn , fm ) = fgcd(m,n) by part (c). Then necessarily n ∣ m because m ≥ 2 and fr < fs for 2 ≤ r < s in general. Corollary 4.7. 2 2 f2k = fk (fk+1 + fk−1 ) = fk+1 − fk−1

for k ≥ 1.

Proof. From Theorem 4.5 (c) we have 2 2 f2k = fk (fk−1 + fk+1 ) = (fk+1 − fk−1 )(fk+1 + fk−1 ) = fk+1 − fk−1 .

Theorem 4.8. Let p be a prime number. Then p ∣ fp

for p = 5

and p ∣ fp−1

or

p ∣ fp+1

for p ≠ 5.

Proof. If p = 2 then f3 = 2 and 2 ∣ f3 . If p = 3 then f4 = 3 and 3 ∣ f4 . If p = 5 then f5 = 5 and 5 ∣ f5 . Now, let p ≥ 7. By Theorem 4.2 we have fn =

1 1 + √5 n 1 1 − √5 n ( ( ) − ) 2 2 √5 √5

for n ≥ 1.

By the binomial formula (see, for instance, Chapter 12 for a more general discussion of the binomial coefficients and the binomial formula) n n n n (1 ± √5)n = 1 ± ( )√5 + ( )5 ± ( )(√5)3 + ⋯ + (−1)n ( )(√5)n . 1 2 3 n

66 | 4 Exceptional numbers If n is odd then 2n−1 fn =

1 ((1 + √5)n − (1 − √5)n ) √ 2 5

n n n n−1 = n + ( )5 + ( )52 + ⋯ + ( )5 2 . 3 5 n

Notice that if p is a prime and i < p then (p ) = i

p! . i!(p−i)!

It follows directly that p ∣ (p ). i

Now, let n = p be a prime number with p ≥ 7. p−1 Since p ∣ (p ) for 1 ≤ i < p we get fp ≡ 5 2 mod p, and hence, fp2 ≡ 1 mod p by Feri

mat’s theorem. By Theorem 4.5 (d) we have

fp2 − fp−1 fp+1 = (−1)p−1 = 1, and hence 0 ≡ fp2 − 1 ≡ fp−1 fp+1

mod p.

This gives p ∣ fp−1 or p ∣ fp+1 . Corollary 4.9. If p ≥ 7, then each prime divisor of fp is greater than p. Proof. Assume q ∣ fp , q a prime number with q ≤ p. Because p ≥ 7 we have q < p by Theorem 4.8. Now, gcd(fp , fq ) = fgcd(p,q) = 1, gcd(fp , fq−1 ) = fgcd(p,q−1) = 1 and gcd(fp , fq+1 ) = fgcd(p,q+1) = 1 because q < p. Since q ∣ fp and q ∣ fq , q ∣ fq−1 or q ∣ fq+1 we must have q ∣ 1, which gives a contradiction. Hence, q > p. Corollary 4.10. There are infinitely many prime numbers. Proof. We give two proofs of this using the Fibonacci numbers. Proof 1. Let M = {p1 , p2 , … , pn } be a finite set of prime numbers with p1 < p2 < ⋯ < pn and pn ≥ 7. Let p be a prime divisor of fpn . Then p > pn , hence p ∉ M. Proof 2. Assume that p1 , p2 , … , pn with p1 = 2 are all prime numbers. We have fpi ≥ 2 for i = 2, 3, … , n. Since gcd(fpi , fpj ) = fgcd(pi ,pj ) = 1 for i ≠ j, only at most one fpi may have two prime divisors because otherwise we would have already n + 1 prime numbers. But f19 = 37 ⋅ 113 and f53 = 557 ⋅ 2 417 which gives a contradiction. We want to remark that the Fibonacci numbers are very much related to the Lucas numbers ℓn . These are named after E. Lucas (1842–1891) and defined by ℓ0 = 2,

ℓ1 = 1

and ℓn = ℓn−1 + ℓn−2

We easily see that ℓn = fn−1 + fn+1 = The first Lucas numbers are

αn + β n = αn + β n α+β

2, 1, 3, 4, 7, 11, 18, 29, … .

for n ≥ 2.

for n ≥ 1.

4.1 The Fibonacci numbers | 67

We now describe the golden section geometrically. Recall that the number α=

1 + √5 2

is called the golden section or golden ratio. To define α geometrically, consider a line segment AB, and let the point P be located so that it divides the line segment in extreme to mean ratio. By this we mean that |AP| |AB| = . |PB| |AP| If we let PB have length 1 then the length of AP is the golden section α, see Figure 4.1.

Figure 4.1: Golden section.

To see that the value of α is

1+√5 2

we have the ratio α α+1 = . 1 α

This then gives the quadratic equation α2 − α − 1 = 0. The two solutions are as desired.

1±√5 2

and since the golden ratio is positive we get that α =

1+√5 2

4.1.1 The golden rectangle A golden rectangle is one in which the ratio of the sides is α ∶ 1 where α is the golden section. In ancient times it was felt that a golden rectangle is very pleasing to the eye, and it was often used for the front side of a temple.

Figure 4.2: The golden rectangle.

68 | 4 Exceptional numbers In Figure 4.2 we have |BC| |AB|

=α

with |AB| = 1.

In general, a rectangle ABCD in the above terminology is called a golden rectangle if |BC| |AB|

= α,

that is,

length of the bigger side = α. length of the smaller side

In Figure 4.2 we now divide the sides |BC| and |AD| by E and F, respectively, such that |BE| = |AF| = 1. This gives a square with side length 1. Then the rectangle FECD again is a golden rectangle (in the general definition). This we see as follows. We have |EC| = α − 1 = |FD|, and we get |DC| |EC|

=

1 = α−1

1

1+√5 2

−1

=

1 + √5 = α. 2

We may continue with this procedure and always obtain golden rectangles. Starting with A, if we connect the vertices of the occurring squares by circular arcs with radius the length of the respective squares, we get the golden spiral, see Figure 4.3.

Figure 4.3: The golden spiral.

4.1.2 Squares in semicircles We first give a figure for squares in semicircles, see Figure 4.4.

Figure 4.4: Squares in semicircles.

4.1 The Fibonacci numbers | 69

M is the center of the semicircle. Let A be on the semicircle so that |AB| = r, then r = R + x2 . We have tan(φ) =

x x 2

sin(φ) sin(φ) = . cos(φ) √ 1 − sin2 (φ)

=

This implies sin2 (φ) =

4 x2 = , 5 R2

hence, x=

2 R. √5

We get now |AB| = r = R(1 +

1 ) √5

and r − x = R(1 −

1 ). √5

Since 1+

1 √5

2 √5

we get xr = section.

x , r−x

=

2 √5

(1 −

1 ) √5

that means, the point C divides the line segment AB in the golden

4.1.3 Side length of a regular 10-gon Given a regular 10-gon whose vertices are on a circle with radius R. Let |S10 | be the length of one side S10 of the regular 10-gon, see Figure 4.5.

Figure 4.5: One side S10 of the regular 10-gon. π π We have |S10 | = 2R sin( 10 ). We must calculate sin( 10 ). From the addition theorems for trigonometric functions we have

sin(

2π π π ) = 2 sin( ) cos( ) 10 10 10

and

cos(

2π π ) = 1 − 2 sin2 ( ) 10 10

70 | 4 Exceptional numbers because sin(x + y) = sin(x) cos(y) + sin(y) cos(x), cos(x + y) = cos(x) cos(y) − sin(x) sin(y) and 2

cos (x) + sin2 (x) = 1. π We note that since sin(φ) = cos( π2 − φ) we have sin( 4π ) = cos( 10 ). 10 It follows that

sin(

π 4π π 2π 2π π π ) = cos( ) = 2 sin( ) cos( ) = 4 sin( ) cos( )(1 − 2 sin2 ( )). 10 10 10 10 10 10 10

π Since cos( 10 ) ≠ 0 we get

1 = 4 sin(

π π )(1 − 2 sin2 ( )). 10 10

π Hence, sin( 10 ) is a solution of the equation 4x(1 − 2x2 ) − 1 = 0. Another solution of this

π π equation is x1 = 21 . Now, sin( 10 ) ≠ 21 . If we divide by x − 21 we get that sin( 10 ) is a solution 2 of the equation −8x − 4x + 2 = 0. This equation has the two solutions

1 √5 x2 = − + 4 4 π π Since sin( 10 ) > 0 we have sin( 10 )=

1 √5 and x3 = − − . 4 4 √5−1 4

π |S10 | = 2R sin( 10 ) = Rα.

=

1 2(α−1)

=

α 2

where α =

1+√5 2

is the golden

section. It follows Hence, if we divide the radius R of the circumcircle by the golden section, then the side S10 is the bigger part. Therefore we may construct a regular 10-gon (and therefore also a regular 5-gon) with compass and straightedge, if we can do it for α. We show how this can be done in the next section.

4.1.4 Construction of the golden section α with compass and straightedge from a given a ∈ ℝ, a > 0 We show in this section how, given a ∈ ℝ with a > 0 to construct the golden section α with a straightedge and compass. We have |AP| = ax, see Figure 4.6. By the Theorem of Pythagoras a 2 a 2 a2 + ( ) = (ax + ) 2 2 ⇔ a2 = a2 x 2 + a2 x ⇔ 1 = x2 + x.

4.2 Perfect numbers and Mersenne numbers | 71

Figure 4.6: Construction of the Golden Section α.

Therefore x = 21 (−1 + √5) = α − 1 because x > 0. Hence, the point P divides the line

segment AP by the golden section:

|AB| |AP|

=

1 1 = = α. x α−1

This gives the desired construction. For more on constructions with compass and straightedges see Chapter 13.

4.2 Perfect numbers and Mersenne numbers A number of the form 2n − 1, n ≥ 2, is called a Mersenne number. They are named after the French monk M. Mersenne (1588–1648), who studied them. However they already occur in Euclid’s Elements, Books 7–9. Mersenne was one of many mathematicians who studied numbers of the form 2n − 1, n ≥ 2. We denote the number 2n − 1 by Mn , n ∈ ℕ, n ≥ 2. If the Mersenne number Mn is a prime number we call it a Mersenne prime. Mersenne showed that 2p − 1 is a prime number for p = 2, 3, 5, 7, 13, 17, 19, 31 and 127 and that if Mn is a prime then n is also prime. Theorem 4.11. If Mn is a Mersenne prime then n itself is a prime number. Proof. Let n = ab with 2 ≤ a, b < n. Then 2n − 1 = (2a )b − 1 = (2a − 1)(1 + 2a + (2a )2 + … + (2a )b−1 ) by the geometric sum formula, and both factors on the right side are bigger than 1. Recall that a primality test is a procedure to determine if a given natural number n is prime or not. For the Mersenne primes there exists an efficient primality test. This is the Lucas–Lehmer test which is named after E. Lucas and D. H. Lehmer (1905–1991). Theorem 4.12. Let p ≥ 3 be a prime number. Define recursively the sequence (Sn )n∈ℕ by S1 = 4 and

2 Sn = Sn−1 − 2 for n ≥ 2.

Then Mp = 2p − 1 is a prime number if and only if Mp ∣ Sp−1 .

72 | 4 Exceptional numbers A proof of Theorem 4.12 is given, for instance, in [10]. From Theorem 4.12 we easily see that for p = 2, 3, 5, 7 the Mersenne number Mp = 3, 7, 31, 127, respectively, is a prime number. But we already have M11 = 211 − 1 = 23 ⋅ 89. Mersenne prime numbers are quite rare. Until the beginning of 2016 there are only 49 Mersenne prime numbers known. With the implementation of the Lucas–Lehmer test the GIMPS-Project (Great Internet Mersenne Prime Search) was founded. Within this project C. Coopers found January 7, 2016, the so far biggest known Mersenne prime number M74 207 281 = 274 207 281 − 1, it has 22 338 618 decimal places. A list of the known Mersenne prime numbers is given, see Table 4.2 and Table 4.3. Table 4.2: List of the known Mersenne prime numbers I. Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Exponent p

Decimal places

Year

Discoverer

2 3 5 7 13 17 19 31 61 89 107 127 521 607 1 279 2 203 2 281 3 217 4 253 4 423 9 689 9 941 11 213 19 937 21 701

1 1 2 3 4 6 6 10 19 27 33 39 157 183 386 664 687 969 1 281 1 332 2 917 2 993 3 376 6 002 6 533

– – – – 1456 1588 1588 1772 1883 1911 1914 1876 1952 1952 1952 1952 1952 1957 1961 1961 1963 1963 1963 1971 1978

– – – – unknown Cataldi Cataldi Euler Pervushin Powers Powers Lucas Robinson Robinson Robinson Robinson Robinson Riesel Hurwitz Hurwitz Gillies Gillies Gillies Tuckerman Noll, Nickel

4.2 Perfect numbers and Mersenne numbers | 73 Table 4.3: List of the known Mersenne prime numbers II. Number

Exponent p

Decimal places

Year

Discoverer

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45* 46* 47* 48* 49*

23 209 44 497 86 243 110 503 132 049 216 091 756 839 859 433 1 257 787 1 398 269 2 976 221 3 021 377 6 972 593 13 466 917 20 996 011 24 036 583 25 964 951 30 402 457 32 582 657 37 156 667 43 112 609 42 643 801 57 885 161 74 207 281

6 987 13 395 25 962 33 265 39 751 65 050 227 832 258 716 378 632 420 921 895 932 909 526 2 098 960 4 053 946 6 320 430 7 235 733 7 816 230 9 152 052 9 808 358 11 185 272 12 978 189 12 837 064 17 425 170 22 338 618

1979 1979 1982 1988 1983 1985 1992 1994 1996 1996 1997 1998 1999 2001 2003 2004 2005 2005 2006 2008 2008 2009 2013 2016

Noll Nelson, Slowinski Slowinski Colquitt, Welsh Slowinski Slowinski Slowinski, Gage Slowinski, Gage Slowinski, Gage Armengaud, Woltman i.a. (GIMPS) Spence, Woltman i.a. (GIMPS) Clarkson, Woltman, Kurowski i.a. (GIMPS, PrimeNet) Hajratwala, Woltman, Kurowski i.a. (GIMPS, PrimeNet) Cameron, Woltman, Kurowski i.a. (GIMPS, PrimeNet) Shafer, Woltman, Kurowski i.a. (GIMPS, PrimeNet) Findley, Woltman, Kurowski i.a. (GIMPS, PrimeNet) Nowak, Woltman, Kurowski i.a. (GIMPS, PrimeNet) Cooper, Boone, i.a. (GIMPS, PrimeNet) Cooper, Boone, i.a. (GIMPS, PrimeNet) Elvenich, Woltman, Kurowski, i.a. (GIMPS, PrimeNet) Smith, Woltman, Kurowski, i.a. (GIMPS, PrimeNet) Odd Magnar Strinmo, Melhus (GIMPS, PrimeNet) Cooper, i.a. (GIMPS, PrimeNet) Cooper, i.a. (GIMPS, PrimeNet)

* Note:

For the last numbers it is not yet proved that there are no smaller Mersenne numbers.

The question remains. Are there infinitely many Mersenne prime numbers Mp = 2p − 1, p a prime number? The Mersenne prime numbers are very closely related to another class of exceptional numbers, the perfect numbers. A natural number n is a perfect number if it is equal to the sum of its proper divisors. That is, n = ∑ d. d|n, d≥1, d≠n

For example the number 6 is perfect since its proper divisors are 1, 2, 3 which add up to 6. We denote by σ(n) the sum of all positive divisors of n ∈ ℕ, that is, σ(n) = ∑ d, d∣n d≥1

74 | 4 Exceptional numbers and by σ ∗ (n) the sum of all positive, proper divisors of n ∈ ℕ, that is, σ ∗ (n) = ∑ d = σ(n) − n. d∣n 1≤d a. Their sum is 2s a = σ(b). This is only possible if b = (2s − 1)a has no other positive divisors, and this means, that a = 1 and 2s − 1 is a prime number. By Theorem 4.11 we get, that s has to be a prime number, and than b = 2s − 1 is a Mersenne prime. Recall that the n-th triangular number Tn , n ∈ ℕ, is the sum of the numbers 1, 2, 3, … , n. We have the explicit formula n

Tn = ∑ k = k=1

n(n + 1) . 2

Geometrically Tn is the number of dots composing a triangle with n dots on a side, see Figure 4.7.

Figure 4.7: Geometrical representation of the triangular numbers T1 , T2 , T3 and T4 .

As a corollary of the characterization of the perfect numbers we get. Corollary 4.15. Let n = 2p−1 (2p − 1) be an even perfect number. Then n is the (2p − 1)-th triangular number. Proof. Let n = 2p−1 (2p − 1) be a perfect number. Let m = 2p − 1. Then Tm =

m(m + 1) (2p − 1)2p = = 2p−1 (2p − 1) = n. 2 2

76 | 4 Exceptional numbers Remark 4.16. All even perfect numbers are described by Theorem 4.14. To determine which Mersenne numbers Mp = 2p − 1, p a prime number, are Mersenne prime numbers we must determine all even perfect numbers, and vice versa. It remains to consider the question of the existence of odd perfect numbers. This is an unsolved problem. Numerical experiments show that, if there exists an odd perfect number n ≥ 3 then the following must hold: (i) n > 101 500 and (ii) n has at least 10 distinct prime divisors. The number theoretical function σ∶ℕ→ℕ n ↦ σ(n) is also the basis for other classes of numbers. In ancient times the natural numbers were considered with respect to their divisibility properties. A natural number n is called deficient if σ(n) < 2n and abundant if σ(n) > 2n. The number 21 is deficient because σ(21) = 32 < 2 ⋅ 21 = 42. On the other hand the numbers 12 and 60 are abundant because σ(12) = 28 > 2 ⋅ 12 = 24

and

σ(60) = 168 > 2 ⋅ 60 = 120. Here we see the reason for the great reputation of the numbers 12 and 60 in ancient times. The perfect numbers are between deficient and the abundant numbers. Definition 4.17. Two different natural numbers m and n are called amicable numbers if σ(m) = m + n = σ(n), or equivalently, σ ∗ (m) = n and σ ∗ (n) = m. If the numbers m and n are amicable then the smaller one is abundant and the bigger one is deficient. Amicable numbers were known to the Pythagoreans, who credited them with many mystical properties. The question of what a friend is, Pythagoras answered like “A person, which is another me, like 220 and 284.”. The first ten pairs of amicable numbers are (220, 284),

(1 184, 1 210),

(6 232, 6 368), (63 020, 76 084)

(2 620, 2 924),

(10 744, 10 856),

(5 020, 5 564),

(12 285, 14 595),

(17 296, 18 416),

and (66 928, 66 992).

The first rule, the Thabit rule, to generate pairs of amicable numbers was given by Thabit ibn Qurra (826–901). His rule was extended 1747 by Euler to the following.

4.2 Perfect numbers and Mersenne numbers | 77

Theorem 4.18. Let n, k ∈ ℕ, n > k, f = 2k + 1, x = f 2n − 1, y = f 2n−k − 1 and z = f 2 22n−k − 1. Let x, y, z be prime numbers. Then a = 2n xy and b = 2n z are amicable. The proof is, like the proof for the rule for even perfect numbers, purely computational using the facts that x, y, z are prime numbers. For k = 1 we get the Thabit rules. Euler’s rule was extended by W. Borho [2]. He found 10 455 new pairs of amicable numbers. Now, there are more than 109 pairs of amicable numbers known. It is conjectured that there are infinitely many pairs of amicable numbers. The amicable numbers are often featured in novels like “The Professor’s Beloved Equation” by Yoko Ogawa. The next type of numbers related to the function σ∶ℕ→ℕ n ↦ σ(n) are the sociable numbers. Definition 4.19. Sociable numbers are numbers which form a cyclic sequence that begins and ends with the same number and has length greater than 2, where each number is the sum of the proper divisors of the preceding number, that is, any two neighboring numbers in the circle form an amicable pair. The period of the sequence is the number k of the numbers in that circle. We remark that each number and its preceding number in the circle form a pair of amicable numbers. Examples 4.20. (1) k = 4: 1 264 460, 1 547 860, 1 727 636, 1 305 184. (2) k = 5: 12 496, 14 288, 15 472, 14 536, 14 264. By March 2013 there are 1 593 cycles of sociable numbers known, 1 581 of period 4, 1 of period 5, 5 of period 6, 4 of period 8, 1 of period 9 and 1 of period 28. There are many open questions for sociable numbers, especially do there exist sociable numbers of period 3? We note that we may use the Mersenne numbers to present a new proof that there exist infinitely many prime numbers. This follows from the following theorem in much the same way that we used the Fibonacci numbers to prove the infinity of the set of primes. Theorem 4.21. For the Mersenne numbers Mm and Mn , the following holds: gcd(Mm , Mn ) = gcd(2m − 1, 2n − 1) = 2gcd(m,n) − 1. Proof. This is certainly correct for m = n. Let therefore m > n ≥ 2. Using the Euclidean algorithm we get the following scheme: m = nq0 + r1 ,

0 ≤ r1 < n,

78 | 4 Exceptional numbers n = r1 q1 + r2 ,

0 ≤ r2 < r1 ,

⋮ rk−2 = rk−1 qk−1 + rk , rk−1 = rk qk ,

and

0 ≤ rk < rk−1 ,

rk = gcd(m, n).

Using this we get 2m − 1 = 2nq0 +r1 − 1 = 2r1 (2nq0 − 1) + (2r1 − 1), 2n − 1 = 2r2 (2r1 q1 − 1) + (2r2 − 1), ⋮ rk−1

2

− 1 = 2rk qk − 1 = (2rk − 1)(2rk (qk −1) + ⋯ + 1).

From this we get (2rk − 1) ∣ (2rk−1 − 1) and also (2rk − 1) ∣ (2rk−2 − 1) since also 2rk−1 qk−1 − 1 = (2rk−1 − 1)(2rk−1 (qk−1 −1) + ⋯ + 1). Finally we get (2rk − 1) ∣ (2n − 1) and

(2rk − 1) ∣ (2m − 1).

Let d = gcd(2n − 1, 2m − 1). From above d ∣ (2ri − 1) for i = 1, 2, … , k, especially d ∣ (2rk − 1), rk = gcd(n, m). Corollary 4.22. There are infinitely many prime numbers. Proof. Let P = {p1 , p2 , … , pn }, 2 = p1 < p2 < ⋯ < pn , a finite set of prime numbers. Then gcd(Mpi , Mpj ) = 2gcd(pi ,pj ) − 1 = 1

for i ≠ j.

For i = 1, 2, … , n the number Mpi = 2pi − 1 is odd, and no two distinct ones of them have a common prime divisor. Since there are only n − 1 odd prime numbers in P, there must exist another prime number which is not in P.

4.3 Fermat numbers Related and of similar interest as the Mersenne numbers are the Fermat numbers. These is the sequence (Fn )n∈ℕ of positive integers defined by n

Fn = 22 + 1,

n = 1, 2, 3, … .

If in particular Fm is a prime number it is called a Fermat prime number.

4.3 Fermat numbers | 79

Fermat believed that all the numbers in this sequence were prime numbers. In fact F1 , F2 , F3 , F4 are all prime numbers but F5 is composite and divisible by 641. It is still an open question as to whether or not there are infinitely many Fermat prime numbers. It has been conjectured that there are only finitely many. On the other hand if a number of the form 2k + 1 is a prime number for some integer k then it must be a Fermat prime number. Theorem 4.23. If a number 2n + 1 is a prime number then it is a Fermat prime number, that is n = 2s , s ∈ ℕ0 . Proof. Let n = 2s u, s ∈ ℕ0 , u odd. Assume that u ≥ 3. Then s

s

s

2n + 1 = (22 + 1)(22 (u−1) − 22 (u−2) + ⋯ + 1). s

Hence, (22 + 1) is a proper divisor of 2n + 1 for u ≥ 3, which gives a contradiction if 2n + 1 is a prime number. Hence, u = 1 and n = 2s if 2n + 1 is a prime number. Theorem 4.23 can easily be expressed in a more general form. Theorem 4.24. Let a ∈ ℕ, a ≥ 2, and an + 1 be a prime number. Then a is even and n = 2m for some m ∈ ℕ0 . Proof. If a is odd then an + 1 is even and hence not a prime number if a ≥ 2. Therefore a is even. Assume that n = kl with k odd and k ≥ 3. Then akl + 1 = (al + 1)(al(k−1) − al(k−2) + ⋯ + 1), and al + 1 is a proper divisor of akl + 1 because k ≥ 3 which contradicts that an + 1 is a prime number. Therefore k = 1 and n = 2m for some m ∈ ℕ0 . Remarks 4.25. So far there are only 5 Fermat primes that are known. These are the original 5 given by Fermat: s

22 + 1 = 3, 5, 17, 257, 65 537

with s = 0, 1, 2, 3, 4,

respectively.

5

Euler showed in 1732 that if s = 5, then 641 ∣ (22 + 1), that is 641 ∣ F5 . This can be seen as follows. Define a = 27 and b = 5. Then 1 + ab = 641, and we have 1 + ab − b4 = 1 + (a − b3 )b = 1 + (128 − 125)b = 1 + 3b = 1 + 15 = 24 . From this we get 5

F5 = 22 + 1 = 232 + 1 = 24 ⋅ 228 + 1 = 24 a4 + 1 = (1 + ab − b4 )a4 + 1

80 | 4 Exceptional numbers = (1 + ab)a4 + (1 − a4 b4 ) = (1 + ab)a4 + (1 + a2 b2 )(1 − a2 b2 ) = (1 + ab)a4 + (1 + ab)(1 − ab)(1 + a2 b2 ) = (1 + ab)(a4 + (1 − ab)(1 + a2 b2 )). Since 1 + ab = 641 we get 641 ∣ F5 . The question arises if there are more than 5 Fermat prime numbers. A new conjecture says that there are no more Fermat prime numbers. The probability that the Fermat number Fn is a prime number is about 2cn for a constant c > 0. This conjecture is based on statistical experiments. The Fermat prime numbers play a fundamental role in the construction of regular m-gons with compass and straightedge. Gauss (1777–1855) showed the following. Let m ∈ ℕ, m ≥ 2. Starting with 0 and 1, the regular m-gon can be constructed with compass and straightedge if and only if m = 2k p1 p2 ⋯ pr , where k, r ∈ ℕ0 , and p1 , p2 , … , pr pairwise distinct Fermat prime numbers. Hence if m is odd the biggest known such number is therefore m = 3 ⋅ 5 ⋅ 17 ⋅ 257 ⋅ 65 537 = 232 − 1, the Mersenne number M32 . We discuss further constructions with compass and straightedge and prove the theorem of Gauss in Chapter 13. We also may use the Fermat numbers to show that there exist infinitely many prime numbers. Theorem 4.26. Let m, n ∈ ℕ. Then gcd(F2n , F2m ) = 1 if m ≠ n. Proof. Let n > m and d ∣ F2m and d ∣ F2n . Then F2n − 2 22 − 1 m n−m m n−m = (22 )2 −1 − (22 )2 −2 + ⋯ ± 1. = 2m F2m 2 +1 n

Hence d ∣ (F2n − 2) and therefore d ∣ 2. Since F2n and F2m are both odd, we must have d = 1. Corollary 4.27. There exist infinitely many prime numbers. Proof. The infinitely many Fermat numbers F2n are pairwise relatively prime, and each F2n has a prime divisor pn . These pn must be pairwise different.

Exercises 1.

Use induction to prove Theorem 4.5, that is (a) f1 + f2 + ⋯ + fn = fn+2 − 1 for n ≥ 1. (b) f12 + f22 + ⋯ + fn2 = fn fn+1 for n ≥ 1.

Exercises | 81

(c) fn+m = fn−1 fm + fn fm+1 for n, m ≥ 1. f fn 2 n (d) ( 11 01 )n = ( n+1 fn fn−1 ), and hence especially, fn+1 fn−1 − fn = (−1) , for n ≥ 1. 2. Decide for each of the following two statements if it is true or false. If a statement is true prove it. If a statement is false give a counterexample. (a) Each even perfect number is a triangular number. (b) Each even triangular number is a perfect number. 3. (a) Prove in detail and direct the statement: For a Mersenne prime number Mn = 2n − 1 is the number (2n − 1)2n−1 a perfect number. (b) Verify the statement in a) for the Mersenne prime numbers M2 , M3 and M5 , that is, prove σ((2n − 1)2n−1 ) = (2n − 1)2n for n = 2, 3, 5. 4. In Section 4.1.3 we saw that we are able to construct a regular 5-gon with the golden section α. Is it possible to construct a regular 3-gon and regular 4-gon? If it is possible, please give a construction with compass and straightedge. 5. Prove that the following pairs of numbers are amicable numbers: (a) 17 296, 18 416; (b) 1 184, 1 210. 6. Prove Theorem 4.18. 7. Prove that the numbers in the following sequences of length k are sociable numbers: (a) k = 4: 1 264 460, 1 547 860, 1 727 636, 1 305 184. (b) k = 5: 12 496, 14 288, 15 472, 14 536, 14 264. 8. Verify with the help of congruences, that the Fermat number F25 is divisible by 641. (Hint: Use the equations 641 = 5 ⋅ 27 + 1 and 641 = 24 + 54 .)

5 Pythagorean triples and sums of squares 5.1 The Pythagorean Theorem The Pythagorean Theorem is perhaps the most famous result from school mathematics. This ancient result, known in China 1000 years before Pythagoras, says that in a right triangle with sides of length a, b, c, and c being the hypotenuse or the side opposite the right angle we have c2 = a2 + b2 . If the three length are natural numbers then the triple (a, b; c) is called a Pythagorean triple. Definition 5.1. A triple (a, b; c) of natural numbers is called a Pythagorean triple if a2 + b2 = c2 . Observe that c is fixed for the right side of the equation. We consider the triples (a, b; c) and (b, a; c) as equal. A Pythagorean triple is called primitive, if gcd(a, b, c) = gcd(a, b) = gcd(a, c) = gcd(b, c) = 1. From a2 + b2 = c2 it follows that (at)2 + (bt)2 = (ct)2 . Therefore if (a, b; c) is a Pythagorean triple and t is a natural number then (at, bt; ct) is also a Pythagorean triple. Conversely from (at)2 + (bt)2 = (ct)2 it follows a2 + b2 = c2 Hence if (at, bt; ct) is a Pythagorean triple and t is a natural number then (a, b; c) is also a Pythagorean triple. It follows that to classify all Pythagorean triples we may restrict ourselves to the primitive Pythagorean triples. The Pythagorean triple (3, 4; 5) should be known to all students. This triple was already known in all ancient advanced civilizations and is recorded in several documents. Here we give some examples: (1) There is a recorded dialog between emperor Tschan Kong and the savant Schank Kaon (1100 BC). (2) The triple (3, 4; 5) is mentioned in several places in the Bible. In the second book of Moses, Chapters 37 and 38, it says that this triple was used for the building at the tent for the ark of the Covenant and for the building of the altar of burnt offering. Further in the First Book of Kings, it was mentioned that it was used for manufacturing at the consoles of begging bowls. (3) There are several Pythagorean triples mentioned on Babylonean clay tablets from the time of the Hammurabi dynasty (1829–1530 BC). The cuneiform tablet Plimpton 322 contains 15 distinct Pythagorean triple, among others the triples (3, 4; 5),

(5, 12; 13),

(119, 120; 169), DOI 10.1515/9783110516142-005

(7, 24; 25),

(56, 90; 106),

and (12 709, 13 500; 18 541).

84 | 5 Pythagorean triples and sums of squares This indicates that already about 1500 BC there was a known procedure to calculate Pythagorean triples. (4) The Indian Bandhayana Sulbasustra from 600 BC contains five Pythagorean triples. The Sulbasustras are manuals for the art of measuring, for instance for the installation of altars. (5) Plato’s (ca. 428–348 BC) theory of forms or ideas argues that non-physical forms or ideas represent the accurate reality. The ideas are the preimages of the imperfect existence and therefore of godlike quality. Hence, Plato’s theory always had a mystic element. He considered the 3 as the male, the 4 as the woman and the 5 as the child. Plato found the calculated Pythagorean triples of the form (n2 − 1, 2n; n2 + 1). For n = 2 we have (3, 4; 5) and for n = 4 we have (15, 8; 17). (6) Pythagoras (570–510 BC) founded a philosophical and mathematical school. For him the numbers are the basis of all reality. From his school descended the construction of infinitely many Pythagorean triples of the form (2n + 1, 2n2 + 2n; 2n2 + 2n + 1). From the examples it is clear that the naming of the Pythagorean triples for Pythagoras does not reflect the reality. The name was probably given because of the formulation of the problem to determine all triples (a, b; c) with a2 + b2 = c2 by the Pythagoreans. (7) In ancient architecture the triple (3, 4; 5) was used to construct right angles. One took a rope of a length of 12 units. After 3 and 7 units one made a knot. Then one connected the both ends by another knot. If 3 persons pulled at the knots one got a right angle, see Figure 5.1. The ancient Egyption rope tensioner knew the triple (3, 4; 5) and used it for the staking-out of the right angle for the construction of the pyramids, for instance the Cheops Pyramid around 2585 BC. The Roman author Vitrivius (14 BC) called the construction of a right angle with help of the triples (3, 4; 5) as the greatest attainment of mathematics.

Figure 5.1: Construction of a right angle with a rope of a length of 12 units.

5.2 Classification of the Pythagorean triples | 85

(8) The Indian mathematician Brahmagupta (628) gave a rule for the construction of Pythagorean triples, which is easy to verify: 2 Let a ∈ ℕ and d ∈ ℕ such that d ∣ a2 and ad − d is an even natural number. Define 1 a2 b = ( − d) 2 d

and

c = a + b.

Then (a, b; c) is a Pythagorean triple.

5.2 Classification of the Pythagorean triples The first general solution of the problem to determine all Pythagorean triples is in Euclid’s tenth book of the Elements. The following theorem is from the book Arithmetica by Diophantus (300 BC). Theorem 5.2. Let n, m be two relatively prime natural numbers with positive and odd difference n − m. Then (n2 − m2 , 2nm; n2 + m2 ) is a primitive Pythagorean triple, and further each primitive Pythagorean triple can be obtained in this manner. Proof. The first part follows when we do the indicated operations in the equation below: (n2 − m2 )2 + (2nm)2 = n4 − 2n2 m2 + 4n2 m2 + m4 = (n2 + m2 )2 . We now show that we get all primitive Pythagorean triples in this manner. Let a2 + b2 = c2 ,

a, b, c ∈ ℕ.

(5.1)

We divide through by c2 to obtain a 2 b 2 ( ) + ( ) = 1. c c Hence, from each Pythagorean triple (a, b; c) we get a solution of the equation x2 + y2 = 1

(5.2)

in rational numbers; we call these rational solutions x = ac , y = bc . Conversely, each rational solution (x, y) of (5.2) gives rise to a Pythagorean triple if we present x and y with the common denominator: x = ac , y = bc . Hence, the classification of the primitive Pythagorean triples reduces to the determination of all positive rational solutions

86 | 5 Pythagorean triples and sums of squares of (5.2). Equation (5.2) describes the circle E with center (0, 0) and radius 1. The point S = (0, −1) lies on E. Let g be the line through S and with slope λ ≠ 0: g ∶ y = λx − 1.

(5.3)

Let Pλ = (xλ , yλ ) ≠ S be the second point of intersection of g with the circle E, see Figure 5.2.

Figure 5.2: Intersection of g with the circle E.

get:

The coordinates of Pλ satisfy equation (5.2) and (5.3). If we plug (5.3) in (5.2) we x2 + y2 = 1 = x 2 + (λx − 1)2

and further (λ2 + 1)x2 − 2λx = 0.

(5.4)

Now, since λ ≠ 0 we have xλ ≠ 0. This gives xλ =

2λ +1

λ2

and yλ = λxλ − 1 =

λ2 − 1 . λ2 + 1

Hence, (xλ , yλ ) is a rational solution of (5.2) if and only if λ=

yλ + 1 ∈ ℚ. xλ

(5.5)

5.2 Classification of the Pythagorean triples | 87

We want to have all positive rational solutions of (5.2). For these λ > 0 and λ2 − 1 > 0, which gives λ > 1. Let λ ∈ ℚ, λ = mn > 1 with n, m ∈ ℕ and gcd(n, m) = 1. Then we obtain xλ =

2nm + m2

n2

and yλ =

n2 − m2 . n2 + m2

Since λ > 1 we therefore get a Pythagorean triple (a, b; c) by (2nm, n2 − m2 ; n2 + m2 ). This triple is primitive if and only if n − m is odd. This can be seen as follows. If n − m is odd then n + m = n − m + 2m is also odd and hence 2 ∤ (n2 − m2 ) since 2 n − m2 = (n − m)(n + m). If we would have a common prime divisor p of n and n2 − m2 then we would have p ∣ (n − m) or p ∣ (n + m), and hence p ∣ m which contradicts gcd(n, m) = 1. Analogously we may argue if we would have a common prime divisor of m and n2 − m2 . If n − m is even, then the triple is not primitive since 2 ∣ 2nm and 2 ∣ (n2 − m2 ). Altogether, we get each primitive Pythagorean triple in the form (2nm, n2 − m2 ; 2 n + m2 ) with n, m ∈ ℕ, n > m, gcd(n, m) = 1 and n − m odd. Remarks 5.3. (1) Let (a, b; c) be a primitive Pythagorean triple with a = n2 − m2 , b = 2nm and c = n2 + m2 such that gcd(n, m) = 1, n > m and n − m odd. We can not have c − a = 1 because otherwise 2m2 = 1. Let c − b = 1, that is, n2 + m2 − 2nm = 1 and hence, (n − m)2 = 1. Since n > m we get n − m = 1, that is, n = m + 1. If we plug this in the forms for a, b, c we get a = n2 − m2 = n + m = 2m + 1, b = 2nm = 2m2 + 2m and c = 2m2 + 2m + 1. Here we obtain precisely the Pythagorean triples known to Pythagoras. The book of Diophantus motivated Pierre Fermat (1601–1655) to make the following statement: The equation an + bn = cn , n ∈ ℕ, n ≥ 3, has no solution in natural numbers. He claimed, that he had a proof but the margin of his writing tablet was too small to write down the complete proof. It is now believed however that Fermat had no complete proof of this big statement. Fermat had a proof of this for exponent 4 using a method that he called infinite descent. It is believed that Fermat’s supposed proof of the big statement was also based on this technique. This result is now called Fermat’s Big Theorem. A proof of Fermat’s big statement was finally given in 1993 by Andrew Wiles, in part together with Taylor. Hence, we have the following.

88 | 5 Pythagorean triples and sums of squares Theorem 5.4 (Fermat’s Big Theorem). The equation an + bn = cn ,

n ∈ ℕ, n ≥ 3,

has no solution in natural numbers. Historically there were some remarkable single steps, that is, proofs for particular natural numbers n. We mention the following: n = 4∶ Fermat. n = 3 or 6∶ Euler (1770). n = 5∶ Legendre (1825). n = 7∶ Lamé (1839). An extended discussion on Fermat’s theorem can be found in [21]. The work done in attempting to solve the equation an + bn = cn led to several important mathematical theories like algebraic number theory and arithmetic geometry. We now return to the case n = 4 which was proven by Fermat using his method of infinite descent. Theorem 5.5. The Diophantine equation x4 + y4 = z 2 has no solution in natural numbers. In particular the equation x4 + y4 = z 4 has no solution in the integers. Proof. We assume that there exists a solution (x0 , y0 ; z0 ) of the equation x 4 + y4 = z 2 in natural numbers. We construct from this a solution (x1 , y1 ; z1 ) in natural numbers with z1 < z0 . We may assume that gcd(x0 , y0 , z0 ) = gcd(x0 , z0 ) = gcd(y0 , z0 ) = gcd(x0 , y0 ) = 1. Then (x02 , y02 ; z0 ) is a primitive Pythagorean triple as described in Theorem 5.2. From that x0 or y0 is even, and the other is odd. Hence, z0 is odd. We may assume that y0 is even. By Theorem 5.2 there are a, b ∈ ℕ with gcd(a, b) = 1, a > b, and x02 = a2 − b2 , y02 = 2ab, z0 = a2 + b2 ; a cannot be even, because then b would be odd (since x0 is odd) and hence, x02 ≡ −1 mod 4, which contradicts the general statement x02 ≡ 1 mod 4 for an odd x0 . Hence, a is odd, and because x0 is odd and x02 = a2 − b2 we get that then b is necessarily even. Moreover we have x02 + b2 = a2 . Again, since gcd(a, b) = 1, then

5.3 Sum of squares | 89

(x0 , b; a) is a primitive Pythagorean triple with b even. Then from Theorem 5.2 there are c, d ∈ ℕ with gcd(c, d) = 1, c > d and c − d odd such that x0 = c2 − d2 ,

b = 2cd,

a = c 2 + d2 .

Since x0 is odd, also c + d is odd. Claim. From gcd(a, b) = 1 we get gcd(c, d) = gcd(c, c2 + d2 ) = gcd(d, c2 + d2 ) = 1. Proof of the claim. Certainly gcd(c, c2 + d2 ) = gcd(d, c2 + d2 ) = 1. Let k ∈ ℕ with k ∣ c and k ∣ d. Then k ∣ c and k ∣ (c2 + d2 ), and hence, k = 1. Therefore also gcd(c, d) = 1. It follows y02 = 2ab = 4cd(c2 + d2 ), and hence ( 21 y0 )2 = cd(c2 + d2 ). By the second version of Euclid’s Lemma (see Theorem 2.32) there exist x1 , y1 , z1 with x12 = c,

y12 = d,

z12 = c2 + d2

which leads to z12 = c2 + d2 = x14 + y14 , and the triple (x1 , y1 ; z1 ) is a solution of x4 + y4 = z 2 in natural numbers. Further z1 ≤ z12 = c2 + d2 = a < a2 + b2 = z0 , that is, z1 < z0 . Therefore, if we assume that there exists a solution (x0 , y0 , z0 ) ∈ ℕ3 of x 4 + y4 = z 2 then there exist an infinite sequence (xk , yk ; zk ), k = 0, 1, 2, …, of solutions in natural numbers such that z0 > z1 > z2 > ⋯ > 0. By the well ordering of ℕ the sequence of the zk must have a smallest element which gives a contradiction because for each zk we have zk+1 < zk . Therefore, the equation x4 + y4 = z 2 has no solution (x, y; z) ∈ ℕ3 .

5.3 Sum of squares In the previous section we classified the Pythagorean triples (a, b; c) with a2 + b2 = c2 . Now we discuss the general question: What happens if we replace c2 by an n ∈ ℕ? Hence, we now consider the equation a2 + b2 = n, n ∈ ℕ, and ask for solutions a, b ∈ ℕ, or more generally a, b ∈ ℤ. Let a2 + b2 = n, n ∈ ℕ and a, b ∈ ℤ. If d = gcd(a, b) > 1 then a = da1 , b = db1 for a1 , b1 ∈ ℤ and gcd(a1 , b1 ) = 1, and d2 (a21 + b21 ) = n. Then d2 ∣ n, that is, n = d2 n1 for some n1 ∈ ℕ, and we get a21 + b21 = n1 . Hence, we always may assume that gcd(a, b) = 1. The main result about sums of squares is Fermat’s Two-Square Theorem. Recall that if a2 ≡ b mod n then b is called a quadratic residue modulo n.

90 | 5 Pythagorean triples and sums of squares Theorem 5.6. Let n ∈ ℕ, n ≥ 2. Then there exist a, b ∈ ℤ with gcd(a, b) = 1 and a2 +b2 = n if and only if −1 is a quadratic residue modulo n. We will prove Theorem 5.6 using properties of the classical modular group PSL(2, ℤ) = SL(2, ℤ)/{±E} where E = ( 01 01 ) and SL(2, ℤ) = {(

a c

b ) ∣ a, b, c, d ∈ ℤ, ad − bc = 1} , d

that is, we map a matrix A ∈ SL(2, ℤ) to the pair {A, −A}. In other words, the element A ∈ PSL(2, ℤ) is a pair {( ac db ), −( ac db )} with ( ac db ) ∈ SL(2, ℤ). There is no misunderstanding if we just write A = ±( ac db ), ( ac db ) ∈ SL(2, ℤ). If A = ±( ac db ) ∈ PSL(2, ℤ) then we call |a + d| the trace of A, written tr(A). Lemma 5.7. A ∈ PSL(2, ℤ) has order 2 if and only if tr(A) = 0. 2

a +bc (a+d)b Proof. Let A = ±( ac db ). Then A2 = ±( (a+d)c ). d2 +bc

Suppose that A has order 2. Then b ≠ 0 or c ≠ 0 since otherwise A = ±( 01 01 ) which has order 1. It follows that a + d = 0, that is, tr(A) = 0. Conversely suppose that tr(A) = 0, that is, a + d = 0. Then necessarily A2 = ±( 01 01 ) by computation. Recall that in a group G elements g1 , g2 ∈ G are conjugate if there is an x ∈ G with g2 = xg1 x−1 . Theorem 5.8. Let A ∈ PSL(2, ℤ) with tr(A) = 0. Then A is conjugate in PSL(2, ℤ) to 0 1 ), that is, there exists X ∈ PSL(2, ℤ) with X −1 AX = T. T = ±( −1 0 β Proof. Let A = ±( αγ −α ). Let S be the set of all conjugates of A in PSL(2, ℤ), that is, −1 S = {X AX ∣ X ∈ PSL(2, ℤ)}. Since conjugation preserves the trace, S consists of eleb ) ∈ S with |a| minimal. Such a Y exists ments of PSL(2, ℤ) with trace 0. Let Y = ±( ac −a because ℕ ∪ {0} is well ordered as ℕ is well ordered (see Chapter 1). We show that |a| = 0, that is, a = 0. Assume that a ≠ 0. Then −bc = a2 + 1 and hence, |b||c| = a2 + 1. It follows that b ≠ 0 ≠ c and |b| < |a| or |c| < |a|. This can be seen as follows. Assume that |b| ≥ |a| and |c| ≥ |a|. Let first |b| = |a|, and without loss of generality, a = b > 0. Since gcd(a, b) = 1 we then have a = b = 1 and c = −2. Let T = ±( 11 01 ). Then we get 0 1 ) contradicting the minimality of |a| ≠ 0. Therefore we have |b| > |a|. TYT −1 = ±( −1 0 Analogously we get |c| > |a|. But now a2 + 1 = |b||c| > a2 + 1 which gives a contradiction. Hence |b| < |a| or |c| < |a|. We may assume that |c| < |a| because with T = ±( 01 −1 0 ) −a −c α β α −β we have T −1 YT = ±( −b ∈ S. Hence, let |c| < |a|. Since with ±( also ±( ) ) ) a −γ −α is δ −α contained in PSL(2, ℤ) and because

−1 ±( 0

0 α )( 1 γ

β 1 )( −α 0

0 −α ) = ±( −1 γ

β ) α

5.3 Sum of squares | 91

we may, without loss of generality, assume that 0 < c < a, which implies 0 < a − c < a. Let U = ±( 01 11 ). Then 1 U −1 YU = ± ( 0

−1 a )( 1 c

b 1 )( −a 0

1 a−c ) = ±( 1 c

2a + b − c ) c−a

which contradicts the minimality of |a| because 0 < a − c < a. Therefore a = 0 and then 0 1 ) = T. −bc = 1. It follows b = ±1 and c = ∓1, and Y = ±( −1 0 We now consider the conjugates of T inside of the PSL(2, ℤ). Let A = ±( ac db ). Then d −b ) and A−1 = ±( −c a ATA−1 = ± (

a c

b 0 )( d −1

1 d )( 0 −c

−b −(bd + ac) ) = ±( a −(c2 + d2 )

a2 + b2 ). bd + ac

(5.6)

We remark that from B = ATA−1 it follows T = A−1 BA. We are now in the position to prove Fermat’s Two-Square Theorem. Proof of Theorem 5.6. Suppose that n = a2 +b2 with gcd(a, b) = 1. Then also gcd(b, n) = 1, that is, there exists a c ∈ ℤ with bc ≡ 1 mod n. It follows that a2 + b2 ≡ 0 mod n, hence (ac)2 ≡ −1 mod n. Therefore −1 is a quadratic residue modulo n. Conversely suppose that −1 be a quadratic residue modulo n. Then there exists an x ∈ ℤ with x2 ≡ −1 mod n or x2 = −1 + mn for some m ∈ ℤ. This implies x n −x2 + mn = 1. Therefore B = ±( −m −x ) ∈ PSL(2, ℤ). Since tr(B) = 0 we have that B is conjugated in PSL(2, ℤ) to T by Theorem 5.8. Hence B is of the above form (5.6) and because n > 0, a2 + b2 > 0 we get n = a2 + b2 . Further gcd(a, b) = 1 because ad − bc = 1 for A = ±( ac db ) ∈ PSL(2, ℤ). There are several alternative ways to approach the two-square theorem. In one approach the result is considered in terms of primes and the prime decomposition. Recall that in Chapter 3 we showed that if p ≥ 3 is a prime number then −1 is a quadratic residue modulo p if and only if p ≡ 1 mod 4. This leads to the following: Lemma 5.9. Let p be a prime number. Then p = a2 + b2 , a, b ∈ ℤ, if and only if p = 2 or p ≡ 1 mod 4. Proof. We only need to know in addition to the above remark that 2 = 12 + 12 . We now want to see which natural numbers n are sums of two squares. Lemma 5.10. Let n ∈ ℕ and n = a2 + b2 for a, b ∈ ℤ. Let p be a prime number with p ∣ n and p ≡ 3 mod 4. Then p ∣ a and p ∣ b. Proof. If p ∣ a then also p ∣ b because p ∣ n. Hence, assume that p ∤ a and p ∤ b. Then there exists a c ∈ ℤ with bc ≡ 1 mod p, and it follows (ac)2 ≡ −1 mod p which contradicts Lemma 5.9. Hence p ∣ a and p ∣ b.

92 | 5 Pythagorean triples and sums of squares The general equation (a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (bc + ad)2 for a, b, c, d ∈ ℤ now gives the following result. Theorem 5.11. Let n ∈ ℕ, n ≥ 2. (1) Then there exist a, b ∈ ℤ with gcd(a, b) = 1 and n = a2 + b2 if and only if β

β

β

n = 2ϵ p1 1 p2 2 ⋯ pkk with ϵ = 0 or 1 and pi ≡ 1 mod 4 for i = 1, 2, … , k. (2) Then there exist a, b ∈ ℤ with n = a2 + b2 if and only if β

β

β

γ

γ

γ

n = 2α p1 1 p2 2 ⋯ pkk q1 1 q22 ⋯ qr r with α ≥ 0, pi ≡ 1 mod 4 for i = 1, 2, … , k and qj ≡ 3 mod 4, γj even for j = 1, 2, … , r. Proof. (1) Here we only need to know that, in addition, −1 is not a quadratic residue modulo 2α with α ≥ 2. But this is always the case because −1 is not a quadratic residue modulo 4. (2) This follows from (1), Lemma 5.10, the general equation (a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (bc + ad)2 and the trivial equation a2 = a2 + 02 for a ∈ ℤ which we need by Lemma 5.10 for the case that a prime factor is equivalent to 3 modulo 4. Remark 5.12. Let n ∈ ℕ. One may ask if n is always a sum of three squares over ℤ. This is not the case in general. Numbers of the form n = 8m+7, m ∈ ℕ0 , for instance are not a sum of three squares because there do not exist a, b, c ∈ ℤ with a2 + b2 + c2 ≡ 7 mod 8. However we do have the Theorem of J. L. Lagrange (1736–1813) that says any natural number can be written as a sum of four squares. Theorem 5.13. Each natural number n can be written as a sum n = a2 + b2 + c2 + d2 with a, b, c, d ∈ ℤ. A proof can be found in [10].

Exercises | 93

Exercises 1.

Find a rule/formula to get infinitely many Pythagorean triples a2 + b2 = c2 with (a) c = b + 1; (b) |a − b| = 1. Prove your rule/formula. (Hint for (b): i

ni

mi

a = n2i − m2i

b = 2mi ni

c = m2i + n2i

1 2 3 4 ⋮

2 5 12 29 ⋮

1 2 5 12 ⋮

3 21 119 697 ⋮

4 20 120 696 ⋮

5 29 169 985 ⋮

Hence, show with induction over k, that n2k − m2k − 2mk nk = (−1)k with nk = 2nk−1 + nk−2 ,

mk = 2mk−1 + mk−2

and nj = mj+1 , 2.

Let a ∈ ℕ and d ∈ ℕ such that d ∣ a2 and 1 a2 b = ( − d) 2 d

3.

mj = nj−1 . a2 d

− d is an even natural number. Let

and

c = b + d.

(a) Prove, that (a, b; c) is a Pythagorean triple. (b) Give an example for (a, b; c). Let (x, y; z) be a primitive Pythagorean triple. Show gcd(

z+y z−y , ) = 1. 2 2

4. Prove the following statement: Let p be a prime number. Then there exist x, y ∈ ℤ with x 2 + y2 ≡ −1 mod p.

94 | 5 Pythagorean triples and sums of squares 5.

Show that the number of representations of m > 1 as a sum m = a2 + b2 with gcd(a, b) = 1 is equal to the number of solutions of x2 ≡ −1

6.

mod m.

(a) Let p, q, p > q be two odd prime numbers and n = pq. Show that n=(

p+q 2 p−q 2 ) −( ). 2 2

(b) The following Fermat’s factorization method is based on (a): Let n be an odd number, which is a product of two different prime numbers. We look for x, y ∈ ℤ with n = x2 − y2 = (x + y)(x − y) or equivalent

x2 − n = y2 .

We search for the smallest k ∈ ℕ with k 2 ≥ n. Consider the sequence k 2 − n,

(k + 1)2 − n,

(k + 2)2 − n,

(k + 3)2 − n,

…

until we find a value m ≥ √n with m a square number. Then, n = k 2 − m = (k + √m)(k − √m). The sequence terminates, because at least for (

n+1 n−1 2 )−n=( ) 2 2

we found a description of n via the trivial factorization n ⋅ 1. If we arrive at this situation without getting a difference of squares before, then n has only the factorization n ⋅ 1 and hence is a prime number. Use Fermat’s factorization method to find the factorization of n = 119 143. 7. Show the following: Let n be a composite positive integer. Let x, y ∈ ℤ with x 2 ≡ y2 mod n and x not congruent to ±y modulo n. Then gcd(x + y, n) and gcd(x − y, n) are proper divisors of n. Especially: Let n = pq with p, q two distinct prime numbers. Let x, y ∈ ℤ with x 2 ≡ y2 mod n and x not congruent to ±y modulo n. Then | gcd(x + y, n)| and | gcd(x − y, n)| are the prime factors of n. Apply this to (a) n = 7 529, x = 227 and y = 210; (b) n = 253, x = 17 and y = 6. 8. Show that, given integers x0 and n with x02 ≡ −1 mod n then there exist integers y and b with gcd(y, b) = 1 and 0 < b ≤ √n such that |−

x0 y 1 − |< . n b √ b n

6 Polynomials and unique factorization 6.1 Polynomials over a ring A large portion of secondary school mathematics is concerned with the solutions of polynomial equations over the rationals. That is finding solutions of equations of the form an x2 + an−1 xn−1 + ⋯ + a0 = 0, where an , an−1 , … , a0 are rational numbers. In this chapter we introduce polynomials over a general integral domain R and consider them as algebraic objects. From this, given an integral domain R we build the ring of polynomials R[x] over R, and we show that, if R is a field, then polynomials share many properties with the integers, in particular unique factorization into primes. Before we start we repeat some algebraic definitions that were introduced in Chapter 1. A ring is a set R ≠ ∅ equipped with two binary operations + ∶ R × R → R and ⋅ ∶ R × R → R satisfying the following three sets of properties for all a, b, c ∈ R: – R is a commutative group under addition, that is, (1) (a + b) + c = a + (b + c). (2) a + b = b + a. (3) There is a zero element 0 ∈ R such that a + 0 = a for all a ∈ R. (4) For each a ∈ R exists −a ∈ R such that a + (−a) = 0. We call −a the negative element of a. – R is a semigroup under multiplication, that is, (a ⋅ b) ⋅ c = a ⋅ (b ⋅ c). –

The multiplication is distributive with respect to the addition, that is, a ⋅ (b + c) = (a ⋅ b) + (a ⋅ c)

and

(b + c) ⋅ a = (b ⋅ a) + (c ⋅ a). A commutative ring with unity 1 is a ring R, in which the semigroup under multiplication is also commutative and has the unity element 1. The commutative ring with unity R is an integral domain if it does not have zero divisors, that is, there is no a ∈ R, a ≠ 0, such that ab = 0 for some b ∈ R, b ≠ 0. Hence, if a, b ∈ R, R an integral domain, with a ≠ 0 ≠ b then ab ≠ 0. Throughout this chapter we will only consider integral domains. Finally a commutative ring with unity K is a field if each nonzero element has a multiplicative inverse. The integers ℤ and the modular integers ℤp , with p a prime number, are integral domains while it is easy to prove that any field, for example the rational numbers ℚ, are integral domains. DOI 10.1515/9783110516142-006

96 | 6 Polynomials and unique factorization For the remainder of this chapter we let R be an integral domain. We now consider an integral domain R and build the ring of polynomials R[x] over R. We write ℕ0 = ℕ ∪ {0}. We provide the set R̃ = {f ∶ ℕ0 → R ∣ f (i) ≠ 0 for only finitely many i}, with an addition and a multiplication as follows (f + g)(m) ∶= f (m) + g(m)

and m

(f ⋅ g)(m) ∶= ∑ f (i)g(j) = ∑ f (i) ⋅ g(m − i). i+j=m

i=0

R̃ itself may be considered as the set of all sequences (a0 , a1 , a2 , …), of elements of R with only finitely many ai ≠ 0. Since all but finitely many of the ai = 0, there exists an m ∈ ℕ0 such that ai = 0 for all i > m. Thus an ≠ 0 for only finitely many n, and an can be nonzero only for n ≤ m. R̃ together with the given addition and multiplication is a commutative ring with zero element (0, 0, 0, …)

(additive identity),

where 0 is the additive identity of R and identity (1, 0, 0, …)

(multiplicative identity),

where 1 is the multiplicative identity of R. The map φ ∶ R → R̃ a ↦ (a, 0, 0, …) is an embedding, that is, it is injective and respects the addition and the multiplication, that is, φ(a + b) = (a + b, 0, 0, …) = (a, 0, 0, …) + (b, 0, 0, …) = φ(a) + φ(b), φ(a ⋅ b) = (a ⋅ b, 0, 0, …) = (a, 0, 0, …) ⋅ (b, 0, 0, …) = φ(a) ⋅ φ(b).

6.1 Polynomials over a ring

| 97

Hence, we now identify R with φ(R) ⊂ R.̃ Now we introduce the usual notation for the elements of R.̃ We let x = (0, 1, 0, 0, …), and define x0 = (1, 0, 0, …) = 1, and xi+1 = x ⋅ xi

for i ∈ ℕ0 .

From the definition of the multiplication we see that xi = (0, … , 0, 1, 0, … , 0), where the only nonzero entry is a 1 on the (i + 1)-th spot, and from the identification of R with φ(R) we get (ai , 0, … , 0, …) ⋅ xi = ai xi . Therefore we may write (a0 , a1 , a2 , …) ∈ R̃

as ∑ ai x i i≥0

(recall that ai ≠ 0 only for finitely many i). From the definition via sequences we now automatically have that the above representation is unique, that is, if ∑ ai xi = ∑ bi xi

i≥0

i≥0

then ai = bi

for all i = 0, 1, 2, … .

Since ai ≠ 0 only for finitely many i, we get n

∑ ai xi = ∑ ai xi

i≥0

i=0

for some n ∈ ℕ0 .

Especially ai = 0 for i > n. We now call n

∑ ai xi ,

i=0

a polynomial over R and write n

f (x) ∶= a0 + a1 x + ⋯ + an xn = ∑ ai x i , i=0

and we call x an indeterminate over R. We denote the set of polynomials f (x) over R as R[x]. We summarize what we have done.

98 | 6 Polynomials and unique factorization Theorem 6.1. Let R be an integral domain. Then R[x] together with the addition and multiplication previously defined forms a ring with identity. We call R[x] the ring of polynomials over R with indeterminate x. Further R[x] is an integral domain. Proof. The fact that R[x] satisfies the properties to make it a commutative ring with unity is straightforward calculations. We leave it for the exercises. Here we show that it is an integral domain. Let f (x), g(x) ∈ R[x] with f (x) ≠ 0 ≠ g(x). Then f (x) = a0 + a1 x + ⋯ + an xn ,

n ∈ ℕ0 ,

an ≠ 0,

g(x) = b0 + b1 x + ⋯ + bm xm ,

m ∈ ℕ0 ,

bm ≠ 0.

and

Then f (x)g(x) = a0 b0 + (a0 b1 + a1 b0 )x + ⋯ + an bm x n+m , and an bm ≠ 0 since R is an integral domain. Definition 6.2. (1) Let f (x) = ∑ni=0 ai xi ≠ 0, an ≠ 0, n ∈ ℕ0 , then we call n the degree of f (x), written as n = deg(f (x)). (2) If f (x) = 0, the zero polynomial, then we say that f (x) has degree −∞. We recall the usual calculations: −∞ < n for all n ∈ ℤ, −∞ + n = −∞ for all n ∈ ℤ, and, −∞ + (−∞) = −∞. The next lemma gives the common properties of degree. Lemma 6.3. Let f (x), g(x) ∈ R[x]. Then (1) deg(f (x) + g(x)) ≤ max(deg(f (x)), deg(g(x))), (2) deg(f (x) ⋅ g(x)) = deg(f (x)) + deg(g(x)). Proof. Lemma 6.3 follows directly from the definition of the addition and multiplication. Definition 6.4. Let f (x) = a0 + a1 x + ⋯ + an x n with an ≠ 0, that is, n = deg(f (x)). We call an the leading coefficient of f (x), and f (x) monic if the leading coefficient is 1.

6.2 Divisibility in rings We now want to show that the ring R[x] of polynomials has a similar behavior with respect to divisibility as the ring ℤ of the integers, especially if R is a field. We first talk about general divisibility in rings.

6.2 Divisibility in rings | 99

Recall that a subring I of R is called an ideal of R if, in addition, ra ∈ I for any a ∈ I and r ∈ R. Example 6.5. Let a ∈ R. The set (a) = {ra ∣ r ∈ R} is an ideal, the principal ideal generated by a ∈ R. We now define divisibility in any integral domain R. Definition 6.6. (a) Let a, b ∈ R. We call a a divisor of b and b a multiple of a (or divisible by a) if there exists a q ∈ R with b = qa. We denote this by a ∣ b. If a is not a divisor of b, then we write also a ∤ b. (b) An element ϵ ∈ R is called a unit if there is an η ∈ R with ϵη = 1. This is equivalent to ϵ ∣ 1. We write also η = ϵ−1 = ϵ1 (the last is well defined because R is commutative). (c) Two elements a, b ∈ R are called associates if there exists a unit ϵ ∈ R with b = ϵa. We write then a ∼ b. (d) a ∈ R is called a proper divisor of b, if a ∣ b but a ≁ b. (e) An element p ∈ R is called a prime element of R if the following holds: (i) p ≠ 0 and p is not a unit. (ii) p = ab ⇒ either a or b is a unit. An element a ∈ R is called composite, if (i) is satisfied but not (ii). Remark 6.7. We note that our definition for prime elements is not the usual one in the literature but in the case of R = K[x], K a field, our definition is equivalent to the usual one. Examples 6.8. The units of ℤ are ±1, and the prime elements of ℤ are ±p, where p ∈ ℕ is a prime number. Theorem 6.9. The following holds in any integral domain R: (1) “∼” is an equivalence relation. (2) The set of the units forms a group with respect of the multiplication in R, the group of units of R. (3) a ∣ b ⟺ (b) ⊂ (a). (i)

(ii)

(4) a ∼ b ⟺ (a ∣ b and b ∣ a) ⟺ (a) = (b). Proof. (1): “∼” is certainly reflexive. “∼” is symmetric because if a ∼ b, that is, a = ϵb for some unit ϵ ∈ R, then b = ϵ−1 a, that is, b ∼ a. “∼” is transitive because if a ∼ b and b ∼ c, that is, a = ϵ1 b and b = ϵ2 c for units ϵ1 , ϵ2 ∈ R, then a ∼ c because a = ϵ1 ϵ2 c, and ϵ2 ϵ2 is a unit of R since from ϵ1 η1 = ϵ2 η2 = 1 we get ϵ1 ϵ2 η1 η2 = 1. (2) is obvious because the units are just the elements of R which have a multiplicative inverse in R. For (3)

100 | 6 Polynomials and unique factorization a ∣ b ⟺ b = ra for some r ∈ R ⟺ b ∈ (a) ⟺ (b) ⊂ (a). Finally for (4), because of (3) it is enough to show that (i) holds but this is the case. From a ∼ b we have a ∣ b. Since “∼” is symmetric we also have b ∼ a, hence also b ∣ a. Lemma 6.10. The following divisibility properties hold, analogously to ℤ, for a, b, c ∈ R for any integral domain R: (1) a ∣ b and b ∣ c ⇒ a ∣ c. (2) c ∣ a and c ∣ b ⇒ c ∣ (r1 a + r2 b) for all r1 , r2 ∈ R. (3) ϵ ∣ a and ϵa ∣ a for all units ϵ ∈ R. (4) 0 ∣ a ⟺ a = 0. (5) a ∣ 0 for all a ∈ R.

6.3 The ring of polynomials over a field K We now consider K[x] where K is a field and prove that there is unique factorization into primes analogous to the fundamental theorem of arithmetic in the integers ℤ. The proof is almost identical to the proof in ℤ once we define a division algorithm, gcd’s and an Euclidean algorithm for K[x]. First we characterize the primes and units in K[x]. Theorem 6.11. Let K be a field and K[x] be the ring of polynomials in x over K. Then the following hold: (1) The units of K[x] are exactly the elements a ∈ K with a ≠ 0. (2) The prime elements of K[x] are exactly those polynomials f (x) ∈ K[x] with deg(f (x)) ≥ 1 and the property: If f (x) = g(x)h(x) with g(x), h(x) ∈ K[x] then either deg(g(x)) = 0 or deg(h(x)) = 0, that is, either g(x) ∈ K ⧵ {0} or h(x) ∈ K ⧵ {0}. Proof. Let f (x) ∈ K[x] be a unit. Then there exists a g(x) ∈ K[x] with f (x)g(x) = 1. From deg(f (x)g(x)) = deg(f (x)) + deg(g(x)) and deg(1) = 0 we get deg(f (x)) = 0, that is, f (x) ∈ K ⧵ {0}. On the other side, a ∈ K ⧵ {0} is certainly a unit in K[x] because there is a b ∈ K ⧵ {0} with ab = 1. Let f (x) be a prime element in K[x]. By definition then the following holds: f (x) ≠ 0, deg(f (x)) ≥ 1 by (1), and from f (x) = g(x)h(x) with g(x) or h(x) a unit we get g(x) ∈ K ⧵ {0} or h(x) ∈ K ⧵ {0}. On the other side, if f (x) ∈ K[x] such that deg(f (x)) ≥ 1 and the given property holds, then certainly f (x) is a prime element of K[x]. Remark 6.12. We call the prime elements of K[x], K a field, irreducible polynomials over K. If f (x) ∈ K[x] with deg(f (x)) ≥ 1, and if there exist g(x), h(x) in K[x] with deg(g(x)) ≥ 1, deg(h(x)) ≥ 1 and f (x) = g(x)h(x), then we call f (x) a reducible polynomial over K.

6.3 The ring of polynomials over a field K

|

101

Examples 6.13. (1) For each a ∈ K the linear polynomial x − a is irreducible. (2) The polynomial x2 + 1 is irreducible over ℚ. This follows from the following. Let x2 + 1 = (x + a)(x + b), a, b ∈ ℚ. Then x2 + 1 = x2 + (a + b)x + ab, that is, a = −b and ab = −a2 = 1. This gives a contradiction because there is no a ∈ ℚ with a2 = −1. (3) The polynomial x2 − 1 is reducible over ℚ because x2 − 1 = (x − 1)(x + 1). 6.3.1 The division algorithm for polynomials Since K[x], K a field, we may write g(x) ∣ f (x) if there is a q(x) ∈ K[x] with f (x) = q(x)g(x) for f (x), g(x) ∈ K[x]. Since K[x] is an integral domain we have the following divisibility properties. Lemma 6.14. In the ring of polynomials K[x] over a field K: (1) g(x) ∣ f (x) and f (x) ∣ g(x) ⇒ g(x) = af (x) for some a ∈ K. (2) g(x) ∣ f1 (x) and g(x) ∣ f2 (x) ⇒ g(x) ∣ (k1 (x)f1 (x) + k2 (x)f2 (x)) for all k1 (x), k2 (x) ∈ K[x]. (3) g(x) ∣ f (x) and f (x) ∣ h(x) ⇒ g(x) ∣ h(x). (4) af (x) ∣ f (x) for all a ∈ K ⧵ {0}. (5) If f (x) ∈ K[x] is irreducible over K and a ∈ K ⧵ {0}, then also af (x) is irreducible over K. The main purpose of this chapter is to present the fundamental theorem for polynomials, that is, each nontrivial polynomial f (x) ∈ K[x], K a field, can be written, up to the order and field elements, uniquely as a product of irreducible polynomials. This is an analogous result to the fundamental theorem of arithmetic. As in the first proof for this theorem we could directly use the second inductive principle. However our emphasis here is to present the Euclidean algorithm for polynomials and, as an application, Euclid’s lemma. This then automatically leads to the proof of the fundamental theorem for polynomials. Crucial to this development is the division algorithm for K[x] which is given in the next theorem. We state it for integral domains but the result specializes to fields. Theorem 6.15 (Division algorithm). Let R be, as above, an integral domain. Let f (x), g(x) ∈ R[x] with g(x) ≠ 0 and g(x) monic. Then there exist q(x), r(x) ∈ R[x] with f (x) = q(x)g(x) + r(x), where deg(r(x)) < deg(g(x)).

102 | 6 Polynomials and unique factorization This representation is unique, that is, q(x) and r(x) are uniquely determined by f (x) and g(x). Proof. We use the second inductive principle, that is, course of values induction. If f (x) = 0 just choose q(x) = r(x) = 0. Now, let f (x) ≠ 0. If f (x) ∈ R∗ = R ⧵ {0} and deg(g(x)) ≥ 1, then just choose q(x) = 0 and r(x) = f (x). If f (x) ∈ R∗ and g(x) ∈ R∗ , that is, g(x) = 1, because g(x) is monic, then just choose q(x) = f (x) and r(x) = 0. Hence, Theorem 6.15 is proved for deg(f (x)) ≤ 0, also certainly the uniqueness statement. Now, let n > 0 and Theorem 6.15 be proved for all f (x) ∈ R[x] with deg(f (x)) < n. Now, given f (x) = an xn + an−1 xn−1 + ⋯ + a1 x + a0

with an ≠ 0,

and g(x) = xm + bm−1 xm−1 + ⋯ + b1 x + b0 . If m > n then just choose q(x) = 0 and r(x) = f (x). Now, finally, let 1 ≤ m ≤ n. We define h(x) = f (x) − an x n−m g(x). We have deg(h(x)) < n, hence, by induction assumption there are q1 (x) and r(x) with h(x) = q1 (x)g(x) + r(x) and deg(r(x)) < deg(g(x)). Then f (x) = h(x) + an xn−m g(x)

= (an xn−m + q1 (x))g(x) + r(x) = q(x)g(x) + r(x)

with q(x) = an x n−m + q1 (x),

which proves the existence. We now show the uniqueness. Let f (x) = q1 (x)g(x) + r1 (x)

= q2 (x)g(x) + r2 (x),

with deg(r1 (x)) < deg(g(x)), deg(r2 (x)) < deg(g(x)). Assume r1 (x) ≠ r2 (x). Let deg(r1 (x)) ≥ deg(r2 (x)). We get (q2 (x) − q1 (x))g(x) = r1 (x) − r2 (x). This contradicts Lemma 6.3 because deg(r1 (x) − r2 (x)) < deg(g(x)), and q2 (x) − q1 (x) ≠ 0

if r1 (x) ≠ r2 (x).

Therefore r1 (x) = r2 (x) and further q1 (x) = q2 (x) because R[x] is an integral domain.

6.3 The ring of polynomials over a field K

|

103

Example 6.16. Given f (x) = 2x 3 + x2 − 5x + 3 and g(x) = x 2 + x + 1. We use the following scheme: −6x + 4 ( 2x 3 + x2 − 5x + 3 ) ∶ (x2 + x + 1) = 2x − 1 + 2 x +x+1 − 2x 3 − 2x2 − 2x − x2 − 7x + 3 x2 + x + 1 − 6x + 4 Hence: q(x) = 2x −1, r(x) = −6x +4 and 2x3 +x2 −5x +3 = (2x −1)(x 2 +x +1)+(−6x +4). Since any field K is an integral domain this theorem applies directly to K[x] where K is a field. Corollary 6.17. Let K be a field and f (x), g(x) ∈ K[x] with g(x) ≠ 0. Then there exist q(x), r(x) ∈ K[x] with f (x) = q(x)g(x) + r(x) where deg(r(x)) < deg(g(x)). This representation is unique, that is, q(x) and r(x) are uniquely determined by f (x) and g(x). Proof. Let g(x) = am xm + ⋯ + a1 x + a0 , m ≥ 0 and am ≠ 0. Then and hence there are unique q1 (x), r(x) ∈ K[x] with f (x) = deg(r(x)) < deg(g(x)). Now define q(x) =

1 q (x). am 1

1 g(x) am

q1 (x) ( a1 g(x)) m

is monic,

+ r(x) and

Examples 6.18. (1) Let f (x) = 3x 4 − 6x2 + 8x − 6 and g(x) = 2x2 + 4. Then 3x4 − 6x2 + 8x − 6 = ( 32 x2 − 6)(2x4 + 4) + 8x + 18 using the above scheme. (2) Let f (x) = 2x5 + 2x 4 + 6x3 + 10x2 + 4x and g(x) = x 2 + x. Then 2x5 + 2x4 + 6x 3 + 10x2 + 4x = (2x 3 + 6x + 4)(x 2 + x) using the above scheme. 6.3.2 Zeros of polynomials As we mentioned earlier one of the central problems of elementary algebra is to determine the solutions of polynomial equations over the rationals. That is, finding solutions of equations of the form f (x) = 0 where f (x) ∈ ℚ[x]. Before we continue to the fundamental theorem of polynomials over a general field K we consider zeros of polynomials over a general integral domain. Suppose that R is an integral domain and f (x) ∈ R[x]. If we substitute a ∈ R in f (x) for the indeterminate x we obtain a mapping f ∶R→R a ↦ f (a), that is, we evaluate f (x) at a ∈ R. We call such a map from R to R a polynomial map. This is the manner in which polynomials were examined in elementary algebra and calculus.

104 | 6 Polynomials and unique factorization For a ∈ R we get f (x) = g(x) + h(x) ⇒ f (a) = g(a) + h(a), f (x) = g(x) ⋅ h(x) ⇒ f (a) = g(a) ⋅ h(a), for f (x), g(x), h(x) ∈ R[x]. Definition 6.19. Let f (x) ∈ R[x] and a ∈ R. If f (a) = 0 then call the element a a zero of f (x). We first show that if a is a zero of f (x) then (x − a) ∣ f (x). From this we obtain that a polynomial of degree n over an integral domain can have at most n different zeros. Theorem 6.20. Let a ∈ R be a zero of f (x) ∈ R[x]. Then f (x) = q(x)(x − a) for some q(x) ∈ R[x]. Proof. By Theorem 6.15 there are q(x), r(x) ∈ R[x] with f (x) = q(x)(x − a) + r(x) where deg(r(x)) < deg(x − a) = 1, that is, r(x) = r ∈ R. Evaluation of f (x) at a leads to 0 = f (a) = q(a)(a − a) + r, and hence r = 0. Corollary 6.21. Let a1 , a2 , … , am ∈ R be zeros of f (x) ∈ R[x] with ai ≠ aj for i ≠ j. Then f (x) = (x − a1 )(x − a2 ) ⋯ (x − am )q(x), for some q(x) ∈ R[x]. Proof. The statement holds for m = 1 by Theorem 6.20. Assume that the corollary holds for m ≥ 1. Let a1 , a2 , … , am+1 be zeros of f (x) with ai ≠ aj for i ≠ j. Then there is by the inductional assumption f (x) = (x − a1 )(x − a2 ) ⋯ (x − am )q1 (x),

(6.1)

for some q1 (x) ∈ R[x]. Evaluation at am+1 gives 0 = f (am+1 ) = (am+1 − a1 )(am+1 − a2 ) ⋯ (am+1 − am )q1 (am+1 ). Since am+1 − ai ≠ 0 for i ≤ m and R is an integral domain we get q1 (am+1 ) = 0, hence, q1 (x) = (x − am+1 )q(x), q(x) ∈ R[x], by Theorem 6.20. If we put this in equation (6.1), we get Corollary 6.21. Corollary 6.22. Let f (x) ∈ R[x] with deg(f (x)) ≥ 1. Then f (x) has at most n pairwise different zeros in R.

6.3 The ring of polynomials over a field K

|

105

Proof. Suppose f (x) ∈ R[x] with deg(f (x)) ≥ 1 = n and suppose f (x) has n pairwise different zeros c1 , c2 , … , cn in R. Then directly from Theorem 6.20 and Corollary 6.21 it follows that f (x) = a(x − c1 )(x − c2 ) ⋯ (x − cn )

with a ∈ R.

Let c be any other zero of f (x), then f (c) = 0 = a(c − c1 )(c − c2 ) ⋯ (c − cn ). Since a field is an integral domain one of these terms must be zero and hence c − ci = 0 for some i and hence c = ci . We now show a very important result concerning zeros of rational polynomials. This is known as the Rational Root Theorem. For this result we need the reduced representation for rational numbers. Let sr ∈ ℚ, r, s ∈ ℤ, s ≠ 0. If gcd(r, s) = d > 1 then r = dr1 and s = ds1 with

gcd(r1 , s1 ) = 1 and

r s

=

r1 . s1

Theorem 6.23 (The Rational Root Theorem). Let f (x) = a0 + a1 x + ⋯ + an x n ∈ ℤ[x] with an ≠ 0, deg(f (x)) = n ≥ 1. Let sr ∈ ℚ be a zero of f (x) and gcd(r, s) = 1, where r, s ∈ ℤ. Then r ∣ a0 and s ∣ an . If especially an = 1, that is, f (x) is monic, then s = ±1, that is, sr = ±r ∈ ℤ and r ∣ a0 . Proof. We evaluate f (x) at sr . Multiplication with sn gives 0 = a0 sn + a1 rsn−1 + ⋯ + an−1 sr n−1 + an r n n

n−1

n

n−1

−an r = s(a0 s

−a0 s = r(a1 s

+ ⋯ + an−1 r

+ ⋯ + an r

n−1

n−1

and

),

).

Since gcd(r, s) = 1 we get r ∣ a0 and s ∣ an by Euclid’s Lemma. Examples 6.24. (1) f (x) = x2 + x − 6. The possible rational roots of f (x) are ±1, ±2, ±3 and ±6. Evaluation at these elements gives the rational roots 2 and −3. (2) f (x) = x3 − 4x 2 − 7x + 10. The possible rational roots of f (x) are ±1, ±2, ±5 and ±10. Evaluation at these numbers gives rational roots 1, −2 and 5. Corollary 6.25. Let p be a prime number and f (x) = x n − p ∈ ℤ[x] with n ≥ 2. Then there does not exist an a ∈ ℚ with f (a) = 0. Proof. Assume a ∈ ℚ is a zero of f (x). Then a ∈ ℤ and a = ±1 or a = ±p by Corollary 6.23. Since p ≥ 2, then a = ±1 is not possible. If a = ±p then we get (±p)n − p = p((±p)n−1 − 1) = 0, which also is not possible because p ≥ 2. Hence, there is no a ∈ ℚ with f (a) = 0.

106 | 6 Polynomials and unique factorization This implies f (x) = xn − p, n ∈ ℕ, n ≥ 2, p a prime number, is irreducible over ℚ. The division algorithm easily shows that there do not exist g(x), h(x) ∈ ℚ[x] with 1 ≤ deg(g(x)), deg(h(x)) and xn − p = g(x)h(x). This statement will be extended by the following known as the Eisenstein criterion after G. Eisenstein (1823–1852). Theorem 6.26 (Eisenstein criterion). Let f (x) = a0 + a1 x + ⋯ + an x n ∈ ℤ[x] with an ≠ 0, n ≥ 1. Let p be a prime number with (i) p ∣ ai for i = 0, 1, … , n − 1. (ii) p ∤ an . (iii) p2 ∤ a0 . Then f (x) is irreducible over ℚ. We need some properties before we can prove Theorem 6.26. Definition 6.27. A polynomial f (x) = ∑ni=0 ai x i ∈ ℤ[x], an ≠ 0, n ≥ 1, is called primitive, if the greatest common divisor of those coefficients which are nonzero is equal to 1. Lemma 6.28. (a) If f (x) ∈ ℚ[x], f (x) ≠ 0, then there exists an a ∈ ℚ such that af (x) ∈ ℤ[x] is primitive. (b) If f (x), g(x) ∈ ℤ[x] with g(x) primitive and f (x) = ag(x) for some a ∈ ℚ, then a ∈ ℤ. (c) If f (x) ∈ ℤ[x], f (x) ≠ 0, then there exist a b ∈ ℤ and a primitive g(x) ∈ ℤ[x] such that f (x) = bg(x). Proof. For (a) let f (x) = ∑ni=0 ai xi with ai =

ri , ri , si si

∈ ℤ, si ≠ 0 for i = 0, 1, 2, … , n. Let

s = s0 s1 ⋯ sn . Then sf (x) ∈ ℤ[x], sf (x) ≠ 0. Let d be the greatest common divisor of those coefficients of sf (x) which are nonzero. Let a = ds . Then af (x) ∈ ℤ[x] is primitive. For (b) let a ≠ 0 (if a = 0 then there is nothing to show). Let a = sr with r, s ∈ ℤ, gcd(r, s) = 1. Assume that a ∉ ℤ. Then there exists a prime number p with p ∣ s. Since g(x) is primitive, then p does not divide all coefficients of g(x). From f (x) = sr g(x) we get sf (x) = rg(x) with p ∣ s and p ∤ r. Hence, p must divide all coefficients of g(x) which contradicts that g(x) is primitive. Hence, a ∈ ℤ. The third item (c) follows directly from (a) and (b). Lemma 6.29 (Gauss’s Lemma). Let f (x), g(x) ∈ ℤ[x] be primitive. Then also f (x) ⋅ g(x) is primitive. Proof. Assume that f (x) ⋅ g(x) is not primitive. Then there exists a prime number p which divides each coefficient of f (x) ⋅ g(x). Then p ∣ f (x) or p ∣ g(x). Assume not. Let

6.3 The ring of polynomials over a field K

|

107

f (x) = ∑ni=0 ai xi , g(x) = ∑m b xj . Let ar be the first coefficient of f (x) which is not divisj=0 j ible by p and, respectively, bs those of g(x). The coefficient of xr+s of f (x) ⋅ g(x) is ar bs + ar+1 bs−1 + ⋯ + ar+s b0 + ar−1 bs+1 + ⋯ + a0 br+s . This sum is divisible by p. Since p ∣ ai for 0 ≤ i < r and p ∣ bj for 0 ≤ j < s we get p ∣ ar bs which is not the case. Hence p ∣ f (x) or p ∣ g(x). This contradicts the fact that f (x) and g(x) are primitive. This proves Lemma 6.29. We now proof Theorem 6.26, the Eisenstein criterion. Proof. Let the polynomial f (x) be as given. Assume there are g(x), h(x) ∈ ℚ[x] with deg(g(x)), deg(h(x)) ≥ 1 and f (x) = g(x)h(x). By Lemmas 6.28 and 6.29 we may assume that g(x), h(x) ∈ ℤ[x]. Let k

g(x) = ∑ bi xi , i=0

bk ≠ 0,

and l

h(x) = ∑ cj xj , j=0

cl ≠ 0.

We have a0 = b0 c0 . Since p ∣ a0 and p2 ∤ a0 we have that p ∤ b0 or p ∤ c0 . Without loss of generality we may assume that p ∣ b0 and p ∤ c0 . Since an = bk cl and p ∤ an we have p ∤ bk . We consider aj = bj c0 + bj−1 c1 + ⋯ + b0 cj

for j = 0, 1, … , n.

Since p ∣ b0 and p ∤ an there exists a j such that p divides each summand in aj = bj c0 + bj−1 c1 + ⋯ + b0 cj besides bj c0 . Then p ∤ aj because p ∤ bj c0 . This implies j = k = n because p ∣ ai for i = 0, 1, … , n − 1. Hence deg(g(x)) = deg(f (x)) and deg(h(x)) = 0, that is, h(x) ∈ ℚ. Therefore f (x) is irreducible over ℚ. Examples 6.30. (1) f (x) = xn − p, n ≥ 2, p a prime number, is irreducible over ℚ. (2) f (x) = xp−1 + xp−2 + ⋯ + x + 1 is irreducible over ℚ. Proof. The polynomial f (x) is irreducible over ℚ then f (x + a), a ∈ ℤ is also irreducible over ℚ. Recall that (x − 1)f (x) = xp − 1. Then, by the binomial formula (see for instance Chapter 12 for a more general discussion of the binomial coefficients and the binomial formula), we get xf (x + 1) = (x + 1)p − 1

p p = xp + ( )xp−1 + ⋯ + ( )x, 1 p−1

108 | 6 Polynomials and unique factorization that is, p p f (x + 1) = xp−1 + ( )x p−2 + ⋯ + ( ). 1 p−1 p! and it is an integer. It follows that p ∣ ( pi ) for 1 ≤ i ≤ p − 1 and p2 i!(p−i)! p ( p−1 ) = p. Hence, f (x) is irreducible over ℚ.

Now ( pi ) = because

p

∤ ( p−1 )

The following presents a useful criterion for polynomials f (x) ∈ K[x], K a field, with 2 ≤ deg(f (x)) ≤ 3 to be irreducible. Theorem 6.31. Let K be a field, f (x) ∈ K[x] with 2 ≤ deg(f (x)) ≤ 3. f (x) is irreducible over K if and only if f (x) has no zero in K. Proof. Suppose that f (x) is irreducible over K and assume that f (x) has a zero c ∈ K. Than f (x) = h(x)(x − c) with 1 ≤ deg(h(x)) ≤ 2, what contradicts the irreducibility of f (x). Conversely suppose that f (x) has no zero in K. Assume that f (x) = g(x)h(x) over K with 1 ≤ deg(g(x)), deg(h(x)). We must have deg(g(x)) = 1 or deg(h(x)) = 1 because 2 ≤ deg(f (x)) ≤ 3. Let deg(g(x)) = 1. Then g(x) = ax + b with a, b ∈ K, a ≠ 0. Then c = −ba−1 is a zero of g(x) in K, and hence also of f (x) which gives a contradiction. Therefore f (x) is irreducible over K. Example 6.32. f (x) = 8x 3 − 6x − 1 is irreducible over ℚ. Proof. Assume that ba , a, b ∈ ℤ, b ≠ 0, gcd(a, b) = 1, is a zero of f (x). Then 8a3 − 6ab2 − b3 = 0. Since gcd(a, b) = 1 we get a = ±1 and b = ±2 because 2 ∣ b, b2 ∣ 8. But all possible combinations for a and b give 8a3 − 6ab2 − b3 ≠ 0. Hence, f (x) is irreducible over ℚ.

6.4 Horner-Scheme The Horner-Scheme is a numerical algorithm to evaluate polynomials at a given value. Definition 6.33. A numerical algorithm is the splitting of a (numerical) problem into elementary operations, which are for instance +, −, ⋅, /, √ , ex , sin, … . An algorithm to solve a specific problem is called – finite, if the problem is solvable by the algorithm in finitely many steps;

6.4 Horner-Scheme

–

| 109

iterative, if the algorithm yields an infinity sequence of values, which approximate the solution of the problem arbitrary closely. Algorithms of this kind need a break criterion, to stop after finitely many steps.

The Horner-Scheme is an example for a finite algorithm. Its aim is to evaluate a polynomial at a given value x. Let P(x) be a polynomial, it is m

P(x) = a0 + a1 x + a2 x2 + ⋯ + am x m = ∑ ai x i . i=0

Then the naive evaluation of P(x) at x needs – the building of the powers up to xm (m − 1 multiplications), – the multiplication of xk with the corresponding coefficient ak (m multiplications) and – the final add up (m additions). Altogether there are 2m − 1 multiplications and m additions. An algorithm of Honer solves this problem with a faster method, it is named Horner’s method or Horner-Scheme or Horner’s rule of W. G. Horner (1786–1837): we consider a polynomial P(x) = a0 + a1 x + a2 x2 + ⋯ + am x m written as P(x) = a0 + x(a1 + x(a2 + x(a3 + ⋯ + x(am−1 + xam ) ⋯))), and calculate the brackets beginning with the inner bracket and ending with the outside bracket. Example 6.34. It is P(x) = 4 + 3x − 2x2 + x5 = 4 + x(3 − 2x + x4 ) = 4 + x(3 + x(−2 + x 3 )), and Q(x) = −1 + 2x − x2 + 3x3 − 2x4 = −1 + x(2 − x + 3x2 − 2x3 ) = −1 + x(2 + x(−1 + 3x − 2x2 )) = −1 + x(2 + x(−1 + x(3 − 2x))). The evaluation with the Horner-Scheme needs m additions and m multiplications, whereby deg(P(x)) = m. Due to the fact, that modern computers are able to execute 1 multiplication and 1 addition in one calculating-step, the Horner-Scheme needs half the complexity than the naive algorithm. Using the Horner-Scheme by hand to evaluate P(x), with P(x) = a0 + x(a1 + x(a2 + x(a3 + ⋯ + x(am−1 + xam ) ⋯))),

110 | 6 Polynomials and unique factorization we execute the following scheme: am ⋅x

am−1

0

am−2

bm x ↗

bm

bm−1

bm−1 x ↗

a3

…

bm−2

b4 x

… ↗

a2

↗

b3

…

a1

b3 x ↗

a0

b2 x ↗

b2

b1

b1 x ↗

b0 = P(x)

Whereby it is bm ∶= am ,

(6.2)

bk−1 ∶= ak−1 + bk ⋅ x,

k = m, … , 1.

(6.3)

After m steps it is b0 = P(x). Hence, P(x) = a0 + x(a1 + x(a2 + x(a3 + ⋯ + x(am−2 + x(a m−1 + xam )) ⋯))) . ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =bm−1 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =bm−2 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⋮ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =b 3 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =b2 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =b1 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =b0

Example 6.35. Calculate P(2) for P(x) = 2 + x − x 2 − 3x3 + 2x4 − x 6 with the HornerScheme: 0

−1 ⋅2

0

−2 ↗

−1

2 −4 ↗

−2

−3 −4 ↗

−2

−14 ↗

−7

1

−1

2

−30 ↗

−15

−58 ↗

−29

Now we take a closer look at bk , k = 1, 2, … , m: It is known, that P(x) = P(x ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 1 ) +(x − x1 )Q(x). =∶c0

Determine Q(x) = c1 + c2 x + ⋯ + cm−1 xm−2 + cm x m−1 ,

−56 = P(2)

6.4 Horner-Scheme |

111

then P(x) = c0 + c1 (x − x1 ) + c2 (x − x1 )x + ⋯ + cm−1 (x − x1 )x m−2 + cm (x − x1 )xm−1 = cm xm + (cm−1 − cm x1 )xm−1 + ⋯ + (c1 − c2 x1 )x + (c0 − c1 x1 )

= am xm + am−1 xm−1 + ⋯ + a1 x + a0 . !

Thus, we get the identity: cm = am ,

ck−1 − ck x1 = ak−1 ⇔ ck−1 = ak−1 + ck x1 ,

k = 1, … , m.

Therefore the ck , k = 0, … , m, are equivalent to bk , see (6.2)–(6.3). Hence P(x) = b0 + (x − x1 )(b1 + b2 x + ⋯ + bm x m−1 ).

(6.4)

Example 6.36. In Example 6.35 the Horner-Scheme for evaluating P(2), with P(x) = 2 + x − x2 − 3x3 + 2x 4 − x 6 , is: −1

0

2

−3

−1

1

2

0

−2

−4

−4

−14

−30

−58

−1 ⏟

−2 ⏟

−2 ⏟

−7 ⏟

−15 ⏟⏟⏟⏟⏟⏟⏟

−29 ⏟⏟⏟⏟⏟⏟⏟

−56 ⏟⏟⏟⏟⏟⏟⏟

=b6

=b5

=b4

=b3

=b2

=b1

=b0

Thus we can write P(x) as P(x) = −56 + (x − 2) ⋅ (−29 − 15x − 7x2 − 2x3 − 2x 4 − x 5 ). We now look at the known description (in the example it is x1 = 2): P(x) = P(x ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 1 ) +(x − x1 )Qm−1 (x), =∶d0

and apply this description recursively: P(x) = d0 + (x − x1 )Qm−1 (x)

= d0 + (x − x1 )(Q m−1 (x1 ) +(x − x1 )Qm−2 (x)). ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =∶d1

Hence the coefficients of Qm−1 (x) are exactly the bk of the Horner-Scheme (because of equation (6.4)) it is provided to proceed Horner’s method to calculate d1 = Qm−1 (x1 ):

112 | 6 Polynomials and unique factorization

⋅2

⋅2

−1

0

2

−3

−1

1

2

0

−2

−4

−4

−14

−30

−58

−1

−2

−2

−7

−15

−29

−56 = d0

0

−2

−8

−20

−54

−138

−1

−4

−10

−27

−69

−167 = d1

This method can be continued until we get P(x) = d0 + (x − x1 )(d1 + (x − x1 )(d2 + ⋯ + (dm−1 + (x − x1 )dm ) ⋯)) = d0 + (x − x1 )d1 + (x − x1 )2 d2 + ⋯ + (x − x1 )m dm m

= ∑ dk (x − x1 )k . k=0

The coefficients dk are given by the continuation of Horner’s method:

⋅2

⋅2

⋅2

⋅2

⋅2

⋅2

⋅2

−1

0

2

−3

−1

1

2

0

−2

−4

−4

−14

−30

−58 −56 = d0

−1

−2

−2

−7

−15

−29

0

−2

−8

−20

−54

−138

−1

−4

−10

−27

−69

−167 = d1

0

−2

−12

−44

−142

−1

−6

−22

−71

−211 = d2

0

−2

−16

−76

−1

−8

−38

−147 = d3

0

−2

−20

−1

−10

−58 = d4

0

−2

−1

−12 = d5

0 −1 = d6

Therefore, with x1 = 2, it is P(x) = −56 − 167(x − 2) − 211(x − 2)2 − 147(x − 2)3 − 58(x − 2)4 − 12(x − 2)5 − (x − 2)6 .

6.5 The Euclidean algorithm and greatest common divisor of polynomials over fields We now introduce the greatest common divisor of polynomials over fields. The following is a preparation.

6.5 The Euclidean algorithm and greatest common divisor of polynomials over fields | 113

Theorem 6.37. Let K be a field and I ⊲ K[x] an ideal in K[x]. Then there exists a polynomial f (x) ∈ K[x] with I = (f (x)) = {g(x) ∈ K[x] ∣ g(x) = h(x)f (x) for some h(x) ∈ K[x]}. Proof. The proof is analogous to the respective proof for the integers. If I = {0} then there is nothing to show. If a ∈ I for some a ∈ K ∗ then I = K[x] = (a) because a−1 h(x)a = (a−1 h(x))a for each h(x) ∈ K[x]; especially K[x] = (1). Now let I ≠ {0} and I ≠ K[x] = (1). Let f (x) ∈ I with deg(f (x)) ≥ 1 minimal in {deg(h(x)) ∣ h(x) ∈ I} ⊂ ℕ. (Recall that I contains polynomials of degree greater than or equal 1 because I ≠ (0) and I ≠ K[x].) Let g(x) ∈ I be arbitrary. By Corollary 6.17 there exist q(x), r(x) ∈ K[x] with g(x) = q(x)f (x) + r(x) and deg(r(x)) < deg(f (x)). We have r(x) ∈ I because g(x) ∈ I and q(x)f (x) ∈ I. Because deg(f (x)) is minimal we have r(x) = 0 and therefore g(x) ∈ (f (x)). An integral domain R with the property, that each ideal is of the form I = {ra ∣ r ∈ R} with some a ∈ R, that is, is a principal domain, is called a principal ideal domain. Both the integers ℤ and the ring of polynomials K[x] with K a field are principal ideal domains. Definition 6.38. Let K be a field and f1 (x), f2 (x), … , fn (x) ∈ K[x] ⧵ {0}, n ≥ 2. A polynomial d(x) ∈ K[x] is called the greatest common divisor of f1 (x), f2 (x), … , fn (x), if the following hold: (1) d(x) is monic. (2) d(x) ∣ fi (x) for i = 1, 2, … , n. (3) If δ(x) ∈ K[x], δ(x) ∣ fi (x) for i = 1, 2, … , n, then δ(x) ∣ d(x). Remark 6.39. The conditions (2) and (3) already occurred for the integers. The condition (1) is new and necessary for the uniqueness of d(x), because if d(x) ∣ fi (x) for i = 1, 2, … , n, then also ad(x) ∣ fi (x) for i = 1, 2, … , n where a ∈ K, a ≠ 0. Analogously as for the integers we write gcd(f1 (x), f2 (x), … , fn (x)) for the greatest common divisor of f1 (x), f2 (x), … , fn (x). If the greatest common divisor exists, then we may calculate this in steps via gcd(f1 (x), f2 (x), … , fn (x)) = gcd(gcd(f1 (x), f2 (x), … , fn−1 (x)), fn (x)). For two polynomials f (x), g(x) ∈ K[x], f (x) ≠ 0 ≠ g(x), we get the gcd(f (x), g(x)) by the Euclidean algorithm, which automatically shows the existence and the uniqueness (after normalization). But this follows already also from Theorem 6.37: Let I be the minimal ideal with respect to the subset relation ⊂ which contains the polynomials f1 (x), f2 (x), … , fn (x), that is,

114 | 6 Polynomials and unique factorization I = {r1 (x)f1 (x) + r2 (x)f2 (x) + ⋯ + rn (x)fn (x) ∣ ri (x) ∈ K[x] for all i = 1, 2, … , n}. This is considered in exercise 9. Then I = (d(x)) for some monic d(x) ∈ K[x]. By Theorem 6.9 we get d(x) ∣ fi (x) for i = 1, 2, … , n and if δ(x) ∣ fi (x) for i = 1, 2, … , n, then δ(x) ∣ d(x) because (d(x)) ⊂ (δ(x)) by minimality of I = (d(x)). 6.5.1 The Euclidean algorithm for K [x] If K is a field the Euclidean algorithm to determine the greatest common divisor of two polynomials follows the same basic outline as the Euclidean algorithm for the integers. Let K be a field and f (x), g(x) ∈ K[x], f (x) ≠ 0 ≠ g(x). Let f1 (x) ∶= f (x), f2 (x) = g(x) and form the following scheme: f1 (x) = q1 (x)f2 (x) + f3 (x),

f2 (x) = q2 (x)f3 (x) + f4 (x),

deg(f3 (x)) < deg(f2 (x)),

deg(f4 (x)) < deg(f3 (x)),

⋮ fn−1 (x) = qn−1 (x)fn (x) + fn+1 (x), fn (x) = qn (x)fn+1 (x)

deg(fn+1 (x)) < deg(fn (x)),

This scheme stops after finitely many steps because deg(fi (x)) ∈ ℕ0 for all i and rfi (x) ∣ fi (x) for all i and all r ∈ K, r ≠ 0. Analogous as in the case of the integers we get Theorem 6.40. Following the procedure outlined in the above scheme we get the following: (1) Let fn+1 (x) = ad(x) with a ∈ K, a ≠ 0, and d(x) monic. Then d(x) = gcd(f (x), g(x)). (2) There exist λ(x), μ(x) ∈ K[x] with fn+1 (x) = ad(x) = λ(x)f (x) + μ(x)g(x). Example 6.41. Let K = ℚ, f (x) = x5 + x4 + x 3 + x 2 + x + 1 and g(x) = x 4 + x 3 + 2x2 + x + 1. Then f (x) = xg(x) − x3 + 1, g(x) = (−x − 1)(−x 3 + 1) + 2x2 + 2x + 2, 1 1 −x3 + 1 = (− x + )(2x2 + 2x + 2). 2 2

Hence, gcd(f (x), g(x)) = x2 + x + 1. Further

2d(x) = 2x2 + 2x + 2 = g(x) + (x + 1)(1 − x3 )

6.5 The Euclidean algorithm and greatest common divisor of polynomials over fields | 115

= g(x) + (x + 1)(f (x) − xg(x)) = (x + 1) f (x) + (1 − x2 − x) g(x). ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =λ(x)

μ(x)

Remark 6.42. From the construction of the greatest common divisor in steps we certainly get also the following. Let fi (x) ∈ K[x], i = 1, 2, … , n, n ≥ 2, fi (x) ≠ 0 for i = 1, 2, … , n. Let d(x) = gcd(f1 (x), f2 (x), … , fn (x)). Then there exists λi (x) ∈ K[x] with d(x) = λ1 (x)f1 (x) + λ2 (x)f2 (x) + ⋯ + λn (x)fn (x). We now come to the main result of this chapter, the fundamental theorem for polynomials from K[x], K a field. We need some preliminary statements. Theorem 6.43. Let f (x) ∈ K[x] with deg(f (x)) ≥ 1. Then there exists an irreducible polynomial p(x) ∈ K[x] with p(x) ∣ f (x). Proof. Let p(x) ∈ K[x] with smallest possible deg(p(x)) ≥ 1, such that p(x) ∣ f (x). Such a p(x) exists, and necessary p(x) is irreducible over K. Theorem 6.44 (Euclid’s Lemma for polynomials). Let f (x), g(x), h(x) ∈ K[x] with f (x) ∣ g(x)h(x) and gcd(f (x), g(x)) = 1. Then f (x) ∣ h(x). Proof. Since gcd(f (x), g(x)) = 1 there exist λ(x), μ(x) ∈ K[x] with λ(x)f (x) + μ(x)g(x) = 1. This gives λ(x)f (x)h(x) + μ(x)g(x)h(x) = h(x). Since f (x) ∣ f (x) and f (x) ∣ g(x)h(x) we get f (x) ∣ h(x).

6.5.2 Unique factorization of polynomials in K [x] We now present the fundamental theorem for polynomials. This result is that in K[x] with K a field each polynomial has a unique representation up to elements of K and the order of the factors as a product of primes. Theorem 6.45 (Fundamental Theorem for Polynomials). Each polynomial f (x) ∈ K[x], f (x) ≠ 0, can be written uniquely up to elements of the field K and the order of the factors as a product of irreducible polynomials from K[x]: f (x) = cp1 (x)p2 (x) ⋯ pn (x), all pi (x) irreducible over K.

c ∈ K ∗ , n ≥ 0,

116 | 6 Polynomials and unique factorization Proof. We first prove the existence of a prime decomposition. We use induction on the degree of the polynomial. Degree one polynomials are irreducible so the assertion is true for degree one. We assume that every polynomial of degree less than or euqal n has a prime decomposition. If f (x) ∈ K, then n = 0 and f (x) = c ∈ K. Now, let deg(f (x)) ≥ 1. By Theorem 6.43 there exists an irreducible polynomial p(x) ∈ K[x] with p(x) ∣ f (x), that is, f (x) = p(x)g(x) for some g(x) ∈ K[x] and 0 ≤ deg(g(x)) < deg(f (x)). By the inductive assumption we may assume that g(x) has a decomposition as a product of irreducible factors. Then also f (x) = p(x)g(x) has one. Now that we have a prime decomposition for a given polynomial in K[x] we prove the uniqueness of this decomposition. Let f (x) = cp1 (x)p2 (x) ⋯ pn (x)

= dq1 (x)q2 (x) ⋯ qm (x),

with c, d ∈ K ∗ , 0 ≤ n, m, and the pi (x), qj (x) irreducible over K. If n = 0 or m = 0 then there is nothing to show. Now, let n, m ≥ 1. By Theorem 6.44 there is necessary p1 (x) ∣ qj (x) for some j ∈ {1, 2, … , m}. Without loss of generality, let p1 (x) ∣ q1 (x). Since both p1 (x), q1 (x) are irreducible over K, it follows also that q1 (x) ∣ p1 (x). By Theorem 6.9 we have that p1 (x) and q1 (x) are associated, that is, there exists an a1 ∈ K ∗ with p1 (x) = a1 q1 (x). Since K[x] has no zero divisors it follows that a1 cp2 (x) ⋯ pn (x) = dq2 (x) ⋯ qm (x). By the second induction principle we now get n = m and pi (x) = ai qi (x) with ai ∈ K ∗ for i = 1, 2, … , n, possibly after a suitable renumbering. This shows the uniqueness of the decomposition up to elements of K and the order of the factors. 6.5.3 General unique factorization domains From the fundamental theorem for polynomials we see that K[x] has unique factorization into primes if K is a field. This raises the question as to whether there are other integral domains, beyond ℤ and K[x] with this property. The answer is yes. There are many integral domains with unique factorization and these are called unique factorization domains. Definition 6.46. A unique factorization domain abbreviated UFD is an integral domain D such that each nonzero element of D is either a unit or has a decomposition into primes that is unique up to ordering and unit factors. In this language the fundamental theorem of arithmetic and the fundamental theorem of polynomials can be phrased in the following manner.

6.6 Polynomial interpolation and the Shamir secret sharing scheme

| 117

Theorem 6.47. (1) (Fundamental Theorem of Arithmetic) The integers ℤ are a unique factorization domain. (2) (Fundamental Theorem for Polynomials) If K is a field then K[x] is a unique factorization domain. There are many examples of UFD’s. The next theorem mentions without proof several examples of such domains. Proofs can be found in [3]. Theorem 6.48. The following are all unique factorization domains: (1) Any principal ideal domain. (2) The polynomial ring R[x] where R is a UFD. In particular ℤ.

6.6 Polynomial interpolation and the Shamir secret sharing scheme 6.6.1 Secret sharing Secret sharing among a group of n participants is a very important cryptographic protocol. The general idea is the following. We have n people who are to have access to a secret S in such a way that the secret can be recovered if any t of the access group with t ≤ n get together. Given such a secret S, an (n, t)-secret sharing threshold scheme is a cryptographic procedure in which a secret is split into pieces (shares) and distributed among a collection of n participants {p1 , p2 , … , pn } so that any group of t, t ≤ n, or more participants can recover the secret. Meanwhile, any group of t − 1 or fewer participants cannot recover the secret. By sharing a secret in this way the availability and reliability issues can be solved. The mathematician and cryptographer A. Shamir solved the secret sharing problem in a very simple but beautiful manner using what is called polynomial interpolation. We discuss polynomial interpolation in general before returning to Shamir’s secret sharing scheme.

6.6.2 Polynomial interpolation over a field K Let K be any field and (x1 , y1 ), (x2 , y2 ), … , (xt , yt ) be t points in K 2 with pairwise distinct xi . A polynomial P(x) over K interpolates these points if P(xi ) = yi for i = 1, 2, … , t. The polynomial P(x) is called an interpolating polynomial for the given points. The crucial theoretical result is that for any n points (xi , yi ) with distinct xi there always exists a unique interpolating polynomial of degree less than or equal t − 1.

118 | 6 Polynomials and unique factorization Theorem 6.49 (Polynomial Interpolation Theorem). Let K be any field and x1 , x2 , … , xt be t pairwise distinct elements of K and y1 , y2 , … , yt any elements of K. Then there exists a unique polynomial of degree less than or equal t − 1 that interpolates the t points (xi , yi ), i = 1, 2, … , t. Notice first that this implies that any polynomial of degree t − 1 is uniquely determined by t points on it with distinct x values. There are several different proofs of this result. We prove it using what is called Vandermonde interpolation. This gives existence and uniqueness of the interpolating polynomial directly. After giving this proof we will discuss a second method called Lagrange interpolation. We start with the Vandermonde interpolation which is named after A. T. Vandermonde (1735–1796). For this we need the following. Here we assume that the reader is familiar with the definitions and basics facts for square matrices and their determinants. Definition 6.50. Let K be any field and x1 , x2 , … , xt be t elements of K. Then the following matrix which we will denote by V : 1 1 V =( ⋮ 1

x1 x2 ⋮ xt

⋯ ⋯ ⋮ ⋯

x1t−1 x2t−1 ) ⋮ xtt−1

is called the Vandermonde matrix. The determinant of this matrix is called the Vandermonde determinant. Notice that the Vandermonde determinant is zero if any only if any two xi coincide. The following lemma is crucial. Lemma 6.51. If x1 , x2 , … , xt are elements of a field K then the value of the Vandermonde determinant is given by det(V) = ∏ (xk − xj ). 1≤j 1, s ∈ ℚ.

In general the Riemann ζ -function is defined for s ∈ ℂ, the complex numbers, and Re(s) > 1, see Chapter 10 for complex numbers. Let G = ℤr , r ∈ ℕ, that is, ℤr = ℤ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ × ℤ × ⋯ × ℤ = {(z1 , z2 , … , zr ) ∣ zi ∈ ℤ, i = 1, 2, … , r} r times ℤ

together with the component wise addition “+”: (z1 , z2 , … , zr ) + (z1′ , z2′ , … , zr′ ) = (z1 + z1′ , z2 + z2′ , … , zr + zr′ ), and (z1 , z2 , … , zr ), (z1′ , z2′ , … , zr′ ) ∈ ℤr . Then ζG (s) = ζGN (s) = ζ (s)ζ (s − 1) ⋯ ζ (s − r + 1), ζG (s) converges for s > r (if s ∈ ℂ, then Re(s) > r), compare [15, p. 402] for the necessary background or [13]. In case r = 1, that is, for the group G = ℤ, we have an = 1 and hence ζG (s) = ζ (s). Let G be the Heisenberg group: 1 { { G ∶= {(0 { { 0

a 1 0

b } } c ) ∣ a, b, c ∈ ℤ} } 1 }

Then ζG (s) =

ζ (s)ζ (s − 1)ζ (2s − 1)ζ (2s − 3) ζ (3s − 3)

and ζGN (s) = ζ (s)ζ (s − 1)ζ (3s − 2). ζG (s) converges for s > 2 (if s ∈ ℂ, then Re(s) > 2), compare [13]. For more examples see for instance [5] or [17] or [16].

8.3 Symmetric polynomials | 151

8.3 Symmetric polynomials In this section we consider the polynomial ring K[x1 , x2 , … , xn ] in n (independent) indeterminates x1 , x2 , … , xn over a field K and also introduce the concept of a symmetric polynomial. We then prove the fundamental theorem of symmetric polynomials. We introduce K[x1 , x2 , … , xn ] in steps. Let R be an integral domain and R[x] the polynomial ring in x over R. Then we may explain R[x, y] via R[x, y] = (R[x])[y] in the indeterminate y over R[x]. Certainly, R[x, y] = R[y, x]. In this sense we explain R[x1 , x2 , … , xn ] in n ≥ 2 (independent) indeterminates via R[x1 , x2 , … , xn ] = (R[x1 , x2 , … , xn−1 ])[xn ]. This is well defined. Now let K be a field and K[x1 , x2 , … , xn ], n ≥ 2, the polynomial ring in (independent) indeterminates x1 , x2 , … , xn over K. We remark that there is no nontrivial algebraic relation between elements of K[x1 , x2 , … , xn ]. To introduce symmetric polynomials we apply these ideas of permutations to certain polynomials in independent indeterminates over a field. We will look at these in detail later in this book. Definition 8.17. Let y1 , y2 , … , yn be (independent) indeterminates over a field K. A polynomial f (y1 , y2 , … , yn ) ∈ K[y1 , y2 , … , yn ] is a symmetric polynomial in y1 , y2 , … , yn if f (y1 , y2 , … , yn ) is unchanged by any permutation σ of {y1 , y2 , … , yn }, that is, f (y1 , y2 , … , yn ) = f (σ(y1 ), σ(y2 ), … , σ(yn )). If K ⊂ K ′ are fields and α1 , α2 , … , αn are in K ′ , then we call a polynomial expression f (α1 , α2 , … , αn ) with coefficients in K symmetric in α1 , α2 , … , αn if f (α1 , α2 , … , αn ) is unchanged by any permutation σ of {α1 , α2 , … , αn }. Here we just evaluate f (y1 , y2 , … , yn ) at (α1 , α2 , … , αn ). There is no misunderstanding if we just talk about polynomials f (α1 , α2 , … , αn ) in α1 , α2 , … , αn . Example 8.18. Let K be a field and k0 , k1 ∈ K. Let h(y1 , y2 ) = k0 (y1 + y2 ) + k1 (y1 y2 ). There are two permutations on {y1 , y2 }, namely σ1 ∶ {y1 , y2 } ↦ {y1 , y2 } y1 → y1

y2 → y2

152 | 8 Permutations and symmetric polynomials and σ2 ∶ {y1 , y2 } ↦ {y1 , y2 } y1 → y2

y2 → y1 . Applying either one of these two to {y1 , y2 } leaves h(y1 , y2 ) invariant. Therefore, h(y1 , y2 ) is a symmetric polynomial. As we will see, any symmetric polynomial can be expressed in terms of what are called elementary symmetric polynomials. Definition 8.19. Let x, y1 , y2 , … , yn be indeterminates over a field K(or elements of an extension field K ′ of K). Form the polynomial p(x, y1 , y2 , … , yn ) = (x − y1 )(x − y2 ) ⋯ (x − yn ). The i-th elementary symmetric polynomial si in y1 , y2 , … , yn for i = 1, 2, … , n, is (−1)i ai , where ai is the coefficient of xn−i in p(x, y1 , y2 , … , yn ). Example 8.20. Consider y1 , y2 , y3 . Then p(x, y1 , y2 , y3 ) = (x − y1 )(x − y2 )(x − y3 )

= x3 − (y1 + y2 + y3 )x 2 + (y1 y2 + y1 y3 + y2 y3 )x − y1 y2 y3 .

Therefore, the first three elementary symmetric polynomials in y1 , y2 , y3 over any field are (1) s1 = y1 + y2 + y3 . (2) s2 = y1 y2 + y1 y3 + y2 y3 . (3) s3 = y1 y2 y3 . In general, the pattern of the last example holds for y1 , y2 , … , yn . That is, s1 = y1 + y2 + ⋯ + yn

s2 = y1 y2 + y1 y3 + ⋯ + yn−1 yn

s3 = y1 y2 y3 + y1 y2 y4 + ⋯ + yn−2 yn−1 yn ⋮ sn = y1 ⋯ yn . The importance of the elementary symmetric polynomials is that any symmetric polynomial can be built up from the elementary symmetric polynomials. We make this precise in the next theorem called the fundamental theorem of symmetric polynomials.

8.3 Symmetric polynomials | 153

Theorem 8.21 (Fundamental theorem of symmetric polynomials). If P is a symmetric polynomial in the indeterminates y1 , y2 , … , yn over a field K, that is, P ∈ K[y1 , y2 , … , yn ] and P is symmetric, then there exists a unique g ∈ K[y1 , y2 , … , yn ] such that P(y1 , y2 , … , yn ) = g(s1 , s2 , … , sn ). That is, any symmetric polynomial in y1 , y2 , … , yn is a polynomial expression in the elementary symmetric polynomials in y1 , y2 , … , yn . In order to prove this result we need the concept of a piece. Let R be an integral domain with x1 , x2 , … , xn (independent) indeterminates over R and let R[x1 , x2 , … , xn ] be the polynomial ring in these indeterminates. Any polynomial f (x1 , x2 , … , xn ) ∈ R[x1 , x2 , … , xn ] i

i

i

is composed of a sum of pieces of the form ax11 x22 ⋯ xnn with a ∈ R. We first put an order on these pieces of a polynomial. i j i i j i The piece ax11 x22 ⋯ xnn with a ≠ 0 is called higher than the piece bx11 x22 ⋯ xnn with b ≠ 0 if the first one of the differences i1 − j1 , i2 − j2 , … , in − jn that differs from zero is in fact positive. The highest piece of a polynomial f (x1 , x2 , … , xn ) is denoted by HG(f ). Lemma 8.22. For f (x1 , x2 , … , xn ), g(x1 , x2 , … , xn ) ∈ R[x1 , x2 , … , xn ] we have HG(fg) = HG(f ) HG(g). Proof. We use an induction on n, the number of indeterminates. It is clearly true for n = 1, and now assume that the statement holds for all polynomials in k indeterminates with k < n and n ≥ 2. Order the polynomials via exponents on the first indeterminate x1 , such that f (x1 , x2 , … , xn ) = x1r ϕr (x2 , x3 , … , xn ) + x1r−1 ϕr−1 (x2 , x3 , … , xn ) + ⋯ + ϕ0 (x2 , x3 , … , xn ),

g(x1 , x2 , … , xn ) = x1s ψs (x2 , x3 , … , xn ) + x1s−1 ψs−1 (x2 , x3 , … , xn ) + ⋯ + ψ0 (x2 , x3 , … , xn ). Then HG(fg) = x1r+s HG(ϕr ψs ). By the inductive hypothesis HG(ϕr ψs ) = HG(ϕr ) HG(ψs ). Hence HG(fg) = x1r+s HG(ϕr ) HG(ψs )

= (x1r HG(ϕr ))(x1s HG(ψs )) = HG(f ) HG(g).

154 | 8 Permutations and symmetric polynomials In general, the k-th elementary symmetric polynomial is given by sk =

∑

i1 0, there exists n1 ∈ ℕ with |xp − xq | < 21 ϵ for all p, q ≥ n1 and

n2 ∈ ℕ with |yp −yq | < 21 ϵ for all p, q ≥ n2 . Let n0 = max{n1 , n2 }. Then |(xp ±yp )−(xq ±yq )| < ϵ for all p, q ≥ n0 by the triangle inequality in ℚ. By Theorem 9.2 there exist M1 , M2 ∈ ℚ with |xp | < M1 for p ≥ n1 and |yp | < M2 for p ≥ n2 . ϵ Further, for each ϵ ∈ ℚ, ϵ > 0, there exist m1 ≥ n1 and m2 ≥ n2 with |xp − xq | < 2M

for all p, q ≥ m1 and |yp − yq |
0, an m0 ≥ m1 with |xp | < Mϵ for all p ≥ m0 . It follows |xp zp | < ϵ for all p ≥ m0 . Hence, (xn zn )n∈ℕ is a zero sequence. Definition 9.7. In CF(ℚ) we now introduce a relation “∼” by (xn )n∈ℕ ∼ (yn )n∈ℕ ∶⇔ (xn − yn )n∈ℕ

is a zero sequence.

This relation is an equivalence relation. This is clear for reflexivity and symmetry. The transitivity follows from the equation xn − zn = xn − yn + yn − zn and the triangle inequality in ℚ. Let ℝ = CF(ℚ)/∼ be the set of the equivalence class. We denote the equivalence class defined by (xn )n∈ℕ as (xn )n∈ℕ . We have (yn )n∈ℕ ∈ (xn )n∈ℕ if and only if yn = xn + zn for all n and (zn )n∈ℕ ∈ NF(ℚ) = I. Theorem 9.8. We equip ℝ with an addition (xn )n∈ℕ + (yn )n∈ℕ = (xn + yn )n∈ℕ and a multiplication (xn )n∈ℕ ⋅ (yn )n∈ℕ = (xn yn )n∈ℕ . The addition and the multiplication are well defined, and ℝ together with these operations forms a field with unity element (1, 1, …). The map φ ∶ ℚ → ℝ, a ↦ (a, a, …) defines an embedding from ℚ into ℝ, that is, φ is injective and φ(a + b) = φ(a) + φ(b), for all a, b ∈ ℚ.

φ(a ⋅ b) = φ(a) ⋅ φ(b)

160 | 9 Real numbers Proof. We leave it as an exercise that the addition and the multiplication are well defined and that ℝ, together with these operations, forms a commutative ring with zero element (0, 0, …) and unity element (1, 1, …). Now let (xn )n∈ℕ ∈ ℝ with (xn )n∈ℕ ≠ (0, 0, …). Claim. Then there exists an n0 ∈ ℕ and a k ∈ ℚ, k > 0, such that |xn | ≥ k for all n ≥ n0 . Proof of the claim. Assume this is not the case, that is, the claim does not hold. Then, for all n0 and all k, there exist an n ≥ n0 with |xn | < k. We may choose n0 for given k sufficiently large such that |xn − xm | < k for all n, m ≥ n0 . Then |xm | = |xn − xn + xm | ≤ |xn − xm | + |xn | < 2k for all m, n ≥ n0 , that is, (xn )n∈ℕ ∈ NF(ℚ) which gives a contradiction. Hence, the claim holds. Therefore there exist an n0 ∈ ℕ and a k ∈ ℚ, k > 0, with |xn | ≥ k for all n ≥ n0 . We now consider the sequence (x′n )n∈ℕ with xn′ = k for n = 1, 2, … , n0 and xn′ = xn for n > n0 . Then (xn′ )n∈ℕ ∈ CF(ℚ) with |xn′ | ≥ k for all n, especially xn′ ≠ 0 for all n. Further,

(xn )n∈ℕ = (xn′ )n∈ℕ . Claim.

(

1 ) ∈ CF(ℚ). xn′

Proof of the claim. For each ϵ ∈ ℚ, ϵ > 0, there exists an n0 ∈ ℕ with |xp′ − xq′ | < ϵk 2 for all p, q ≥ n0 . Assume that | x1′ − p

and |xq′ | ≥ k we then get

1 | xq′

≥ ϵ for one p ≥ n0 and one q ≥ n0 . From |xp′ | ≥ k

|xp′ − xq′ | = |xp′ xq′ (

1 1 − )| ≥ ϵk 2 , xq′ xp′

which is not the case. Hence, | x1′ − x1′ | < ϵ for all p, q ≥ n0 , that is, ( x1′ )n∈ℕ ∈ CF(ℚ). p

q

n

From this we get (

1 ⋅ (xn )n∈ℕ = (1, 1, …). ) xn′ n∈ℕ

Therefore, ℝ is a field. For the map φ ∶ ℚ → ℝ, a ↦ (a, a, …) certainly φ(a + b) = φ(a) + φ(b) and φ(a ⋅ b) = φ(a) ⋅ φ(b) from the definition of the addition and multiplication.

9.1 The real number system

| 161

Let a, b ∈ ℚ, a ≠ b. Then (a, a, …) ≠ (b, b, …) since (a − b, a − b, …) is no zero sequence. Therefore φ is injective. We now identify a ∈ ℚ with φ(a) and consider ℚ as a subfield of ℝ. In addition to the field structure the rational numbers have an order. We use the order on ℚ to introduce an order for ℝ. First, we give a general description. Definition 9.9. An ordered field K is a field with a total ordering of its elements that is compatible with the field operations. We may formulate this in a slightly different, more concrete manner. An ordered field K is a field together with a total order ≤ on K such that the following hold: (1) If a ∈ K then exactly one of the following relations is satisfied: a = 0,

a > 0,

−a > 0.

(2) If a > 0 and b > 0 then a + b > 0 and ab > 0. (3) 1 > 0 for the unity element 1. The set P = {a ∣ a > 0} is often called the positive cone of K. We have the disjunct union K = P ∪ {0} ∪ (−P) where −P = {a ∈ K ∣ −a > 0}. If −a > 0 then we call a negative and write a < 0. We write: a > b ⇔ a − b > 0, a ≥ b ⇔ a = b or a − b > 0, a 0, a ≤ b ⇔ a = b or b − a > 0. In the following let K be an ordered field. Lemma 9.10. Let a, b, c ∈ K. Then (1) Either a b (trichotomy law). (2) If a ≥ b and b ≥ c then a ≥ c. (3) If a > b then a + c > b + c, and in case c > 0 then also ac > bc. (4) If a > b and c < 0 then bc > ac. (5) If a > b > 0 then 0 < a−1 < b−1 . Proof. (1), (2) and (3) follow easily from the definition. We show (4) and (5). (4) Let a > b and c < 0. Then a − b > 0 and −c > 0, and therefore 0 < −c(a − b) = −ca + cb = cb − ca ⇔ bc > ac. (5) Now let 1 be the unity element. From 1 > 0 it follows that, if a > 0, then a−1 > 0 because aa−1 = 1 > 0. Now a−1 < b−1 because 0 < a − b = ab(b−1 − a−1 ). Corollary 9.11. Let a ∈ K. Then a2 ≥ 0 and a2 = 0 if and only if a = 0.

162 | 9 Real numbers Proof. If a > 0 then a ⋅ a = a2 > 0. If a < 0 then −a > 0 and (−a)(−a) = (−a)2 = a2 > 0. If a2 = 0 then a = 0 because K has no zero divisors. If a = 0 then certainly a2 = 0. Definition 9.12. The absolute value |a| of a ∈ K is defined by 0 { { { |a| = {a { { {−a

if a = 0 if a > 0 if a < 0.

Remark 9.13. From Corollary 9.11 we see that |a| = |−a| and a2 = (−a)2 = |a|2 ≥ 0. Lemma 9.14. Let a, b ∈ K. Then a ≤ |a| and |ab| = |a| ⋅ |b|. Further |a + b| ≤ |a| + |b|, the triangle inequality. Proof. Let a ∈ K. If a ≥ 0 then a = |a|. If a < 0 then −a = |a| > 0, that is, a < 0 < |a|. The equation |ab| = |a||b| certainly holds. We now prove the triangle inequality. If a + b ≥ 0 then |a + b| = a + b ≤ |a| + |b|. If a + b < 0 then |a + b| = −(a + b) = −a − b ≤ |−a| + |−b| = |a| + |b|. Corollary 9.15. Let a, b ∈ K. Then ||a| − |b|| ≤ |a − b|. Proof. |a| = |a − b + b| ≤ |a − b| + |b| and |b| = |b − a + a| ≤ |b − a| + |a| = |a − b| + |a|. Therefore |a| − |b| ≤ |a − b| and |b| − |a| ≤ |a − b|. This gives ||a| − |b|| ≤ |a − b|. Remark 9.16. An ordered field K always has characteristic char(K) = 0 because 0 < 1 < 1 + 1 < ⋯. Therefore ℚ is the prime field of K, and we may assume that ℚ ⊂ K. The ordered field ℚ itself also satisfies the Archimedean property which we define now. Definition 9.17. An ordered field K is called Archimedean ordered, if for two positive elements a, b there always exists an n ∈ ℕ with na > b. Since we consider ℚ as a subfield of K, then na is defined as (n ⋅ 1)a = na = a + a + ⋯ + a, for n ∈ ℕ. ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ n-times

Lemma 9.18. Let a ∈ K, K an Archimedean ordered field. Then (i) There exist n, m ∈ ℕ with −m < a < n. (ii) If a > 0, then there exists an n ∈ ℕ with n1 < a.

9.1 The real number system

| 163

Proof. (i) Suppose first that a ≥ 0. Since 1 > 0 then n, m > 0 by induction for natural numbers n, m. Hence −m < a for all m ∈ ℕ. If a = 0 then certainly a < n for all n ∈ ℕ. Let a > 0. Consider a and 1. Then there is an n ∈ ℕ with a < n ⋅ 1 = n. Now, let a < 0. Then −a > 0, and there are n, m ∈ ℕ with −m < −a < n. Then −n < a < m by the above rules. (ii) Let a > 0. Then again consider a and 1, and therefore there exists an n ∈ ℕ with na > 1. It follows a > n1 because n1 > 0. Note that the rational numbers ℚ is an Archimedean ordered field. If, for instance, r, s, n, m ∈ ℕ and sr < mn then mn < sr ⋅ s(n + 1).

n⋅1 Further, because m⋅1 ∈ K for n, m ∈ ℤ, m ≠ 0, we may assume that ℚ is contained in each Archimedean ordered field as an Archimedean ordered subfield. We now want to extend the ordering of ℚ to ℝ such that ℝ becomes an Archimedean ordered field.

Definition 9.19. A rational Cauchy sequence (xn )n∈ℕ is called positive, if there exist a k ∈ ℚ, k > 0, and an n0 ∈ ℕ such that xn > k for all n ≥ n0 . Remark 9.20. The sum and the product of two positive rational Cauchy sequences are certainly also positive. Lemma 9.21. Let (xn )n∈ℕ ∈ CF(ℚ) be positive and (yn )n∈ℕ ∈ NF(ℚ). Then (xn + yn )n∈ℕ is positive. Proof. Choose an n0 ∈ ℕ sufficiently large, such that in the same time xn > k > 0 for all n ≥ n0 and |yn | < 21 k for all n ≥ n0 . Then xn + yn ≥ xn − |yn | > 21 k for all n ≥ n0 . Corollary 9.22. Let (xn )n∈ℕ ∈ CF(ℚ) be positive and (yn )n∈ℕ ∈ (xn )n∈ℕ . Then (yn )n∈ℕ is also positive. Proof. We have yn = xn + zn and (zn )n∈ℕ ∈ NF(ℚ). Now the result follows from Lemma 9.21. Convention. We now write α, β, γ, … for the elements of ℝ. We keep in mind that α = (xn )n∈ℕ for some (xn )n∈ℕ ∈ CF(ℚ). Definition 9.23. α ∈ ℝ is called positive if α = (xn )n∈ℕ for a positive (xn )n∈ℕ ∈ CF(ℚ). α ∈ ℝ is called negative if −α is positive. Remark 9.24. This is well defined by Corollary 9.22, that is, independent of the chosen representative of the respective equivalence class.

164 | 9 Real numbers Theorem 9.25. Let (xn )n∈ℕ ∈ CF(ℚ), and let (xn )n∈ℕ be neither positive nor negative. Then (xn )n∈ℕ ∈ NF(ℚ). Proof. For each ϵ ∈ ℚ, ϵ > 0, and each n0 ∈ ℕ there exist an n ≥ n0 and an m ≥ n0 such that xn ≤ ϵ and −xm ≤ ϵ. Choose n0 so, that in addition |xp − xq | < ϵ for all p, q ≥ n0 . Then xp = (xp − xq ) + xn < ϵ + ϵ = 2ϵ

for q = n

and −xp = (xq − xp ) − xm < ϵ + ϵ = 2ϵ

for q = m,

where p ≥ n0 is arbitrary. It follows |xp | < 2ϵ for all p ≥ n0 . This gives that (xn )n∈ℕ ∈ NF(ℚ). Conclusion. Let (xn )n∈ℕ ∈ CF(ℚ). Then either (xn )n∈ℕ is positive or (−xn )n∈ℕ is positive or (xn )n∈ℕ is a zero sequence. Hence, α ∈ ℝ is either positive (> 0) or negative (< 0) or null (= 0). This shows that we have extended the ordering of ℚ to an ordering of ℝ. Hence, we have Theorem 9.26. ℝ is an ordered field. Lemma 9.27. Let α ∈ ℝ be represented by (xn )n∈ℕ and β ∈ ℝ represented by (yn )n∈ℕ . Then if there exists an n0 ∈ ℕ such that xp ≥ yp for all p ≥ n0 then always α ≥ β. Proof. Assume that α < β, that is, β − α > 0. Then there exist for (yn − xn )n∈ℕ ∈ CF(ℚ), an ϵ ∈ ℚ, ϵ > 0, and an m0 ∈ ℕ such that yp − xp > ϵ > 0 for all p ≥ m0 . If we chose here p = n0 + m0 then we get a contradiction to xp ≥ yp . But: If xp > yp for all p ≥ n0 then we do not have α > β in general, only α ≥ β. Since each rational Cauchy sequence is bounded, there exists for α ∈ ℝ an s ∈ ℚ with α < s. Since ℚ is Archimedean ordered, there exists an n ∈ ℕ with s < n. Therefore, for each α ∈ ℝ there exists an n ∈ ℕ with α < n. If α, β > 0 then αβ−1 > 0, and there exists an n ∈ ℕ with αβ−1 < n, that is, α < βn. Therefore we have the following Theorem 9.28. ℝ is an Archimedean ordered field. Remark 9.29. In any ordered field we may introduce the concept absolute value, Cauchy sequence and zero-sequence, especially in the Archimedean ordered field ℝ. The set CF(ℝ) of the Cauchy sequences in ℝ forms analogously a ring with unitary element. For the Cauchy sequences in ℝ we write analogously (αn )n∈ℕ with αn ∈ ℝ for all n ∈ ℕ. Let NF(ℝ) ⊂ CF(ℝ) the set of the zero sequences in ℝ. The relation (αn )n∈ℕ ∼ (βn )n∈ℕ ∶⇔ (αn − βn )n∈ℕ is a zero sequence, is again an equivalence relation, and the set ℝ = CF(ℝ)/∼ of the equivalence classes forms an Archimedean ordered

9.1 The real number system

| 165

field ℝ with the respective addition and multiplication, which contains ℝ (after the identification α ↦ (α, α, …)). The Cauchy sequences (xn )n∈ℕ in ℚ can be certainly considered as Cauchy sequences in ℝ because ℚ ⊂ ℝ. A sequence (αn )n∈ℕ in ℝ is called convergent in ℝ, if there exists an element α ∈ ℝ, such that for each ϵ ∈ ℝ, ϵ > 0, there is an n0 ∈ ℕ with |αn − α| < ϵ for all n ≥ n0 . Then (αn − α)n∈ℕ is a zero sequence in ℝ. As usual, α is called the limit of (αn )n∈ℕ , written as α = limn→∞ αn . The limit α is uniquely determined. If α = limn→∞ αn then there exists N such for any ϵ > 0 we have |αn − α| ≤ ϵ2 for n > N. Similarly if β = limn→∞ αn then there exists N1 such for any ϵ > 0 we have |αn − β| ≤ ϵ2 for n ≥ N1 . Then for n ≥ max{N, N1 } we have |α − β| ≤ ϵ. Since ϵ was arbitrary we must have α = β and the limit is unique. Lemma 9.30. Let α ∈ ℝ be defined by the rational Cauchy sequence (xn )n∈ℕ , that is, α = (xn )n∈ℕ . Then α = limn→∞ xn . Proof. Let ϵ ∈ ℝ, ϵ > 0. Since ℝ is Archimedean ordered, there exists an n ∈ ℕ with 1 < ϵ. Hence there exists an ϵ′ ∈ ℚ, ϵ′ > 0, with 0 < ϵ′ < ϵ. For ϵ′ there exist an n0 ∈ ℕ n such that |xp −xq | < ϵ′ for all p, q ≥ n0 , that is, xp −xq < ϵ′ and xq −xp < ϵ′ for all p, q ≥ n0 . By Lemma 9.27 we get xp − α ≤ ϵ′ and α − xp ≤ ϵ′ , and hence, |xp − α| ≤ ϵ′ < ϵ for all p ≥ n0 . Therefore (xn − α)n∈ℕ is a zero sequence and limn→∞ xn = α. We call an ordered field Cauchy complete if every Cauchy sequence is also a convergent sequence. We now show that the field ℝ is Cauchy complete, that is, each Cauchy sequence (αn )n∈ℕ in ℝ has already a limit in ℝ, which means that ℝ = ℝ. Let (αn )n∈ℕ be a Cauchy sequence in ℝ. Each αn is defined by a rational Cauchy sequence with limit αn , that is, for each αp and each ϵ ∈ ℝ, ϵ > 0, there exists an approximating ap ∈ ℚ with |ap − αp | < 31 ϵ. Now, let ϵ ∈ ℝ, ϵ > 0. We may choose n0 sufficiently large such that |αp − αq | < 31 ϵ,

|ap − αp | < 31 ϵ and |aq − αq | < 31 ϵ for all p, q ≥ n0 . Then

1 1 1 |ap − aq | ≤ |ap − αp | + |αp − αq | + |aq − αq | < ϵ + ϵ + ϵ. 3 3 3 Therefore, the an form a Cauchy sequence in ℚ, which defines an element α ∈ ℝ. The Cauchy sequence (αn )n∈ℕ differs from this Cauchy sequence (an )n∈ℕ in ℚ only by a zero sequence (an − αn )n∈ℕ . Hence, the sequences (an )n∈ℕ and (αn )n∈ℕ have the same limit. Altogether we have, starting from the rational numbers ℚ, constructed a field ℝ with the following properties: (1) ℚ is a subfield of ℝ. (2) ℝ is Archimedean ordered. (3) ℚ is dense in ℝ, that is, for α ∈ ℝ and ϵ ∈ ℝ, ϵ > 0, there exists always an x ∈ ℚ with |x − α| < ϵ.

166 | 9 Real numbers (4) Each Cauchy sequence (αn )n∈ℕ in ℝ has a limit in ℝ, that is, (αn )n∈ℕ converges in ℝ. We say, that ℝ is Cauchy complete. We call ℝ the field of the real numbers. In this construction a real number is an equivalence class of a Cauchy sequence in ℚ. The field ℝ is called a Cauchy completion of ℚ. Now we further analyze the field ℝ. Let K be an Archimedean ordered field. Let M ⊂ K, M ≠ ∅. If there exists an element s ∈ K, such that a ≤ s for all a ∈ M then s is called an upper bound of M, and M is called bounded above. An upper bound s is called a least upper bound (lub) or supremum of M, written sup(M), if the following holds: If s′ ∈ K is an upper bound of M, then s ≤ s′ . If a least upper bound of M exists, then it is uniquely determined. Theorem 9.31 (The least upper bound property in ℝ). Each non-empty subset M ⊂ ℝ, which is bounded above, has a least upper bound in ℝ. We often say the real numbers ℝ satisfy the least upper bound property or lub property. Proof. Let s be an upper bound of M, m1 ∈ ℤ and m1 > s, μ ∈ M arbitrary and m2 ∈ ℤ with m2 > −μ. Then −m2 < μ < m1 . For each natural number p we form the finitely many fractions k ⋅ 2−p , k ∈ ℤ, such that −m2 ≤ k ⋅ 2−p ≤ m1 . We search for the smallest among those fractions which are upper bounds of M. Such a fraction exists, because m1 itself has this property to be an upper bound of M. This smallest upper bound we denote with ap . Then ap − 2−p is not an upper bound anymore. Therefore, for each q > p, we then have ap − 2−p < aq ≤ ap . It follows |ap − aq | < 2−p and, hence, |ap − aq | < 2−n for all p, q > n. By the Archimedean property, for a given ϵ > 0 we always find a k ∈ ℕ with k > ϵ1

and further an n ∈ ℕ with 2n > k > ϵ1 . Then 2−n < ϵ. Hence the sequence (an )n∈ℕ is a Cauchy sequence in ℝ. Therefore there exists an α ∈ ℝ with ap − 2−p ≤ α ≤ ap ; α is an upper bound of M, that is, all elements μ ∈ M are less than or equal to α, because if we would have μ > α for some μ ∈ M, then we would find a number 2p > (μ − α)−1 . Then we would have 2−p < μ − α. If we add this with ap − 2−p ≤ α, we would get ap < μ which is not the case because ap is an upper bound of M. Hence μ ≤ α for all μ ∈ M. Also, α is the least upper bound of M. Assume ν is a smaller upper bound of M. Then we can find a p ∈ ℕ with 2−p < α − ν. Since ap − 2−p is not an upper bound of M, there exists a μ ∈ M with ap − 2−p < μ. From this we get ap − 2−p < ν. If we add this with 2−p < α − ν we get ap < α, which is not the case. Hence, α is the least upper bound of M. Definition 9.32. Let K and K ′ be Archimedean ordered field. K and K ′ are called order-isomorphic, if there exists a bijection φ ∶ K → K ′ with

9.1 The real number system

| 167

(1) φ(a + b) = φ(a) + φ(b). (2) φ(ab) = φ(a)φ(b). (3) a < b in K ⇒ φ(a) < φ(b) in K ′ . In the usual manner we see that also the inverse map φ−1 ∶ K ′ → K satisfies the properties (1), (2) and (3). Theorem 9.33 (Characterization of the Real Numbers). Let K be an Archimedean ordered field. Then the following hold: (1) K is order-isomorphic to a subfield K ′ of ℝ. (2) If in K the theorem about the least upper bound holds, that is, if each bounded above, non-empty subset M of K has a least upper bound in K, then K is orderisomorphic to ℝ. Proof. We remark that we may consider ℚ as a subfield of K. Each element a ∈ K is the least upper bound of a non-empty set M of rational numbers. We may choose for M the set of all rational numbers r with r < a. The same set M has in ℝ an upper bound a′ . We consider the map φ ∶ K → ℝ, a ↦ a′ . This map certainly is injective. Let K ′ = φ(K), so K is isomorphic to K ′ . If a, b ∈ K then φ(a + b) = a′ + b′ = φ(a) + φ(b). If a, b ∈ K and a > 0, b > 0, then φ(ab) = a′ b′ = φ(a)φ(b). Since (−a)b = −ab and (−a)(−b) = ab in K we have in general φ(ab) = φ(a)φ(b) for a, b ∈ K. If a ∈ K is positive in K, that is, a > 0, then φ(a) ∈ K ′ is positive in K ′ . Hence K and K ′ are order-isomorphic. This proves part (1). Now, assume that in K the theorem about the last upper bound (Theorem 9.31) holds. Then especially each bounded above, non-empty set of rational numbers has a least upper bound in K, and the same set has also a least upper bound a′ in K ′ . From this we get that each real number is in K ′ , because ℚ is dense in ℝ, which means especially that each real number is the least upper bound of a set of rational numbers. Hence, φ(K) = K ′ = ℝ, which proves (2). It follows from the theorem that the real numbers are characterized algebraically as the unique Archimedean ordered field that satisfies the lub property. Corollary 9.34. In ℝ the following are equivalent: (1) Each Cauchy sequence in ℝ has a limit in ℝ, that is, ℝ is Cauchy complete. (2) Each bounded above, non-empty set in ℝ has a least upper bound.

168 | 9 Real numbers A closed interval [a, b] in ℝ is a subset of the form [a, b] = {x ∈ ℝ; a ≤ x ≤ b}. We say that a sequence of closed intervals In = [an , bn ] is nested if In+1 ⊂ In for all n. The real numbers also satisfy the following property called the nested intervals property. This property becomes important in proving many further analytic properties of ℝ. Theorem 9.35. Let In = [an , bn ] be a nested sequence of closed intervals in ℝ. If the length of In goes to zero as n → ∞ then there exists a unique point x0 common to all the intervals. The proof of this depends on the following that shows that with the real numbers the lub property, Cauchy completeness and the nested intervals property are all equivalent. We leave the proof to the exercises. Theorem 9.36. Within the real numbers ℝ the following are equivalent: (1) The lub property. (2) The nested intervals property. (3) The Cauchy completeness property.

9.2 Decimal representation of real numbers Most students think of the real numbers as the collection of all decimal numbers. In this section we consider the connection between real numbers and decimal numbers. To start, let g ∈ ℕ, g ≥ 2, and m ∈ ℕ. Then m has a unique representation in terms of powers of g. Theorem 9.37. Let g ∈ ℕ, g ≥ 2, and m ∈ ℕ. Then m has a unique representation m = g0 + g1 ⋅ g + g2 ⋅ g 2 + ⋯ + gk ⋅ g k

(9.1)

with 0 ≤ gi < 10, gk ≠ 0 and k ∈ ℕ0 . Proof. If 1 ≤ m < g then this unique representation is certainly given by m = g0 with k = 0 because each number of the form in (9.1) with k ≥ 1 is greater or equal to g. Hence, especially the case m = 1 is done. Now, let m ≥ 2. We assume that we have such a unique representation for all natural number less than or equal to m − 1. We write m = q ⋅ g + r with q ∈ ℕ0 and r with 0 ≤ r < g using the division algorithm. Here q and r are uniquely determined (see Chapter 2). The desired representation (9.1) can be written as m = g0 + (g1 + g2 ⋅ g + ⋯ + gk g k−1 ) ⋅ g.

9.2 Decimal representation of real numbers | 169

Now let g0 = r and q = g1 + g2 ⋅ g + ⋯ + gk g k−1 , which is possible because q < m, then we have a representation of m. Here q0 = r is unique, and g1 , g2 , … , gk are unique by the induction hypothesis. Convention. In what follows we assume that g = 10. We introduce the decimal representation of real numbers and show that each real number has an essentially unique representation as a decimal number and that each decimal number represents a real number. We could do that for any natural number g ≥ 2 instead of g = 10. This leads to the g-nary numbers, for g = 2 we get the binary representation of real numbers. The proof for general g are totally analogous, and we leave that as an exercise. We here concentrate on g = 10, that is, the decimal representations of real numbers. Definition 9.38. A sequence (dn )n∈ℕ with n

dn = ± ∑ am 10−m ,

k ∈ ℕ0 ,

m=−k

all am ∈ ℕ0 with 0 ≤ am < 10 for m = −k, −k + 1, … , −1, 0, 1, 2, … , n is called a decimal fraction. We write for this sequence ∞

± ∑ am 10−m m=−k

or

±a−k a−k+1 ⋯ a−1 a0 , a1 a2 ⋯ . Example 9.39. Consider the number 215, 3214 where a5 = a6 = ⋯ = 0. Theorem 9.40. Each decimal fraction ∞

± ∑ am 10−m m=−k

is a Cauchy sequence in ℝ. Hence, there exists a real number r which is the limit of the decimal fraction. Proof. Since multiplying a Cauchy sequence by a constant is still a Cauchy sequence it follows that it is enough to assume that k = 0 and to show that the sequence (dn )n∈ℕ with dn = ∑nm=0 am 10−m is a Cauchy sequence. Let p, q ∈ ℤ with p ≥ q ≥ 0. Then p

p

j=q+1

j=q+1

p−(q+1)

|dp − dq | = ∑ aj 10−j ≤ 9 ⋅ ∑ 10−j = 9 ⋅ 10−(q+1) ∑ 10−j j=0

9 1 − 10−(p−q) = q+1 ⋅ 10 1 − 10−1 9 ≤ q = 10−q . 10 ⋅ 9

170 | 9 Real numbers Because limq→∞ 10−q = 0 there is for each ϵ ∈ ℝ, ϵ > 0, an n0 ∈ ℕ with 10−n0 < ϵ. Then |dp − dq | < ϵ for all p, q > n0 , p ≥ q because |dp − dq | ≤ 10−q < 10−n0 < ϵ. Theorem 9.41. For all r ∈ ℝ there exists a decimal fraction whose limit is equal to r. Proof. We show, that for all r ∈ ℝ, r ≥ 0, a sequence (an )n≥−k with 0 ≤ an < 10 for all n ∈ ℤ, n ≥ −k, and r = ∑∞ a 10−n exists. Then it is clear, that also for r < 0 such n=−k n a decimal fraction exists with r = − ∑∞ a 10−n . So, let r ≥ 0. If r = 0, then there is n=−k n nothing to show, we just take an = 0 for all n ≥ −k. Let r > 0. There is an m ∈ ℕ0 with r 10m > 10 , that is, 10m+1 > r by the Archimedean property for ℝ. Hence, M = {m ∣ m ∈ ℕ0 , 10m+1 > r} ≠ ∅, and M ⊂ ℕ0 . Therefore, M contains a smallest element k. We define recursively a sequence (an )n≥−k of integers such that for each real number n

xn = ∑ aj 10−j , j=−k

n ∈ ℤ,

n ≥ −k,

holds xn ≤ r < xn + 10−n . For n = −k we consider the 11 real numbers 0 = 0 ⋅ 10k < 1 ⋅ 10k < 2 ⋅ 10k < ⋯ < 10 ⋅ 10k = 10k+1 , see Figure 9.1.

Figure 9.1: Approach for the construction.

Since 0 < r < 10k+1 there exists a unique a−k ∈ ℕ0 with 0 ≤ a−k ≤ 9 and a−k ⋅ 10k ≤ r < (a−k + 1) ⋅ 10k . For this so defined a−k the condition (9.2) holds because x−k = a−k ⋅ 10k ≤ r < a−k ⋅ 10k + 10k = x−k + 10k . Now, let n ≥ −k and aj for j ∈ {−k, −k + 1, … , n} so defined that (9.2) holds, that is, xn ≤ r < xn + 10−n .

(9.2)

9.2 Decimal representation of real numbers | 171

We consider the 11 real numbers xn < xn + 10−n−1 < xn + 2 ⋅ 10−n−1 < ⋯ < xn + 9 ⋅ 10−n−1 < xn + 10 ⋅ 10−n−1 = xn + 10−n .

Then there exists a unique an+1 ∈ ℕ0 with 0 ≤ an+1 ≤ 9 and xn + an+1 ⋅ 10−n−1 ≤ r < xn + (an+1 + 1) ⋅ 10−n−1 . Now, by definition of the xn we have xn+1 = xn + an+1 10−n−1 , hence, xn+1 ≤ r < xn+1 + 10−n−1 .

(9.3)

Hereby we have constructed the sequence (an )n≥−k such that (9.3) holds. Because of (9.3) we have |r − xn | < 10−n for all n ≥ −k. With limn→∞ 10−n = 0 we get ∞

r = ∑ an ⋅ 10−n . n=−k

Remark 9.42. Let r ∈ ℝ. Then the decimal fraction for r is in general not unique. Example 9.43. 1 = 1, 000 … = 0, 999 …, because ∞

∞

9 1 9 10 9 = ⋅ = 1. ∑ n= n 10 10 10 10 9 n=1 n=0 ∑

Theorem 9.44. Let r ∈ ℝ, r > 0. The decimal fraction r = ∑∞ a ⋅ 10−n , k ∈ ℕ0 , an ∈ n=−k n ℕ0 , 0 ≤ an < 10 for all n ≥ k, of r is unique if we forbid either “an = 9 for almost all an ” (that means, for all up to at most finitely many an ) or “an = 0 for almost all an ”. Proof. Let ∞

∞

n=−k

n=−ℓ

r = ∑ an ⋅ 10−n = ∑ bn ⋅ 10−n . We write 0

z1 = ∑ an ⋅ 10−n ∈ ℕ0 n=−k

0

and z2 = ∑ bn ⋅ 10−n ∈ ℕ0 . n=−ℓ

Let z1 ≠ z2 , and without loss of generality, let z1 > z2 . Then bn − an 9 1 ≤ −1 + ⋅ = 0. n 10 1 − 10−1 n=1 10 ∞

0 = z2 − z1 + ∑

Then z2 − z1 = −1 and bn − an = 9 for all n ≥ 1, that means, z1 = z2 + 1 and an = 0, bn = 9 for all n ≥ 1, which was excluded.

172 | 9 Real numbers Now, let z1 = z2 and bm = am , 1 ≤ m ≤ p − 1, bp ≠ ap , p ∈ ℕ. Without any loss of generality, let ap > bp . We consider ∞

∞

n=1

n=1

r ′ = r ⋅ 10p = z1′ + ∑ a′n ⋅ 10−n = z2′ + ∑ b′n ⋅ 10−n with z1′ = z1 ⋅ 10p + a1 ⋅ 10p−1 + a2 ⋅ 10p−2 + ⋯ + ap ,

z2′ = z2 ⋅ 10p + b1 ⋅ 10p−1 + b2 ⋅ 10p−2 + ⋯ + bp ,

a′n = an+p

and b′n = bn+p .

Because z1 = z2 , am = bm for 1 ≤ m ≤ p − 1, ap > bp we get z1′ > z2′ . From this we get: z1′ = z2′ + 1,

a′n = 0,

b′n = 9

for all n ≥ 1 or ap = bp + 1,

ap+i = 0,

bp+i = 9

for all i ≥ 1, which was excluded. This gives the uniqueness result if we include the statement in the preparation.

9.3 Periodic decimal numbers and the rational number In the previous section we showed that each real number has a decimal expansion. It follows since the rationals are a subfield of the reals, that each rational number also has a decimal expansion. In this section we show that the rationals are characterized precisely as those decimal expansions which either terminate or repeat, that is, the rational numbers are precisely the finite and repeating decimals. Definition 9.45. A decimal representation of a real number is said to be repeating if it becomes periodic (repeating it values at regular intervals), the infinity repeated position is not zero and also the infinity repeated digit sequence, the repented, is not zero. A decimal representation is called terminating if all except finitely many digits are zero. Then, the infinity-repeated digit sequence of zeros can be omitted. The convention is generally to indicate a repeating decimal by drawing a horizontal line above the repetend. Examples 9.46. (1) (2) (3)

16 = 0, 16; 99 1 = 0, 142857. 7

1 3

= 0, 3;

9.4 The uncountability of ℝ

| 173

Theorem 9.47. The set ℚ of the rational numbers is equal to the set of the repeating and terminating decimals. Proof. Let mn ∈ ℚ, m ∈ ℤ, n ∈ ℕ, gcd(n, m) = 1. Without loss of generality, we may assume that m ∈ ℕ with m < n (it is clear that we may assume that m ≥ 0, the case m = 0 is trivial, and by the division algorithm m = qn + r with 0 ≤ r < n and mn = q + nr with q ∈ ℤ). We form m, 000 … and divide this by n. After each division by n we get a remainder between 0 and n − 1. If we get the remainder 0, then the decimal representation terminates. If the remainder is not zero, then we get at the latest after n − 1 steps a remainder which already occurred. From this point on the pattern repeats itself. This also shows that the maximal length of the repetend is n − 1. Conversely x ∈ ℚ if its decimal representation is terminating. Now, let x have a repeating decimal representation. We may assume that x > 0. If x > 1, then x = p + y with p ∈ ℕ and 0 < y < 1, and here y is a repeating decimal representation. Hence, we may assume that 0 < x < 1. Let the repetend start at the (k + 1)-th decimal digit. For example, if x = 0, 15672, then k = 2. Then 10k x = x1 + y1 with x1 ∈ ℤ and y1 is a repeating decimal representation, for which the repetend starts at the first decimal digit. Then 10k x ∈ ℚ (and therefore also x ∈ ℚ) if and only if y1 ∈ ℚ. Therefore we now may assume that 0 < x < 1 and the repetend starts at the first decimal digit. Let n be the length of the repetend. Then we get 10n x = m + x with m ∈ ℤ, that is, 10n x − x = m and therefore x = 10mn −1 ∈ ℚ. Example 9.48. x = 0, 672. Then 103 x − x = 672 and hence x=

672 672 224 = = . 103 − 1 999 333

A real number which is not rational is called irrational. It follows that the decimal expansion of each irrational number never terminates nor repeats. If x ∈ ℝ is irrational and we choose as a sequence the rational numbers {rn } where rn is the rational number given by the first n digits of x. Then clearly we have that x is the limit of the rational sequence {rn } and we recover the following result showing that ℚ is dense in ℝ. Lemma 9.49. Each irrational number is the limit of a sequence of rationals and hence each irrational number is arbitrarily close to a rational number.

9.4 The uncountability of ℝ Using the expression of the real numbers ℝ as decimal numbers we prove that there are uncountably many real numbers. We remind the reader of some definitions. Definition 9.50. An infinite set M is called countable if there exists a bijection φ ∶ M → ℕ, otherwise it is called uncountable.

174 | 9 Real numbers Examples 9.51. (1) ℕ and ℤ are countable. (2) ℚ is countable. This we see from the following scheme: 0

1

2 1

3 1

4 1

5 1

6 1

⋯

1 2

2 2

3 2

4 2

5 2

6 2

⋯

1 3

2 3

3 3

4 3

5 3

6 3

⋯

1 4

2 4

3 4

4 4

5 4

6 4

⋯

1 5

2 5

3 5

4 5

5 5

6 5

⋯

1 6

2 6

3 6

4 6

5 6

6 6

⋯

⋮

⋮

⋮

⋮

⋮

⋮

⋮

Theorem 9.52. ℝ is uncountable. Proof. We will show that the unit interval [0, 1] is uncountable. It follows then all of ℝ is uncountable. Suppose that [0, 1] is countable. Then we may give the elements of [0, 1] in a list as follows: r1 = x1 , a11 a12 a13 a14 …

r2 = x2 , a21 a22 a23 a24 …

r3 = x3 , a31 a32 a33 a34 … ⋮ We now show, that we may construct a real number which is not in the list. The number 0, b1 b2 b3 b4 … with b1 ≠ a11

and b1 ≠ 9,

9.5 Continued fraction representation of real numbers | 175

b2 ≠ a22

and b2 ≠ 9,

b4 ≠ a44

and b4 ≠ 9,

b3 ≠ a33

and b3 ≠ 9,

⋮ is not in the list because it differs from each number in the list at least at one digit. Recall that we assumed bi ≠ 9 for all i to get the uniqueness of the decimal bracket (see Theorem 9.44). This contradicts the assumption that [0, 1] is countable and r1 , r2 , r3 , … is a listing of [0, 1]. Hence, [0, 1] is uncountable. Therefore ℝ is uncountable. We let |S| denote the cardinality of a set S. If S is countably infinite we say |S| = ℵ0 while if S has the same size as ℝ we say |S| = c called the size of the continuum. From the theory of cardinality we have α0 < c while the continuum hypothesis, one of the most important open results in mathematics, is that no set has a size strictly between ℵ0 and c. For more on cardinality and the continuum hypothesis see [3].

9.5 Continued fraction representation of real numbers Besides having decimal expansion representations, each real number x has a representation as a continued fraction. In this section we look at this representation. A continued fraction representation of a real number is an expression obtained through an iterative process of representing a real number r as the sum of its integer part and its fractional part, then writing, if the fractional part is nonzero, the reciprocal of the fractional part of r as the sum of its integer part and its fractional part, and so on. In a finite continued fraction the iteration is terminated after finitely many steps. More concrete, let r be a real number. Let first r ∉ ℤ, and let a0 = ⌊r⌋ ∶= max{x ∈ ℤ ∣ x ≤ r} be the integer part of r and f1 ∶= {r} = r − a0 be the fractional part of r. Then r = a0 + f1 . Since r ∉ ℤ then f1 ≠ 0, and we consider f1 . Analogously f1 = a1 + f2 . If we continue we 1 1 get a continued fraction representation of r as 1

[a0 ; a1 , a2 , a3 , …] = a0 +

,

1

a1 + a2 +

1 a3 + ⋯

where [a1 ; a2 , a3 , …] is the continued fraction of f1 . The ai are called the coefficients of 1 the continued fraction. If r ∈ ℤ then we just have r = a0 ∈ ℤ. To calculate a continued

176 | 9 Real numbers fraction representation of a real number r computationally, write down the integer part a0 of r. Subtract this integer part form r. If the difference is 0, stop; otherwise find the reciprocal f1 of the difference f1 = r − a0 and repeat. The procedure will stop if 1 and only if r is rational. This follows from the fact that the corresponding Euclidean algorithm terminates. Recall, if r = mn , m, n ∈ ℕ and r = a0 + f1 , and 0 ≤ f1 < 1, then m = a0 n + f1 n, and 0 ≤ f1 n < n. A finite continued fraction automatically is a rational number. Example 9.53. r = 3, 245 = 3 +

a1 = 4,

1 f1

−4=

4 49

= f2 =

1 12

+

1 , 4

49 . 200

49 , f = 200 =4 200 1 49 1 1 , = 4. Hence 4 f3

Then a0 = 3, f1 =

and finally a2 = 12, f3 = 1

3, 245 = 3 +

12 +

4 . 49

Therefore

.

1

4+

+

1 4

Every finite continued fraction represents a rational number, and every rational number can be expressed in precisely two different ways as a finite continued fraction, with the conditions that the first coefficient a0 is an integer and the other coefficients ai , i ≥ 1, are positive integers. These two representations agree except in their final terms. In the longer representation the final term in the continued fraction is 1; in the shorter representation the final 1 is dropped, but the final term here is increased by 1: [a0 ; a1 , a2 , … , an−1 , an , 1] = [a0 ; a1 , a2 , … , an−1 , an + 1]. Example 9.54. −4, 2 = [−5; 1, 4] = [−5; 1, 3, 1]. It follows that every infinite continued fraction is irrational, that is, not rational, and it can be proved that every irrational number can be expressed in precisely one way as an infinite continued fraction.

9.6 Theorem of Dirichlet and Cauchy’s Inequality We already know from the decimal expansion that irrational numbers can be approximated by rational numbers. The following theorem of P. G. L. Dirichlet (1805–1859) shows this very efficiently. Theorem 9.55 (Theorem of Dirichlet). For each irrational number α there exist infinity many numbers p, q ∈ ℤ with p 1 |α − | < 2 . q q

9.6 Theorem of Dirichlet and Cauchy’s Inequality | 177

Proof. Let {x} be the fractional part and ⌊x⌋ be the integer part of a real number x ∈ ℝ. For N ≥ 1 consider the number {iα} ∈ [0, 1] for i = 1, 2, … , N + 1, where [0, 1] = {x ∈ ℝ ∣ 0 ≤ x ≤ 1}. If we split the interval [0, 1] equally into N intervals of length N1 , then we find two numbers {iα} and {jα}, 1 ≤ i < j ≤ N + 1, which lay in the same interval. Therefore, 0 ≤ α(i − j) − (⌊αi⌋ − ⌊αj⌋) = {iα} − {jα} ≤

1 . N

With q = i − j and p = ⌊αi⌋ − ⌊αj⌋ we get p 1 1 ≤ . |α − | ≤ q Nq q2 If we choose N successively greater, then we get infinitely many rational approximations, which satisfy our condition. Example 9.56. A rational approximation of π, the ratio of a circle’s circumference to its diameter, is |π −

1 355 . |< 133 1332

It is 335 ≈ 3, 1415929; thus, we get the first seven decimal places of π. For a more inten113 sive discussion about π see Chapter 12. We now show another important inequality, the Cauchy Inequality between the arithmetic and the geometric mean. Theorem 9.57 (Cauchy’s Inequality). Let n ∈ ℕ and a1 , a2 , … , an be positive real numbers. Then n √a1 a2 ⋯ an ≤

a1 + a2 + ⋯ + an . n

Proof. We denote the inequality I(n), n ∈ ℕ. Certainly I(1) holds. I(2) is correct because from 0 ≤ (a1 − a2 )2 we get 4a1 a2 ≤ (a1 + a2 )2 . We now show: (a) I(n) implies I(n − 1), (b) I(n) implies I(2n), for n ≥ 2. Proof of (a). We define n−1

ak . (n − 1) k=1

A= ∑

178 | 9 Real numbers Then n−1

A ⋅ ∏ ak = Aa1 a2 ⋯ an−1 k=1

I(n)

≤ (

a +A ∑n−1 k=1 k n

n

) =(

(n − 1)A + A n ) = An n

and hence n−1

n

ak ) n −1 k=1

∏ ak ≤ An−1 = ( ∑ k=1

n−1

.

This proves (a). Proof of (b). 2n

n

n

∏ ak = ∏ ak + ∏ an+k k=1

k=1

I(n)

k=1

n

n

n a ak ) ( ∑ n+k ) n n k=1 k=1

n

≤ (∑

I(2)

≤ (

∑2n k=1 n

ak n

2n

) =(

a ∑2n k=1 k 2n

n

) ,

that is, √ 2n 2n √ ∑2n ak √ ∏ ak ≤ k=1 . 2n ⎷ k=1 Now, (a) and (b) together with I(1) and I(2) prove the theorem by induction. Suppose that n ≥ 2 and I(n) holds. Then I(2n) holds and, hence also I(2n − 1), I(2n − 2), …, I(n + 1).

9.7 p-adic numbers So far we described the construction and uniqueness of the real numbers as extensions of the rational numbers. The main point was the Cauchy completion starting with the rational numbers ℚ. We have seen that there are algebraic extensions of ℚ that live within ℝ. In this section, we look at a separate type of extensions of the rational field ℚ. Here we also use Cauchy completion but use a different measure of distance other than the absolute value. For each prime number p we will get a new field, called the field of the p-adic numbers denoted by ℚp . These fields will be constructed in a manner analogous to the way the real number system is constructed from ℚ. We leave out some details which

9.7 p-adic numbers | 179

are analogous to what was done for the reals ℝ and leave the proofs as exercises for the reader. The p-adic numbers were first developed by Kurt Hensel in 1897, and for each prime number p they can be considered as a completion of the rational numbers.

9.7.1 Normed fields and Cauchy completions The real numbers ℝ are a completion of the rationals ℚ and are characterized as the unique (up to isomorphism) complete Archimedean ordered field. The question arises as to whether there are other completions of the rationals. The answer is yes but they must be, by necessity, non-Archimedean, and further are of a very special type. Notice that the construction of ℝ from ℚ used the absolute value prominently and Cauchy sequences and denseness were in terms of this distance. For the additional completions of ℝ we must define different distance functions on ℚ. We do this in general. Definition 9.58. A norm on a field K is a function | | ∶ K → ℝ satisfying (1) |x| ≥ 0 for all x ∈ K, (2) |x| = 0 if and only if x = 0, (3) |xy| = |x||y| for all x, y ∈ K, (4) |x + y| ≤ |x| + |y| for all x, y ∈ K (triangle inequality). A normed field is a field K with a norm. For example ℚ and ℝ are normed fields with the usual absolute value. Any normed field K is a metric space under d(x, y) = |x − y|. Since a normed field K is a metric space the concepts of convergence, Cauchy sequence, completeness and denseness of subsets are all defined on K analogously. As before we say that a normed field is complete if every Cauchy sequence within the field converges to an element in the field, that is, within the field K the concepts of Cauchy sequence and convergent sequence coincide. The basic result is that given any normed field K it can be embedded as a dense subset of a complete normed field K. The complete normed field obtained in this manner is called the Cauchy completion of K. Theorem 9.59. Given a normed field K then there exists a complete normed K for which K is a dense subfield. The field K is called the Cauchy completion of K. The proof of Theorem 9.59 is identical to the proof that ℝ can be constructed from ℚ. That proof used only the absolute value properties which are the general norm properties. To construct K from K we follow exactly the same steps as for ℝ. We let K be the set of Cauchy sequences from K under the equivalence relation that two Cauchy

180 | 9 Real numbers sequences are equivalent if their differences go to zero. We then get that K is a complete ordered field and that K is a dense subset of K.

9.7.2 The p-adic fields Considering ℝ as the completion of ℚ depended upon absolute value as the norm on ℚ. The question arose as to whether ℚ could be completed in any other way. The answer is yes but it requires a completely different norm on the rationals. As we saw in Theorem 9.33 the reals are characterized as a complete Archimedean ordered field. Hence if we are to complete ℚ relative to a different norm this norm must be nonArchimedean. We say that a normed field K is Archimedean if for a, b ∈ K with |a| ≠ 0 there exists an n ∈ ℕ with n|a| > |b| (see Definition 9.63). Before describing this new norm (actually infinitely many new norms) on ℚ we discuss some properties of norms in general. Since the completion of a normed field depends on Cauchy sequences we consider two norms to be equivalent if they give rise to exactly the same Cauchy sequences. Definition 9.60. Two norms on a normed field F are equivalent if their induced metrics are equivalent. That is | |1 is equivalent to | |2 if a sequence is Cauchy with respect to one metric if and only if it is Cauchy with respect to the other. The next result gives a condition for equivalence of norms. Theorem 9.61. Two norms | |1 and | |2 on a normed field K are equivalent if and only if there exists an α > 0 such that |x|2 = |x|α1 for all x ∈ K Proof. Suppose that |x|2 = |x|α1 for all x ∈ K and suppose that (xn )n∈ℕ is a Cauchy sequence relative to the first norm. Given ϵ > 0 and N be found for ϵ1/α . Then for m, n > N we have |xn − xm |1 < ϵ1/α so that |xn − xm |2 < ϵ. Therefore (xn )n∈ℕ is a Cauchy sequence relative to the second norm and the two norms are equivalent. Conversely suppose the two norms are equivalent. Choose an a ∈ K with |a|1 < 1. This is possible since we have a nontrivial norm. Then let α=

log(|a|2 ) . log(|a|1 )

It follows that |a|2 = (|a|1 )α . We show this is true for all x ∈ K. We show this for |x|1 < 1. The other cases follow the same argument.

9.7 p-adic numbers | 181

Consider the set S = {r = m

m , m, n n

∈ ℕ ∣ (|x|1 )r < |a|1 }. Then for any r ∈ S we have m

(|x|1 )m < (|a|1 )n so that | xan |1 < 1. But then | xan |2 < 1 and so (|x|2 )m < (|a|2 )n and therefore (|x|2 )r < |a|2 . The same argument with the | |2 replacing | |1 shows that for the same S we have S = {r =

m , m, n ∈ ℕ ∣ (|x|2 )r < |a|2 }. n

By taking logarithms we then must have r>

log |a|1 log |x|1

and r >

log |a|2 . log |x|2

Since the logarithms involved are all negative we then must have log |a|1 log |a|2 = log |x|1 log |x|2 because otherwise there would be a rational number between these two values. However this equality implies α=

log |a|2 log |x|2 = log |a|1 log |x|1

and we have the result. On the rational numbers the absolute value is a norm. The next lemma describes norms equivalent to absolute value on ℚ. Lemma 9.62. On the rational numbers ℚ with absolute value | |, the function |x|α = |x|α , α > 0, is a norm on ℚ if and only if α ≤ 1. In this case it is equivalent to absolute value | |. Proof. Let |x|α = |x|α with α ≤ 1. We show that this is a norm on ℚ. The first two properties of a norm are direct so we must only show the triangle inequality. Consider |(x + y)|α = |x + y|α . Assume that |y| ≤ |x|. Then |(x + y)|α = |x + y|α ≤ (|x| + |y|)α = |x|α (1 + ≤ |x|α (1 +

|y| α |y| ) ≤ |x|α (1 + ) |x| |x|

|y|α ) = |x|α + |y|α = |x|α + |y|α . |x|α

Conversely if α > 1 then the triangle inequality is not satisfied. For example |1 + 1|α = 2α > 1α + 1α . The Archimedean property and its negation are crucial for our additional completions of ℚ so we make the definitions formal.

182 | 9 Real numbers Definition 9.63. A norm | | on a field K is Archimedean if given x, y ∈ K with x ≠ 0 there exists an integer n with |nx| > |y|. If a norm is not Archimedean it is called nonArchimedean. Non-Archimedean norms satisfy a very special version of the triangle inequality. Lemma 9.64. A norm | | on K is non-Archimedean if and only if it satisfies |x + y| ≤ max(|x|, |y|). The inequality above is called the strong triangle inequality. The induced metric is called an ultra-metric and satisfies d(x, z) ≤ max(d(x, y), d(x, z)). We leave the proof of the lemma to the exercises. However recall that an ordered field must have characteristic 0 and hence contains a copy of the rational integers ℤ. Non-Archimedean norms on a field K are also characterized by the norms of the integers. Theorem 9.65. (1) The norm | | is non-Archimedean if and only if |n| ≤ 1 for all integers n. (2) The norm | | is Archimedean if and only if sup({|n| ∣ n ∈ ℤ}) = ∞. Proof. Suppose that | | is non-Archimedean. For any norm we have |1| = 1. Now we do induction on the natural numbers which we may assume to be in K. Assume |k| ≤ 1 and consider |k + 1|. Then |k + 1| ≤ max{|k|, 1} ≤ 1 so the assertion is true for all natural numbers by induction. We have the equality | − x| = |x| so the assertion is true for all integers. Conversely, suppose that |x| ≤ 1 for all integers x. We show that |x +y| ≤ max{|x|, |y|}. By the binomial formula (see, for instance, Chapter 12 for a more general discussion of the binomial coefficients and the binomial formula) we now have n

n

n n |x + y|n = |(x + y)n | = | ∑ ( )x k yn−k | ≤ ∑ |( )||x|k |y|n−k . k k=0 k k=0 But ( nk ) is an integer so n

|x + y|n ≤ ∑ |x|k |y|n−k ≤ (n + 1) max{|x|, |y|}n . k=0

Hence |x + y| ≤ (n + 1)1/n max{|x|, |y|} for all n.

9.7 p-adic numbers | 183

Taking the limit as n → ∞ gives us the non-Archimedean inequality. This completes part (1). For part (2) it is clear that if | | is Archimedean there must be integers with arbitrarily large norms. 9.7.3 The p-adic norm For each prime number p, we now introduce a non-Archimedean norm on the rational numbers. Completion of ℚ with respect to this norm will give us the field of p-adic numbers. Since it is non-Archimedean this p-adic norm is not equivalent to absolute value and hence as a normed field none of the p-adic fields are isomorphic to ℝ. Further we will show that for different prime numbers p1 and p2 the corresponding p1 -adic norm is not equivalent to the p2 -adic norm. Let x = mn be a rational number where gcd(m, n) = 1. The fundamental theorem of arithmetic implies that x also has a unique prime decomposition e

e

e

x = p1 1 p22 ⋯ pkk , whereby here the exponents ei are allowed to be negative. Now let p be a fixed prime number and x ∈ ℚ. Then it follows from the prime decomposition that a x = pk ( ) b with integers a, b such that (a, b) = 1, p ∤ ab and k ∈ ℤ. We now define the p-adic norm of the rational number x by |x|p = p−k

if x ≠ 0

and

0

if x = 0.

The map ord ∶ ℚ → ℤ by ord(x) = k is called the p-adic valuation. Lemma 9.66. For any prime number p, the p-adic norm is a non-Archimedean norm on ℚ. Further | |p can take on only a discrete set of values. Proof. The basic norm properties are straightforward computations and we leave them to the exercises. From the definition the p-adic norm for any rational is p−m for some integer m. Therefore the p-adic norm can take on only a discrete set of values. Finally for any integer n it is clear that the p-adic norm is 1 or less. Therefore this norm must be non-Archimedean. Since for any prime number p the p-adic norm is a norm hence it defines a p-adic distance function on ℚ given by d(x, y) = |x − y|p .

184 | 9 Real numbers Further since the norm is non-Archimedean it follows that the p-adic distance function is an ultra-metric and satisfies d(x, z) ≤ max(d(x, y), d(x, z)). The p-adic norm for any natural number n is less than or equal to one. On the other hand if n > 1 we have for the ordinary absolute value |n| > 1. It follows that |n|p ≠ |n|α for any real number α and hence for no prime number is the p-adic norm equivalent to the standard absolute value. Lemma 9.67. For each prime number p the corresponding p-adic norm on ℚ is not equivalent to the standard absolute value on ℚ. Next we show that for distinct prime numbers p1 and p2 the corresponding norms are inequivalent. Lemma 9.68. If p1 , p2 are distinct prime numbers then the corresponding p-adic norms are inequivalent. p

Proof. Suppose that p1 ≠ p2 . Let xn = ( p1 )n . In the p1 -adic norm this goes to zero and 2 hence (xn )n∈ℕ converges and is therefore a Cauchy sequence. However, in the p2 -adic norm the sequence (xn )n∈ℕ goes to infinity and hence diverges and is therefore not a Cauchy sequence. It follows that the two p-adic norms are not equivalent.

9.7.4 The construction of ℚp For each prime number p the rational numbers equipped with the p-adic norm provides a non-Archimedean ordered field. Using the Cauchy completion procedure we can construct a complete normed field that has the rationals as a dense subset, with respect to the induced p-adic distance. For a given prime number p this is the field of p-adic numbers that we will denote by ℚp . Each of these fields is non-Archimedean and hence non-isomorphic to the real numbers ℝ. Further from Lemma 9.68, for differing prime numbers p1 , p2 the corresponding norms are inequivalent and therefore the corresponding fields are distinct as normed fields. We therefore have the following theorem. Theorem 9.69. For each prime number p the field ℚp of p-adic numbers is a complete non-Archimedean normed field which contains the rational numbers ℚ as a dense subset. Further each of these fields is distinct from the real numbers ℝ and for different prime numbers p1 , p2 the fields are distinct.

9.7 p-adic numbers | 185

In the next section we will prove a type of converse of this result (Ostrowski’s theorem) and show that ℝ and ℚp are (up to isomorphisms) the only complete normed extensions of ℚ that have ℚ as dense subfield.

9.7.5 Ostrowski’s theorem We have seen that the field of real numbers is up to isomorphism the only Archimedean completion of ℚ. That is, if K is any other complete Archimedean ordered field that contains ℚ as a dense subset then K is isomorphic to ℝ. Ostrowski’s theorem, that we present in this section says that besides the reals, the only completions of ℚ are the fields of p-adic numbers. Theorem 9.70 (A. I. Ostrowski (1823–1886)). Every nontrivial norm | | on ℚ is equivalent to either absolute value | | or a p-adic norm | |p for some prime number p. Therefore, (up to isomorphisms) the only complete normed fields containing ℚ are the reals ℝ and the p-adic fields ℚp . Proof. Let | | be a norm on ℚ. Assume first that it is Archimedean. Then there exists an integer n with |n| > 1. Let n0 be the least such integer and suppose that |n0 | = nα0 . We show that |n| = nα , α > 0, for all positive integers n. Write n in its n0 -expansion so that n = a0 + a1 n0 + ⋯ + as ns0 . By our assumption on n0 we have |ai | ≤ 1 for all i. Therefore |n| ≤ Cnα . Using nN gives us |n| ≤ C 1/N nα . Letting N → ∞ we get that n ≤ nα . Use the expansion again to get n ≥ nα so therefore n = nα . This then implies that if q ∈ ℚ with q > 0 then |q| = qα and hence the norm is equivalent to absolute value. Therefore, if the norm is Archimedean it is equivalent to absolute value. Now suppose the norm is non-Archimedean. Then |n| ≤ 1 for all integers n. Let n0 be the least integer for which |n0 | < 1. Claim first that n0 is a prime number. If not n0 = n1 n2 with n1 < n, n2 < n. From this |n1 | = |n2 | = 1 and hence |n0 | = 1, a contradiction. Therefore, n0 = p a prime number and we claim the norm is equivalent to the p-adic norm. If p does not divide n then n = rp + s and |s| = 1. But then |rp| < 1 and so |n − s| < |s| and so |n| = |s| = 1. Thus if p does not divide n we have |n| = 1. Given n ∈ ℕ we have n = pk m with gcd(m, p) = 1. Then |n| = |pk ||m| = |p|k . If |p| < 1 then |p| = p−α = ( p1 )α for some α > 0 and hence this norm is equivalent to the p-adic norm. Using the local–global principle in the form of Hensel’s Lemma (see [10, Chapter 7]) we may extend Ostrowski’s theorem and may show that ℝ is not isomorphic (as abstract field) to ℚp , and if p1 , p2 are two different prime numbers then ℚp1 and ℚp2 are non-isomorphic (as abstract fields).

186 | 9 Real numbers Theorem 9.71. The p-adic field ℚp is not isomorphic (as field) to ℝ for any prime number p. Further, if p1 and p2 are distinct prime numbers, then the corresponding p-adic fields are non-isomorphic (as fields).

Exercises 1. 2.

3.

Prove that a convergent sequence must be a Cauchy sequence. Show, that the addition and the multiplication, as defined in Theorem 9.8, are well defined and that ℝ, together with these operations forms a commutative ring with zero element (0, 0, …) and unity element (1, 1, …). Let K be an ordered field. Show that for all a, b ∈ K and n ∈ ℕ the following hold: (a) 0 ≤ a < b ⇒ 0 ≤ an < bn ; (b) an < bn and b ≥ 0 ⇒ a < b; (c) a2n = b2n ⇔ a = b or a = −b; (d) a2n−1 = b2n−1 ⇔ a = b. (Hint: Use the formula n−1

an − bn = (a − b) ∑ ai bn−1−i i=0

for all n ∈ ℕ.)

4. Prove with induction, that for all n ∈ ℕ the following holds: If x ∈ ℝ with x ≥ −1 then (1 + x)n ≥ 1 + nx. 5. Prove the following: Within the real numbers ℝ the following are equivalent: (1) The lub property. (2) The nested intervals property. (3) The Cauchy completeness property. 6. Proof the theorems and lemmas in Section 9.2 for arbitrary g instead of g = 10. 7. g-nary numbers (a) Let g = 10. Write the number 394 as sum with powers of 10. (b) Let g = 5. Write the number 394 as sum with powers of 5. (c) Let g = 2. Write the number 25 as sum with powers of 2. 8. Use the decimal representation of a natural number a ∈ ℕ as r

a = ∑ (zi ⋅ 10i ) with zi ∈ {0, 1, … , 9} i=0

to prove (a) 2 ∣ a ⇔ 2 ∣ z0 . (b) 5 ∣ a ⇔ 5 ∣ z0 . (c) 4 ∣ a ⇔ 4 ∣ (z1 ⋅ 10 + z0 ). (d) 3 ∣ a ⇔ 3 ∣ ∑ri=0 zi . (e) 9 ∣ a ⇔ 9 ∣ ∑ri=0 zi .

(f) 11 ∣ a ⇔ 11 ∣ ∑ri=0 (−1)i zi or ∑ri=0 (−1)i zi = 0.

Exercises | 187

9.

Calculate the finite continued fraction for the rational numbers: (a) 4, 624; (b) 2, 112; (c) 5, 9521. 10. Write the following fraction as repeating or terminating decimals: (a) 119 ; (b)

22 ; 7 7 . 12

(c) 11. Prove Lemma 9.64, that is, a norm | | on K is non-Archimedean if and only if it satisfies |x + y| ≤ max(|x|, |y|). 12. Prove for Lemma 9.66 that the p-adic norm of a rational number x defined by |x|p = p−k

if x ≠ 0

and

0

if x = 0

is a norm. 13. If A and B are countable sets show that A ∪ B and A × B are also countable. Use this to show that the set 𝒯 of transcendental elements in ℝ is not countable.

10 The complex numbers, the Fundamental Theorem of Algebra and polynomial equations 10.1 The field ℂ of complex numbers In this chapter we examine the complex numbers ℂ, a field that algebraically extends ℝ by introducing the imaginary unit i. At the time of Euler (1707–1783) the imaginary numbers were considered by most mathematicians not proper numbers but rather only synthetical symbols which were useful, for instance, for general versions or the formulas by Vieta (1540–1603) and Cardano (1501–1576). Euler (1707–1783) and Gauss (1777–1855) established the complex numbers on an equal footing as the real numbers by representing them in the two-dimensional real plane, which in terms of complex numbers is called the complex plane. In the nineteenth century there was a rapid development of complex analysis which extended the calculus of functions to functions with complex arguments. The development of complex analysis is related to the names Euler (1707–1783), Riemann (1826–1866), Weierstrass (1815–1897), Klein (1849–1925), Cauchy (1789–1857), Dirichlet (1805–1859), Laurent (1813–1854), Nevanlinna (1895–1980), Ahlfors (1907–1996) and others. There are many ways to construct the complex numbers starting with the real numbers. The basic idea is to find a field extension of ℝ in which the polynomial x 2 + 1 has a zero. For our basic number systems we have ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ. Each of the basic number systems completed the previous one relative to a certain property. On the natural numbers ℕ there was no zero element or additive inverses. By adjoining zero and inverses we created the integers ℤ. On ℤ there were no multiplicative inverses except for ±1. By adjoining to ℤ the multiplicative inverse of each nonzero integer we obtained the rationals ℚ. On ℚ there was not always a limit for a Cauchy sequence. By adjoining all the limits of Cauchy sequences of rationals we obtained the reals ℝ. In the real numbers ℝ, √−1 does not exist. We now create a field in which −1 does have a square root. We start with a kind of naive, intuitive introduction of the complex numbers. This could be presented at school, because there we cannot expect, for instance, the splitting field of a polynomial over the real numbers. After that we continue with a formal implementation. We formally define i = √−1; that is, i is a new element such that i2 = −1. Historically, i was called the imaginary unit, but as we will see in the next section, i has a very real geometric significance. A complex number is then an expression of the form z = x + iy with x, y ∈ ℝ. If x = 0, y ≠ 0, so that z has the form iy, then z is called a (purely) imaginary number. The set DOI 10.1515/9783110516142-010

190 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations of complex numbers, denoted by ℂ, is then ℂ = {x + iy ∣ x, y ∈ ℝ}. If we identify a real number x with the complex number x + 0i, we see that ℝ ⊂ ℂ as sets. For the complex number z = x + iy, we call x the real part and y the imaginary part of z. Note that both the real and imaginary parts of z are real numbers. We say two complex numbers are equal provided that both their real parts are equal and their imaginary parts are equal. On ℂ we define arithmetic by algebraic manipulation using the fact that i2 = −1. To be more specific, we assume that i commutes with any real number, also the associative laws and distributive laws hold for any expressions containing i. That is, if z = x + iy and w = a + ib then: (i) z = w if and only if x = a and y = b. (ii) z ± w = (x ± a) + i(y ± b). (iii) z ⋅ w = (xa − yb) + i(xb + ya). We note that (iii) follows from the assumed properties as follows: (x + iy)(a + ib) = xa + i(ya) + i(xb) + i2 (yb) = (xa − yb) + i(xb + ya). Example 10.1. Let z = 3 + 4i and w = 7 − 2i then: (a) z + w = 10 + 2i; (b) z − w = −4 + 6i; (c) zw = 29 + 22i. It is easy to verify using the assumed laws that under these definitions ℂ forms a commutative ring with an identity and that ℝ is a subring of ℂ: The zero is 0 + i0 = 0 ∈ ℝ, that is, identified as 0 + 0i, and the multiplicative identity is 1 + 0i = 1 ∈ ℝ identified with 1 + 0i. In order for ℂ to be a field we must have multiplicative inverses. We now show how to construct these. Definition 10.2. If z ∈ ℂ with z = x + iy, then the complex conjugate of z, denoted by z, is z = x − iy, and the absolute value, or modulus, of z, denoted by |z|, is |z| = √x 2 + y2 . Example 10.3. Let z = 3 + 4i. Then z = 3 − 4i and |z| = √9 + 16 = 5. If z ∈ ℂ and z ≠ 0, then |z| ≠ 0 and z ⋅ define

1 z

=

z |z|2

for z ≠ 0.

z |z|2

=

|z|2 |z|2

Example 10.4. Let z = 3 + 4i and w = 7 − 2i. Then 3 4 (a) z1 = 33−4i = 25 − 25 i. 2 +42 (b)

z w

=z⋅

1 w

= (3 + 4i) 7+2i = 53

13+34i 53

=

13 53

+

34 i. 53

= 1, because zz = |z|2 , and we may

10.1 The field ℂ of complex numbers |

191

With this definition of z1 for z ≠ 0 we get that ℂ becomes a field with ℝ as a subfield. We remark that ℂ is not an ordered field because in any ordered field 1 > 0, hence −1 < 0, and the square of any nonzero element must be positive (see Corollary 9.11). So far, we described the complex numbers and their first properties in a naive, but very motivating manner. Mathematically, the existence of the complex numbers can be seen as follows. The basic idea is to find a field extension of ℝ in which the polynomial x 2 + 1 has a zero. First we let ℂ = ℝ2 = {(a, b) ∣ a, b ∈ ℝ}. Then ℂ is a ℝ-vector space of dimension 2 over ℝ. By x ↦ (x, 0) we may embed ℝ into ℂ in a canonical manner. This automatically gives an addition in ℂ by the vector space addition (a, b) + (c, d) = (a + c, b + d). Together with the multiplication (a, b) ⋅ (c, d) = (ac − bd, ad + bc) the vector space becomes a field. The verification for this is just calculation. Here, the most interesting fact is that (a, b)−1 = (

a −b , ) a2 + b2 a2 + b2

is the multiplicative inverse of (a, b) ≠ (0, 0). If we identify x ∈ ℝ with (x, 0), then especially 1 is identified with (1, 0), and if we define i ∶= (0, 1), then we get the representation (a, b) = (a, 0) + (0, b) = (a, 0) + (b, 0)(0, 1) = a + bi =∶ z ∈ ℂ. We get i2 = (0, 1)(0, 1) = (−1, 0) = −1 which confirms the original idea of having a feasible region over ℝ in which the polynomial x2 + 1 has a zero. The real numbers a and b are called real part and imaginary part of z, written as a = Re(z) and b = Im(z). Theorem 10.5. The complex numbers ℂ form a field and ℝ can be considered as a subfield of ℂ.

192 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations The approach of the complex numbers appears in a completely natural manner if we consider any two-dimensional ℝ-vector space. The vector addition will be the field addition, and the scalar multiplication of real numbers with vectors will be extended to the field multiplication. Hence, let V be an arbitrary ℝ-vector space with basis {a1 , a2 }. The first basis vector a1 will be defined as the unit element of the field multiplication, that is, 1 ∶= a1 . Then ℝ will be embedded into V by x ↦ x ⋅ a1 = x ⋅ 1 ∈ V . This is nothing else than the scalar multiplication (x, a1 ) ↦ x ⋅ a1 in V . We identify x ∈ ℝ with x ⋅ 1 and introduce a multiplication in V . We may do this choice because each nonzero vector in V is part of some basis of V . However a2 ⋅ a2 look like, it has to be in V , that is, a2 ⋅ a2 is a linear combination of a1 and a2 , that is, a2 ⋅ a2 = a22 = Aa1 + Ba2 = A + Ba2 with A, B ∈ ℝ. All the other products then are determined by using the rules for calculations in fields: (A1 a1 + B1 a2 )(A2 a1 + B2 a2 ) = (A1 + B1 a2 )(A2 + B2 a2 )

= A1 A2 + (A1 B2 + A2 B1 )a2 + B1 B2 a22 .

We define a′2 ∶= a2 − B2 a1 = a2 − B2 with the B from above. It is {a1 , a′2 } also a basis of V , and we get B 2 ) = a2 a2 − Ba2 + 2 B4 = A + Ba2 − Ba2 + =A+ 4

(a′2 )2 = (a2 −

B2 4 B2 =∶ D ∈ ℝ. 4

Therefore we have (a′2 )2 = D ∈ ℝ. We cannot have D = 0 because a field has no zero divisors. We have therefore D > 0 or D < 0. Assume that D > 0. We consider the polynomial x 2 − D = f (x) over V (as a field) which contains ℝ. The polynomial f (x) has the two real zeros √D and −√D but also the zeros a′2 and −a′2 from V ⧵ ℝ. This is impossible because a polynomial of degree two has in a field at most two zeros. Hence, for V to be a field we must have D < 0. We define i ∶=

1 √−D

a′2 .

D Also {a1 , i} = {1, i} is a basis for V over ℝ, and we have i2 = −D = −1. From this argument we get the following uniqueness property of ℂ.

Theorem 10.6. Up to notation the real field ℝ can be extended in only one way to a field ℂ such that ℂ also is a ℝ-vector space of dimension 2.

10.2 The complex plane

| 193

10.2 The complex plane We may picture the complex numbers in the two-dimensional real plane, that is, the Euclidean plane (see Figure 10.1): To the complex number z = a + ib we associate the point (a, b).

Figure 10.1: Complex numbers in the Euclidean plane.

As we saw in the last section r = 0z is called the absolute value of z (here 0z is the length of the line segment 0z from 0 to z). By the theorem of Pythagoras we have r = √a2 + b2 . We write r = |z|. The number z = a − bi is called the complex conjugate of z = a + bi. We get z from z by a reflection through the x-axis. We get z ⋅ z = (a + bi)(a − bi) = a2 + b2 , hence |z| = √zz. Further, if z ≠ 0, we have the useful identity z −1 =

z . |z|2

Theorem 10.7. For z, z1 , z2 ∈ ℂ we have the following results: (1) |z| ≥ 0 and |z| = 0 ⇔ z = 0. (2) |z| = |z|. (3) |z1 z2 | = |z1 ||z2 |. (4) z1 + z2 = z1 + z2 . (5) z = z if and only if z ∈ ℝ. Proof. The results follow by straightforward calculations. Corollary 10.8. The map ℂ → ℂ, z ↦ z, defines an automorphism of ℂ, that is, this map is bijective and respects the addition and multiplication in ℂ. Corollary 10.9. (a) If φ ∶ ℝ → ℝ is an automorphism, then φ is the identity. (b) The map ℂ → ℂ, z ↦ z, is the only non-trivial automorphism of ℂ.

194 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations Proof. (a) Let φ be any automorphism of ℝ. Then φ(q) = q for all q ∈ ℚ. This we can see as follows: φ(1) = 1

and

1+1+⋯+1=n + φ(1) + ⋯ + φ(1) = ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ φ(n) = φ(1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ + 1 + ⋯ + 1) = φ(1) ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ n-times

n-times

n-times

for n ∈ ℕ. Further φ(0) = 0 and φ(−n) = −φ(n) = −n for each n ∈ ℕ. Therefore φ(q) = q for all q ∈ ℤ. Now, let q ∈ ℚ. We may write q as q = n ∈ ℕ. Then φ(q) =

φ(p) φ(n)

=

p n

p n

for p ∈ ℤ,

= q, that is, φ(q) = q for all q ∈ ℚ. Now let r ∈ ℝ. From

Chapter 9 we may write r as r = (qn )n∈ℕ for a rational Cauchy sequence (qn )n∈ℕ . Then we also get φ(r) = (φ(qn ))n∈ℕ = r. (b) Let φ ∶ ℂ → ℂ be an automorphism of ℂ. Then φ|ℝ is the identity of ℝ. Therefore, φ(a + bi) = a + bφ(i),

φ(i2 ) = (φ(i))2 = φ(−1) = −1,

that is, φ(i) = i or φ(i) = −i. Theorem 10.10. For z1 , z2 ∈ ℂ we have the triangle inequality |z1 + z2 | ≤ |z1 | + |z2 |. Proof. If z1 = 0, there is nothing to show. Now, let z1 ≠ 0. For of all, in general | Re(z)| ≤ |z| for z ∈ ℂ. Proof of this fact: Let z = a + bi. Then |Re(z)|2 = a2 ≤ a2 + b2 = |z|2 . By rule (3) of Theorem 10.7 we may assume that z1 = 1, because |z1 + z2 | =

z 1 ⋅ |1 + 1 |. |z1 | z2

So, let z1 = 1. We have to show that |1 + z|2 ≤ (1 + |z|)2 for z ∈ ℂ. We have (1 + |z|)2 − (1 + z)(1 + z) = 1 + 2|z| + |z|2 − (1 + (z + z)) + zz = 2(|z| − Re(z)) ≥ 0 because Re(z) ≤ |z|. Therefore, |1 + z|2 ≤ (1 + |z|)2 .

10.2 The complex plane

| 195

Remark 10.11. As for the real numbers we get also ||z1 | − |z2 || ≤ |z1 − z2 |

for z1 , z2 ∈ ℂ.

Definition 10.12. A sequence (zn )n∈ℕ in ℂ, that is, zn ∈ ℂ for all n ∈ ℕ, is called Cauchy sequence in ℂ, if for each ϵ ∈ ℝ, ϵ > 0, there exists an n0 ∈ ℕ with |zn − zm | < ϵ for all n, m ≥ n0 . Theorem 10.13. Each Cauchy sequence in ℂ has a limit in ℂ, that is, ℂ is like ℝ Cauchy complete. Proof. Let (zn )n∈ℕ be a Cauchy sequence in ℂ. We write zn = xn + iyn . Since | Re(z)| ≤ |z| for z ∈ ℂ and |i| = 1 we get |xn − xm | ≤ |zn − zm |

and |yn − ym | ≤ |zn − zm |.

Therefore the sequence (xn )n∈ℕ and (yn )n∈ℕ are Cauchy sequences in ℝ, and hence they are convergent in ℝ. Let lim x n→∞ n

=x

and

lim y n→∞ n

= y.

From |(xn + yn i) − (x + yi)| ≤ |xn − x| + |yn − y| we get that (zn )n∈ℕ converges with limn→∞ zn = z = x + iy. Remark 10.14. (1) A convergent sequence in ℂ is certainly a Cauchy sequence in ℂ. (2) In ℂ we now certainly have the usual limit properties. The proofs are completely analogous to the real situation. This gives the following. Corollary 10.15. Let (zn )n∈ℕ and (wn )n∈ℕ be convergent sequences in ℂ with limn→∞ zn = z and limn→∞ wn = w. Then the following hold: (1) The sequence (zn + wn )n∈ℕ converges in ℂ, and we have limn→∞ (zn + wn ) = z + w. (2) The sequence (zn wn )n∈ℕ converges in ℂ, and we have limn→∞ (zn wn ) = zw. (3) If zn ≠ 0 for all n ∈ ℕ and z ≠ 0, then the sequence ( z1 )n∈ℕ converges, and limn→∞ ( z1 ) = z1 .

n

n

Corollary 10.16. Let (zn )n∈ℕ be a convergent sequence in ℂ with limn→∞ zn = z. Then the sequence (|zn |)n∈ℕ converges with limn→∞ |zn | = |z|. Proof. This follows directly from ||z1 | − |z2 || ≤ |z1 − z2 | for z1 , z2 ∈ ℂ. Remark 10.17. Together with the limit properties and the maximum principle for polynomial functions on closed intervals in ℝ we have the important consequence.

196 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations Corollary 10.18. Let B(r) = {z ∈ ℂ ∣ |z| ≤ r} for some fixed r ∈ ℝ, r > 0. Let P(z) = a0 + a1 z + a2 z 2 + ⋯ + ak z k , k ≥ 1, ak ≠ 0, be a polynomial function in ℂ. Then there exist z0 , z1 ∈ B(r) with |P(z0 )| ≤ |P(z)| ≤ |P(z1 )| for all z ∈ B(r). Proof. The set M = {|P(z)| ∣ z ∈ B(r)} is certainly bounded in ℝ because |P(z)| ≤ |a0 | + |a1 ||z| + |a2 ||z|2 + ⋯ + |an ||z|n . Hence, α ∶= sup(M) exists. Let (|P(zn )|)n∈ℕ be a sequence of elements of M with limn→∞ |P(zn )| = α. Since ℂ is Cauchy complete we get α = lim |P(zn )| = |P( lim zn )|. n→∞

n→∞

Since B(r) is a closed circle of ℂ we must have that limn→∞ zn ∈ B(r) and hence α ∈ M. Analogously β ∶= inf(M) exists and β ∈ M because certainly 0 ≤ |P(z)| for all z ∈ B(r).

10.2.1 Geometric interpretation of complex operations We have seen that the complex numbers have a very nice geometric interpretation as points in the two-dimensional real space ℝ2 = ℝ × ℝ. To each complex number z = a + ib we can identify the point (a, b) in the xy-plane ℝ2 = ℂ. Conversely, to each point (a, b) ∈ ℝ2 we can identify the complex number z = a+ib. When thought of in this way, as consisting of complex numbers, ℝ2 is called the complex plane. Alternatively, we can think of the complex number z = a + ib as the two dimensional vector v⃗ = (a, b), that is, the vector with representative starting at (0, 0) and ending at (a, b). In this interpretation the absolute value |z| is just the absolute value of the vector (a, b) which is just the distance from the point (a, b) to the origin. The conjugate z = a − ib is just the point (a, −b), which is the point (a, b) reflected through the x-axis. We can describe the arithmetic operations in terms of this geometrical interpretation. As we introduced addition and subtraction are done componentwise. Hence addition and subtraction of complex numbers corresponds to the same vector operations as pictured in Figure 10.2. Multiplication by real numbers is scalar multiplication of two-dimensional vectors. Geometrically this is a stretching or a shrinking. That is, if z ∈ ℂ, r ∈ ℝ then: (1) if r ≥ 0, rz = w, where w is the vector in the same direction as z with magnitude |r||z|; (2) if r < 0, rz = w where w is the vector in the opposite direction as z with magnitude |r||z|. If |r| ≥ 1, then it is a stretching; while if |r| < 1, then it is a shrinking. We will, however, refer to both of these as a stretching.

10.2 The complex plane

| 197

Figure 10.2: Vector operations.

If z = a + ib, then iz = −b + ia. That is, multiplication by i takes the point (a, b) to the point (−b, a). These vectors are orthogonal. Since 1i = i, multiplication by i corresponds to a counterclockwise rotation by 90∘ . Therefore, i is not really “imaginary” in any sense, it corresponds to a rotation, see Figure 10.3.

Figure 10.3: Multiplication with complex numbers.

Putting all this together we can give a complete geometric interpretation to complex multiplication. Suppose z, w ∈ ℂ with z = a + ib. Then consider zw = (a + ib)w = aw + ibw. Geometrically then, we first stretch the vector w by a, then stretch the vector w by b and rotate the second stretched vector by 90° counterclockwise. Finally we add the resulting vectors (see Figure 10.2).

10.2.2 Polar form and Euler’s identity If P ∈ ℝ2 with P ≠ (0, 0) with rectangular coordinates (a, b), then P also has polar coordinates (r, α) where r is the distance from the origin O to P and α is the angle the vector ⃗ makes with the positive x-axis (see Figure 10.4). OP

198 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations

Figure 10.4: Polar form.

Here we restrict α to be in the range [0, 2π) so that each point P in ℝ2 with P ≠ (0, 0) has only one set of polar coordinates. The rectangular coordinates (a, b) of a point P ≠ (0, 0) are related to its polar coordinates (r, α) by the formulas: (i) a = r cos(α). (ii) b = r sin(α). (iii) r = √a2 + b2 . (iv) α = arctan( ba ) chosen in the appropriate quadrant. If z = a + ib ≠ 0 corresponds to the point (a, b) with polar coordinates (r, α), then from the relations above, z can be written as z = r(cos(α) + i sin(α)). This is called the polar form for z. The angle α is the argument of z, denoted by Arg z, and in this context |z| is called the modulus of z. It is the absolute value of z. We note that if we were to allow α to be an arbitrary real number (not just between 0 and 2π), two complex numbers in polar form, zn = rn (cos(αn ) + i sin(αn )) for n = 1, 2, would be equal, z1 = z2 , if and only if r1 = r2 and α1 = α2 + 2kπ where k = 0, ±1, ±2, …. Example 10.19. Suppose z = 1 − i; then |z| = √2 and Arg(z) = arctan(−1) = (1, −1) is in the fourth quadrant. Therefore z = √2(cos(

7π 4

since

7π 7π ) + i sin( )). 4 4

There is a very nice exponential way to express the polar form that is due to Euler. Before we describe this, we must look more closely at the powers of i. Now, i2 = −1 so i3 = i2 ⋅ i = −i. Then i4 = i3 i = −i2 = 1 and therefore i5 = i. From this it follows that the powers of i repeat cyclically as {1, i, −1, −i} and im = in if and only if n ≡ m mod 4. For example, i51 = i3 = −i. Further, the multiplicative inverse of any power of i is another power of i, and so these powers form a group under multiplication.

10.2 The complex plane

| 199

Lemma 10.20. The powers of i form a cyclic group of order 4 under multiplication. If t is a real variable, recall that the functions et , sin(t), cos(t) have the following power series expansions: et = 1 + t + sin(t) = t − cos(t) = 1 −

tn tn t2 +⋯+ +⋯= ∑ , 2! n! n=1 n! ∞

t 2n+1 t 2n+1 t3 t5 + − ⋯ + (−1)n + ⋯ = ∑ (−1)n , 3! 5! (2n + 1)! (2n + 1)! n=0 ∞

t2 t4 t 2n t 2n + − ⋯ + (−1)n + ⋯ = ∑ (−1)n . 2! 4! (2n)! (2n)! n=0 ∞

Now consider t = iα with α real, and substitute into the power series expansion for et to find eiα (although t is not a real variable we do this formally): (iα)2 (iα)3 α2 iα3 + + ⋯ = 1 + iα − − +⋯ 2! 3! 2! 3!

eiα = 1 + iα +

using the rules for the powers of i. Then combining terms with and without i we get eiα = (1 −

α2 α4 α3 α5 + − ⋯) + i(α − + − ⋯) 2! 4! 3! 5!

= cos(α) + i sin(α). This is known as Euler’s Identity.

Lemma 10.21 (Euler’s Identity). eiα = cos(α) + i sin(α) for α ∈ ℝ. Now, if r ≠ 0, r = |z| and α = Arg(z), we then have from the polar form that z = r(cos(α) + i sin(α)) = reiα . This last identity makes multiplication of complex numbers very simple. Suppose z = r1 eiα1 with z ≠ 0, w = r2 eiα2 with w ≠ 0. Then zw = r1 r2 ei(α1 +α2 ) . Breaking this into components, we then have |zw| = |z||w| and Arg(zw) = Arg(z) + Arg(w) modulo 2π. Lemma 10.22. If z, w ∈ ℂ, with z ≠ 0, w ≠ 0 then |zw| = |z||w| and Arg(zw) = Arg(z) + Arg(w) modulo 2π. Notice that Arg(i) = π/2, and multiplication iz rotates z by 90°. That is, Arg(iz) = π/2 + Arg(z) = Arg(i) + Arg(z) modulo 2π, which follows directly from the lemma. Euler’s Identity leads directly to what is called Euler’s Formula. Suppose α = π. Then eiπ = cos(π) + i sin(π) = −1 + 0 ⋅ i = −1. Put succinctly, eiπ + 1 = 0.

200 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations Lemma 10.23 (Euler’s Formula). eiπ + 1 = 0. This has been called a “magic” formula because the five most important numbers in Mathematics – 0, 1, e, i, π – are tied together in a very simple equation. If one thinks about how diversely these five numbers appear – 0 as the additive identity, 1 as the multiplicative identity, e as the natural exponential base, π as the ratio of the circumference to the diameter of any circle, and i as the imaginary unit, this result is truly amazing. In Chapter 12 we will look even more closely into the fundamental numbers e and π. Now, we may explain the complex exponential function ez , z = x + iy by ez ∶= ex+iy = ex eiy , and we can in a straightforward manner get the functional equation ez1 +z2 = ez1 ez2

for z1 , z2 ∈ ℂ.

Hence, we extended the real exponential function analytically into ℂ. From the Taylor expansions for ex , sin(x) and cos(x) we automatically get zn , n! n=0 ∞

ez = ∑

z ∈ ℂ.

This series converges absolutely for all z ∈ ℂ, that is, ∑∞ n=0 n

limn→∞ |z| n! by

|z|n n!

converges. Especially

= 0 for each z ∈ ℂ (see Chapter 12 for more details). Analogously we introduce the complex trigonometric functions sin(z) and cos(z) ∞

sin(z) = ∑ (−1)n n=0

z 2n+1 (2n + 1)!

and

∞

cos(z) = ∑ (−1)n n=0

z 2n . (2n)!

These two series also converge absolutely for all complex numbers z. By comparing the series using the powers of i as we did for a real variable to obtain Euler’s identity we obtain the complex Euler identity eiz = cos(z) + i sin(z). This directly provides the complex trigonometric formulas sin(z + w) = sin(z) cos(w) + cos(z) sin(w), cos(z + w) = cos(z) cos(w) − sin(z) sin(w). Further from the definition we see that cos(−z) = cos(z)

and

sin(−z) = − sin(z).

and

sin(z) =

Finally this leads to the relations cos(z) =

eiz + e−iz 2

eiz − e−iz . 2i

10.2 The complex plane

| 201

10.2.3 Other constructions of ℂ We have constructed the complex numbers via two dimensional ℝ-vector spaces. There are several other possibilities to introduce the complex numbers. Two possibilities are especially remarkable. The first uses field extensions as in Chapter 6. The polynomial f (x) = x2 + 1 is irreducible over ℝ. Therefore, see Chapter 7, ℝ[x]/(x2 + 1) is a field ℝ(j) with j2 + 1 = 0, and ℂ is isomorphic to ℝ(j) under the map 1 ↦ 1 and i ↦ j. For an additional construction consider the set of all real 2 × 2-matrices of the type (

a b

−b ). a

Using addition and multiplication of matrices, this set becomes a field K which is isomorphic to ℂ under the map 1 1↦( 0

0 ) 1

0 and i ↦ ( 1

−1 ). 0

Recall that (

0 1

2

−1 −1 ) =( 0 0

0 ). −1

10.2.4 The Gaussian integers We close this section by looking briefly at the complex integers or Gaussian integers. We saw that we have both division and the Euclidean algorithm in the integral domains ℤ and K[x], K a field. We now introduce another ring in which this also is possible. Consider the set ℤ[i] ∶= {x + iy ∈ ℂ ∣ x, y ∈ ℤ}. It is easy to show that ℤ[i] is an integral domain, as a subring of ℂ. This subring of ℂ is called the complex integers or Gaussian integers. This next result shows that the division algorithm holds in the Gaussian integers. Theorem 10.24. Let a, b ∈ ℤ[i] with b ≠ 0. Then there exist q, r ∈ ℤ[i] with a = bq + r and 0 ≤ |r| < |b|. Proof. Let a, b ∈ ℤ[i] with b ≠ 0. If a = 0 choose q = r = 0, and we are done. Now, let a ≠ 0. Define ba = t + is with t, s ∈ ℚ. Choose m, n ∈ ℤ such that |m − t| ≤

1 2

and |n − s| ≤ 21 . This is possible because the elements of ℤ[i] form a lattice of side length 1 in the plane. Let q = m + in. Then

202 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations 2 a 1 1 1 | − q| = |t + is − (m + in)| = (t − m)2 + (s − n)2 ≤ + = . 4 4 2 b

Now define r = a − bq. Then a = bq + r, and either r=0

or

a |r| = |a − qb| = |b| | − q| < |b|. b ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 0, with |f (z)| > |a0 | for all z ∈ ℂ with |z| > r0 . Let B(r0 ) = {z ∈ ℂ ∣ |z| ≤ r0 }. By Corollary 10.18 there exists a z0 ∈ B(r0 ) with |f (z0 )| ≤ |f (z)| for all z ∈ B(r0 ). Since f (0) = a0 and |f (z)| > |a0 | for |z| > r0 we get |f (z0 )| ≤ |f (z)| for all z ∈ ℂ. Recall that |f (z0 )| ≤ |f (0)| = |a0 |. Therefore {|f (z)| ∣ z ∈ ℂ} ⊂ [a, ∞) with 0 ≤ a ∈ ℝ and a = |f (z0 )|. We remark that, by the intermediate value theorem, we have in fact {|f (z)| ∣ z ∈ ℂ} = [a, ∞], but this we do not need. We just need that there is a z0 ∈ ℂ with a = |f (z0 )| ≤ |f (z)| for all z ∈ ℂ. Our aim is to show that a = 0. We write f (z0 ) = aeiφ for our z0 ∈ ℂ with |f (z0 )| = a. We now consider the polynomial function g∶ℂ→ℂ z ↦ g(z) ∶= e−iφ f (z + z0 ). The polynomial function g(z) takes his minimal absolute value at 0, because g(0) = e−iφ f (z0 ) = e−iφ eiφ a = a, that is, g(0) = |g(0)| = a. Hence, without loss of generality, we may assume that z0 = 0 and f (0) = a. Now we may write f (x) as f (x) = a + bxm + xm+1 ⋅ h(x) with 0 ≤ a ∈ ℝ, 0 ≠ b ∈ ℂ, m ≥ 1, and some polynomial h(x) ∈ ℂ[x] with degree n − m − 1 if m < n and h(x) = 0 if m = n. Assumption. The constant coefficient a is positive, that is, a > 0. We have to show that this assumption leads to a contradiction which then gives that a = 0. For this we need another tool which is of independent interest. In ℂ we may extract roots for each z ∈ ℂ, z ≠ 0. Let z = r(cos(φ) + i sin(φ)) ≠ 0 and m ∈ ℕ, m ≥ 2. Then zim = z, i = 1, 2, … , m, for the following pairwise different numbers m

z1 = √r(cos(

φ φ ) + i sin( )), m m

206 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations m

z2 = √r(cos(

φ 2π φ 2π + ) + i sin( + )), m m m m

⋮ m

zk+1 = √r(cos(

φ 2kπ φ 2kπ + ) + i sin( + )), m m m m

for k = 2, 3, … , m − 1. The z1 , z2 , … , zm are thus the m pairwise different zeros of the polynomial x m − z. They are called the m-th roots of z. In the special case z = 1 they are the m-th roots of unity. We now continue with the proof of the Fundamental Theorem of Algebra. We may choose in ℂ one fixed root w = √− ba of the m-th roots of z. We have the following situation, see Figure 10.5. m

Figure 10.5: Situation for the proof.

We have f (0) = a > 0. The idea is to range on the line segment 0w from w to 0 and to produce a suitable value sw, s ∈ [0, 1], with |f (sw)| < f (0) = a, which gives the contradiction to the minimality of a. To do this we choose a real number s ∈ (0, 1) such that s|wm+1 ⋅ h(sw)| < a. Such an s exists always because {|h(sw)| ∣ s ∈ (0, 1)} is a bounded subset of ℝ (see Corollary 10.16 and Corollary 10.18). Now, by the choice of w we have f (sw) = a + b(sw)m + (sw)m+1 h(sw) a = a + bsm (− ) + (sw)m+1 h(sw) b = a(1 − sm ) + (sw)m+1 h(sw), and finally by the triangle inequality |f (sw)| ≤ a(1 − sm ) + sm+1 |wm+1 h(sw)| < a(1 − sm ) + sm a = a = f (0) which gives the desired contradiction. Hence, a = 0 and f (0) = 0, as claimed. This proves the Fundamental Theorem of Algebra in Theorem 10.27.

10.3 The Fundamental Theorem of Algebra

| 207

10.3.2 Second proof of the Fundamental Theorem of Algebra For the second proof, which is an algebraic proof, we need some additional tools. Lemma 10.34. Let f (x) ∈ ℝ[x] with deg(f (x)) = n ≥ 1 and n odd. Then f (x) has a real zero. Proof. We may assume that an > 0, that is, f (x) = an x n + an−1 x n−1 + ⋯ + a0 , n = 2k + 1 ≥ 1, an > 0. From the lemma of growth (Lemma 10.32) we get lim f (x) = ∞

x→∞

and

lim f (x) = −∞

x→−∞

because n is odd and an > 0. Therefore, there exist x1 , x2 ∈ ℝ with f (x1 ) > 0 and f (x2 ) < 0. From the intermediate value theorem for real functions we get an x3 ∈ ℝ with f (x3 ) = 0. Lemma 10.35. Let f (x) ∈ ℂ[x] with deg(f (x)) = 2. Then f (x) has a zero in ℂ. b . Proof. Let f (x) = ax 2 + bx + c ∈ ℂ[x], a ≠ 0. If b2 − 4ac = 0, then f (x) has a zero z = − 2a 2 2 Now, let b − 4ac ≠ 0. Then we may extract square roots of b − 4ac, and hence,

z1 =

−b + √b2 − 4ac 2a

and z2 =

−b − √b2 − 4ac 2a

are the two zeros of f (x). Lemma 10.36. If each non-constant real polynomial has a zero in ℂ, then also each non-constant complex polynomial has a zero in ℂ. Proof. Let f (x) ∈ ℂ[x], and let each non-constant real polynomial have a zero in ℂ. Let deg(f (x)) ≥ 1. We define H(x) = f (x)f (x), where f (x) = a0 + a1 x + ⋯ + an xn if f (x) = a0 + a1 x + ⋯ + an xn . Now, H(x) = f (x)f (x) = f (x)f (x) = H(x), and therefore H(x) ∈ ℝ[x]. By our assumption there exists a z0 ∈ ℂ with H(z0 ) = 0. Then f (z0 )f (z0 ) = 0, and therefore f (z0 ) = 0 or f (z0 ) = 0. If f (z0 ) = 0, then z0 is a zero of f (x). If f (z0 ) = 0, then also f (z0 ) = 0, and, hence, 0 = f (z0 ) = f (z0 ) = f (z0 ), and z0 is a zero of f (x).

208 | The complex numbers, the Fundamental Theorem of Algebra and polynomial equations Lemma 10.37. Let f (x) = a0 + a1 x + ⋯ + an x n ∈ ℝ[x] with n ≥ 1 and an ≠ 0. Then f (x) has a zero in ℂ. Proof. We prove the statement by induction. Let n = 2m q with q odd. If m = 0 then the statement follows from Lemma 10.34. Let m ≥ 1 and the statement holds for all polynomials of a degree d = 2k q′ with k < m and q′ odd. Let K be the splitting field of f (x) over ℝ, and let the zeros of f (x) in K be α1 , α2 , … , αn . We have to show that at least one αi is in ℂ. Let h ∈ ℤ. We consider the polynomial H(x) = ∏(x − (αi + αj + hαi αj )) ∈ K[x]. i 0 then K is called a real quadratic field while if d < 0 it is an imaginary quadratic field. In both cases {1, √d} is a basis for K over ℚ. We remark that automatically d > 1 if d > 0 because |K ∶ ℚ| = 2. Hence, let always d > 1 if d > 0. The integers in ℚ(√d) are called quadratic integers and we characterize them. Suppose α ∈ 𝒪K is a quadratic integer. Since α ∈ K we have α = q1 + q2 √d. Since irr(α, ℚ) is a monic rational integral polynomial of degree 2 we have irr(α, ℚ) = (x − α)(x − α) = x2 − (α + α)x + αα ∈ ℤ[x], where α = q1 − q2 √d. It follows that α ∈ 𝒪K if and only if its trace and norm are both rational integers: trK (α) = α + α = 2q1 ∈ ℤ

NK (α) = αα = q12 − dq22 ∈ ℤ

since d is squarefree. Now (2q2 )2 d = (2q1 )2 − 4(q12 − q22 d) ∈ ℤ ⟹ 2q2 ∈ ℤ. Therefore q1 =

m , q2 2

=

n 2

for rational integers m, n and α=

m + n√d 2

with m, n ∈ ℤ.

Further m2 − n2 d ≡ 0

mod 4.

If d ≡ 2 mod 4 or d ≡ 3 mod 4 this congruence is solved only if m, n are even or equivalently q1 , q2 ∈ ℤ. If d ≡ 1 mod 4 then m2 − dn2 ≡ 0 mod 4 is equivalent to m ≡ n mod 2. It follows that the integers in 𝒪K can be described by: (1) m + n√d with m, n ∈ ℤ. √ (2) If d ≡ 1 mod 4 but not otherwise, also m+n2 d with m, n odd rational integers. From this characterization it follows that if d is not congruent to 1 mod 4, every integer in 𝒪K can be written as m + n√d with m, n ∈ ℤ. In other words {1, √d} is an integral basis. √ If d ≡ 1 mod 4 let ω = 1+2 d . Then from the characterization every integer in 𝒪K is uniquely of the form m + nω, m, n ∈ ℤ and so {1, ω} is an integral basis (see exercises). We summarize all this discussion in the next theorem.

242 | 11 Quadratic number fields and Pell’s equation Theorem 11.35. Let K be a quadratic field. Then: (1) K = ℚ(√d) for some squarefree rational integer d. (2) The integers in K can be characterized as (a) m + n√d with m, n ∈ ℤ; √ (b) If d ≡ 1 mod 4 but not otherwise, also m+n2 d with m, n odd rational integers. (3) An integral basis for 𝒪K is given by (a) {1, √d} if d ≡ 2 mod 4 or d ≡ 3 mod 4; (b) {1, ω} where ω =

1+√d 2

if d ≡ 1 mod 4. (4) The discriminant of K = ℚ(√d) is (a) 4d if d ≡ 2, 3 mod 4, (b) d if d ≡ 1 mod 4.

Proof. Everything was explained prior to the theorem except part (4). If d ≡ 2, 3 mod 4 then {1, √d} is an integral basis. Then Δ(1, √d) = |

1 1

√d 2 | = 4d. −√d

If d ≡ 1 mod 4 then {1, ω} is an integral basis and 1 Δ(1, ω) = | 1 |

1+√d 2 | 1−√d 2 |

2

= d.

Theorem 11.36. Suppose that K = ℚ(√d) with d < 0 and d squarefree is an imaginary quadratic number field. If d ≠ −1, −3 then the only units in 𝒪K are ±1. If d = −1 the units

are ±1, ±i while if d = −3 the units are ±1, ±ω, ±ω where ω =

1+i√3 . 2

Proof. As we have seen α ∈ 𝒪K is a unit if and only if |N(α)| = 1. Let α be a unit in 𝒪K . x2 −dy2 x+y√d Then α = x + y√d or α = and then N(α) = x 2 + |d|y2 or N(α) = . 2

4

Since d < 0, x2 − dy2 ≥ 0. If d < −1 and d is not congruent to 1 mod 4 the only solutions to x2 − dy2 = 1 is x = ±1, y = 0. Our analysis of the Gaussian integers showed that if d = −1 then ±i are also units. If d < −3 then the only solutions to x2 − dy2 = 4 are x = ±2 again giving the result. Finally if d = −3 we see by computation that ±ω and ±ω are also units (see exercises and note that ω3 = 1). Remark 11.37. Concrete, we have the following structure: (1) If d = −1 then U(𝒪K ) = {±1, ±i}. This is cyclic of order 4. (2) If d = −3 then U(𝒪K ) = {±1, ±ω, ±ω}. This is cyclic of order 6. (3) If d ≠ −1, −3 and d < 0 squarefree then U(𝒪K ) = {−1, 1} which is cyclic of order 2.

11.6 Quadratic fields and quadratic integers | 243

We have already considered the units in imaginary quadratic number fields (Theorem 11.36) with d < 0. Now, let d > 0 and square-free. First we need some technical results. Lemma 11.38. If ζ is an irrational real number then there are infinitely many rational numbers xy with (x, y) = 1 and | xy − ζ | < y12 . For a proof see Theorem 9.55 (Theorem of Dirichlet). Lemma 11.39. There is a constant M = M(d) such that |x2 − dy2 | < M has infinitely many integral solutions. Proof. Write x2 − dy2 = (x + √dy)(x − √dy). From Lemma 11.38 there exist infinitely many pairs of relatively prime integers (x, y), y > 0, satisfying |x − √dy| < y1 . It follows that 1 |x + √dy| ≤ |x − √dy| + 2√dy < + 2√dy. y Then 1 1 |x2 − dy2 | < | + 2√dy| ≤ 2√d + 1. y y Theorem 11.40 (Pell’s equation). Pell’s equation x 2 − dy2 = 1 has infinitely many integral solutions. Further there is a particular solution (x1 , y1 ) such that every solution has the form ±(xn , yn ) where xn + yn √d = (x1 + y1 √d)n for n ∈ ℤ. Proof. From Lemma 11.39 there is a positive integer m such that x 2 − dy2 = m for infinitely many integral pairs (x, y) with x > 0, y > 0. We may assume that the x components are distinct. Further since there are only finitely many residue classes modulo |m| one can find pairs (x1 , y1 ), (x2 , y2 ) such that x1 ≠ x2 and x1 ≡ x2 mod |m| and y1 ≡ y2 mod |m|. Let α = x1 − y1 √d, β = x2 − y2 √d. If γ = x − y√d let γ = x − y√d the conjugate of γ and N(γ) = x2 − dy2 the norm of γ. Then αβ = A + B√d with m ∣ A and m ∣ B. Thus αβ = m(u + v√d) for some integers u, v. Taking norms on both sides yields m2 = m2 (u2 − v2 d) ⇒ u2 − v2 d = 1. It remains to show that v ≠ 0. If v = 0 then u = ±1 and then αβ = ±m. Multiplying by β gives αm = ±mβ or α = ±β. But this implies x1 = x2 a contradiction. Therefore there is a solution to Pell’s equation with xy ≠ 0.

244 | 11 Quadratic number fields and Pell’s equation We now prove the second assertion. We say that a solution (x, y) is greater than a solution (u, v) if x + y√d > u + v√d. Now consider the smallest solution α = x + y√d with x > 0, y > 0. Such a solution clearly exists and is unique. It is called a fundamental solution. Consider any solution β = u + v√d with u > 0, v > 0. We show that there is a positive integer n such that β = αn . Suppose not. Then choose n > 0 such that αn < β < αn+1 . Then 1 < (α)n β < α since α = α−1 . However if (α)n β = A + B√d then (A, B) is a solution to Pell’s equation and 1 < A + B√d < α. Now A + B√d > 0 so A − B√d = (A + B√d)−1 > 0. Hence A > 0. Also A − B√d = (A + B√d)−1 < 1 and hence B√d > A − 1 ≥ 0. Thus B > 0. This contradicts the minimality of α. If β = a + b√d is a solution with a > 0, b < 0 then β−1 = a − b√d = αn by the above argument so β = α−n . The cases a < 0, b > 0 and a < 0, b < 0 lead to −αn for n ∈ ℤ. This proves the theorem. Corollary 11.41. In any real quadratic field there are infinitely many units. Proof. If d > 1 then Pell’s equation x2 − dy2 = 1 has infinitely many integral solutions, see Theorem 11.40. Since α = x + y√d is an integer in 𝒪K with N(α) = 1 it follows that 𝒪K has infinitely many units. We can now proof the unit theorem for real quadratic number fields. Theorem 11.42. Let K = ℚ(√d) with d > 0 and squarefree be a real quadratic field. Then there exists a unit ϵ0 ∈ 𝒪K such that every unit in 𝒪K is of the form ±ϵ0n for n ∈ ℤ. Such an ϵ0 is called a fundamental unit of K. Proof. From Theorem 11.40 there exist positive nonzero integers x, y such that x2 − dy2 = 1. Thus ϵ = x + y√d is a unit in 𝒪K with ϵ > 1. Let M be a fixed real number greater than ϵ. There are at most finitely many α ∈ 𝒪K , α = p + q√d, p, q, ∈ ℚ with |α| < M and also |α| < M. This is clear since there are only finitely many integers k with |k| < M. Let β be a unit with 1 < β < M. Such a β exists since M > ϵ. Then N(β)N(β) = ±1. If β = − β1 then −M < − β1 < M and if β = β1 then also −M < β1 < M. Thus there are only finitely many units β with 1 < β < M and of course there is at least one ϵ. Let ϵ0 be the smallest positive unit greater than 1. If β is any positive unit then there is a unique integer s with ϵs ≤ β < ϵs+1 . Then 1 ≤ βϵ0−s < ϵ0 . Since βϵo−s is also a unit we must have βϵ−s = 1. If β < 0 then −β is positive and −β = ϵ0s for some s ∈ ℤ, completing the proof. Now, let d > 1. For the classification of the units in real quadratic number fields we need some technical preparations.

11.6 Quadratic fields and quadratic integers | 245

If d = 2 the fundamental unit is ϵ0 = 1 + 2 and for d = 5 a fundamental unit is 1 √5). However even for small discriminants, computation of the fundamental unit (1 + 2 √

can be quite difficult. For example the fundamental unit for d = 34 is 35 + 6√34. Now what can be said about primes and prime factorization for quadratic integers? We saw in Section 11.5 that there is always a prime factorization. However our example in ℚ(√−5) shows that this is not always unique. Since there is a norm in every 𝒪K the first question to ask is when this is an Euclidean norm or equivalently which 𝒪K are Euclidean domains. We have already seen that the Gaussian integers are Euclidean. We state several results concerning these questions (see [20]). Theorem 11.43. Suppose K = ℚ(√d) with d < 0 and squarefree is an imaginary quadratic number field. Then 𝒪K is Euclidean (with respect to the norm) if and only if d = −1, −2, −3, −7, −11. We let Õd stand for 𝒪K when K = ℚ(√d). The rings Õ−1 , Õ−2 , Õ−3 , Õ−7 , Õ−11 are called the Euclidean imaginary quadratic number rings. They and matrix groups with entries from them have been investigated extensively (see [7] and [9]). Remark 11.44. It is of interest to add the following extension to Theorem 11.43. Suppose K = ℚ√d with d < 0 and squarefree is an imaginary quadratic number field. Then 𝒪K is an Euclidean domain if and only if the norm NK is an Euclidean map. This can be seen as follows. Let d < 0 and squarefree. Suppose that d is different from the numbers −1, −2, −3, −7 and −11. We define E = {0} ∪ 𝒪∗K where 𝒪∗K = {1, −1} is the set of the units of 𝒪K . Assume that 𝒪K is an Euclidean domain with Euclidean map φ. Then there exists an element α ∈ 𝒪K ⧵ E such that φ(α) is minimal for the set 𝒪K ⧵ E. Then for every β ∈ 𝒪K there is a γ ∈ 𝒪K with (β − γα) ∈ E because of the minimality of φ(α). Since E has only three elements, the principal ideal (α) has, as an abelian group, an index less than or equal 3 in 𝒪K . Hence, we must have NK (α) ≤ 3 (see for instance [10]) which contradicts the choice of d and α. In the real case we have the following. Theorem 11.45. The real quadratic fields K = ℚ(√d) for which 𝒪K is Euclidean are for d = 2, 3, 5, 6, 7, 11, 13, 17, 19, 21, 29, 33, 37, 41, 57, 73. Recall being a principal ideal domain always implies unique factorization, see for instance [10, Section 6.2.3]. It was conjectured by Gauss and finally proven in several

246 | 11 Quadratic number fields and Pell’s equation results by K. Heegner (1893–1965), A. Baker (born 1939), and H. Stark (born 1939) that there are only finitely many imaginary quadratic number fields whose integer rings are principal ideal domains. Theorem 11.46. Suppose K = ℚ(√d) with d < 0 is an imaginary quadratic number field. Then 𝒪K is a principal ideal domain if and only if d = −1, −2, −3, −7, −11, −19, −43, −67, −163. It has been conjectured that there are infinitely many real quadratic fields whose integral rings are principal ideal domains. In the case where 𝒪K does have unique factorization we can analyze the primes. We state the following and leave the proof to the exercises. Theorem 11.47. Suppose K is a quadratic field and suppose 𝒪K is a unique factorization domain. Then (1) To each prime number π ∈ 𝒪K there corresponds one and only one rational prime number p such that π ∣ p. (2) Any rational prime p is either a prime in 𝒪K or a product π1 π2 of two primes (not necessarily distinct) from 𝒪K . In this case if π1 ≠ π2 , we say p is decomposed. If π1 = π2 , so that p = π 2 , we say the rational prime is ramified. (3) All primes in 𝒪K are either rational primes or the two factors of rational primes (and their associates).

Exercises 1.

Let K = ℚ(θ) be an algebraic number field of degree n. For α ∈ K define the mapping Tα ∶ K → K by Tα (x) = αx.

2.

Show that this is a linear transformation of the n-dimensional ℚ-vector space K. Let K = ℚ(√−d) with d squarefree. Let ω = √d if d ≡ 2 mod 4 or d ≡ 3 mod 4 and

let ω = 1+2 d if d ≡ 1 mod 4. Show that every integer in 𝒪K is uniquely of the form m + nω, m, n ∈ ℤ and so {1, ω} is an integral basis. √ 3. Let d = 3, K = Q(√−d) and ω = −1+i2 3 . Show that ±ω, ±ω are units in 𝒪K . (Note that ω3 = 1.) 4. Prove: If α, β ∈ ℤ[i] then: (a) N(α) is an integer for all α ∈ ℤ[i]. (b) N(α) ≥ 0 for all α ∈ ℤ[i]. (c) N(α) = 0 if and only if α = 0. (d) N(α) ≥ 1 for all α ≠ 0. √

Exercises | 247

5.

6.

(e) N(αβ) = N(α)N(β) that is the norm is multiplicative. (f) The element u ∈ ℤ[i] is a unit if and only if N(u) = 1. (g) If π ∈ ℤ[i] and N(π) = p, where p is an ordinary prime in ℤ, then π is a prime in ℤ[i]. Consider the Gaussian integers ℤ[i]. Show: (a) The only units in ℤ[i] are ±1, ±i. (b) Suppose π is a Gaussian prime. Then π is either: (i) a positive rational prime p ≡ 3 mod 4 or an associate of such a rational prime. (ii) 1 + i or an associate of 1 + i. (iii) a + bi or a − bi where a > 0, b > 0, a is even and N(π) = a2 + b2 = p with p a rational prime congruent to 1 mod 4 or an associate of a + bi or a − bi. Use the multiplicativity of the norm to show that in ℤ[√−5] the numbers 3,

7,

1 + 2i√5,

1 − 2i√5

are all primes and not associates of each other. Recall that N(a + bi√5) = a2 + 5b2 .

7.

Since 21 = 3 ⋅ 7 = (1 + 2i√5)(1 − 2i√5) this shows that prime factorization is not unique in ℤ[√−5]. Classify the quadratic number fields K with discriminant −100 ≤ dK ≤ 100.

8. Proof Theorem 11.47, that is, suppose K is a quadratic field and suppose 𝒪K is a unique factorization domain. Then (1) To each prime number π ∈ 𝒪K there corresponds one and only one rational prime number p such that π ∣ p. (2) Any rational prime p is either a prime in 𝒪K or a product π1 π2 of two primes (not necessarily distinct) from 𝒪K . In this case if π1 ≠ π2 , we say p is decomposed. If π1 = π2 , so that p = π 2 , we say the rational prime is ramified. (3) All primes in 𝒪K are either rational primes or the two factors of rational primes (and their associates).

12 Transcendental numbers and the numbers e and π 12.1 The numbers e and π The five numbers 0, 1, e, i, π are the five most important numbers in common mathematics and the basis of most analysis. They are tied together, as we saw in Chapter 10, by Euler’s magic formula eiπ + 1 = 0. This is a truly amazing result given how disparately these numbers arise; 0 as the additive identity in ℤ, 1 as the unity in ℤ, i as the imaginary unit, π as the ratio of the circumference of a circle to its diameter and e as the basis for the natural logarithms. There would be seem to be no relation between them. In this chapter we examine the two amazing numbers e and π and show how they appear, sometimes unexpectedly, throughout mathematics. After this we prove that they are both transcendental numbers. It has been said that the proof by C. L. F. von Lindemann (1852–1939), that π is transcendental, is one of the crowning achievements of nineteenth century mathematics. We first look at the definitions, some basic properties and the calculations of e and π. The constants e and π arise in diverse ways. The definition of π denoting the ratio of the circumference of a circle to its diameter was known and used more or less already in all ancient advanced civilizations. In Euclid’s twelfth book of the elements we find the following theorem. Theorem 12.1. Let C1 and C2 be two circles with radii r1 and r2 and areas a(C1 ) and a(C2 ), respectively. Then a(C1 ) r12 = . a(C2 ) r 2 2 Proof. We use the principle of exploitation. Let C be a circle with area a(C), and let ϵ ∈ ℝ, ϵ > 0. Then there exists an inner regular polygon P with a(C) − a(P) < ϵ, where a(P) is the area of P. For C1 and C2 there are three possibilities: a(C1 ) r12 = , a(C2 ) r 2 2 We assume that

a(C1 ) a(C2 )

. a(C2 ) r 2 2

Then a(C2 ) =

DOI 10.1515/9783110516142-012

or

a(C1 ) ⋅ r22 r12

=∶ S.

250 | 12 Transcendental numbers and the numbers e and π Let ϵ ∶= a(C2 ) − S > 0. Then there exists an inner regular polygon P2 with a(C2 ) − a(P2 ) < ϵ = a(C2 ) − S, and hence a(P2 ) > S. We consider the respective polygon P1 in C1 . We get a picture like Figure 12.1.

Figure 12.1: Polygon P1 and P2 .

By the intercept theorem we obtain a(P1 ) r12 = , a(P2 ) r 2 2 and hence, a(P1 ) r12 a(C1 ) = = . a(P2 ) r 2 S 2 It follows a(C1 ) S = > 1, a(P2 ) a(P1 ) that is, S > a(P2 ) which gives a contradiction. By symmetry also a(C1 ) r12 > a(C2 ) r 2 2

12.1 The numbers e and π

| 251

is not possible because then 2 a(C2 ) r2 < 2. a(C1 ) r 1

Hence we must have

a(C1 ) r12 = . a(C2 ) r 2 2

It was also well-known that π is the area of a unit circle and after trigonometric functions were discovered it was known that π is the smallest positive real number such that cos( π2 ) is zero. The exact calculation of π was a major preoccupation of Greek, Arab and Indian mathematicians. We will say more of this in the next section. The number e was discovered by J. Bernoulli (1655–1705) but named e in honor of Euler who did the first major studies on it. Bernoulli discovered it while determining a formula for compound interest. His definition was then the following limit (see exercise 12.2): 1 n e = lim (1 + ) . n→∞ n It became quickly clear that this number was related to logarithmic functions. Napier (1550–1617) introduced basic logarithms defined for each positive base a > 0 by y = loga (x) ↔ ay = x. As differential calculus developed after I. Newton (1642–1727) and G. W. Leibniz (1646–1716) it followed from Bernoulli’s definition of e that starting with any positive base a > 0 we have that if y = loga (x) then y′ = x1 loga (e). Hence for base e we have

y′ = x1 and therefore e is the natural base and y = loge (x) = ln(x) is the natural logarithmic function. Further y = ex is the natural exponential function with y′ = ex also. The differential equation y′ = y has y = cex as its unique solution. This is the basis for exponential growth, that is when a substance changes over time at a constant proportional rate, the situation for many populations and substances, the unique solution is A(t) = A0 ekt , where A(t) is the amount at time t, A0 is the initial amount and k is the constant proportional rate. 12.1.1 Calculation e of π The value of e can be computed from its limit definition 1 n e = lim (1 + ) . n→∞ n

252 | 12 Transcendental numbers and the numbers e and π However it is easier to compute from a power series representation. The Taylor series for y = ex is well-known as ex = 1 + x +

xn x2 +⋯+ + ⋯. 2 n!

It follows that e=1+1+

1 1 1 + + ⋯ + + ⋯. 2 6 n!

This is a very rapidly converging series and using n = 10 we get a value of e that has 3 an error of less than 10! = 3 ⋅ 10−7 . The calculation of π is more difficult. Historically attempts at achieving values for π were done in all ancient cultures. Hippocratos (ca. 430 BC) used this quotient in Theorem 12.1 for his little moon theorem which was then used to find a value for π. We give an example, see Figure 12.2.

Figure 12.2: Example for the little moon theorem.

Let C1 be the area of the half circle with diameter AC and C2 be the area of the half circle with diameter AB. Theorem 12.2. The area L of the little moon is equal to the area T of the triangle. Proof. We have the equations L + S = C1 ,

2S + 2T = C2 .

Also (AB)2 = 2(AC)2

12.1 The numbers e and π

| 253

by the Theorem of Pythagoras. By Theorem 12.1 we have 1 2 C1 2 (AC) (AC)2 1 = = = . 1 C2 2 2 2 (AB) 2(AB) 2

Therefore C1 L+S 1 = = C2 2(S + T) 2

and

2(S + T) = 2(S + L),

that is, T = L. We will give a proof of the general version of the little moon theorem (Theorem 12.2), which uses the Theorem of Thales (see Theorem 14.18), that is, if A, B and C are points on a circle where the line AB is a diameter of the circle, then ACB is a right triangle. Theorem 12.3. The addition of the areas P and Q of the little moons is equal to the area R of the triangle, that is, R = P + Q (see Figure 12.3).

Figure 12.3: General little moon theorem.

Proof. Let P ′ be the area of the brown half circle with diameter a, that is, 1 a 2 P ′ = ( ) π, 2 2 and Q′ be the area of the green half circle with diameter b, that is, 1 b 2 Q′ = ( ) π. 2 2 In addition, R′ denotes the area of the blue half circle with diameter c, that is, 1 c 2 R′ = ( ) π 2 2

and R′ = R + X + Y.

254 | 12 Transcendental numbers and the numbers e and π It follows P + Q = P ′ + Q′ − (X + Y) = P ′ + Q′ − (R′ − R) = R + P ′ + Q′ − R′ 1 b 2 1 c 2 1 a 2 =R+ ( ) π+ ( ) π− ( ) π 2 2 2 2 2 2 a2 b2 c2 =R+ π+ π− π 8 8 8 1 2 2 = R + π(a + b − c2 ). 8 We know from the Theorem of Thales (Theorem 12.2), that the triangle in Figure 12.3 with the sides a, b, and c is a right angle triangle, with hypotenuse c and therefore a2 + b2 = c2 , and thus we get 1 P + Q = R + π(a2 + b2 − c2 ) 8 1 = R + π(c2 − c2 ) 8 = R. The Babylonians worked for practical reasons with πB = 3. This uses the following idea, see Figure 12.4.

Figure 12.4: Idea from the Babylonians.

Let C be a circle with radius r and F = a(C) its area. Let F0 and F1 be the area of the outer and inner square, respectively. We have F0 = 2r ⋅ 2r = 4r 2 and F1 = x2 and then F1 = x 2 = 2r 2 by the Theorem of Pythagoras. Now, 2F ≈ F0 + F1 , and hence 1 F ≈ (4r 2 + 2r 2 ) = 3r 2 . 2

12.1 The numbers e and π

| 255

The value πB = 3 matches with the value in the bible (see the first book of the kings, Chapter 7, verse 23, and the second book of Chronicles, Chapter 4, verse 2). The Babylonians used πB = 3 for practical reasons but they knew it more accurately. On a Babylonian clay tablet from ca. 2000 BC was 1936 the following value found for π: 36 3 57 ≈ + , π 60 602

that is, π ≈ 3, 125. The old Egyptians used for land surveying the approximate value π ≈4⋅

64 = 3,16049 … . 81

Archimedes using inscribed and circumscribed polygons determined that 223 22 1 and we will deduce a contradiction. From the power series evaluation we then have ∞

a 1 =∑ . b n=0 n! Now let b

b

b

1 a b! 1 ) = b!( − ∑ ) = a(b − 1)! − ∑ . n! n! n! b n=0 n=0 n=0

x = b!(e − ∑

Since in the sum b ≥ n for each summand we have x is a natural number. Now b

b! n!

is a natural number and therefore

∞

1 b! . )= ∑ n! n! n=1 n=b+1

x = b!(e − ∑ We have

b! 1 1 = < n! (b + 1)(b + 2) ⋯ (1 + n − b) (b + 1)n−b

for n > b.

Hence ∞

∞

∞

1 b! 1 < ∑ =∑ . n−b k n! n=b+1 (b + 1) n=b+1 k=1 (b + 1)

x= ∑

12.2 The irrationality of e and π

| 257

However by the sum of a geometric series ∞

1 1 1 1 = ⋅ = < 1. 1 j b + 1 b 1 − b+1 k=1 (b + 1) ∑

This is a contradiction since x is a natural number and therefore e must be irrational.

The proof that π is irrational is more complicated and we need some preparatory material. Preparation: Let n ∈ ℕ0 = ℕ ∪ {0} and k ∈ {0, 1, … , n}. The binomial coefficient ( nk ) is defined by n(n − 1) ⋯ (n − k + 1) n n! = , ( )= k! k!(n − k)! k where m! ∶= ∏m l for m ∈ ℕ and 0! = 1. l=1 From this definition we see that n n ( )=( )=1 n 0

n n and ( ) = ( ). k n−k

If k ∈ ℤ with k < 0 or k > n we define ( nk ) = 0 for n ∈ ℕ0 . We have the following functional equation: n+1 n n ( )=( )+( ) k k k−1

for n ∈ ℕ0 .

Proof. This is obvious for k ≤ 0 and k ≥ n + 1. Now let 1 ≤ k ≤ n. Then n(n − 1) ⋯ (n − k + 1) n(n − 1) ⋯ (n − k + 2) n n + ( )+( )= k! (k − 1)! k k−1 n(n − 1) ⋯ (n − k + 2) n − k + 1 = ⋅( + 1) (k − 1)! k n(n − 1) ⋯ (n − k + 2) ⋅ (n + 1) = k! (n + 1)n(n − 1) ⋯ (n + 1 − k + 1) = k! n+1 =( ). k Lemma 12.5. If n ∈ ℕ then (1 + n1 )n < 3.

258 | 12 Transcendental numbers and the numbers e and π Proof. This is certainly correct for n = 1. Let n ≥ 2 and 2 ≤ k ≤ n. We need a preparation: 1 1 n n! ⋅ = ( )⋅ nk k!(n − k)! nk k n(n − 1) ⋯ (n − k + 1) 1 = ⋅ k! nk 1 n n−1 n−k+1 = ⋅ ⋅ ⋯ n ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ n n ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ k! ⏟ =1

0. 0

By Lemma 12.10 we have c

∫ pk (x)f (x)dx ∈ ℤ. 0

Together with c

0 < ∫ pk (x)f (x)dx 0

we get c

1 ≤ ∫ pk (x)f (x)dx ≤ cL 0

Mk . k! k

But as in the proof of Theorem 12.8 we have limk→∞ cL Mk! = 0 which gives a contradiction. Therefore c ∉ ℚ. Corollary 12.11. π and e are irrational. If q ∈ ℚ, q ≠ 0, then eq is irrational. Proof. (a) We apply Theorem 12.9 for c = π and f (x) = sin(x). We have sin(x) > 0 for 0 < x < π and sin(0) = sin(π) = 0. As indefinite integrals we may take ± sin(x) and ± cos(x) which satisfy the assumptions because cos(0) = 1, cos(π) = −1. Now π ∉ ℚ by Theorem 12.9. (b) Let q > 0 and eq = mn , m, n ∈ ℕ, gcd(m, n) = 1. We apply Theorem 12.9 for c = q and f (x) = nex . We have f (0) = n, f (q) = m ∈ ℤ. As indefinite integrals we may take f (x) itself. From Theorem 12.9 we get that q is irrational which means eq ∈ ℚ ⟹ q ∉ ℚ. Therefore q ∈ ℚ ⟹ eq ∉ ℚ. If q < 0 then we apply e−q =

1 eq

and get the statement.

Recall that e is the Euler number which is the basis for the exponential function ez , z ∈ ℂ. For a definition of e see the exercises.

12.3 e and π throughout mathematics | 263

12.3 e and π throughout mathematics Besides being the basic constants involved in exponential growth and logarithmic functions and in trigonometry and the study of circular functions, e and π arise, often unexpectedly, throughout mathematics. In this section we discuss several of these occurrences including the normal distribution and Stirling’s approximation. At the end of this chapter we describe another such result surprisingly tying π to the set of prime numbers.

12.3.1 The normal distribution The normal density function with parameters μ and σ is the function defined for all real numbers x by f (x) =

1

√2πσ

1

e− 2 (

x−μ 2 ) σ

.

The specialization of this function with μ = 0 and σ = 1 is called the standard normal density. This has the form f (x) =

1 − 21 x2 . e √2π

In the context of our physical world this is perhaps the single most important function since it describes the probability density of almost all naturally occurring continuous measurements. Further through the central limit theorem many other distributions are approximated by it. The graph of this function is the well-known normal curve or bell curve or Gaussian curve. The curve was originally discovered by DeMoivre in 1738 but it was Gauss who intensely studied it and determined the appropriate constants. Prominently included in this function are our constants e and π. It was known by observation that the frequency curve for most continuous measurements was bell2 shaped and approximately fit the curve y = e−x . (The factor 21 is to have the inflection points match what was observed.) But why the presence of π? From probability theory a density function must have a total area of 1. In the case of the normal curve the factor √π comes from the following amazing result due to Laplace (1749–1827) which says 2 that the total area beneath y = e−x from x = −∞ to x = ∞ is √π, a result that no one would guess at. Besides this result, Laplace also proved the earliest versions of the central limit theorem. 2

Theorem 12.12. ∫−∞ e−x dx = √π. ∞

Proof. The proof is a tricky exercise in multiple integration using a change of vari2 ∞ ables to polar coordinates. Let A = ∫−∞ e−x dx. By symmetry we have A = 2B where

264 | 12 Transcendental numbers and the numbers e and π 2

B = ∫0 e−x dx. Then ∞

∞

∞

2

2

∞

∞

0

0

2

2

B2 = ∫ e−x dx ∫ e−y dy = ∫ ∫ ex +y dxdy 0

0

converting this to a double integral. Now the region −∞ < x < ∞, −∞ < y < ∞ is the whole first quadrant and hence 2

B2 = ∫ ∫

first quadrant

2

e−(x +y ) dxdy.

We now convert to polar coordinates. We have x 2 + y2 = r 2 , dxdy = r drdθ and the first quadrant bounds in polar coordinates are 0 < r < ∞, 0 < θ < π2 . Hence π 2

∞

2

B2 = ∫ ∫ e−r r drdθ. 0

0

Evaluating we have − 21 e−r |∞ 0 = 2

1 2

and therefore B2 = ∫

π 2

0

Hence B =

√π 2

π 1 dθ = . 2 4

and A = √π.

12.3.2 The Gamma Function and Stirling’s approximation The second thing we look at is Stirling’s approximation of factorials. This was also originally proposed by DeMoivre but Stirling (1692–1770) found the appropriate constant √2π. Factorials n! arise in many different areas of mathematics. We saw them in the last section on binomial coefficients and they are the denominators in Taylor expansions. Factorials grow very large very rapidly. 8! is already 40 320 and standard hand calculators can handle a maximum of 69! which is over 10100 . Stirling discovered the following approximation which is quite exact ratio-wise and extremely useful in computations. Again it surprisingly involves e and π. We say that two functions f (x) f (x) and g(x) are asymptotically equal, which we denote by f (x) ≊ g(x) if limx→∞ g(x) = 1. Theorem 12.13 (Stirling’s approximation). n n n! ≊ √2πn( ) e

as n → ∞.

In order to prove this we must introduce the Gamma Function which also arises in many different areas of mathematics including differential equations and probability theory.

12.3 e and π throughout mathematics | 265

Definition 12.14. For x > 0 the Gamma Function is given by ∞

Γ(x) = ∫ t x−1 e−t dt. 0

The following lemma gives a recursion relation for Γ(x) which shows that the Gamma Function is a generalization of factorials. Lemma 12.15. For real x > 0 we have Γ(x + 1) = xΓ(x). This lemma is just an application of integration by parts and we leave it to the exercises. However by straightforward calculation we find Γ(1) = 1 and hence Γ(2) = 1, Γ(3) = 2 = 2! and in general Γ(n) = (n − 1)!. Lemma 12.16. For any natural number n ≥ 1 we have Γ(n) = (n − 1)!. Using Theorem 12.12 we can also get the following which becomes important in some statistical applications (see exercise 7). Lemma 12.17. Γ( 21 ) = √π. We use these facts to prove Stirling’s approximation. Proof of Stirling’s approximation. We have from above that n! = Γ(n + 1). Therefore ∞

∞

0

0

n! = Γ(n + 1) = ∫ t n e−t dt = ∫ en ln t−t dt. We introduce the change of variable x = t−n so that t = n + √nx and dt = √ndx. Notice √n that t = 0 corresponds to x = −√n. We then have ∞

n! = ∫

−√n

en ln(n+√nx)−n−√nx √ndx.

Now ln(n + √nx) = ln (n(1 +

x x )) = ln n + ln (1 + ). √n √n

Recall that the power series expansion for ln(1 + x) for |x| < 1 is ln(1 + x) = x −

x2 x3 x4 + − + ⋯. 2 3 4

Therefore ln (1 +

x x x2 − + ⋯. )= √n √n 2n

266 | 12 Transcendental numbers and the numbers e and π For n → ∞ then only the first two terms are relevant and ln (1 +

x x2 x − . )≊ √n √n 2n

Putting this back into the integral we have ∞

n! ≊ ∫

−√n

x2

∞

en ln n−n− 2 √ndx = √nen ln n−n ∫

−√n

1 2

e− 2 x dx.

From Laplace’s result we have 1 2

∞

lim ∫

n→∞ −√n

e− 2 x dx = √2π.

Combining this with the above for n → ∞ we get n n n! ≊ en ln n−n √n√2π = √2πn( ) e which is Stirling’s approximation. 12.3.3 The Wallis Product Formula As we have mentioned the computation of π was of tremendous interest to mathematicians during ancient times and during the period of the development of calculus. Several different infinite series and infinite products were used in this computation. In this section we give a remarkable product formula for π given by J. Wallis (1616–1703) in 1655. After we prove the Wallis result we will use it to give an alternative proof of Stirling’s approximation. Theorem 12.18 (The Wallis Product Formula). n

(2k)2 π = lim ∏ . 2 n→∞ k=1 (2k − 1)(2k + 1) Proof. For n ∈ ℕ ∪ {0} we consider the integrals π

cn = ∫ sinn (x)dx. 0

By direct computation we have c0 = π and c1 = 2. For n ≥ 2 using integration by parts we obtain π

cn = − cos(x) sinn−1 (x)|π0 + (n − 1) ∫ sinn−1 (x) cos2 (x)dx π

0

n−2

= (n − 1) ∫ sin 0

2

(x)(1 − sin (x))dx.

12.3 e and π throughout mathematics | 267

This gives the recursion relation cn = (n − 1)cn−2 − (n − 1)cn or cn =

n−1 c n n−2

for n ≥ 2.

Since c0 = π and c1 = 2 this leads to c2n =

2n − 1 2n − 3 3 1 ⋯ π 2n 2n − 2 72

and c2n+1 =

2n 2n − 2 4 2 ⋯ ⋅ 2. 2n + 1 2n − 1 53

If 0 ≤ x ≤ π then sin2n (x) ≥ sin2n+1 (x) ≥ sin2n+2 (x) since for x ∈ [0, π] we have 0 ≤ sin(x) ≤ 1. Integration then gives c2n ≥ c2n+1 ≥ c2n+2 =

2n + 1 c . 2n + 2 2n

It follows that 1≥

c2n+1 2n + 1 ≥ c2n 2n + 2

and therefore lim

n→∞

c2n+1 = 1. c2n

Now c2n+1 2n 2n − 2 2n − 2 2n 4 4 2 22 = ⋯ ⋅ ⋅ ⋅ . c2n 2n + 1 2n − 1 2n − 1 2n − 3 5 3 3 1π Combining this with the limit we get the Wallis formula n

(2k)2 π = lim ∏ . 2 n→∞ k=1 (2k − 1)(2k + 1) Remark 12.19. (1) This proof is based on elementary calculus. There is a simpler proof if one uses some more advanced results. Euler, studying how to express a complex function in terms of its zeros, proved the following product formula involving sin(x): sin(x) x2 = ∏ (1 − 2 2 ). x nπ n=1 ∞

Letting x =

π 2

we get the Wallis product.

268 | 12 Transcendental numbers and the numbers e and π (2) Surprisingly the same formula arises in quantum mechanical calculations of the energy levels of the hydrogen atom. We now use Wallis’s result to give an alternative proof of Stirling’s approximation. We first need the following preparatory ideas. In Wallis’s formula if we expand the n-th partial product with ∏nk=1 (2k)2 we also obtain as a corollary. Corollary 12.20. 24n n! π = lim . 2 n→∞ (2n)!(2n + 1)! Now let en = (1 + n1 )n and en∗ = (1 + n1 )n+1 . From exercise 2 we have that en ≤ en∗ for all n ∈ ℕ. Then we get: Theorem 12.21. e( ne )n ≤ n! ≤ en( ne )n for all n ∈ ℕ. Proof. This is clear for n = 1. For n ≥ 2 we multiply the inequalities ek ≤ e ≤ ek∗ for k = 1, 2, … , n − 1 by each other. We have en−1 en−2 ⋯ e2 e1 = (

3 2 2 nn−1 n n−1 ) ⋯( ) ( ) = n−1 2 1 (n − 1)!

and analogously ∗ ∗ en−2 ⋯ e2∗ e1∗ = en−1

nn . (n − 1)!

Using en ≤ e ≤ en∗ provides the inequalities nn−1 nn ≤ en−1 ≤ (n − 1)! (n − 1)! proving the theorem. Remark 12.22. Note that the geometric mean of both sides in the theorem is e√n( ne )n which is of interest in itself. We now give a proof of Stirling’s approximation which we state in a slightly different way and which puts a bound on the Stirling ratio. Theorem 12.23 (Stirling’s approximation). 1≤

n!

√2πn( n )n e

1

≤ e 12n .

12.3 e and π throughout mathematics | 269

Proof. We consider the sequences an =

n!

1 nn+ 2 e−(n−1)

and 1 bn = ln(an ) = ln(n!) + (n − 1) − (n + ) ln(n). 2 Then 1 3 bn − bn+1 = ln(n!) − ln ((n + 1)!) + n − 1 − n − (n + ) ln(n) + (n + ) ln(n + 1) 2 2 and hence n+1 1 bn − bn+1 = (n + ) ln ( ) − 1. 2 n We first show that this sequence converges. 2μ+2 1 Let x = 2μ+1 for μ ∈ ℕ. Then 1+x = 2μ . Recall, as in the other proof of Stirling’s 1−x ) is given by approximation, that for |x| < 1 the Taylor series for ln( 1+x 1−x ln ( Now let x =

1 2μ+1

1+x x2k+1 . )=2∑ 1−x 2k + 1 k=0 ∞

and multiply the series by

2μ+1 2

we get

μ+1 1 1 . (μ + ) ln ( )=1+ ∑ 2 μ (2k + 1)(2μ + 1)2k k=1 ∞

Now ∞

∞

1 1 1 1 1 1 < ∑ = ( − ). 2k 2k 3 12 μ μ + 1 (2k + 1)(2μ + 1) (2μ + 1) k=1 k=1 ∑

This leads to μ+1 1 1 1 1 0 < bn − bn+1 = (μ + ) ln ( )< ( − ). 2 μ 12 μ μ + 1 If we take this for μ = n, … , n + k − 1 and add up then we get 0 < bn − bn+k
10nk! 10(k+1)!

contradicting the equality from the Mean Value Theorem. Therefore c is transcendental.

272 | 12 Transcendental numbers and the numbers e and π Remark 12.25. The number ∞

1 j! 10 j=1

c=∑

in Theorem 12.24 is often denoted by λ and called the Liouville constant. It is ∞

1 = 0,11000 10000 00000 00000 00010 … . k! k=0 10

λ ∶= ∑

We now give an extension of Theorem 12.24. Recall that an irrational algebraic number α is said to have degree n if the minimal polynomial of α over ℚ has degree n. Theorem 12.26. For each irrational algebraic number α ∈ ℝ of degree n exists a constant c > 0 with p c |α − | ≥ n q q for all p, q ∈ ℤ. Proof. Let n

f (x) = ∑ ai x i i=0

be the minimal polynomial of α, that is, the polynomial with the smallest degree with α as a zero. It is f (x) = g(x)(x − α) with g(α) ≠ 0, because f is minimal. Due to the continuity of rational functions there exists a ρ > 0 with g(x) ≠ 0 for all x ∈ [α − ρ, α + ρ]. Let c = min{ρ, M −1 } with M = max{|g(x)| ∣ x ∈ [α − ρ, α + ρ]}. For p, q ∈ ℤ we get two cases. For c = ρ we get p c |α − | ≥ ρ ≥ n , q q in the other case we get p |α − | < ρ. q The latter implies g( pq ) ≠ 0, but this provides f ( pq ) | ∑n ai pi qn−i | p 1 c ≥ n ≥ n. |α − | = | p | = i=0 p q q M q g( q ) |qn g( q )|

12.5 The transcendence of e and π

| 273

Definition 12.27. A number α ∈ ℝ is called a Liouville-Number, if for all n ∈ ℕ exist integers p, q ∈ ℤ with p 1 |α − | < n . q q Theorem 12.28. Each Liouville-Number is transcendent. Proof. We first show that Liouville-Numbers are irrational. Assume α = and 2n−1 > d. Thus, for all p ∈ ℤ and q ∈ ℕ with q > 1 we get

c d

is rational

p 1 c p 1 1 > ≥ . |α − | = | − | ≥ q d q qd 2n−1 q qn Therefore α is not a Liouville-Number. Now, suppose α is an irrational algebraic Liouville-Number. Choose the constant c > 0 from the previous Theorem 12.26 and r with 2r > c1 . Since α is a Liouville-Number we get integers p, q ∈ ℤ with p 1 c 1 |α − | < n+r ≤ r n < n , q q 2q q but this is a contradiction to Theorem 12.26. Example 12.29. The Liouville constant ∞

1 = 0,11000 10000 00000 00000 00010 … k! k=0 10

λ ∶= ∑

is a Liouville-Number and hence transcendent, because ∞

1 1 . |≤ n! )n k! (10 10 k=0

|λ − ∑

We now show in Section 12.5 that both e and π are transcendental. The transcendence of e was proved first by Hermite in 1873 while Lindemann in 1881 proved the transcendence of π.

12.5 The transcendence of e and π We need some preparations about line segment integrals in ℂ. Let I = [a, b] ⊂ ℝ, a ≤ b. We consider functions f ∶ I → ℂ. Such a function can be decomposed as f (t) = α(t) + iβ(t) with real functions α ∶ I → ℝ and β ∶ I → ℝ, that is, α(t) = Re(f (t)) and β(t) = Im(f (t)).

274 | 12 Transcendental numbers and the numbers e and π Definition 12.30. Let f (t) = α(t) + iβ(t) as above. (a) f is called continuous in I if both α and β are continuous in I. (b) f is called differentiable in I, if both α and β are differentiable in I. Notation: f ′ (t) = α′ (t) + iβ′ (t) if f is differentiable in I with derivation f ′ . Remark 12.31. Let f , g ∶ I → ℂ. (a) If f and g are continuous in I, then also c1 f + c2 g and cf ⋅ g, c, c1 , c2 ∈ ℂ. (b) If f and g are differentiable in I, then also c1 f +c2 g, c1 , c2 ∈ ℂ, and then (c1 f +c2 g)′ = c1 f ′ + c2 g ′ . This follows easily from the definition, especially if one first considers cf , c ∈ ℂ, and then f + g. Lemma 12.32 (Rule for products). Let f , g ∶ I → ℂ be differentiable in I. Then also f ⋅ g ∶ I → ℂ is differentiable with (f ⋅ g)′ (t) = f ′ (t) ⋅ g(t) + f (t) ⋅ g ′ (t). Proof. Let f (t) = α(t) + iβ(t)

and g(t) = γ(t) + iδ(t)

as above. Then ((α + iβ)(γ + iδ))′ (t) = ((αγ − βδ) + i(αδ + βγ))′ (t) = (α′ (t)γ(t) + α(t)γ ′ (t) − β′ (t)δ(t) − β(t)δ′ (t)) + i(α′ (t)δ(t) + α(t)δ′ (t) + β′ (t)γ(t) + β(t)δ′ (t)) = α′ (t)(γ(t) + iδ(t)) + iβ′ (t)(γ(t) + iδ(t)) + α(t)(γ ′ (t) + iδ′ (t)) + iβ(t)(γ ′ (t) + iδ′ (t)) = f ′ (t)g(t) + f (t)g ′ (t). Examples 12.33. (1) Let f ∶ I → ℂ, f (t) = (zt)n for z ∈ ℂ and n ∈ ℕ. Then f ′ (t) = nz(zt)n−1 . This is clear for n = 1. Let n ≥ 2, and define g(t) = (zt)n−1 and h(t) = zt. Then the statement follows from the rule for products by induction. (2) Let f ∶ I → ℂ, f (t) = ezt , z ∈ ℂ. Then f ′ (t) = zezt . This we see as follows. Let z = x + iy, x, y ∈ ℝ. Then ezt = ext+iyt = ext eiyt = ext (cos(yt) + i sin(yt)).

12.5 The transcendence of e and π

| 275

Then (ezt )′ = xext cos(yt) − yext sin(yt) + i(xext sin(yt) + yext cos(yt)) = (x + iy)ext (cos(yt) + i sin(yt)) = zext (cos(yt) + i sin(yt)) = zext eiyt = zezt . Definition 12.34. Let f ∶ I → ℂ be continuous. Then we define the integral f on I by b

b

b

a

a

a

∫ f (t)dt = ∫ α(t)dt + i ∫ β(t)dt, where α(t) = Re(f (t)) and β(t) = Im(f (t)) as above. Remark 12.35. Let f , f1 , f2 ∶ I → ℂ be continuous and c1 , c2 ∈ ℂ. Then b

b

b

a

a

a

∫ (c1 f1 + c2 f2 )(t)dt = c1 ∫ f1 (t)dt + c2 ∫ f2 (t)dt and b

b

a

a

∫ f (t)dt = ∫ f (t)dt. Lemma 12.36. Let f ∶ I → ℂ be continuous and f (t) = α(t) + iβ(t) as above. Let s

F(s) = ∫ f (t)dt. a

Then F is differentiable with F ′ = f . Proof. We have F(s) = γ(s) + iδ(s) with s

γ(s) = ∫ α(t)dt a

s

and δ(s) = ∫ β(t)dt. a

By the real main theorem on differentiation and integration γ and δ are differentiable as real functions with γ ′ (s) = α(s) and δ′ (s) = β(s). It follows that F(s) is differentiable with F ′ (s) = f (s). Definition 12.37. Let f ∶ I → ℂ. A differentiable function F ∶ I → ℂ is called indefinite integral of f if F ′ = f .

276 | 12 Transcendental numbers and the numbers e and π Lemma 12.38. Let f ∶ I → ℂ and F ∶ I → ℂ an indefinite integral of f . A function G ∶ I → ℂ is an indefinite integral of f if and only if there exists a constant c ∈ ℂ with F(t) − G(t) = c for all t ∈ I. Proof. (1) If F(t) − G(t) = c then G is differentiable because F and C̃ ∶ I → ℂ ̃ = c for all t ∈ I C(t) are differentiable. It follows G′ = F ′ since C ′̃ (t) = 0 for all t ∈ I. (2) Let G be an indefinite integral of f . Then G′ = f = F ′ and hence, h′ = 0 for h = F − G. Let h(t) = α(t) + iβ(t) with α(t) = Re(h(t)), β(t) = Im(h(t)). Since h′ (t) = 0 for all t ∈ I we get α′ (t) = β′ (t) = 0 for all t ∈ I. Hence, α(t) = c1 ∈ ℝ and β(t) = c2 ∈ ℝ for all t ∈ I, and therefore h(t) = c1 + ic2 = c for all t ∈ I. Lemma 12.39. Let f ∶ I → ℂ be continuous and F ∶ I → ℂ differentiable with F ′ = f on I. Then b

∫ f (t)dt = F(b) − F(a). a

Proof. For s ∈ I we define s

F0 (s) ∶= ∫ f (t)dt. a

Since f is continuous, F0 ∶ I → ℂ is an indefinite integral of f by Lemma 12.36. Let F ∶ I → ℂ be an arbitrary indefinite integral of F. By Lemma 12.38 there exists a constant c ∈ ℂ with F(s) = F0 (s) + c for all s ∈ I. It follows F(b) − F(a) = F0 (b) − F0 (a) b

a

a b

a

= ∫ f (t)dt − ∫ f (t)dt = ∫ f (t)dt because

a ∫a f (t)dt

a

= 0.

Definition 12.40. Let f ∶I →ℂ f (t) = α(t) + iβ(t) with α(t) = Re(f (t)) and β(t) = Im(f (t)). We call f continuously differentiable on I if (i) f is differentiable on I and (ii) f ′ ∶ I → ℂ is continuous.

12.5 The transcendence of e and π

| 277

Lemma 12.41 (Partial Integration). Let f , g ∶ I → ℂ be continuously differentiable. Then b

b

a

a

∫ f (t)g ′ (t)dt = f (t)g(t)|ba − ∫ f ′ (t)g(t)dt, where f (t)g(t)|ba = f (b)g(b) − f (a)g(a). Proof. Let F ∶= f ⋅ g ∶ I → ℂ. By Lemma 12.32, the rule for products, we get F ′ (t) = f ′ (t)g(t) + f (t)g ′ (t). F ′ (t) is continuous by assumption. By Lemma 12.39 we get b

b

b

a

a

a

∫ f ′ (t)g(t)dt + ∫ f (t)g ′ (t)dt = ∫ F ′ (t)dt = F(t)|ba

∶= f (t)g(t)|ba

= f (b)g(b) − f (a)g(b). Example 12.42. Let f ∶ I → ℂ, f (t) = (z1 t)n , n ∈ ℕ, for some fixed z1 ∈ ℂ and g ∶ I → ℂ, g(t) = ez2 t for some fixed z2 ∈ ℂ. Then b

b

a

a

z2 ∫ (z1 t)n ez2 t = (z1 t)n ez2 t |ba − nz1 ∫ (z1 t)n−1 ez2 t dt. Lemma 12.43. Let f ∶ I → ℂ be continuous and g ∶ I → ℝ be continuous with |f (t)| ≤ g(t) for all t ∈ I. Then b

b

b

a

a

a

| ∫ f (t)dt| ≤ ∫ |f (t)|dt ≤ ∫ g(t)dt. Proof. If s ∈ ℝ then b

b

a

a

eis ∫ f (t)dt = ∫ eis f (t)dt and hence b

b

a

a

Re (eis ∫ f (t)dt) = ∫ (Re (eis f (t)dt))dt b

b

a

a

≤ ∫ |eis f (t)|dt = ∫ |f (t)|dt from the corresponding result for real integrals.

278 | 12 Transcendental numbers and the numbers e and π b

Now, if ∫a f (t)dt = 0, there is nothing to prove. b

b

b

b

Let ∫a f (t)dt ≠ 0. We write ∫a f (t)dt = | ∫a f (t)dt|eiφ with φ = arg ∫a f (t)dt. Now, let

s = −φ. Then

b eis ∫a f (t)dt

is real and positive, hence, b

b

b

a

a

a

eis ∫ f (t)dt = | ∫ f (t)dt| ≤ ∫ |f (t)dt| by the remark above. Now b

b

a

a

∫ |f (t)|dt ≤ ∫ g(t)dt by the monotony of real integrals. Remark 12.44. Let f (x) ∈ ℂ[x] a polynomial in x of degree m ≥ 1. Let z1 ∈ ℂ, z1 ≠ 0, be a fixed complex number. We want to consider the function f (z)ez1 −z on the line segment in ℂ from 0 to z1 . This line segment is parametrized by the function φ ∶ [0, 1] → ℂ, t ↦ tz1 . The line segment integral along the line segment from 0 to z1 is then defined as z1

1

0

0

I(z1 ) = ∫ f (z)ez1 −z dz ∶= ∫ f (φ(t))ez1 −φ(t) φ′ (t)dt 1

= z1 ∫ f (tz1 )ez1 −tz1 dt 0

because φ′ (t) = z1 . By partial integration (Lemme 12.41) we get 1

I(z1 ) = ez1 ( − e−z1 t f (z1 t))|10 + z1 ez1 ∫ e−z1 t f ′ (z1 t)dt 0

because (e−z1 t )′ = −z1 e−z1 t . Hence, I(z1 ) = ez1 (f (0) − e−z1 f (z1 )) + z1 ∫ f ′ (z1 t)ez1 −tz1 dt. If we continue that way we get m

m

j=0

j=0

I(z1 ) = ez1 ∑ f (j) (0) − ∑ f (j) (z1 ).

12.5 The transcendence of e and π

| 279

In what follows we also need an upper estimate for I(z1 ). First |ez1 −z | ≤ e|z1 −z| ≤ e|z1 | for z = tz1 , 0 ≤ t ≤ 1. Define |f |(x) ∶= |a0 | + |a1 |x + ⋯ + |am |xm ∈ ℝ[x]. Then, by the triangle inequality, |f (z)| ≤ |f |(|z|) ≤ |f |(|z1 |) because |z| ≤ |z1 |. Hence, by Lemma 12.43, 1

|I(z1 )| ≤ ∫ |z1 ||f (tz1 )||ez1 −tz1 |dt 0 1

≤ ∫ |z1 ||f |(|z1 |)e|z1 | dt = |z1 ||f |(|z1 |)e|z1 | . 0

Now we are in the position to prove the transcendence of e and π. Theorem 12.45. e is a transcendental number, that is, transcendental over ℚ. Proof. We use our analytical preparations and repeat the relevant statements. Let f (x) ∈ ℝ[x] with the degree of f (x) = m ≥ 1. Let z1 ∈ ℂ, z1 ≠ 0, and γ ∶ [0, 1] → ℂ, γ(t) = tz1 . Let z1

I(z1 ) = ∫ f (z)ez1 −z dz. 0

Recall that z1

z1

0

0

∫ f (z)ez1 −z dz = −f (z1 ) + ez1 f (0) + ∫ ez1 −z f ′ (z)dz. It follows then by repeated partial integration that (1) I(z1 ) = ez1 ∑m f (j) (0) − ∑m f (j) (z1 ). j=0 j=0 Let |f |(x) be the polynomial that we get if we replace the coefficients of f (x) by their absolute values. Since |ez1 −z | ≤ e|z1 −z| ≤ e|z1 | , we get (2) |I(z1 )| ≤ |z1 |e|z1 | |f |(|z1 |).

280 | 12 Transcendental numbers and the numbers e and π Now assume that e is an algebraic number, that is, (3) q0 + q1 e + ⋯ + qn en = 0 for some n ≥ 1 and integers q0 ≠ 0, q1 , … , qn , and the greatest common divisor of q0 , q1 , … , qn , is equal to 1. We consider now the polynomial f (x) = x p−1 (x − 1)p ⋯ (x − n)p with p a sufficiently large prime number, and we consider I(z1 ) with respect to this polynomial. Let J = q0 I(0) + q1 I(1) + ⋯ + qn I(n). From (1) and (3) we get that m

n

J = − ∑ ∑ qk f (j) (k), j=0 k=0

where m = (n + 1)p − 1 since (q0 + q1 e + ⋯ + qn en )(∑m f (j) (0)) = 0. j=0

Now, f (j) (k) = 0 if j < p, k > 0, and if j n, then f (p−1) (0) is an integer divisible by (p − 1)! but not by p!. It follows that J is a nonzero integer that is divisible by (p − 1)! if p > |q0 | and p > n. So let p > n, p > |q0 |, so that |J| ≥ (p − 1)!. Now, |f |(k) ≤ (2n)m . Together with (2) we then get that |J| ≤ |q1 |e|f |(1) + ⋯ + |qn |nen |f |(n) ≤ cp for a number c independent of n. It follows that (p − 1)! ≤ |J| ≤ cp , that is, 1≤

tal.

|J| cp−1 ≤c . (p − 1)! (p − 1)!

This gives a contradiction, since

cp−1 (p−1)!

→ 0 as p → ∞. Therefore, e is transcenden-

We now move on to the transcendence of π. As for showing irrationality this is more complicated than for e. Recall first from the proof of Theorem 11.23 that if α ∈ ℂ is an algebraic number and f (x) = an xn + ⋯ + a0 ,

n ≥ 1,

an ≠ 0,

and all ai ∈ ℤ with f (α) = 0. Then an α is an algebraic integer. Theorem 12.46. π is a transcendental number, that is, transcendental over ℚ.

12.5 The transcendence of e and π

| 281

Proof. Assume that π is an algebraic number. Then θ = iπ is also algebraic. Let θ1 = θ, θ2 , … , θd be the conjugates of θ. Suppose p(x) = q0 + q1 x + ⋯ + qd xd ∈ ℤ[x],

qd > 0,

and

gcd(q0 , … , qd ) = 1

is the entire minimal polynomial of θ over ℚ. Then θ1 = θ, θ2 , … , θd are the zeros of this polynomial. Let t = qd . Then from the discussion above tθi is an algebraic integer for all i. From eiπ + 1 = 0 and from θ1 = iπ we get that (1 + eθ1 )(1 + eθ2 ) ⋯ (1 + eθd ) = 0. The product on the left side can be written as a sum of 2d terms eϕ , where ϕ = ϵ1 θ1 + ⋯ + ϵd θd , ϵj = 0 or 1. Let n be the number of terms ϵ1 θ1 + ⋯ + ϵd θd that are nonzero. Call these α1 , … , αn . We then have an equation q + eα1 + ⋯ + eαn = 0

(12.2)

with q = 2d −n > 0. Recall that all tαi are algebraic integers. We consider the polynomial f (x) = t np xp−1 (x − α1 )p ⋯ (x − αn )p with p a sufficiently large prime integer. We have f (x) ∈ ℝ[x], since the αi are algebraic numbers and the elementary symmetric polynomials in α1 , … , αn are rational numbers. Let I(z1 ) be defined as in the proof of Theorem 12.45, and now let J = I(α1 ) + ⋯ + I(αn ). From (1) in the proof of Theorem 12.45 and (12.2) we get m

m

n

J = −q ∑ f (j) (0) − ∑ ∑ f (j) (αk ), j=0

j=0 k=1

with m = (n + 1)p − 1. Now, ∑nk=1 f (j) (αk ) is a symmetric polynomial in tα1 , … , tαn with integer coefficients since the tαi are algebraic integers. It follows from the main theorem on symmetric polynomials that ∑m ∑n f (j) (αk ) is an integer. Further, f (j) (αk ) = 0 for j < p. Hence j=0 k=1 ∑m ∑n f (j) (αk ) is an integer divisible by p!. j=0 k=1

Now, f (j) (0) is an integer divisible by p! if j ≠ p − 1, and f (p−1) (0) = (p − 1)!(−t)np (α1 ⋯ αn )p

is an integer divisible by (p − 1)! but not divisible by p! if p is sufficiently large. In particular, this is true if p > |t n (α1 ⋯ αn )| and also p > q.

282 | 12 Transcendental numbers and the numbers e and π From (2) in the proof of Theorem 12.45 we get that |J| ≤ |α1 |e|α1 | |f |(|α1 |) + ⋯ + |αn |e|αn | |f |(|αn |) ≤ cp for some number c independent of n. As in the proof of Theorem 12.45, this gives us (p − 1)! ≤ |J| ≤ cp , that is, 1≤

|J| cp−1 ≤c . (p − 1)! (p − 1)!

This as before gives a contradiction, since transcendental.

cp−1 (p−1)!

→ 0 as p → ∞. Therefore, π is

12.6 An amazing property of π and a connection to prime numbers Definition 12.47. For a real variable s > 1 we define the Riemann zeta function by ∞

1 . s n n=1

ζ (s) = ∑

From the classical p-series test this will converge if s > 1 and hence will define a function. From the fundamental theorem of arithmetic each n can be expressed as a product of primes, and hence the zeta function can be written as the following product ζ (s) = ∏ (1 + p prime

1 1 + + ⋯ ), ps p2s

where the product runs over all prime numbers. However, the geometric series converges, so that 1+

1 1 1 + +⋯= . ps p2s 1 − p−s

Therefore ζ (s) = ∏ ( p prime

1 ). 1 − p−s

This product is called Euler product after Euler, who first used it in his investigations. 2 Euler further determined the exact value of ζ (2) and showed that it was π6 . Originally this was done by a clever use of certain trigonometric identities (see [19]). Subsequently Euler developed a method to determine the values of ζ (s) at all even integers.

12.6 An amazing property of π and a connection to prime numbers | 283

2

We first give a proof of the basic result that ζ (2) = π6 using a different approach. Some basic ideas from the theory of Fourier series are needed. Recall that a real or complex function f (x) is periodic of period L if f (x + L) = f (x) for all x. In the early 1800s P. Fourier (1768–1830) attempted to prove that any periodic function can be expressed as a trigonometric series that is a sum of sine functions and cosine functions. If f (x) is periodic of period 2L then its Fourier series is ∞

f = a0 + ∑ (an cos ( n=1

nπx nπx ) + bn sin ( )). L L

Using certain orthogonality relations between sines and cosines Fourier showed that if f (x) = f (x) then the coefficients a0 , an , bn must be given by 1 L ∫ f (x)dx, 2L −L nπx 1 L an = ∫ f (x) cos ( )dx, L −L L 1 L nπx bn = ∫ f (x) sin ( )dx, L −L l

a0 =

n = 1, 2, … , n = 1, 2, … .

The an , bn are called the Fourier coefficients. Fourier assumed that f (x) = f (x) but the situation was not definitively proved until the theory of Lebesgue integration was developed. What was then obtained is called the Fourier Convergence Theorem. Theorem 12.48 (Fourier Convergence Theorem (see [12])). Let f (x) be periodic of period 2L. Then: If both f (x) and f ′ (x), that is, if f is C 1 periodic, on (−L, L), then the Fourier series converges uniformly to f (x). The proof simply follows the standard arguments for the uniform convergence of sequences of real functions on closed subintervals of (−L, L). Therefore, a C 1 periodic function is everywhere represented by its Fourier series realizing Fourier’s original idea. We now give Euler’s result using Fourier series. Theorem 12.49. ζ (2) =

π2 . 6

Proof. Let f (x) = x2 , −π < x < π and then continued periodically with period 2π. This is continuous everywhere and differentiable everywhere except at integer multiples of π. Therefore by the Fourier convergence theorem it is everywhere represented by its Fourier series.

284 | 12 Transcendental numbers and the numbers e and π We apply the formulas. First, f (x) is an even function so there are only cosine terms and hence bn = 0 for all n. Then a0 = and an =

π2 1 π 2 ∫ x dx = 2π −π 3

1 π 2 4 ∫ x cos(nx)dx = (−1)n 2 π −π n

using integration by parts and the fact that cos(nπ) = (−1)n . Therefore the Fourier series for f (x) is given by x2 =

n

(−1) π2 + 4 ∑ 2 cos(nx), 3 n=1 n ∞

−π < x < π.

Now let x = π and place this value into the Fourier expansion. Then π2 =

n

(−1) π2 + 4 ∑ 2 cos(nπ). 3 n=1 n ∞

But cos(nπ) = (−1)n so π2 =

n

(−1) π2 + 4 ∑ 2 (−1)n 3 n=1 n ∞

⇒ π2 =

1 π2 π2 +4∑ 2 = + 4ζ (2) 3 3 n=1 n

⇒ ζ (2) =

∞

π2 . 6

Corollary 12.50. There are infinitely many prime numbers. Proof. Assume that there are only finitely many prime numbers p1 , p2 , … , pn . Then n

ζ (2) = ∏ ( i=1

Now ∏ni=1 ( 1−p1 −s ) is a rational number, but i

there are infinitely many primes.

1 π2 = . ) 6 1 − p−s i π2 6

not. This gives a contradiction and hence,

Exercises 1.

(a) Proof the binomial formula n

n (a + b)n = ∑ ( )ak bn−k k=0 k for a, b ∈ ℝ.

Exercises | 285

2.

(b) Let M be a set with n elements, n ∈ ℕ0 . Show that the number of subsets of M with k, 0 ≤ k ≤ n, elements is ( nk ). Consider the sequences 1 n en = (1 + ) n

1 n+1 and en∗ = (1 + ) , n

n ∈ ℕ.

Show that: (a) The sequence (en )n∈ℕ is monotonically increasing. (b) The sequence (en∗ )n∈ℕ is monotonically decreasing. (c) The sequences (en )n∈ℕ and (en∗ )n∈ℕ converge with lim e n→∞ n 3.

= lim en∗ =∶ e. n→∞

The number e is called the Euler number. Consider the sequences n

1 , k! k=0

En = ∑

(En )n∈ℕ0 ,

n ∈ ℕ0

and (En∗ )n∈ℕ , 0

En∗ = En +

1 , n ⋅ n!

n ∈ ℕ.

Show that lim E n→∞ n

= lim En∗ = e n→∞

the Euler number. 4. Show that n n n! ≤ ( ) 2 5.

for n ∈ ℕ, n ≥ 6. Show that 1

∫ t x−1 (1 − t)−x dt = 0

6.

π sin(πx)

for x ∈ ℝ, 0 < x < 1. The Gamma-Function is defined by ∞

Γ(x) = ∫ t x−1 e−t dt 0

for x ∈ ℝ, x > 0.

286 | 12 Transcendental numbers and the numbers e and π (a) Show that Γ(x + 1) = xΓ(x), and especially Γ(1) = 1

and Γ(n + 1) = n!

for n ∈ ℕ0 . (b) Show that ∞ t x−1 1 1 = dx ∫ k x Γ(x) 0 ex − 1 k=1 ∞

ζ (x) = ∑ for x ∈ ℝ, x > 1. (Hint: Show first ∞

∫ t x−1 e−kt dt = 0

7.

Γ(x) .) kx

Show that 1 Γ( ) = √π. 2

8. (a) Prove Euler’s identity for the cotangent function π cot(πx) =

∞

1 2x +∑ , x n=1 x 2 − n2

x ∈ ℂ ⧵ ℤ.

Recall that, as usual, cot(πx) is defined as cot(πx) = the definition of sin(z) and cos(z) for z ∈ ℂ). (b) Use (a) to show that π cot(πx) =

∞

1 − 2 ∑ ζ (2k)x 2k−1 , x k=1

cos(πx) sin(πx)

(see Chapter 10 for

x ∈ ℂ ⧵ ℤ,

by expanding the quotient inside the sum sign in (a) as a geometric series and interchanging the order of summation. (c) The Bernoulli numbers Bk , k ∈ ℕ, are defined by the power series identity t tk = 1 + ∑ Bk , t k! e −1 k=1 ∞

t ≠ 0.

The Bk are rational numbers with B1 = − 21 and Bk = 0 for all odd integers k ≥ 3. Use the definition to show that t

t

t 2

− 2t

e 2 + e− 2 e −e

2k−1

B t 2 + 2 ∑ 2k , t (2k)! k=1 ∞

=

t ≠ 0,

Exercises | 287

and π cot(πx) = πi

e e

2πix 2 2πix 2

+ e− − e−

2πix 2 2πix 2

=

∞ (2πi)2k B2k 2k−1 1 +∑ x , x k=1 (2k)!

x ∈ ℂ ⧵ ℤ.

(d) Conclude from (a)–(c) Euler’s formula (1735) that ζ (2k) = (−1)k−1 9.

(2π)2k B . 2(2k)! 2k

Tell whether the following α ∈ ℂ are algebraic or transcendental over the given field: (a) α = √π over ℚ(π). (b) α = π 2 over ℚ. (c) α = e2 over ℚ(e3 ).

13 Compass and Straightedge Constructions and the Classical Problems 13.1 Historical remarks Greek mathematicians in the classical period posed the problem of constructing certain geometric figures in the Euclidean plane using only a straightedge and a compass. These are known as geometric construction problems. Recall from elementary geometry that using a straightedge and compass it is possible to draw a line parallel to a given line segment through a given point, to extend a given line segment, and to erect a perpendicular to a given line at a given point on that line. There were other geometric construction problems that the Greeks could not determine straightedge and compass solutions but on the other hand were never able to prove that such constructions were impossible. In particular there were four famous insolvable (to the Greeks) construction problems. The first is the squaring of the circle. This problem is, given a circle, to construct using straightedge and compass a square having area equal to that of the given circle. The second is the doubling of the cube. This problem is, given a cube of given side length, to construct using a straightedge and compass, a side of a cube having double the volume of the original cube. The third problem is the trisection of an angle. This problem is to trisect a given angle using only a straightedge and compass. The final problem is the construction of a regular n-gon. This problems asks which regular n-gons could be constructed using only straightedge and compass. By translating each of these problems into the language of field extensions we can show that each of the first three problems are insolvable in general and we can give the complete solution to the construction of the regular n-gons.

13.2 Geometric constructions A geometric construction is a construction in the plane ℝ2 for which we only may use a compass and a straightedge using the following fundamental operations: (1) We may connect two given or constructed points with the straightedge and may draw the line through these two points. (2) We may draw with the compass a circle around a center, which is a given or an already constructed point, and with a radius r, which is the distance of two given or constructed points. We say that r is a given or already constructed length. (3) New constructed points arise from intersections of such lines and circles. Remark 13.1. A line or a line segment is given by two points; a circle is given by its center and its radius, that is, by two or three points, a length is given by two points and an angle is given by three points. DOI 10.1515/9783110516142-013

290 | 13 Compass and straightedge constructions and the classical problems Therefore, in each geometric construction problem the initial data are points, and the objects, which should be constructed, are again points. Before we translate the geometric constructions into the language of field extensions we describe by pictures two fundamental constructions which are always involved. Recall that a line always contains given or already constructed points. (1) Given a line g and a point P, we may construct the perpendicular line to g through P. (a) P ∉ g, see Figure 13.1. (b) P ∈ g, see Figure 13.2. (2) Given a line g and a point P ∉ g, we may construct the line through P and parallel to g, see Figure 13.3.

Figure 13.1: Construction of perpendicular line to g with P ∉ g.

Figure 13.2: Construction of perpendicular line to g with P ∈ g.

13.2 Geometric constructions | 291

Figure 13.3: Construction the line through P and parallel to g with P ∉ g.

Figure 13.4: Perpendicular line to the x-axis through α.

For the remainder of this chapter we make the convention that the points (a, b) ∈ ℝ2 are considered as complex numbers a + ib and the plane ℝ2 as the complex plane (Euclidean plane) ℂ. We also assume that the initial data in a construction problem contains at least two points, and we always assume that 0 and 1 are contained in the initial quantity. Remark 13.2. Since we assume that 0 and 1 are in the initial quantity, then we may construct α if α ∈ ℂ is given using the above construction of the perpendicular line to the x-axis through α, see Figure 13.4. We now define constructible numbers. Definition 13.3. Let M ⊂ ℂ with 0, 1 ∈ M. We say that β is constructable from M if we get β from M in finitely many construction steps as described above. The following theorem characterizes the constructible numbers as those complex numbers that can be constructed from a given set of constructible complex numbers by a finite chain of quadratic field extensions. Theorem 13.4. Let 0, 1, α1 , … , αm be given points, and let β be another point. Then the following are equivalent: (i) Starting with 0, 1, α1 , … , αm , then β is constructable from {0, 1, α1 , … , αm }.

292 | 13 Compass and straightedge constructions and the classical problems (ii) There is a chain K0 ⊂ K1 ⊂ ⋯ ⊂ Ks of fields with the following properties: (a) K0 = ℚ(α1 , α2 , … , αm , α1 , α2 , … , αm ), (b) |Ki ∶ Ki−1 | = 2 for i = 1, 2, … , s, (c) β ∈ Ks . Proof. Suppose (i). To show (ii) it is enough to prove the following. If we get a new point δ from given (or already constructed) point γ1 , γ2 , … , γr by (1) intersection of two lines or (2) intersection of a line and a circle or (3) intersection of two circles, then either δ ∈ ℚ(γ1 , γ2 , … , γr , γ1 , γ2 , … , γr ) or δ is in a field extension of degree 2 over ℚ(γ1 , γ2 , … , γr , γ1 , γ2 , … , γr ). Let L ∶= ℚ(γ1 , γ2 , … , γr , γ1 , γ2 , … , γr ). If δ ∈ L then also δ ∈ L. If |L(δ) ∶ L| = 2 then mδ (x) = x2 + ax + b ∈ L(x), where mδ (x) is the minimal polynomial of δ over L (this is the normed irreducible polynomial over L which has δ as a zero). Now, δ is a zero of x2 + ax + b ∈ L[x], that is, |L(δ, δ) ∶ L(δ)| ≤ 2, and we continue with L(δ, δ). Case 1. Let γ1 , γ2 , γ3 , γ4 be given points with γ1 ≠ γ2 , γ3 ≠ γ4 , and let δ be the intersection of the line ℓ1 = γ1 γ2 through γ1 , γ2 and the line ℓ2 = γ3 γ4 through γ3 , γ4 . We may ℓ1 and ℓ2 describe as ℓ1 ∶ z = γ1 + t(γ2 − γ1 ),

ℓ2 ∶ z = γ3 + u(γ4 − γ3 ),

t ∈ ℝ,

and

u ∈ ℝ.

The transition to the conjugates gives: z = γ1 + t(γ2 − γ1 )

and z = γ3 + u(γ4 − γ3 ),

respectively. The elimination of t and u now gives z − γ1

γ2 − γ1

=

z − γ1 γ2 − γ1

and

z − γ3

γ4 − γ3

=

z − γ3 , γ4 − γ3

respectively. This is a linear system of equations in z and z with coefficients from the field K = ℚ(γ1 , γ2 , γ3 , γ4 , γ1 , γ2 , γ3 , γ4 ). Because δ ∈ ℓ1 ∩ ℓ2 , then z = δ and z = δ is a solution of this system, and therefore δ, δ ∈ K. Case 2. Let ℓ be a line through α, γ (α ≠ γ), and let δ be the intersection of ℓ with a circle with center η and radius r. Then we have δ = α + t(γ − α),

t∈ℝ

and |δ − η| = r.

13.2 Geometric constructions | 293

Hence (α + t(γ − α) − η) ⋅ (α + t(γ − α) − η) = r 2 , and t is the zero of a quadratic equation. This means, that δ is in a field extension of ℚ(α, γ, η, α, γ, η) with degree less than or equal to 2. Case 3. δ is the intersection of two circles with centers η1 , η2 (η1 ≠ η2 ) and radii r1 , r2 . Then (δ − ηi )(δ − ηi ) = ri2

for i = 1, 2,

that is, δδ − (δηi + δηi ) + ηi ηi = ri2

for i = 1, 2.

(13.1)

If we subtract these two equations from each other we get δ(η2 − η1 ) + δ(η2 − η1 ) + η1 η1 − η2 η2 = r12 − r22 . Because η1 ≠ η2 we may eliminate δ. From (13.1) we get a quadratic equation for δ, that is, δ is contained in a field extension of ℚ(η1 , η2 , η1 , η2 , r1 , r2 ) with degree less than or equal 2. (ii) ⇒ (i): Let 0, 1, α1 , … , αm be given. (1) All elements from K0 = ℚ(α1 , α2 , … , αm , α1 , α2 , … , αm ) are constructable. (a) If we have αi then we get αi as already shown. (b) If we have α and β then we also have α ± β, see Figure 13.5. (analogously for α − β). Recall that we may construct the line through a point which is parallel to a given line. (c) If we have line segments of length a and b, a ≠ 0 ≠ b, then we have a line segment of length a ⋅ b, see Figure 13.6. From the intercept theorem we get 1 b = , a x that is, x = a ⋅ b.

Figure 13.5: Given α and β then we have α + β.

294 | 13 Compass and straightedge constructions and the classical problems

Figure 13.6: Line segment of length a ⋅ b.

Figure 13.7: Line segment of length a1 .

(d) If we have a line segment of length a ≠ 0, then we have a line segment of length a1 , see Figure 13.7. From the intercept theorem we get 1 x = , a 1 that is, x = a1 . (e) We may add constructable angles, see Figure 13.8.

Figure 13.8: Add constructable angles.

Analogously we may subtract angles. As a consequence so far we have: Let α = reiφ ≠ 0. Then α constructable ⇔ r and eiφ constructable. (f) From (c), (d) and (e) we get: If α and β are constructable then also αβ and α−1 if α ≠ 0.

13.2 Geometric constructions | 295

Fazit. All elements of K0 are constructable. (2) Let already each element of Ki−1 be constructable, and let α ∈ Ki . If α ∈ Ki−1 , then there is nothing to prove. Now let α ∉ Ki−1 . Since |Ki ∶ Ki−1 | = 2 we get that α is a zero of a polynomial mα (x) = x2 + ϵx + η with ϵ, η ∈ Ki−1 . Hence, ϵ 2 ϵ 2 α2 + ϵα + ( ) + η − ( ) = 0, 2 2 that is, ϵ 2 ϵ 2 (α + ) = ( ) − η. 2 2 Hence it is enough to show: If β ∈ Ki−1 , then √β is constructable. (a) We may cut an angle in half, see Figure 13.9.

Figure 13.9: Cut an angle in half.

(b) We may extract square roots from real numbers a > 0, see Figure 13.10.

Figure 13.10: Extract square roots from real numbers a > 0.

By the altitude theorem we get h2 = 1 ⋅ a = a, This altogether proves Theorem 13.4.

hence h = √a.

296 | 13 Compass and straightedge constructions and the classical problems We leave the proofs for the intercept theorems and the altitude for the exercises (see also Chapter 14). From the proof of the previous theorem and the fact that each field extension in the chain is quadratic it follows that the degree of the minimal polynomial for any constructible number β must be a power of 2. Corollary 13.5. Let β ∈ ℂ be constructable from ℚ (or equivalently from 0 and 1), and let mβ (x) ∈ ℚ[x] be the minimal polynomial of β over ℚ (mβ (x) is the normed irreducible polynomial over ℚ with mβ (β) = 0). Then deg(mβ (x)) is a power of 2. Proof. By Theorem 13.4 there is a field K with |K ∶ ℚ| = 2t for some t ∈ ℕ ∪ {0} and β ∈ K. By the degree formula (Theorem 7.7 in Chapter 7) we get that |ℚ(β) ∶ ℚ| is a power of 2. Because |ℚ(β) ∶ ℚ| = deg(mβ (x)) (see Theorem 7.18 of Chapter 7) then deg(mβ (x)) is a power of 2.

13.3 Four classical construction problems We now consider the four classical construction problems mentioned in Section 13.1; the squaring of the circle, the doubling of the cube, the trisection of an angle and the construction of a regular n-gon. We prove that the first three are impossible and then give the complete solution to the fourth. We always assume that our initial quantity is {0, 1}.

13.3.1 Squaring the circle (problem of Anaxagoras 500–428 BC) Theorem 13.6. It is impossible in general to square the circle. This means, it is impossible in general, given a circle, to construct using only straightedge and compass a square having area to that of a given circle. Proof. Suppose the given circle has radius 1. It is then constructable and would have an area of π. A corresponding square would have a side of length √π. To be constructable, a number α must have |ℚ(α) ∶ ℚ| = 2m < ∞ for some m ∈ ℕ ∪ {0}, and hence α must be algebraic. However, π is transcendental (see Chapter 12), so √π is also transcendental and therefore not constructable.

13.3.2 The doubling of the cube or the problem from Deli Theorem 13.7. It is impossible in general to double the cube. This means, it is impossible in general, given a cube of given side length, to construct using only a straightedge and compass, a side of a cube having double the volume of the original cube.

13.3 Four classical construction problems | 297

Proof. Let the given side length be 1, so that the original volume is also 1. To double 3 3 this we would have to construct a side of length √2. However |ℚ(√2) ∶ ℚ| = 3 since 3 3 3 the minimal polynomial of √2 over ℚ is x − 2. This is not a power of 2, so √2 is not constructable.

13.3.3 The trisection of an angle Theorem 13.8. It is impossible in general to trisect an angle. This means, that it is impossible in general to trisect a given angle using only a straightedge and compass. Proof. An angle α is constructable if and only if eiα is constructable. Since eiα = cos(α) + i sin(α) and sin2 (α) + cos2 (α) = 1 we get that α is constructable if and only if cos(α) is constructable. Since cos( π3 ) = 21 we get that π3 is constructable. We show that it cannot be trisected by straightedge and compass. The following trigonometric identity holds cos(3α) = 4 cos3 (α) − 3 cos(α). Hence we have to solve the equation 4x3 − 3x − cos(3α) = 0. With α = π9 we get 3α = π3 = 60∘ , that is, cos(3α) = 21 . From the above identity we have 4 cos3 (α) − 3 cos(α) −

1 =0 2

or

8 cos3 (α) − 6 cos(α) − 1 = 0,

and we have to solve the equation 8x3 − 6x − 1 = 0. From the criterion for polynomials of degree 2 or 3 we get that f (x) = 8x 3 − 6x − 1 is irreducible over ℚ (see Chapter 6). It follows |ℚ(cos( π9 )) ∶ ℚ| = 3, and hence cos( π9 ) is not constructable. Therefore, π3 is constructable, but it cannot be trisected.

298 | 13 Compass and straightedge constructions and the classical problems 13.3.4 Construction of a regular n-gon The final construction we consider is the construction of regular n-gons. The algebraic study of the constructibility of regular n-gons was initiated by C. F. Gauss (1777–1855) in the early part of the 19th century. Starting with 0 and 1 we want to construct a regular n-gon with straightedge and compass only. Notice first, that a regular n-gon will be constructable for n ≥ 3 2πi

if and only if e n is constructable. We remark that a 2-gon of length 1 certainly is constructable. We first consider the case that n = p is an odd prime number. 2πi

Let α = e p . By Corollary 13.5 we have that |ℚ(α) ∶ ℚ| is a power of 2. Further α is a zero of the polynomial f (x) =

xp − 1 = xp−1 + xp−2 + ⋯ + x + 1 x−1

which is irreducible over ℚ (see Chapter 6), especially f (x) = mα (x) over ℚ. By Corollary 13.5 we must have that deg(f (x)) = p − 1 is a power of 2. Hence, p = 2n + 1, and p is s a Fermat prime number, that is, p = 22 + 1 for some s ∈ ℕ ∪ {0} (see Chapter 4). The only known Fermat prime numbers so fare are 3, 5, 17, 257 and 65 537 for s = 0, 1, 2, 3 and 4, respectively. The necessary condition, that p is a Fermat prime number, is also sufficient. Theorem 13.9. Let p be an odd prime number. Then the regular p-gon is, starting with 0 and 1, constructable with straightedge and compass if and only if p is a Fermat prime number. Proof. We already saw that p must be a Fermat prime number if the regular p-gon is constructable. s Now, we assume that p = 22 + 1 is a Fermat prime number. We have to show that 2πi

the regular p-gon is constructable (starting with 0 and 1). We have to show that α = e p is constructable with straightedge and compass. As already mentioned, α is a zero of the polynomial f (x) = xp−1 + x p−2 + ⋯ + x + 1 which is irreducible over ℚ. The zeros of f (x) in ℂ are exactly the elements α, α2 , … , αp−1 . Therefore L = ℚ(α) is the splitting field of f (x) over ℚ. Let G = Aut(L, ℚ) be the set of automorphisms of L (which fix ℚ). In fact G is the set of automorphisms σm ∶ α ↦ αm with 1 ≤ m ≤ p − 1. Via σm ↦ m = m + pℤ we get that G is isomorphic to ℤ∗p , the multiplicative group of the prime field ℤp . Hence G = ⟨σ⟩ is cyclic of order p − 1 generated by some automorphism σ ∈ G.

13.3 Four classical construction problems | 299

We define Gi = ⟨σ 2i ⟩ for i = 0, 1, … , p−1 = 22 −1 . 2 So we have a chain s

{1} = G0 ⊂ G1 ⊂ ⋯ ⊂ G p−1 = G 2

of cyclic groups (with |Gi ∶ Gi−1 | = 2 for i = 1, 2, … , p−1 ). 2

Now define K0 = L and Ki = {φ ∈ L ∣ φ(a) = a for all φ ∈ Gi } for i = 1, 2, … , p−1 . We 2 remark that ℚ = K p−1 . So we get a chain 2

L = K0 ⫌ K1 ⫌ ⋯ ⫌ K p−1 = ℚ. 2

. Since |L ∶ ℚ| = deg(f (x)) = 22 we must have |Ki ∶ Ki−1 | = 2 for i = 1, 2, … , p−1 2 By Theorem 13.4 we have that each element from L is constructable with straights

edge and compass, hence, especially α = e

2πi p

and therefore the regular p-gon.

Remark 13.10. Theorem 13.9 is of technical interest, and a general construction seems to be complicated. But for p = 3, 5 and 17 the construction is quite direct and easy. If p = 3 then the construction is known from the school, see Figure 13.11.

Figure 13.11: Construction from the school.

If p = 5 then the construction can be done with help of the Golden section (see Chapter 4). We remark that cos( 2π ) = 41 (√5 − 1). 5 Now, let p = 17. As already mentioned the regular 17-gon is constructable if and only if cos( 2π ) is constructable. 17 2π

So we have to construct the number cos( 2π ) = 21 (α + α−1 ) where α = e 17 . 17 First, construct the positive zero α1 of the polynomial x 2 + x − 4, we get 1 α1 = (√17 − 1) = α + α−1 + α2 + α−2 + α4 + α−4 + α8 + α−8 . 2

Then, construct the positive zero α2 of the polynomial x 2 − α1 x − 1. We get 1 α2 = (√17 − 1 + √34 − 2√17) 4 = α + α−1 + α4 + α−4 .

300 | 13 Compass and straightedge constructions and the classical problems From α1 and α2 construct β = 21 (α22 − α1 + α2 − 4). Then α3 = 2 cos( 2π ) is the biggest of the 17

two positive zeros of the polynomial x2 − α2 x + β. Finally we get cos(

1 2π ) = (−1 + √17 + √34 − 2√17) 17 16 +

1 √ (2 17 + 3√17 − √34 − 2√17 − 2√34 + 2√17). 16

Corollary 13.11. Let n ∈ ℕ, n ≥ 2. Starting with 0 and 1, the regular n-gon is constructable with straightedge and compass if and only if (i) n = 2m , m ≥ 1, or (ii) n = 2m p1 p2 ⋯ pr , m ≥ 0, and the p1 , p2 , … , pr are pairwise different Fermat prime numbers. Proof. Certainly, a regular 2m -gon, m ∈ ℕ, m ≥ 1, is constructable. We first make some reduction steps. Let n = n1 n2 with 2 ≤ n1 , n2 and gcd(n1 , n2 ) = 1. If the regular n1 -gon and the regular n2 -gon is constructable, then also the regular n1 n2 -gon is constructable. This follows directly from the trigonometric identities cos(

2(n + n2 )π 2π 2π + ) = cos( 1 ) n1 n2 n1 n2 2π 2π 2π 2π = cos( ) ⋅ cos( ) − sin( ) ⋅ sin( ) n1 n2 n1 n2

and sin(

2(n + n2 )π 2π 2π + ) = sin( 1 ) n1 n2 n1 n2 2π 2π 2π 2π = sin( ) ⋅ cos( ) + cos( ) ⋅ sin( ). n1 n2 n1 n2

On the other side, if n = n1 n2 , 2 ≤ n1 , n2 , and if the regular n1 n2 -gon is constructable, then certainly the regular n1 -gon and the regular n2 -gon is constructable. We have just to connect each n2 -th and n1 -th vertex, respectively. This gives the reduction to prime powers. If n = p is a prime number then the result follows from Theorem 13.9. If n = pα , α ≥ 2, then n = p2 ⋅ pα−2 , and therefore, by the above reduction step, we have reduced the problem to the case n = p2 , p an odd prime number. So, let n = p2 , p an odd prime number. Assume, we may construct a regular p2 -gon, 2πi

and hence α = e p2 . α is a zero of the polynomial f (x) =

2

xp − 1 x − 1 ⋅ = xp(p−1) + x p(p−1)−1 + ⋯ + x p + 1 x − 1 xp − 1

f (x) is irreducible over ℚ because f (xp ) = f (y), y = x p , is irreducible over ℚ. Therefore |ℚ(α) ∶ ℚ| = p(p − 1). But p(p − 1) is not a power of 2 which contradicts the necessary

Exercises | 301

condition in Corollary 13.5. Hence, the regular p2 -gon, p and odd prime number, is not constructable. This proves Corollary 13.11.

Exercises 1. 2.

Give the details for the fundamental construction in Remark 13.1 and Remark 13.2. Let ϕ be a given angle. In which of the following cases is the angle ψ constructible from the angle ϕ by compass an straightedge? π π (a) ϕ = 13 , ψ = 26 ; (b) ϕ = (c) ϕ =

π π , ψ = 11 ; 33 π π , ψ = 12 . 7 2πi

3. Let μ = e n be a primitive n-th root of unity. Show that μ + μ1 = 2 cos( 2π ). n 4. Using the described hints, construct in detail the regular 17-gon with compass and straightedge. 5. Show that the diagonal of a golden rectangle of height 1 and width the golden ratio (see Section 4.1.1) is constructable with compass and straightedge. 6. Prove the intercept theorems (theorems of intersecting lines) and the altitude theorem. 7. The splitting field 𝒦n of the polynomial xn − 1 ∈ ℚ[x] with n ≥ 2 is called the n-th cyclotomic field. Show that (a) 𝒦n , n ≥ 2, is the splitting field of the cyclotomic polynomial ϕn (x) =

(b)

∏

k

1≤k≤n gcd(k,n)=1

(x − e2πi n ) ∈ ℚ[x].

We remark (see [14]) that ϕn (x) ∈ ℤ[x] and ϕn (x) is irreducible over ℚ. ϕn (x) =

xn − 1 . ∏ d∣n ϕd (x) 0 0, of t0 because f is continuous, see Figure 14.1.

Figure 14.1: Continuous f .

Hence ⟨f , f ⟩ > 0 if f ≠ 0.

14.1 Length and angle

| 305

V is an infinite dimensional ℝ-vector space. For instance, the following functions fi ∶ [a, b] → ℝ, t ↦ t i , i = 0, 1, 2, … , n, … are linearly independent. Definition 14.4. Let V be an Euclidean vector space. The length of v⃗ ∈ V is defined as follows, ⃗ ‖v‖⃗ ∶= √⟨v,⃗ v⟩. Remarks 14.5. (1) From the definition we see that ⃗ ‖v‖⃗ 2 = ⟨v,⃗ v⟩. If V = ℝn is equipped with the canonical scalar product, then ‖x‖⃗ 2 = x⃗ ⋅ x⃗ = x 2⃗ by our agreement. (2) Direct calculations, just using the definition, leads to the equation 1 ⟨v,⃗ w⟩⃗ = (‖v‖⃗ 2 + ‖w‖⃗ 2 − ‖v⃗ − w‖⃗ 2 ) 2 for v,⃗ w⃗ ∈ V . Theorem 14.6 (Cauchy–Schwarz-Inequality). Let V be an Euclidean vector space. Then ⃗ ⟨v,⃗ w⟩⃗ 2 ≤ ⟨v,⃗ v⟩⃗ ⋅ ⟨w,⃗ w⟩, or in other words, ⃗ ≤ ‖v‖‖ ⃗ w‖⃗ |⟨v,⃗ w⟩| for all v,⃗ w⃗ ∈ V . Equality holds if and only if v⃗ and w⃗ are linearly dependent. Proof. (a) If v⃗ = 0⃗ then ⟨v,⃗ w⟩⃗ = 0⃗ = ⟨v,⃗ v⟩⃗ for all w⃗ ∈ V , and hence, Theorem 14.6 is correct with equality. Analogously we may argue for w⃗ = 0.⃗ (b) Let v⃗ ≠ 0⃗ ≠ w.⃗ For arbitrary λ ∈ ℝ we have ⃗ (v⃗ − λw)⟩ ⃗ = ⟨v,⃗ v⟩⃗ − 2λ⟨v,⃗ w⟩⃗ + λ2 ⟨w,⃗ w⟩. ⃗ 0 ≤ ⟨(v⃗ − λw), If we choose λ =

⟨v,⃗ w⟩⃗ , ⟨w,⃗ w⟩⃗

we get 0 ≤ ⟨v,⃗ v⟩⃗ − 2

⟨v,⃗ w⟩⃗ 2 ⟨v,⃗ w⟩⃗ 2 + ⋅ ⟨w,⃗ w⟩⃗ ⟨w,⃗ w⟩⃗ ⟨w,⃗ w⟩⃗ 2

306 | 14 Euclidean vector spaces

= ⟨v,⃗ v⟩⃗ −

⟨v,⃗ w⟩⃗ 2 , ⟨w,⃗ w⟩⃗

and hence 0 ≤ ⟨v,⃗ v⟩⃗ ⋅ ⟨w,⃗ w⟩⃗ − ⟨v,⃗ w⟩⃗ 2 because ⟨w,⃗ w⟩⃗ > 0. (c) Let v,⃗ w⃗ be linearly independent. Then, w⃗ ≠ 0,⃗ v⃗ ≠ 0⃗ and ⃗ (v⃗ − λw)⟩ ⃗ 0 < ⟨(v⃗ − λw), for each λ ∈ ℝ. Hence, in (b) we have always a strict inequality “ 0 ⇒ ‖v‖⃗ > 0. ⃗ (2) ‖αv‖⃗ = √⟨αv,⃗ αv⟩⃗ = √α2 ⟨v,⃗ v⟩⃗ = |α|√⟨v,⃗ v⟩.

⃗ 2 ⇔ ⟨v,⃗ w⟩⃗ ≤ ‖v‖‖ ⃗ w‖⃗ because ‖v‖2 = (3) ‖v⃗ + w‖⃗ ≤ ‖v‖⃗ + ‖w‖⃗ ⇔ ‖v⃗ + w‖⃗ 2 ≤ (‖v‖⃗ + ‖w‖) 2 ⃗ ⟨v,⃗ v⟩⃗ and ‖w‖⃗ = ⟨w,⃗ w⟩. ⃗ w‖⃗ which holds from the Hence, we put down (3) to the inequality ⟨v,⃗ w⟩⃗ ≤ ‖v‖‖ Cauchy–Schwarz-Inequality (see Theorem 14.6). Corollary 14.9. Let V be an Euclidean vector space. Then, ‖v⃗ + w‖⃗ = ‖v‖⃗ + ‖w‖⃗ if and only if w⃗ = 0⃗ or v⃗ = λw⃗ for some λ ∈ ℝ, λ ≥ 0, that is, v⃗ and w⃗ have the same direction. Proof. From Theorem 14.8 and its proof we know that ⃗ w‖. ⃗ ‖v⃗ + w‖⃗ = ‖v‖⃗ + ‖w‖⃗ ⇔ ⟨v,⃗ w⟩⃗ = ‖v‖‖ Hence, necessarily for the equality in Corollary 14.9 is that v⃗ and w⃗ are linearly dependent, that is, w⃗ = 0⃗ or v⃗ = λw,⃗ λ ∈ ℝ. ⃗ that is, λ = |λ| ≥ 0. If If w⃗ ≠ 0⃗ we must have further ⟨λw,⃗ w⟩⃗ = λ⟨w,⃗ w⟩⃗ = |λ|⟨w,⃗ w⟩, on the other side v⃗ = λw⃗ for some λ ≥ 0, then ⟨λw,⃗ w⟩⃗ = λ⟨w,⃗ w⟩⃗ = |λ|⟨w,⃗ w⟩⃗ and ⟨v,⃗ w⟩⃗ = ⃗ = ‖v‖‖ ⃗ w‖. ⃗ If w⃗ = 0⃗ there is nothing to show. |⟨v,⃗ w⟩| Remark 14.10. If v⃗ = λw⃗ with λ < 0, then ‖v⃗ + w‖⃗ = ‖λw⃗ + w‖⃗ = ‖(1 + λ)w‖⃗ = ‖(1 − |λ|)w‖⃗ = |1 − |λ||‖w‖⃗ ⃗ ≠ ‖v‖⃗ + ‖w‖⃗ = ‖λw‖⃗ + ‖w‖⃗ = (1 + |λ|)‖w‖. We now use length to define a distance function on an Euclidean vector space. Definition 14.11. Let V be an Euclidean vector space. The distance between two vectors v,⃗ w⃗ ∈ V is ⃗ ‖v⃗ − w‖⃗ = √⟨v⃗ − w,⃗ v⃗ − w⟩, see Figure 14.2. Theorem 14.12. The following hold: (1) ‖v⃗ − w‖⃗ ≥ 0 and ‖v⃗ − w‖⃗ = 0 ⇔ v⃗ = w.⃗ ⃗ (2) ‖v⃗ − w‖⃗ = ‖w⃗ − v‖. ⃗ (3) ‖v⃗ − w‖⃗ ≤ ‖v⃗ − u‖⃗ + ‖u⃗ − w‖.

308 | 14 Euclidean vector spaces

Figure 14.2: Distance between two vectors v,⃗ w ⃗ ∈ V .

Proof. This follows directly from Theorem 14.8. In (3) we write v⃗ − w⃗ = v⃗ − u⃗ + u⃗ − w⃗ and use the triangle inequality. To complete the extension of geometric properties to an Euclidean vector space we now define angles. To do this we use the Cauchy–Schwarz inequality. Remark 14.13. Let V be an Euclidean vector space and v⃗ ≠ 0⃗ ≠ w.⃗ By the Cauchy– Schwarz-Inequality we have −1 ≤

⟨v,⃗ w⟩⃗ ≤ 1. ⃗ w‖⃗ ‖v‖‖

Hence there exists exactly one φ ∈ [0, π] with cos(φ) = angle.

⟨v,⃗ w⟩⃗ . Hence we define in ⃗ w‖⃗ ‖v‖‖

V an

Definition 14.14. Let V be an Euclidean vector space. The angle φ between two vectors v,⃗ w⃗ ∈ V ⧵ {0}⃗ is defined by φ = arccos(

⟨v,⃗ w⟩⃗ ), ⃗ w‖⃗ ‖v‖‖

φ ∈ [0, π].

Remark 14.15. Definition 14.14 equals to the usual conception for a right-angled triangle in a plane, see Figure 14.3.

Figure 14.3: Usual conception for a right-angled triangle in a plane.

It is cos(α) = with ‖c‖⃗ = c and ‖a‖⃗ = a.

‖a‖⃗ a = c ‖c‖⃗

14.2 Orthogonality and Applications in ℝ2 and ℝ3

|

309

Hence, let 0⃗ = (0, 0), a⃗ = (a, 0) and c⃗ = (a, h) be written in coordinates. Then ⟨a,⃗ c⟩⃗ = ⃗ a‖⃗ = c ⋅ a, therefore a and ‖c‖‖ 2

cos(α) =

⟨a,⃗ c⟩⃗ a2 a = = . ⃗ a⋅c c ‖a‖‖c‖

In ordinary Euclidean geometry orthogonality or perpendicularity plays an important role. We now extend this idea to Euclidean vector spaces. The following definition is motivated by ⟨v,⃗ w⟩⃗ = 0 ⇔ cos(φ) = 0 ⇔ φ =

π 2

(φ ∈ [0, π]). Definition 14.16. (a) Two vectors v,⃗ w⃗ ∈ V are called orthogonal, denoted by v⃗ ⟂ w,⃗ if ⟨v,⃗ w⟩⃗ = 0. (b) Let S, T be non-empty subsets of V . Then S ⟂ T means ⟨v,⃗ w⟩⃗ = 0 for all v⃗ ∈ S and w⃗ ∈ T. (c) S⟂ = {v⃗ ∣ v⃗ ∈ V, {v}⃗ ⟂ S}, S ≠ ∅, S ⊂ V , is called the orthogonal complement of S in V . ⃗ w‖, ⃗ that is, if φ = 0 or π. (d) Two vectors v,⃗ w⃗ ∈ V are called parallel if ±⟨v,⃗ w⟩⃗ = ‖v‖‖ In a first subsection we now want to make some geometric applications in the plane ℝ2 , and we show at the same time how meaningful the general definitions for length and angles are. Let here V = ℝ2 , we write ⟨v,⃗ w⟩⃗ for the canonical scalar product.

14.2 Orthogonality and Applications in ℝ2 and ℝ3

Figure 14.4: Orthogonality of vectors.

First we show that two vectors v⃗ and w⃗ are orthogonal in the geometric sense if and only if ⟨v,⃗ w⟩⃗ = 0 (see Figure 14.4).

310 | 14 Euclidean vector spaces ⃗ w⃗ = OP ⃗ and Let v⃗ = OP, ⃗ ⟂ OP ⃗ ⇔ ‖w⃗ + v‖⃗ = ‖w⃗ − v‖⃗ OP ⃗ (w⃗ + v)⟩ ⃗ = ⟨(w⃗ − v), ⃗ (w⃗ − v)⟩ ⃗ ⇔ ⟨(w⃗ + v), ⇔ 2⟨w,⃗ v⟩⃗ = −2⟨w,⃗ v⟩⃗ ⇔ ⟨w,⃗ v⟩⃗ = 0.

Gougu-Theorem, Theorem of Pythagoras In this section we show that the Pythagorean theorem, called the Gougu-Theorem in China where it was first discovered, is true in any Euclidean vector space.

Figure 14.5: Gougu-Theorem, Theorem of Pythagoras.

Let a = ‖v‖⃗ ≠ 0, b = ‖w‖⃗ ≠ 0, c = ‖v⃗ + w‖⃗ (see Figure 14.5). We have w⃗ ⟂ v⃗ ⇔ ⟨v,⃗ w⟩⃗ = 0 ⃗ (w⃗ + v)⟩ ⃗ = ⟨w,⃗ w⟩⃗ + ⟨v,⃗ v⟩⃗ ⇔ ⟨(w⃗ + v), ⇔ c2 = a2 + b2 .

Cosine rule

Figure 14.6: Cosine rule.

In elementary geometry the law of cosines is as c2 = a2 + b2 − 2ab cos(α), where the notations are as in Figure 14.6.

Cosine rule

| 311

The Pythagorean theorem is a special case of this for orthogonal vectors. The law of cosines is also true in any Euclidean vector space. Theorem 14.17 (Law of Cosines). Let V be an Euclidean vector Space and v,⃗ w⃗ ∈ V . Let α be the angle between v⃗ and w.⃗ Then ⃗ w‖⃗ cos(α). ‖v⃗ − w‖⃗ 2 = ‖v‖⃗ 2 + ‖w‖⃗ 2 − 2‖v‖‖ Proof. From the definition of angle we have cos(α) = ⃗ w‖⃗ cos(α). Now ‖v‖‖

⟨v,⃗ w⟩⃗ ⃗ w‖⃗ ‖v‖‖

and hence ⟨v,⃗ w⟩⃗ =

‖v⃗ − w‖⃗ 2 = ⟨v⃗ − w,⃗ v⃗ − w⟩⃗ = ⟨v,⃗ v⟩⃗ − 2⟨v,⃗ w⟩⃗ + ⟨w,⃗ w⟩⃗ ⃗ w‖⃗ cos(α) = ‖v‖⃗ 2 + ‖w‖⃗ 2 − 2‖v‖‖ from the statement above. This gives the law of cosines. We have ⃗ w‖⃗ cos(α). ⟨v,⃗ w⟩⃗ = ‖v‖‖ We determine c ∈ ℝ so that (w⃗ − cv)⃗ ⟂ v.⃗ From (w⃗ − cv)⃗ ⟂ v⃗ we get ⃗ v⟩⃗ = 0 ⇒ ⟨w,⃗ v⟩⃗ − ⟨cv,⃗ v⟩⃗ = 0 ⇒ c = ⟨(w⃗ − cv),

⟨w,⃗ v⟩⃗ . ⟨v,⃗ v⟩⃗

Hence for the angle in our definition cos(α) =

c‖v‖⃗ ⟨w,⃗ v⟩⃗ = , ⃗ w‖⃗ ‖w‖⃗ ‖v‖‖

⃗ w‖⃗ cos(α) as declared. that is, ⟨w,⃗ v⟩⃗ = ‖v‖‖ The vector cv⃗ is called the projection of w⃗ along v.⃗ A simple consequence is the sine rule for triangles in ℝ2 , see Figure 14.7.

Figure 14.7: Sine rule for triangles in ℝ2 .

We may explain sin(α) for α ∈ (0, π) via the cosine rule and the theorem of Pythagoras as h sin(α) = c . b Analogously sin(β) =

hc . a

312 | 14 Euclidean vector spaces It follows hc = b sin(β) = a sin(β). This we may also do for the heights ha and hb , and altogether we obtain the sine rule: a c b = = sin(α) sin(β) sin(γ)

for α, β, γ ∈ (0, π).

Theorem of Thales in ℝ2 We now give a theorem of Thales von Milet (ca. 624–547 BC) about inscribed angles in a circle. We consider Figure 14.8.

Figure 14.8: Theorem of Thales.

Let ‖v‖⃗ = r = ‖w‖⃗ ≠ 0, r the radius of the half circle around 0 through P. We get ⃗ (w⃗ − v)⟩ ⃗ = ⟨w,⃗ w⟩⃗ − ⟨v,⃗ v⟩⃗ = 0, ⟨(v⃗ + w), that is, ⃗ (v⃗ + w)⃗ ⟂ (v⃗ − w), that is, ⃗ ⟂ (−PQ) ⃗ or PQ

α=

π . 2

Theorem 14.18 (Theorem of Thales). The triangle △(−P)QP is rectangular.

Donkey bridge in ℝ2 Theorem 14.19 (The donkey bridge). The basis angles of an isosceles triangle are equal. Consider Figure 14.9.

The altitude theorem and the Cathetus theorem for rectangular triangles in ℝ2

| 313

Figure 14.9: Isosceles triangle.

Figure 14.10: Isosceles triangle.

Proof. Consider Figure 14.10. ⃗ ‖u‖⃗ = ‖v‖⃗ and We have ‖x‖⃗ = ‖y‖, ⟨u,⃗ x⟩⃗ = ⟨(a, 0), (a, h)⟩ = a2 = ⟨(−a, 0), (−a, h)⟩ = a2 ⇒ cos(α) = cos(β) ⇒ α = β,

since 0 ≤ α ≤

π . 2

The altitude theorem and the Cathetus theorem for rectangular triangles in ℝ2

Figure 14.11: Cathetus theorem for rectangular triangles.

Let b = ‖v‖⃗ ≠ 0, a = ‖w‖⃗ ≠ 0. We get (see Figure 14.11) ⃗ u⃗ − v‖⃗ cp = ‖w⃗ − v‖‖

314 | 14 Euclidean vector spaces ⃗ (w⃗ − v)⟩ ⃗ ⋅ ⟨(u⃗ − v), ⃗ (u⃗ − v)⟩ ⃗ = √⟨(w⃗ − v), ⃗ (u⃗ − v)⟩ ⃗ 2 = √⟨(w⃗ − v), ⃗ u⟩⃗ − ⟨⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⃗ 2 = √(⟨( w⃗ − v), w,⃗ v⟩⃗ +⟨v,⃗ v⟩) ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ =0

=0 2

2

= ‖v‖⃗ = b . Hence, we get the cathetus theorem. Theorem 14.20 (Cathetus theorem). cp = b2 and analogously cq = a2 . From the cathetus theorem we may also get the theorem of Pythagoras again: a2 + b2 = cq + cp = c(p + q) = c2 . Further we obtain h2 + p2 + h2 + q2 = a2 + b2 = c2 = (p + q)2 = p2 + 2pq + q2 , and hence 2h2 = 2pq. This gives the altitude theorem. Theorem 14.21 (Altitude theorem). h2 = pq.

Orthocenter of a triangle in ℝ2

Figure 14.12: Orthocenter of a Triangle.

We have the orthocenter rule (see Figure 14.12): ⃗ ⃗ ⟂ BC AE

⃗ ⇒ CF ⃗ ⟂ AB. ⃗ ⟂ AC ⃗ and BD

Calculation of the line through a point P = (x0 , y0 ) orthogonal to a vector n⃗ in ℝ2

| 315

⃗ = 0 ⇒ ⟨v,⃗ w⟩⃗ − ⟨v,⃗ u⟩⃗ = 0 Proof. (1) v⃗ ⟂ (w⃗ − u)⃗ ⇒ ⟨v,⃗ (w⃗ − u)⟩ ⃗ = 0 ⇒ ⟨v,⃗ w⟩⃗ − ⟨w,⃗ u⟩⃗ = 0. (2) w⃗ ⟂ (v⃗ − u)⃗ ⇒ ⟨w,⃗ (v⃗ − u)⟩ These leads to ⃗ u⟩⃗ = 0 ⇒ u⃗ ⟂ (v⃗ − w), ⃗ ⟨v,⃗ u⟩⃗ − ⟨w,⃗ u⟩⃗ = 0 ⇒ ⟨(v⃗ − w), that is ⃗ ⟂ AB. ⃗ CF

Calculation of the line through a point P = (x0 , y0 ) orthogonal to a vector n⃗ in ℝ2

Figure 14.13: Line through a point P = (x0 , y0 ) orthogonal to a vector n.⃗

⃗ n⟩⃗ = ⟨v,⃗ n⟩⃗ − ⟨w,⃗ n⟩⃗ = 0 ⇒ ⟨v,⃗ n⟩⃗ = ⟨w,⃗ n⟩⃗ ⇒ xa + yb = x0 a + y0 b ⟨(v⃗ − w), by the definition of the canonical scalar product (see Figure 14.13). If we set −c ∶= x0 a + y0 b then we get the equation ax + by + c = 0 for the line through P = (x0 , y0 ) orthogonal to the vector n.⃗ There are also applications for the space ℝ3 . Here we just want to describe a cosine rule for the angles between three vectors.

316 | 14 Euclidean vector spaces

Cosine rule for ℝ3 Let u,⃗ v,⃗ w⃗ ∈ ℝ3 be three linearly independent vectors which pairwise generate planes and which form pairwise angles a, b, c. These planes intersect in lines and meet pairwise in the angles α, β, γ, see Figure 14.14.

Figure 14.14: Cosine rule for the angles between three vectors.

Then we have the following: Let u,⃗ v,⃗ w,⃗ a, b, c, α, β, γ be given as in the picture. Then cos(a) = cos(b) ⋅ cos(c) + sin(b) ⋅ sin(c) ⋅ cos(α). Proof. We first determine the angle α, that is, the angle between the two planes which are generated by v,⃗ w⃗ and v,⃗ u⃗ respectively. This angle is equal the angle between two vectors in these planes, respectively, and which are orthogonal to the intersection line of these planes. Such vectors are for instance the perpendicular vectors of the vectors u⃗ and w⃗ on the vectors v.⃗ Without loss of generality we may assume that ‖u‖⃗ = ‖v‖⃗ = ‖w‖⃗ = 1. Analogously as in the proof of the cosine rule in ℝ2 we then get for these perpendicular vectors u⃗ − cos(c)v⃗ and w⃗ − cos(b)v,⃗ respectively. Hence we get for α the formula cos(α) =

⃗ (w⃗ − cos(b)v)⟩ ⃗ ⟨(u⃗ − cos(c)v), ⃗ w⃗ − cos(b)v‖⃗ ‖u⃗ − cos(c)v‖‖

14.3 Orthonormalization and closest vector | 317

and hence ⃗ w⃗ − cos(b)v‖⃗ cos(α) ‖u⃗ − cos(c)v‖‖ ⃗ (w⃗ − cos(b)v)⟩ ⃗ = ⟨(u⃗ − cos(c)v), = ⟨u,⃗ w⟩⃗ − cos(b)⟨u,⃗ v⟩⃗ − cos(c)⟨v,⃗ w⟩⃗ + cos(c) cos(b) = cos(a) − cos(b) cos(c) − cos(c) cos(b) + cos(c) cos(b). Therefore we get (cos(a) − cos(b) cos(c))2 ⃗ (u⃗ − cos(c)v)⟩⟨( ⃗ ⃗ (w⃗ − cos(b)v)⟩ ⃗ = cos2 (α)⟨(u⃗ − cos(c)v), w⃗ − cos(b)v), = cos2 (α)(1 − 2 cos(c)⟨u,⃗ v⟩⃗ + cos2 (c)(1 − cos(b))⟨w,⃗ v⟩⃗ + cos2 (b)) = (1 − 2 cos2 (c) + cos2 (c))(1 − 2 cos2 (b) + cos2 (b)) cos2 (α) = (1 − cos2 (c))(1 − cos2 (b)) ⋅ cos2 (α) = sin2 (c) ⋅ sin2 (b) ⋅ cos2 (α). From this we get cos(a) = cos(b) ⋅ cos(c) + sin(c) ⋅ sin(b) cos(α). We want to look at orthogonality in the ℝ-vector space V of the continuous functions f ∶ [a, b] → ℝ, a, b ∈ ℝ, a < b, with the scalar product a

⟨f , g⟩ = ∫ f (t)g(t)dt b

for f , g ∈ V.

We have here f ⟂ g if and only if a

∫ f (t)g(t)dt = 0. b

Examples 14.22. (a) If [a, b] = [−1, 1] then the functions fi (t) = t 2i , gj (t) = t 2j+1 with i, j = 0, 1, 2, … and i + j odd are orthogonal. (b) If [a, b] = [0, 2π] then the functions sin(kt) and cos(kt), k = 1, 2, 3, …, are orthogonal because 2π

∫ sin(kt) ⋅ cos(kt) = 0

1 sin2 (kx)|2π 0 =0 2k

for k = 1, 2, 3, ….

14.3 Orthonormalization and closest vector In this section we show that any Euclidean vector space has a basis consisting of mutually orthogonal vectors. We then give a procedure called the Gram–Schmidt procedure

318 | 14 Euclidean vector spaces after J. P. Gram (1850–1916) and E. Schmidt (1876–1959) to find this orthogonal basis starting with any basis. Finally we use this orthogonal basis to solve the closest vector problem. This is given a subspace W of an Euclidean vector space V and a v⃗ ∈ V to find the vector in W closest to v.⃗ In the subsequent two sections we will present nice applications of the closest vector theorem. In the following let V be an Euclidean vector space with a scalar product ⟨ , ⟩. Definition 14.23. A set of vectors v1⃗ , v2⃗ , … , vn⃗ ∈ V is called an orthogonal set, if ⟨vi⃗ , vj⃗ ⟩ = 0 for i ≠ j, that is, if any two different vectors of the v1⃗ , v2⃗ , … , vn⃗ are orthogonal. This set of vectors is called an orthonormal set if it forms an orthogonal set and if ‖vi⃗ ‖ = 1

for all i = 1, 2, … , n.

If {v1⃗ , v2⃗ , … , vn⃗ } is an orthonormal set in V and, in addition, a basis of V , then {v1⃗ , v2⃗ , … , vn⃗ } is called an orthonormal basis of V . If V is infinite dimensional, then a subset B ⊂ V is called an orthonormal basis of V if B is a basis of V and if each finite subset of B is an orthonormal set in V . Lemma 14.24. An orthogonal set {v1⃗ , v2⃗ , … , vn⃗ }, vi⃗ ≠ 0⃗ for i = 1, 2, … , n, is linearly independent. Proof. Let c1 v1⃗ + c2 c2⃗ + ⋯ + cn vn⃗ = 0,⃗ all ci ∈ ℝ. We must show that c1 = c2 = ⋯ = cn = 0. We have ⟨vi⃗ , c1 v1⃗ + c2 v2⃗ + ⋯ + cn vn⃗ ⟩ = ⟨vi⃗ , 0⟩⃗ = 0 hence, c1 ⟨vi⃗ , v1⃗ ⟩ + c2 ⟨vi⃗ , v2⃗ ⟩ + ⋯ + cn ⟨vi⃗ , vn⃗ ⟩ = 0

for all i = 1, 2, … , n.

Since ⟨vi⃗ , vj⃗ ⟩ = 0 if i ≠ j, we get ci ⟨vi⃗ , vi⃗ ⟩ = 0 for i = 1, 2, … , n. Now, vi⃗ ≠ 0,⃗ that is, ⟨vi⃗ , vi⃗ ⟩ ≠ 0, and therefore ci = 0 for i = 1, 2, … , n. Hence, {v1⃗ , v2⃗ , … , vn⃗ } is linearly independent. Theorem 14.25. Let {e1⃗ , e2⃗ , … , en⃗ } be an orthonormal basis of V , and if v⃗ ∈ V then n

v⃗ = ∑⟨v,⃗ ei⃗ ⟩ei⃗ . i=1

Proof. Since especially {e1⃗ , e2⃗ , … , en⃗ } is a basis of V , there are c1 , c2 , … , cn ∈ ℝ with v⃗ = c1 e1⃗ + c2 e2⃗ + ⋯ + cn en⃗ . It follows ⟨v,⃗ ei⃗ ⟩ = c1 ⟨e1⃗ , ei⃗ ⟩ + c2 ⟨e2⃗ , ei⃗ ⟩ + ⋯ + cn ⟨en⃗ , ei⃗ ⟩

for i = 1, 2, … , n.

Since ⟨ei⃗ , ej⃗ ⟩ = 0 if i ≠ j and ‖ei⃗ ‖ = √⟨ei⃗ , ei⃗ ⟩ = 1 we get that ⟨v,⃗ ei⃗ ⟩ = ci for i = 1, 2, … , n as desired.

14.3 Orthonormalization and closest vector | 319

If v⃗ ∈ V and {e1⃗ , e2⃗ , … , en⃗ } is an orthonormal basis then ci = ⟨v,⃗ ei⃗ ⟩ are called the Fourier coefficients of v⃗ relative to {e1⃗ , e2⃗ , … , en⃗ }. We now give a procedure to find an orthogonal basis starting with any basis. This is called the Gram–Schmidt Orthogonalization Procedure. Theorem 14.26 (The Gram–Schmidt Orthogonalization Procedure). Let V be an Euclidean vector space with scalar product ⟨ , ⟩. Let {x1⃗ , x2⃗ , … , xn⃗ , …} be a linearly independent set in V . Then the set {v1⃗ , v2⃗ , … , vn⃗ , …} is an orthogonal set, in which the vi⃗ are defined by v1⃗ = x1⃗

k

⃗ , vi⃗ ⟩ ⟨xk+1 vi⃗ ⟨ v i⃗ , vi⃗ ⟩ i=1

⃗ = xk+1 ⃗ −∑ and vk+1

for k ≥ 1.

The orthogonal set {v1⃗ , v2⃗ , … , vn⃗ , …} is called the Gram–Schmidt-Orthogonalization (GSO) of {x1⃗ , x2⃗ , … , xn⃗ , …}. Proof. Define v1⃗ ∶= x1⃗ . Then v2⃗ = x2⃗ −

⟨x2⃗ , v1⃗ ⟩ v⃗ . ⟨v1⃗ , v1⃗ ⟩ 1

It follows ⟨v2⃗ , v1⃗ ⟩ = ⟨x2⃗ , v1⃗ ⟩ − ⟨x2⃗ , v1⃗ ⟩

⟨v1⃗ , v1⃗ ⟩ = ⟨x2⃗ , x1⃗ ⟩ − ⟨x2⃗ , x1⃗ ⟩ = 0. ⟨v1⃗ , v1⃗ ⟩

This is the start for an induction. Now, let j > 2 and assume ⟨vi⃗ , vk⃗ ⟩ = 0 for all i < j and for all k < i. We have j−1

vj⃗ = xj⃗ − ∑ i=1

⟨xj⃗ , vi⃗ ⟩ ⟨vi⃗ , vi⃗ ⟩

vi⃗ .

If k < j, then j−1

⟨vj⃗ , vk⃗ ⟩ = ⟨xj⃗ , vk⃗ ⟩ − ∑ i=1

⟨xj⃗ , vi⃗ ⟩ ⟨vi⃗ , vi⃗ ⟩

⟨vi⃗ , vk⃗ ⟩.

By induction we have ⟨vi⃗ , vk⃗ ⟩ = 0

for k < i < j.

Hence, ⟨vi⃗ , vk⃗ ⟩ we can unequal zero only if k = i = j − 1. But then ⃗ ⟩− ⟨vj⃗ , vk⃗ ⟩ = ⟨xj⃗ , vj−1

⃗ ⟩ ⟨xj⃗ , vj−1

⃗ , vj−1 ⃗ ⟩ ⟨vj−1

⃗ , vj−1 ⃗ ⟩ = 0. ⟨vj−1

Now, let V be finite dimensional. If {x1⃗ , x2⃗ , … , xn⃗ } is a basis of V , then the GSO leads to an orthogonal set {v1⃗ , v2⃗ , … , vn⃗ } which is also a basis of V . If we divide the vi⃗ by ‖vi⃗ ‖, then we get an orthonormal basis of V . Therefore we have the following.

320 | 14 Euclidean vector spaces Theorem 14.27. If V is a finite dimensional Euclidean vector space, then V has an orthonormal basis. Example 14.28. Let U ⊂ ℝ4 be the subset of ℝ4 with a basis u⃗1 = (1, 0, 0, 2),

u⃗2 = (2, 1, 1, 0)

and

u⃗3 = (0, 2, 3, 0).

We construct an orthogonal basis of U. Let v1⃗ = u⃗1 = (1, 0, 0, 2). Then v2⃗ = u⃗2 −

⟨u⃗2 , v1⃗ ⟩ 2 8 4 v⃗ = (2, 1, 1, 0) − (1, 0, 0, 2, ) = ( , 1, 1, − ). 5 5 5 ⟨v1⃗ , v1⃗ ⟩ 1

Now v3⃗ = u⃗3 −

⟨u⃗3 , v1⃗ ⟩ ⟨u⃗3 , u⃗2 ⟩ − v⃗ , ⟨v1⃗ , v1⃗ ⟩ ⟨v2⃗ , v2⃗ ⟩ 1

and we get v3⃗ = (− Then

v⃗ v⃗ v1⃗ , 2 , 3 ‖v1⃗ ‖ ‖v2⃗ ‖ ‖v3⃗ ‖

40 27 53 20 , , , ). 26 26 26 26

form an orthonormalbasis of U.

If V is an Euclidean vector space and W ⊂ V is subspace then the closest vector problem is to determine given v⃗ ∈ V the closest vector w⃗ ∈ W to v.⃗ In geometric terms w⃗ is the orthogonal projection of v.⃗ This can be solved in terms of an orthonormal basis of W . The solution is called the closest vector theorem. Theorem 14.29 (Closest vector theorem). Let W be a subspace of an Euclidean vector space V with scalar product ⟨ , ⟩. Let v⃗ be a vector of V . If {e1⃗ , e2⃗ , … , en⃗ } is an orthonormal basis of W then the unique vector w⃗ ∈ W closest to v⃗ is given by n

w⃗ = ∑⟨v,⃗ ei⃗ ⟩ei⃗ . i=1

Proof. Each vector from W can be written uniquely as a linear combination c1 e1⃗ + c2 e2⃗ + ⋯ + cn en⃗ with certain c1 , c2 , … , cn ∈ ℝ. We have to find c1 , c2 , … , cn such that ⟨v⃗ − (c1 e1⃗ + c2 e2⃗ + ⋯ + cn en⃗ ), v⃗ − (c1 e1⃗ + c2 e2⃗ + ⋯ + cn en⃗ )⟩ becomes minimal.

14.4 Polynomial approximation

| 321

Since {e1⃗ , e2⃗ , … , en⃗ } is an orthonormal basis of W , we get ⟨v⃗ − (c1 e1⃗ + c2 e2⃗ + ⋯ + cn en⃗ ), v⃗ − (c1 e1⃗ + c2 e2⃗ + ⋯ + cn en⃗ )⟩ n

n

i=1 n

i=1

= ‖v‖⃗ 2 − 2 ∑ ci ⟨v,⃗ ei⃗ ⟩ + ∑ ci2 n

= ‖v‖⃗ + ∑(⟨v,⃗ ei⃗ ⟩ − ci )2 − ∑(⟨v,⃗ ei⃗ ⟩)2 . 2

i=1

i=1

Since the second term is non-negative, the whole right side is minimal if the second term is zero, or equivalently, if ⟨v,⃗ ei⃗ ⟩ = ci for all i = 1, 2, … , n. Remark 14.30. If v⃗ ∈ W , then this is just the representation of v⃗ as a linear combination in the orthogonal basis {e1⃗ , e2⃗ , … , en⃗ }. Example 14.31. Let W ⊂ ℝ4 be the subspace generated by u⃗1 = (1, 0, 0, 2),

u⃗2 = (2, 1, 1, 0)

and

u⃗3 = (0, 2, 3, 0).

In Example 14.28 we saw that v1⃗ = (1, 0, 0, 2),

8 4 v2⃗ = ( , 1, 1, − ) and 5 5

v3⃗ = (−

40 27 53 20 , , , ) 26 26 26 26

is an orthogonal basis of W . By normalization to length 1 we get that 1 8 1 1 4 2 , , ,− , 0, 0, ), e2⃗ = ( ) √5 √5 √82 √82 √82 √82 40 27 53 20 e3⃗ = (− , , , ) √5 538 √5 538 √5 538 √5 538 e1⃗ = (

and

is an orthonormal basis of W . The vector w⃗ ∈ W closest to v⃗ = (0, 1, 0, 1) is given by w⃗ = ⟨v,⃗ e1⃗ ⟩ + ⟨v,⃗ e2⃗ ⟩ + ⟨v,⃗ e3⃗ ⟩ =

1 (−445 348, 437 295, 938 315, 2 532 204). 2 268 530

14.4 Polynomial approximation We now give a very nice application of the closest vector theorem to a problem in analysis. Given an arbitrary continuous function f on [a, b] the Weierstrass approximation theorem says that f (x) can be uniformly approximated by a polynomial. An important problem in analysis is to determine, given a continuous function f (x) on [a, b], the best polynomial approximation for it. One solution to this problem is called least squares approximation.

322 | 14 Euclidean vector spaces If f (x) is an arbitrary continuous function on [a, b]. Then the least squares approximation of degree n for f (x) is the polynomial Pn (x) of degree n or less that minimizes b

∫ |f (x) − Pn (x)|2 dx. a

The least squares approximation can be found easily using the closest vector theorem. Let V be the Euclidean vector space of the continuous functions f ∶ [a, b] → ℝ, a, b ∈ ℝ, a < b, with the scalar product b

⟨f , g⟩ = ∫ f (t)g(t)dt. a

Let n ∈ ℕ and f ∈ V be given. Let W = Rn [x] be the subspace of V given by polynomials of degree less than or equal to n. This has a basis 1, x, x2 , … , xn . For given f (x) ∈ V the least squares approximation of degree n is the element of W closest to f (x) with respect to the scalar product ⟨ , ⟩ in V . We can find this polynomial with the following algorithm. A basis of ℝn [x] is given by the polynomials 1, x, x2 , … , xn . Step 1: Using the GSO we have to find an orthogonal basis {po (x), p1 (x), … , pn (x)} of ℝn [x]. Step 2: By normalization of the orthogonal basis we get an orthonormal basis {φ0 (x), φ1 (x), … , φn (x)} of ℝn [x]. For this we have especially b

∫ φi (t)φj (t)dt = δij , a

where {1, δij = { 0, {

for i = j, for i ≠ j

the Kronecker symbol. Step 3: Find c0 , c1 , … , cn ∈ ℝ with pn (x) = c0 φ0 (x) + c1 φ1 (x) + ⋯ + cn φn (x) and b

ci = ∫ f (t)φi (t)dt a

for i = 0, 1, … , n.

Example 14.32. Let n = 2 and f (x) = ex , x ∈ [0, 1]. Step 1: For {1, x, x2 } we find an orthogonal basis. Let u1⃗ = 1, u2⃗ = x and u3⃗ = x 2 . With the Gram–Schmidt orthogonalization procedure we get v1⃗ = 1

and v2⃗ = u2 −

⟨u⃗2 , v1⃗ ⟩ v⃗ . ⟨v1⃗ , v1⃗ ⟩ 1

14.5 Secret sharing scheme using the closest vector theorem

We have

1

⟨u⃗2 , v1⃗ ⟩ = ∫ tdt = 0

Hence, v2⃗ = x −

1 . 2

| 323

1

1 2

and ⟨v1⃗ , v1⃗ ⟩ = ∫ dt = 1. 0

Using this we get v3⃗ = u⃗3 −

Now,

⟨u⃗3 , v1⃗ ⟩ ⟨u⃗3 , v2⃗ ⟩ v2⃗ − v⃗ . ⟨v2⃗ , v2⃗ ⟩ ⟨v1⃗ , v1⃗ ⟩ 1

t2 1 )dt = , 2 12 0 1 1 ⟨u3⃗ , v1⃗ ⟩ = ∫ t 2 dt = , and 3 0 1 1 2 1 ⟨v2⃗ , v2⃗ ⟩ = ∫ (t − ) dt = . 2 12 0 1

⟨u⃗3 , v2⃗ ⟩ = ∫ (t 3 −

We get therefore v3⃗ = x2 − x + 61 . Step 2: We normalize the Gram–Schmidt orthogonal basis. With ‖v1⃗ ‖ = 1 we have φ0 (x) = 1; with ‖v2⃗ ‖ = √1 we have φ1 (x) = √12(x − 21 ); with 12 ‖v3⃗ ‖ = 1 we have φ2 (x) = √180(x2 − x + 1 ). 6

√180

Step 3: We calculate the ⟨ex , φi (x)⟩ for i = 0, 1, 2. It is 1

⟨ex , φ0 (x)⟩ = ∫ et dt = 1,718 … , 0 1

1 ⟨e , φ1 (x)⟩ = ∫ et (√12(t − ))dt = 0,4880 … , 2 0 1 1 x ⟨e , φ2 (x)⟩ = ∫ et (√180(t 2 − t + ))dt = 0,0625 … . 6 0 x

For the best approximation we get

P2 (x) = ⟨ex , φ0 (x)⟩φ0 (x) + ⟨ex , φ1 (x)⟩φ1 (x) + ⟨ex , φ2 (x)⟩φ2 (x), hence, P2 (x) ≈ 1,718 + 0,4880φ1 (x) + 0,0625φ2 (x). Notation. The Gram–Schmidt orthogonal set which belongs to {1, x, x2 , …} on [0, 1] is called the set of the Legendre polynomials on [0, 1].

14.5 Secret sharing scheme using the closest vector theorem As a second application we give an alternative secret sharing scheme using the closest vector theorem. Recall that a secret sharing scheme among a group of n participants is a very important cryptographic protocol. The general idea is the following. We have n people who have access to a secret S in such a way that the secret can be recov-

324 | 14 Euclidean vector spaces ered if any t of the access group with t ≤ n get together. Given such a secret S, then an (n, t)-secret sharing (threshold) scheme is a cryptographic primitive in which a secret is split into pieces (shares) and distributed among a collection of n participants {p1 , p2 , … , pn } so that any group of t, t ≤ n, or more participants can recover the secret. Meanwhile, any group of t − 1 or fewer participants cannot recover the secret. By sharing a secret in this way the availability and reliability issues can be solved. A geometric alternative scheme to Shamir’s secret sharing scheme (see Section 6.6.3) uses the Closest Vector Theorem, see Theorem 14.29. Now we explain the secret sharing scheme using the Closest Vector Theorem. We start with an Euclidean vector space V of dimension m and an access control group of size n. We assume that the dimension m of V is much greater than n, that is, m ≫ n. Within V there is a hidden subspace W of dimension t < n. The secret to be shared is given as an element in this hidden subspace, that is, the secret v ∈ W a vector in W . The dealer distributes to each of the n members of the access control group, i = 1, 2, … , n, two vectors, vi and w, where vi ∈ W , and w is a vector in the big space V . The common vector w has the property that w ∉ W and v is the vector in W closest to w. In general the vector w can be given publicly. The set {v1 , v2 , … , vn } has the property that any subset of size t is independent. Hence any subset of size t determines a basis for W . Suppose t valid users get together. They can determine a basis for W and hence using the Gram–Schmidt orthogonalization procedure (see Theorem 14.26) they determine an orthonormal basis. Since w is given, they can determine v by the Closest Vector Theorem and recover the secret. Given a subset of size less than t the given vectors generate a subspace of W of dimension less than t and hence in V there are infinitely many extensions to subspaces of dimension t. This implies that determining W with less than t elements of a basis has negligible probability. This is a general method like the Shamir protocol. In [4], Chum, Fine, Moldenhauer, Rosenberger and Zhang, compared several different secret sharing protocols including the classic Shamir secret sharing scheme and the secret sharing scheme using the Closest Vector Theorem explained above.

Exercises 1. 2. 3.

Let v⃗ = (1, 2, 3) and w⃗ = (6, 5, 4) be two vectors in ℝ3 . Calculate the distance between v⃗ and w.⃗ Let v1⃗ = (3, 0, 40) and v2⃗ = (1, 1, 1) be two vectors in ℝ3 . Calculate an orthonormal basis for the subspace in ℝ3 , which is generated by v1⃗ and v2⃗ . Consider the three vectors x⃗ = (0, 1, 1), in ℝ3 .

y⃗ = (1, 1, 0),

z ⃗ = (1, 0, 1),

Exercises | 325

(a) Calculate the distance between the two vectors x ⃗ and y.⃗ (b) Determine an orthonormal basis for the subspace U ⊂ ℝ3 generated by x ⃗ and z.⃗ (c) Calculate the closest vector in U to y.⃗ 4. Let V be an Euclidean vector space and v,⃗ w⃗ ∈ V . Show 1 ⟨v,⃗ w⟩⃗ = (‖v‖⃗ 2 + ‖w‖⃗ 2 − ‖v⃗ − w‖⃗ 2 ). 2 5.

Given are two straight-lines g1 and g2 , with g1 ∶ x⃗ = (1, 1) + ℝ(1, 2)

6.

and g2 ∶ y⃗ = (3, 1) + ℝ(2, 1).

Calculate the angle under which the lines intersect. Show that the four sides of a (non-degenerate) parallelogram are equal if the both diagonals intersect perpendicular. (Hint: Without loss of generality consider the following situation shown in Fig⃗ ure 14.15. It is a⃗ = v⃗ + w,⃗ b⃗ = −v⃗ + w.)

Figure 14.15: (Non-degenerate) parallelogram with equal sides.

(a) Show that the set of vectors in ℝ4 of the form {(x, 2x, y, x + y)} forms a subspace and determine an orthogonal basis for it. (b) Find the dimension of the subspace of ℝ4 spanned by v1⃗ = (3, 6, 3, 0), v2⃗ = (4, 2, 1, 1), v3⃗ = (2, 0, 2, −2) and give a basis for it. Then give a general form for a vector in this subspace. Find the vector in this subspace closest to (1, 1, 0, 0). 8. Find the vector in the subspace U spanned by u⃗1 = (1, 1, −1, 1), u⃗2 = (3, 2, −1, 0) in ℝ4 closest to v⃗ = (0, 7, 4, 7). 9. (a) Determine an orthonormal basis for the subspace V of ℝ3 spanned by 7.

u⃗1 = (1, 2, 3),

u⃗2 = (2, 3, 1),

u⃗3 = (1, 3, 2).

(b) Use the results from (c) to find the vector in V closest to (1, 5, 1).

Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

[12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

G. Baumslag, B. Fine, M. Kreuzer, and G. Rosenberger. A Course in Mathematical Cryptography. De Gruyter, 2015. W. Borho, J. C. Jantzen, H.-P. Kraft, J. Rohlfs, and D. Zagier. Lebendige Zahlen. Birkhäuser, 1981. C. Carstensen, B. Fine, and G. Rosenberger. Abstract Algebra. De Gruyter, 2011. C. S. Chum, B. Fine, A. I. S. Moldenhauer, G. Rosenberger, and X. Zhang. On secret sharing protocols. Contemporary Mathematics, 677:51–78, 2016. M. Dörfer and G. Rosenberger. Zeta functions of finitely generated nilpotent groups. Groups Korea’94, Eds.: Kim/Johnson, Walter de Gruyter, pages 35–46, 1995. R. A. Dunlop. The Golden Ratio and Fibonacci Numbers. World Scientific, 1999. B. Fine. The Algebraic Theory of Bianchi Groups. Marcel Dekker, 1989. B. Fine and G. Rosenberger. The Fundamental Theorem of Algebra. Springer, 1997. B. Fine and G. Rosenberger. Algebraic Generalizations of Discrete Groups. Marcel Dekker, 2001. B. Fine and G. Rosenberger. Number Theory: An Introduction via the Density of Primes. Birkhäuser, 2nd edition, 2016. D. Gorenstein, R. Lyons, and R. Solomon. The Classification of the Finite Simple Groups. Mathematical Surveys and Monographs, Volumes 40.1–40.6, American Mathematical Society, 1994–2005. M. D. Greenberg. Advanced Engineering Mathematics. Prentice Hall, Englewood Cliffs, 1988. F. Grunewald, D. Segal, and G. Smith. Subgroups of finite index in nilpotent groups. Inventiones Mathematicae, 9:185–223, 1988. B. Hornfeck. Algebra. De Gruyter, 3th edition, 1976. H. L. Keng. Introduction to Number Theory. Springer-Verlag, 1982. M. Kreuzer and G. Rosenberger. Growth in Hecke groups. Contemporary Mathematics, 629:261–281, 2014. A. Mann. How Groups Grow. London Mathematical Society, Lecture Note Series 395, 2011. G. Müller. Elementare Zahlentheorie. Arithmetik als Prozess, Klett-Verlag, pages 255–290, 2004. I. Niven, H. S. Zuckerman, and H. L. Montgomery. The Theory of Numbers. Wiley, 5th edition, 1991. P. Ribenboim. The Book of Primes Number Records. Springer, 1989. S. Singh. Fermat’s Last Theorem. HarperPress, 2012. M. J. Wiener. Cryptoanalysis of Short RSA Secret Exponents. IEEE Transaction on Information Theory, 36(3):553–558, 1990.

DOI 10.1515/9783110516142-015

Index Abelian group 2 Absolute value 15, 193 Abundant number 76 Algebraic 131 Algebraic closure 227 Algebraic extension 131 Algebraic integer 235 Algebraic number 131, 227, 228 Algebraic number field 227 Algebraically closed 203 Alternating groups An 149 Altitude theorem 314 Amicable number 76 Archimedean 182 Argument 198 Associates 99 Axioms 1 Binomial coefficient 257 Binomial formula 284 Canonical scalar product 304 Cardano’s formulas 212 Cathetus theorem 314 Cauchy complete 165 Cauchy completion 157, 166, 179 Cauchy sequence 157, 195 Cauchy–Schwarz-Inequality 305 Cauchy’s Inequality 177 Characteristic 125 Chinese Remainder Theorem 47 Closest vector theorem 324 Commutative ring with unity 12, 95 Complete normed field 179 Complex conjugate 193 Complex integers 201 Complex numbers 189 Complex plane 196 Composite 99 Composite number 20 Congruence class 39 Congruence modulo n 39 Conjugate 90 Conjugate elements in groups 147 Conjugates 228 Constructible numbers 291 Construction of a regular n-gon 298

Continued fraction 175 Continuum hypothesis 175 Convergent sequence 157 Coprime 25 Coset 149 Cosine rule 310 Cosine rule for ℝ3 316 Countable 173 Course of values induction 9 Cycle multiplication 146 Cyclic group 45 Decimal fraction 169 Decimal numbers 168, 169 Deficient number 76 Degree 98 Discriminant 231, 240 Division algorithm 23 Division ring 220 Divisor 19, 99 Donkey Bridge in ℝ2 312 Doubling the cube 296 Elementary symmetric polynomial 152 Equivalent norms 180 Euclidean algorithm 26 Euclidean domain 202 Euclidean quadratic number field 245 Euclidean vector space (or scalar product space) 303 Euler φ-function 43 Euler’s Formula 200 Euler’s identity 199 Extension by radicals 210 Extension field 126 Extreme to mean ratio 67 Factor 19 Fermat numbers 78 Fermat prime number 78 Fermat’s Big Theorem 87 Fermat’s Little Theorem 42 Fermat’s two-square theorem 89 Ferrari’s formula 209 Fibonacci Numbers 61 Field 14, 95 Field extension 126

330 | Index

Field of p-adic numbers 184 Finite extension 128, 131 Finitely generated 131 First induction principle 4 Fourier coefficients 283, 319 Fourier Convergence Theorem 283 Fourier series 283 Fundamental Theorem for Polynomials 115 Fundamental Theorem of Arithmetic 19

Least squares approximation 322 Least upper bound property 166 Least well-ordering property (LWO) 8 Left transversale 149 Legendre polynomials 323 Length 305 Linear combination 19 Liouville-Number 273 Lub property 166

g-nary numbers 169 Galois theory 209 Gaussian integers 201 Gauss’s Lemma 106 Geometric construction 289 Geometric construction problems 289 Golden angle 63 Golden ratio 67 Golden rectangle 67 Golden section 67 Gram–Schmidt Orthogonalization Procedure (GSO) 319 Greatest common divisor (gcd) 24, 113 Group 2 Group of units 99

Mathematical induction 4 Mersenne number 71 Mersenne prime 71 Minimal polynomial 132 Modular group 90 Modulus 198 Monic polynomial 98 Monoid 2 Multiple 99 Multiple zero 135

Hamiltonian skew field 220 Homomorphism 149 Horner-Scheme 109 Ideal 33, 99 Imaginary quadratic field 241 Indeterminate 97 Index 149 Induction 2 Infinite descent 87 Inner product space 303 Integers 1 Integral basis 238 Integral domain 13, 33, 40, 95 Interpolating polynomial 117 Irrational number 173 Irreducible polynomial 100 Isomorphism 127, 149 Lagrange interpolation 118 Law of Cosines 311 Leading coefficient 98 Least common multiple (lcm) 30

n-th root 138 Natural numbers 1 Nested intervals property 168 Non-Archimedean 182 Norm in an algebraic number field 232 Norm on a field 179 Normal subgroup 149 Normed field 179 Order isomorphic 166 Ordered field 161 Orthocenter of a triangle in ℝ2 314 Orthogonal 309 Orthogonal complement 309 Orthogonal set 318 Orthonormal basis 318 Orthonormal set 318 p-adic norm 183 p-adic numbers 184 p-adic valuation 183 Peano Axioms 1 Pell’s equation 244 Perfect numbers 72 Permutation group 142 Polar form 198 Polynomial 97 Polynomial Approximation 321

Index | 331

Polynomial interpolation 117 Primality test 20 Prime 19 Prime divisor 19 Prime element 99 Prime factor 19 Prime field 125 Prime number 19 Primitive element 131, 228 Primitive element theorem 137 Primitive integral polynomial 235 Primitive polynomial 106 Primitive root 45 Principal ideal 99 Principal ideal domain (PID) 33, 113 Principle of mathematical induction 4 Projection 311 Proper divisor 99 Pythagorean triple 83 ℚp 184 Quadratic fields 240 Quadratic integers 241 Quadratic nonresidue modulo n 54 Quadratic residue 89 Quadratic residue modulo n 54 Quaternions 220 Quotient 23

Second principle of induction 9 Secret sharing scheme 117, 323 Secret sharing threshold scheme 117 Semigroup 2 Shamir’s secret sharing scheme 121 Share distribution 121 Sieve of Eratosthenes 20 Simple extension 131 Simple extension by radicals 210 Simple group 149 Simple zero 135 Skew field 220 Sociable number 77 Solvable by radicals 210 Splitting field 125, 135 Squaring the circle 296 Stabilizer 142 Standard prime decomposition 22 Stirling’s approximation 264 Strong triangle inequality 182 Subfield 125 Subring 33 Sum of squares 89 Supremum 166 Symmetric function 151 Symmetric group 141, 142 Symmetric polynomial 151, 152

Ramified 246, 247 Rational Root Theorem 105 Real number system 157 Real numbers 166 Real quadratic field 241 Reducible polynomial 100 Relatively prime 25 Remainder 23 Repeating decimal 172 Residue 39 Residue class 39 Residue class ring 40 Ring 11, 95 Ring of integers modulo n 40 Ring of polynomials 98 Roots of unity 206 RSA Cryptosystem 46

Terminating decimal 172 Theorem 1 Theorem of Abel 216 Theorem of Cayley 143 Theorem of Dirichlet 176 Theorem of Frobenius 224 Theorem of Hall 149 Theorem of Kronecker 130 Theorem of Ostrowski 185 Theorem of Pythagoras 310 Theorem of Thales in ℝ2 312 Trace 233 Transcendental extension 131 Transcendental number 227, 228 Transposition 146 Transversal 149 Triangular number 5 Trisection of an angle 297

Sn 142 Scalar product 303

Ultra-metric 182 Uncountable 173

332 | Index

Unique factorization domain (UFD) 116, 202 Unit 43, 99 Unit group 99

Wallis Product Formula 266 Wilson’s theorem 41

Vandermonde determinant 118, 232 Vandermonde interpolation 120 Vandermonde matrix 118

ℤn 40 Zero of a polynomial 104 Zeta function 150