Math 706, Theory of Numbers, Kansas State University, Spring 2019 [version 21 Nov 2018 ed.]

  • Commentary
  • Downloaded from https://www.math.ksu.edu/~cochrane/m706/m706s19/m706s19notes.pdf

Table of contents :
Notation......Page 7
1.2.1. Trichotomy Principle.......Page 9
1.3. Discreteness Axioms.......Page 10
1.4. Additional Properties of Z.......Page 11
2.1. Divisibility and Greatest Common Divisors......Page 13
2.3. Euclidean Algorithm......Page 14
2.4. Euclidean Domains......Page 15
2.6. Solving the equation ax+by=d, with d=(a,b)......Page 17
2.7. The linear equation ax+by=c......Page 19
2.9. Unique Factorization in Z......Page 20
2.10. Properties of GCDs and LCMs......Page 21
2.11. Units, Primes and Irreducibles......Page 22
2.12.1. Principal Ideal Domains......Page 23
2.13. Gaussian Integers......Page 24
2.14. The Set of Primes......Page 25
2.14.2. Twin Primes......Page 26
2.14.3. Number of primes up to x......Page 27
2.14.5. Goldbach Conjecture......Page 28
3.1. Basic properties of congruences......Page 29
3.2. The ring of integers 8mu(mod6mum), Zm......Page 30
3.4. Multiplicative inverses and Cancelation Laws......Page 31
3.5. The Group of units 8mu(mod6mum) and the Euler phi-function......Page 32
3.7. Fermat's Little Theorem, Euler's Theorem and Wilson's Theorem......Page 33
3.8. Chinese Remainder Theorem......Page 34
3.9. Group of units modulo a prime, G(p)......Page 36
3.10. Group of units G(pe)......Page 38
3.11. Group of units G(m) for arbitrary m......Page 39
4.2. Power Congruences, xn a 8mu(mod6mum)......Page 41
4.4. General Polynomial Congruences: Lifting Solutions......Page 43
4.5. Counting Solutions of Polynomial Congruences......Page 46
5.2. Properties of the Legendre Symbol......Page 47
5.3. Proof of the Law of Quadratic Reciprocity......Page 49
5.4. The Jacobi Symbol......Page 51
5.4.1. Euclidean-type algorithm for evaluating a Jacobi symbol......Page 53
5.5. Local solvability implies global solvability......Page 54
5.6. Sums of two Squares......Page 55
5.6.2. Sums of three squares and sums of four squares......Page 58
6.2. Pseudoprimes and Carmichael Numbers......Page 59
6.3. Mersenne Primes and Fermat Primes......Page 61
7.1. Properties of Greatest Integer Function and Binomial Coefficients......Page 63
7.3. Multiplicative Function......Page 64
7.4. Perfect Numbers......Page 66
7.6. Estimating Arithmetic Sums......Page 67
7.7. Möbius Inversion Formula......Page 69
7.8. Estimates for (n), (n) and (n)......Page 70
7.8.1. Estimates for (n)......Page 71
8.1. The Fibonacci Sequence......Page 73
8.3. A Matrix view of the Fibonacci Sequence......Page 74
8.4. Congruence and Divisibility Properties of the Fibonacci Sequence......Page 75
8.5. Periodicity of the Fibonacci sequence 8mu(mod6mum)......Page 76
8.6. Further Properties of the Fibonacci Sequence......Page 78
9.2. Systems of Linear Equations......Page 81
9.3. Pythagorean Triples......Page 85
9.4. Rational Points on Conics......Page 86
9.5. The Equations x4+y4=z2 and x2+4y4=z4......Page 87
9.6. Cubic Curves......Page 88
9.6.2. Method of Tangent Lines......Page 89
10.2. Addition of Points on an Elliptic Curve......Page 91
10.3. The Projective Plane......Page 93
10.5. The Elliptic Curve as an abelian Group......Page 94
10.6. The Pollard (p-1)-method of Factorization......Page 97
10.7. Elliptic Curve Method of Factorization......Page 98
11.1. Euler-Maclaurin Summation Formula and Estimating Factorials......Page 101
11.2. Chebyshev Estimate for (x)......Page 102
11.3. Bertrand's Postulate......Page 104
11.4. The von Mangoldt function and the function......Page 105
11.5. The sum of reciprical primes......Page 107
12.1. Matrix representation of quadratic form......Page 109
12.2. Equivalent Forms and Reduced Forms......Page 110
12.3. Representation by Positive Definite Binary Quadratic Forms......Page 112
12.5. Congruence test for Representation......Page 113
12.5.1. Ideal Class Number......Page 114
12.6. Tree diagram of Values Represented by a Binary Quadratic Form......Page 115
13.2. Discrete Subgroups of Rn......Page 117
13.3. Minkowski's Fundamental Theorem......Page 118
13.4. Canonical Basis Theorem and Sublattices......Page 119
13.5. Lagrange's 4-squares Theorem......Page 120
13.7. The Legendre Equation......Page 122
13.8. The Catalan Equation......Page 123
14.1. Approximating real numbers by rationals......Page 125
14.2.1. Simple Finite Continued Fractions......Page 126
14.3. Convergents to Continued Fractions......Page 127
14.4. Infinite Continued Fraction Expansions......Page 129
14.5. Best Rational Approximations to Irrationals......Page 130
14.6. Hurwitz's Theorem......Page 132
14.7. The set of all best rational approximations......Page 133
14.8. Quadratic Irrationals and Periodic Continued Fractions......Page 134
14.9. Pell Equations......Page 137
14.10. Liouville's Theorem......Page 140
15.1. Definition and Convergence of a Dirichlet series......Page 143
15.2. Important examples of Dirchlet Series......Page 144
15.3. Another Proof of the Möbius Inversion Formula......Page 145
15.5. Analytic properties of Dirichlet series......Page 146
15.6. The Riemann Zeta Function and the Riemann Hypothesis......Page 149
15.7. More on the zeta function......Page 150
Appendix A. Preliminaries......Page 153
B.0.2. Cancelation Law for Addition......Page 157
B.0.6. Basic consequence of Trichotomy.......Page 158
B.0.10. General Associative-Commutative Law.......Page 159
B.0.12. Binomial Square Formula......Page 160
C.1. Equivalence of the Discreteness Axioms......Page 161
C.3. Proof by Induction......Page 162
C.3.2. Examples of Induction Proofs......Page 163
C.3.3. Property 11. Binomial Expansion Formula......Page 164
D.1. Definition of a Ring......Page 167
D.3. Units and Zero Divisors......Page 168
D.5. Polynomial Rings......Page 169
D.6. Ring homomorphisms and Ideals......Page 172
D.7. Group Theory......Page 174
D.8. Lagrange's Theorem......Page 176
D.9. Normal Subgroups and Group Homomorphisms......Page 177
Bibliography......Page 179

Citation preview

Math 706, Theory of Numbers Kansas State University Spring 2019 Todd Cochrane

Department of Mathematics Kansas State University

Contents Notation

3

Chapter 1. Axioms for the set of Integers Z 1.1. Ring Properties of Z 1.2. Order Properties of Z 1.3. Discreteness Axioms. 1.4. Additional Properties of Z.

5 5 5 6 7

Chapter 2. Divisibility and Unique Factorization 2.1. Divisibility and Greatest Common Divisors 2.2. Division Algorithms 2.3. Euclidean Algorithm 2.4. Euclidean Domains 2.5. Linear Combinations and GCDLC Theorem 2.6. Solving the equation ax + by = d, with d = (a, b) 2.7. The linear equation ax + by = c 2.8. Primes and Euclid’s Lemma 2.9. Unique Factorization in Z 2.10. Properties of GCDs and LCMs 2.11. Units, Primes and Irreducibles 2.12. UFDs, PIDs and Euclidean Domains 2.13. Gaussian Integers 2.14. The Set of Primes

9 9 10 10 11 13 13 15 16 16 17 18 19 20 21

Chapter 3. Modular Arithmetic 3.1. Basic properties of congruences 3.2. The ring of integers (mod m), Zm 3.3. Congruences in general rings 3.4. Multiplicative inverses and Cancelation Laws 3.5. The Group of units (mod m) and the Euler phi-function 3.6. A few results from Group Theory 3.7. Fermat’s Little Theorem, Euler’s Theorem and Wilson’s Theorem 3.8. Chinese Remainder Theorem 3.9. Group of units modulo a prime, G(p) 3.10. Group of units G(pe ) 3.11. Group of units G(m) for arbitrary m

25 25 26 27 27 28 29 29 30 32 34 35

Chapter 4. Polynomial Congruences 4.1. Linear Congruences 4.2. Power Congruences, xn ≡ a (mod m)

37 37 37

3

4

CONTENTS

4.3. 4.4. 4.5.

A general quadratic congruence General Polynomial Congruences: Lifting Solutions Counting Solutions of Polynomial Congruences

39 39 42

Chapter 5. Quadratic Residues and Quadratic Reciprocity 5.1. Introduction 5.2. Properties of the Legendre Symbol 5.3. Proof of the Law of Quadratic Reciprocity 5.4. The Jacobi Symbol 5.5. Local solvability implies global solvability 5.6. Sums of two Squares

43 43 43 45 47 50 51

Chapter 6. Primality Testing, Mersenne Primes and Fermat Primes 6.1. Basic Primality Test 6.2. Pseudoprimes and Carmichael Numbers 6.3. Mersenne Primes and Fermat Primes

55 55 55 57

Chapter 7. Arithmetic Functions 7.1. Properties of Greatest Integer Function and Binomial Coefficients 7.2. The Divisor function and Sigma function 7.3. Multiplicative Function 7.4. Perfect Numbers 7.5. The M¨ obius Function 7.6. Estimating Arithmetic Sums 7.7. M¨ obius Inversion Formula 7.8. Estimates for τ (n), σ(n) and φ(n)

59 59 60 60 62 63 63 65 66

Chapter 8. Recurrence Sequences 8.1. The Fibonacci Sequence 8.2. Second order linear recurrences 8.3. A Matrix view of the Fibonacci Sequence 8.4. Congruence and Divisibility Properties of the Fibonacci Sequence 8.5. Periodicity of the Fibonacci sequence (mod m) 8.6. Further Properties of the Fibonacci Sequence

69 69 70 70 71 72 74

Chapter 9. Diophantine Equations 9.1. Preliminaries 9.2. Systems of Linear Equations 9.3. Pythagorean Triples 9.4. Rational Points on Conics 9.5. The Equations x4 + y 4 = z 2 and x2 + 4y 4 = z 4 9.6. Cubic Curves

77 77 77 81 82 83 84

Chapter 10.1. 10.2. 10.3. 10.4. 10.5. 10.6.

87 87 87 89 90 90 93

10. Elliptic Curves Definition of an Elliptic Curve Addition of Points on an Elliptic Curve The Projective Plane Elliptic curves in the projective plane The Elliptic Curve as an abelian Group The Pollard (p − 1)-method of Factorization

CONTENTS

10.7.

Elliptic Curve Method of Factorization

5

94

Chapter 11.1. 11.2. 11.3. 11.4. 11.5.

11. Prime Number Theory Euler-Maclaurin Summation Formula and Estimating Factorials Chebyshev Estimate for π(x) Bertrand’s Postulate The von Mangoldt function and the ψ function The sum of reciprical primes

97 97 98 100 101 103

Chapter 12.1. 12.2. 12.3. 12.4. 12.5. 12.6.

12. Binary Quadratic Forms Matrix representation of quadratic form Equivalent Forms and Reduced Forms Representation by Positive Definite Binary Quadratic Forms Class Number Congruence test for Representation Tree diagram of Values Represented by a Binary Quadratic Form

105 105 106 108 109 109 111

Chapter 13.1. 13.2. 13.3. 13.4. 13.5. 13.6. 13.7. 13.8.

13. Geometry of Numbers Lattices and Bases Discrete Subgroups of Rn Minkowski’s Fundamental Theorem Canonical Basis Theorem and Sublattices Lagrange’s 4-squares Theorem Sums of Three Squares The Legendre Equation The Catalan Equation

113 113 113 114 115 116 118 118 119

Chapter 14. Best Rational Approximations and Continued Fractions 14.1. Approximating real numbers by rationals 14.2. Continued Fractions 14.3. Convergents to Continued Fractions 14.4. Infinite Continued Fraction Expansions 14.5. Best Rational Approximations to Irrationals 14.6. Hurwitz’s Theorem 14.7. The set of all best rational approximations 14.8. Quadratic Irrationals and Periodic Continued Fractions 14.9. Pell Equations 14.10. Liouville’s Theorem

121 121 122 123 125 126 128 129 130 133 136

Chapter 15.1. 15.2. 15.3. 15.4. 15.5. 15.6. 15.7.

139 139 140 141 142 142 145 146

15. Dirichlet Series Definition and Convergence of a Dirichlet series Important examples of Dirchlet Series Another Proof of the M¨obius Inversion Formula Product Formula for Dirichlet Series Analytic properties of Dirichlet series The Riemann Zeta Function and the Riemann Hypothesis More on the zeta function

Appendix A.

Preliminaries

149

Appendix B.

Proof of Additional Properties of Z

153

6

CONTENTS

Appendix C. Discreteness Axioms for Z C.1. Equivalence of the Discreteness Axioms C.2. Proof of Additional Discreteness Properties C.3. Proof by Induction

157 157 158 158

Appendix D. Review of Groups, Rings and Fields D.1. Definition of a Ring D.2. Basic properties of Rings D.3. Units and Zero Divisors D.4. Integral Domains and Fields D.5. Polynomial Rings D.6. Ring homomorphisms and Ideals D.7. Group Theory D.8. Lagrange’s Theorem D.9. Normal Subgroups and Group Homomorphisms

163 163 164 164 165 165 168 170 172 173

Appendix.

175

Bibliography

Notation

N = {1, 2, 3, 4, 5, . . . } = Natural numbers Z = {0, ±1, ±, 2, ±3, . . . } = Integers E = {0, ±2, ±4, ±6, . . . } = Even integers O = {±1, ±3, ±5, . . . } = Odd integers Q = {a/b : a, b ∈ Z, b 6= 0} = Rational numbers R = Real numbers C = Complex numbers Z[i] = {a + bi : a, b ∈ Z} = Gaussian Integers Zm = Ring of integers mod m [a]m = {a + mx : x ∈ Z} = Residue class of a mod m Um = Multiplicative group of units mod m −1

a

(mod m) = “multiplicative inverse of a (mod m)” φ(m) = Euler phi-function

(a, b) = gcd(a, b) = greatest common divisor of a and b [a, b] = lcm[a, b] = least common multiple of a and b a|b = “a divides b” M2,2 (R) = Ring of 2 × 2 matrices over a given ring R R[x] = Ring of polynomials over R |S| = order or cardinality of a set S Sn = n-th symmetric group log(x) = natural logarithm of x

∩ intersection ∅

∪ union ⊆

empty set

∃ there exists ∀ ⇔

∃!

there exists a unique ⇒

for all

equivalent to ∈

subset implies

iff if and only if ≡

element of 7

congruent to

CHAPTER 1

Axioms for the set of Integers Z 1.1. Ring Properties of Z We shall assume the following properties as axioms for the set of integers. 1.1.1. Addition Properties. There is a binary operation + on Z, called addition, satisfying a) Addition is well defined, that is, given any two integers a, b, a+b is a uniquely defined integer. b) Substitution Law for addition: If a = b and c = d then a + c = b + d. c) The set of integers is closed under addition. For any a, b ∈ Z, a + b ∈ Z. d) Addition is commutative. For any a, b ∈ Z, a + b = b + a. e) Addition is associative. For any a, b, c ∈ Z, (a + b) + c = a + (b + c). f) There is a zero element 0 ∈ Z (also called the additive identity), satisfying 0 + a = a = a + 0 for any a ∈ Z. g) For any a ∈ Z, there exists an additive inverse −a ∈ Z satisfying a + (−a) = 0 = (−a) + a. Note 1.1.1. Properties a),b), and c) above are implicit in the definition of a binary operation on Z. Definition 1.1.1. Subtraction is defined by a − b = a + (−b) for a, b ∈ Z. 1.1.2. Multiplication Properties. There is an operation · (or ×) on Z called multiplication, satisfying, a) Multiplication is well defined, that is, given any two integers a, b, a · b is a uniquely defined integer. b) Substitution Law for multiplication: If a = b and c = d then ac = bd. c) Z is closed under multiplication. For any a, b ∈ Z, a · b ∈ Z. d) Multiplication is commutative. For any a, b ∈ Z, ab = ba. e) Multiplication is associative. For any a, b, c ∈ Z, (ab)c = a(bc). f) There is an identity element 1 ∈ Z satisfying 1 · a = a = a · 1 for any a ∈ Z. 1.1.3. Distributive law. This is the one property that combines both addition and multiplication. For any a, b, c ∈ Z, a(b + c) = ab + ac. One can deduce (from the given axioms) the additional distributive laws, (a + b)c = ac + bc, a(b − c) = ab − ac and (a − b)c = ac − bc. 1.2. Order Properties of Z 1.2.1. Trichotomy Principle. The set of integers can be partitioned into a union of three disjoint sets, Z = −N ∪ {0} ∪ N, where N is called the set of positive integers or natural numbers, and −N := {−x : x ∈ N} the set of negative integers. 9

10

1. AXIOMS FOR THE SET OF INTEGERS Z

The inequalities > (greater than) and < (less than) are defined as follows: a > b if a − b ∈ N; a < b if a − b ∈ −N. Thus the Trichotomy Principle is equivalent to the Law of Trichotomy, which states that for any two integers a, b exactly one of the following holds: a < b, a = b or a > b, (that is a − b ∈ −N, a − b = 0 or a − b ∈ N.) 1.2.2. Positivity Axioms. a) The sum of two positive integers is a positive integer. b) The product of two positive integers is a positive integer. An important consequence of the second positivity axiom is that 1 is a positive integer. Indeed, if 1 was negative, then by trichotomy, −1 is positive, and so (−1)(−1) is positive by the positivity axiom. But, by one of the properties of negatives (see below), (−1)(−1) = 1, implying that 1 is positive, a contradiction.

1.3. Discreteness Axioms. The following four axioms are equivalent, that is, in defining Z we may start with any one of these axioms, and then deduce the others from that one. See Appendix C for a proof and further discussion. a) Well Ordering Property of N. Any nonempty subset of N has a smallest element. b) Axiom of Induction. Let S be a subset of N such that (i) 1 ∈ S and (ii) n ∈ S ⇒ n + 1 ∈ S. Then S = N. c) Maximum Element Principle. Any nonempty subset of Z, bounded above has a largest element. (Recall, a set S is bounded above if there exists an integer M such that for all x ∈ S, x ≤ M .) d) Minimum Element Principle. Any nonempty subset of integers bounded below has a minimum element. (Recall, a set S is bounded below if there exists an integer M such that for all x ∈ S, x ≥ M .) Other important consequences of the discreteness axioms are the following. 1) Minimality of 1. 1 is the smallest positive integer, that is, there is no integer between 0 and 1. This simple fact turns out to be a powerful tool in many proofs in number theory. 2) Natural Numbers are sums of 1’s. Every positive integer is a (finite) sum of 1’s. That is, N = {1, 2, 3, . . . }, where as usual, 2 := 1 + 1, 3 := 1 + 1 + 1, and so on. 3) Strong Form of Induction. Let S be a subset of N such that (i) 1 ∈ S and (ii) If {1, 2, . . . , n} ⊆ S then n + 1 ∈ S. Then S = N. See Appendix C.3 for examples of induction proofs.

1.4. ADDITIONAL PROPERTIES OF Z.

11

1.4. Additional Properties of Z. The properties below can all be deduced from the axioms above. You may freely use them in your homework for this class. See Appendix B for proofs. 1] Subtraction-Equality principle. x = y if and only if x − y = 0. 2] Cancelation law for addition: If a + x = a + y then x = y. 3] Additive inverses are unique, that is, if a, b, c are integers such that a + b = 0 and a + c = 0 then b = c. 4] Zero multiplication property: a · 0 = 0 for any a ∈ Z. 5] Properties of negatives: −(−a) = a, (−a)b = −(ab) = a(−b), (−a)(−b) = ab, (−1)a = −a. 6] Basic consequence of Trichotomy: If a > 0 then −a < 0 and if a < 0 then −a > 0. 7] Products of Positives and Negatives: If a > 0 and b < 0 then ab < 0. If a < 0 and b < 0, then ab > 0. 8] Zero divisor property, or integral domain property: If ab = 0 then a = 0 or b = 0. 9] Cancelation law for multiplication: If ax = ay and a 6= 0 then x = y. 10] General Associative-Commutative Law: a) Addition: When adding a collection of n integers a1 + a2 + · · · + an , the numbers may be grouped in any way and added in any order. In particular, the sum a1 +a2 +· · ·+an is well defined, that is, no parentheses are necessary to specify the order of operations. b) Multiplication: When multiplying a collection of n integers a1 a2 · · · an , the numbers may be grouped in any way and multiplied in any order. In particular, the product a1 a2 · · · an is well defined, that is, no parentheses are necessary to specify the order of operations. 11] General such as (a + b)(c + d) = ac + ad + bc + bd, or PDistributive  PLaws Pn m n Pm ( i=1 ai ) j=1 bj = i=1 j=1 ai bj . 12] Binomial Expansion Formula: For any integers a, b and positive integer n we have    Pn (a + b)n = k=0 nk ak bn−k = an + n1 an−1 b + n2 an−2 b2 + · · · + bn . In particular, (a + b)2 = a2 + 2ab + b2 (a + b)3 = a3 + 3a2 b + 3ab2 + b3 .

CHAPTER 2

Divisibility and Unique Factorization 2.1. Divisibility and Greatest Common Divisors Definition 2.1.1. Let a, b ∈ Z, a 6= 0. We say that a divides b and write a|b if b = ax for some integer x. In this case, we also say that a is a divisor of b, b is divisible by a and that b is a multiple of a. Note 2.1.1. i) The divisors of an integer can be positive or negative. Thus the divisors of 6 are 1, −1, 2, −2, 3, −3, 6, −6. ii) Every nonzero integer is a divisor of 0. Why? iii) Since 0 · 0 = 0, it is reasonable to say that 0 divides 0, but we rule this language out for technical reasons. For instance, we want to be able to say that if a divides b then b/a is an integer. It is for this reason that we put a 6= 0 in the definition of divisibility. Theorem 2.1.1. Sum and difference properties for divisibility. i) For any a, b, d ∈ Z, if d|a and d|b then d|(a + b) and d|(a − b). ii) For any a, b, d, x, y ∈ Z, if d|a and d|b then d|(ax + by). Proof. i) is just a special case of ii), putting x = y = 1, x = 1, y = −1 respectively. Thus it suffices to prove ii). ii) Let x, y ∈ Z and suppose that d|a and d|b. Then a = du, b = dv for some u, v ∈ Z. Thus ax + by = (du)x + (dv)y = d(ux) + d(vy) = d(ux + vy), and so d|(ax + by) since ux + vy ∈ Z.  Theorem 2.1.2. Transitive property of divisibility. If d|a and a|b then d|b. Proof. Suppose that d|a and a|b. Then a = dx, b = ay, for some x, y ∈ Z. Thus, b = ay = (dx)y = d(xy), and so d|b since xy ∈ Z.  Definition 2.1.2. i) Let a, b ∈ Z, not both zero. The greatest common divisor of a and b, written (a, b) or gcd(a, b), is the largest positive integer dividing both a and b. ii) If (a, b) = 1 we say that a and b are relatively prime. Note 2.1.2. i) (0,0) is not defined. Why? ii) If d|b and b 6= 0 then |d| ≤ |b| . Proof. We may assume that b and d are positive. Say b = dq for some q ∈ Z. Note, q ≥ 1 since b > 0. Thus b − d = dq − d = d(q − 1) ≥ 0 and so b ≥ d.  iii) (0, m) = |m| for any nonzero integer m. Why? iv) If a, b are not both 0 then (a, b) is defined. Proof. Let S be the set of common divisors of a and b. Plainly 1 ∈ S (and so S is nonempty) and, by note ii), S is bounded above by |b|. Thus, by the maximum element principle, S has a maximum element.  13

14

2. DIVISIBILITY AND UNIQUE FACTORIZATION

Lemma 2.1.1. The GCD Invariance Property. For any integers a, b, q with a and b not both zero, we have (a − qb, b) = (a, b). Proof. Let S be the set of common divisors of a and b, and T the set of common divisors of a − qb and b. We claim that S = T , and so S and T have the same maximal element, that is, gcd(a, b) = gcd(a − qb, b). To show S = T it suffices to show S ⊆ T and T ⊆ S. Suppose that d ∈ S, that is, d|a and d|b. Then by the sum and difference property of divisibility, d|a − qb. Therefore, d ∈ T . Similarly, if d ∈ T , that is, d|b and d|a − qb, then d|[(a − qb) + qb], that is, d|a. Therefore, d ∈ S.  Note 2.1.3. i) The concept of gcd extends easily to any number of integers: For a1 , . . . , ak ∈ Z, not all zero we define (a1 , . . . , ak ) to be the greatest common divisor of a1 , . . . , ak . ii) We have a generalization of the gcd invariance property: For any integer q, (a1 , . . . , an ) = (a1 − qai , a2 . . . , an ). iii) The concept of divisibility can be defined identically as above for any commutative ring. For noncommutative rings, one can define left and right divisibility, but we shall not pursue this here. 2.2. Division Algorithms Theorem 2.2.1. The Division Algorithm. For any integers a, b with b 6= 0 there exist unique integers q, r such that a = qb + r, and 0 ≤ r < |b|. (Here, q is called the quotient and r the remainder in dividing a by b.) Proof. We’ll prove the case where b > 0 and leave b < 0 as an exercise for the reader. Existence: If a = 0 we take q = r = 0. Suppose that a 6= 0. Let S = {x ∈ Z : xb ≤ a}. Since b > 0, S is bounded above by |a|, and thus it has a maximal element, say q. Thus qb ≤ a < (q + 1)b. Put r = a − qb. Then 0 ≤ r < b and a = qb + r. Uniqueness: Suppose that q1 b + r1 = q2 b + r2 with 0 ≤ r1 , r2 < b. Then |q1 − q2 |b = |r2 − r1 | < b, and so |q1 − q2 | < 1. Thus, since there is no integer between 0 and 1, q1 − q2 = 0. Therefore q1 = q2 and consequently r1 = r2 .  Theorem 2.2.2. Minimal Remainder Division Algorithm. Let a, b be integers with b > 0. Then there exist integers q, r such that a = qb + r with |r| ≤ b/2. Proof. Start with the ordinary division algorithm to produce q 0 , r0 ∈ Z with a = q 0 b + r0 , 0 ≤ r0 < b. If r0 ≤ b/2 we are done, that is, we can take q = q 0 , r = r0 . Assume next that r0 > b/2. Then a = (q 0 + 1)b + r0 − b and |r0 − b| < b/2. Thus we take q := q 0 + 1 and r = r0 − b.  Example 2.2.1. Consider 29 ÷ 5. Using the ordinary division algorithm we would write 29 = 5 · 5 + 4. Using the minimal remainder division algorithm we would write 29 = 6 · 5 − 1. 2.3. Euclidean Algorithm The Euclidean Algorithm provides a simple procedure for calculating the greatest common divisor of any two integers. There are two versions of it, the traditional

2.4. EUCLIDEAN DOMAINS

15

algorithm in which a positive remainder is used at each step, and the Fast Euclidean Algorithm in which the least remainder is chosen at each step. The Traditional Euclidean Algorithm. Let a ≥ b > 0 be positive integers. Then, by the Division Algorithm and GCD Invariance Property, Lemma 2.1.1, we have a = bq1 + r1 ,

0 ≤ r1 < b,

(a, b) = (r1 , b)

b = r1 q2 + r2 ,

0 ≤ r2 < r1 ,

(a, b) = (r1 , r2 )

r1 = r2 q3 + r3 ,

0 ≤ r3 < r2 ,

(a, b) = (r3 , r2 )

... rk−3 = rk−2 qk−1 + rk−1 ,

0 ≤ rk−1 < rk−2 ,

(a, b) = (rk−1 , rk−2 )

rk−2 = rk−1 qk ,

(a, b) = rk−1 .

Since r1 > r2 > · · · > rk−1 we are guaranteed that this process will stop in a finite number of steps. Note 2.3.1. i) In the Fast Euclidean Algorithm, one chooses the least remainder at each step (allowing for positive or negative remainders), and thus we would have |rj | ≤ |rj−1 /2| at each step. Thus |r1 | ≤ b/2, |r2 | ≤ b/4, and (by induction) |rj | ≤ b/2j for any j. The algorithm stops if |rj | < 1. It follows that the number of steps k is at most log2 b + 1. (The extra +1 is needed for trivial cases such as (3, 2).) Thus the algorithm is extremely efficient for calculating gcds of large numbers. ii) In the Traditional Euclidean Algorithm, it can be shown that the running time is slowest for calculating gcds of consecutive Fibonacci numbers. Indeed, the calculation of (Fn , Fn+1 ), requires n − 1 steps. (Ex. (F6 , F7 ) := (8, 13) = (8, 5) = (3, 5) = (3, 2) = (1, 2) = (1, 0).) It follows that the number of steps for calculating (a, b) for any integers a, b with a ≥ b > 0 is at most logϕ b, where ϕ=

√ 1+ 5 2

= 1.618..., the golden ratio. 2.4. Euclidean Domains

The Euclidean algorithm can be applied to many integral domains other than the integers. Such integral domains are called Euclidean domains. Definition 2.4.1. Let D be an integral domain and D∗ the set of nonzero elements in D. Then D is called a Euclidean domain if there exists a mapping δ : D∗ → N ∪ {0}, such that i) δ(a) ≤ δ(ab) for all nonzero a, b ∈ D. ii) For any a, b ∈ D, b 6= 0, there exist elements q, r ∈ D with a = qb + r,

and r = 0 or

δ(r) < δ(b).

We note that there are many slight variations in the definition of a Euclidean domain in the literature, but we will not concern ourselves with this subtlety here. Note 2.4.1. i) In any Euclidean domain the concept of divisibility and greatest common divisor can be defined as for the case of Z, the only difference being in the meaning of the word “greatest”. Here, “greatest” means with respect to the mapping δ. ii) By property i) it follows that if a is nonzero and d|a then δ(d) ≤ δ(a). Thus if a|b and b|a, that is, a and b differ by a unit multiple, then δ(a) = δ(b).

16

2. DIVISIBILITY AND UNIQUE FACTORIZATION

iii) The greatest common divisor of two elements of a Euclidean domain, not both zero, is defined, but is not unique. Indeed, by note ii) and the fact that any common divisor of two elements a, b is a divisor of the greatest common divisor (see next section), it follows that (a, b) is unique up to unit multiples. Thus in Z we could say ±3 is the greatest common divisor of 6, 9 in this context, but we will maintain the convention that for Z, greatest common divisors are always positive. iv) The Euclidean algorithm can be applied as in the previous section to find the greatest common divisor of any two nonzero elements of a Euclidean domain. In this case instead of r1 > r2 > · · · > rk−1 we would have δ(r1 ) > δ(r2 ) > · · · > δ(rk−1 ). Theorem 2.4.1. The set of integers Z is a Euclidean domain with respect to the mapping δ(n) = |n|. Proof. We trivially have |a| ≤ |ab| for all nonzero integers a, b, and the Division Algorithm, Theorem 2.2.1, gives us property ii) in the definition.  Another standard example of a Euclidean domain is any polynomial ring over a field. We will be particularly interested in this result for the case of the finite fields Zp , with p a prime. Theorem 2.4.2. Division Algorithm for Polynomials. Let F be a field and f (x), g(x) ∈ F [x] with g(x) 6= 0. Then there exist polynomials q(x), r(x) over F such that f (x) = q(x)g(x) + r(x),

with r(x) = 0 or deg(r(x)) < deg(g(x)).

The polynomial q(x) is called the quotient and r(x) the remainder. A proof is provided in the appendix. Corollary 2.4.1. Any polynomial ring over a field is a Euclidean domain with respect to the mapping δ(f (x)) = deg(f (x)). Proof. It is easy to prove that for any nonzero polynomials f (x), g(x) over any integral domain, deg(f (x)g(x)) = deg(f (x)) + deg(g(x)), and thus property i) holds. The division algorithm of the preceding theorem yields property ii).  Note 2.4.2. The units in a polynomial ring over a field are just the nonzero constant polynomials. Thus the greatest common divisor of two polynomials is unique up to constant multiples. In this case one can adopt the convention of taking the gcd to be a monic polynomial, that is, a polynomial with leading coefficient 1. Another important example of a Euclidean domain that we shall occasionally make reference to is the ring of Gaussian integers in C, Z[i] = {a + bi : a, b ∈ Z}. We leave the following as an exercise for the reader. Homework 2.4.1. Prove that the ring of Gaussian integers is a Euclidean domain with respect to the mapping δ(a + bi) = a2 + b2 = |a + bi|2 .

2.6. SOLVING THE EQUATION ax + by = d, WITH d = (a, b)

17

2.5. Linear Combinations and GCDLC Theorem Definition 2.5.1. A linear combination of two integers a, b (with integer coefficients) is an integer of the form ax + by with x, y ∈ Z. Such a combination is also called an integral linear combination. Theorem 2.5.1. Greatest Common Divisor Linear Combination (GCDLC) Theorem. For any integers a, b, not both zero, the greatest common divisor d of a, b can be expressed as a linear combination of a, b with integer coefficients. In particular, d is the smallest positive linear combination a, b. This theorem is also referred to as B´ezout’s Lemma. A constructive proof of the theorem can be given by following the Euclidean Algorithm together with the method of back substitution as we illustrate in Example 2.6.1. We will present here a non-constructive proof based on the following lemma. Lemma 2.5.1. Every additive subgroup of Z is of the form (d) := {dx : x ∈ Z}, for some nonnegative integer d. Proof. Let H be an additive subgroup of Z. If H = (0), then we simply take d = 0. Otherwise H contains some nonzero element h. By taking the additive inverse of h if necessary, we see that H ∩N is nonempty. Thus, by the Well Ordering Axiom, H ∩ N has a minimum element d. We claim that H = (d). Certainly, (d) ⊆ H since H is closed under addition and subtraction. Next, lets show that H ⊆ (d). Let h ∈ H. Then h = qd + r for some integers q, r with 0 ≤ r < d. Now r = h − qd ∈ H, and thus by the minimality of d we must have r = 0. Consequently h = qd ∈ (d).  Proof of Theorem 2.5.1. Let H = (a) + (b) = {ax + by : x, y ∈ Z}, the set of all linear combinations of a and b. Plainly H is an additive subgroup of Z, since it is closed under subtraction. Thus, by the preceding lemma, H = (d) for some nonnegative integer d, and since a, b are not both zero, we must have d > 0, and that d is the smallest positive linear combination of a and b. Say , d = ax0 + by0 for some x0 , y0 ∈ Z. We claim that d = (a, b). Since a, b ∈ H we have d|a and d|b. Next, suppose that e is any common divisor of a, b. Then, e is also a divisor of the linear combination ax0 + by0 , that is, e|d, and therefore e ≤ d. 

2.6. Solving the equation ax + by = d, with d = (a, b) The Euclidean algorithm provides us with an algorithm for solving the equation ax + by = d, where d = (a, b). We shall present two variations of the algorithm the first being the method of Back Substitution and the second, the Array Method. Back Substitution: We start by finding (a, b) using either the traditional or fast Euclidean algorithm, and then work our way backwards through the equations to find x and y.

18

2. DIVISIBILITY AND UNIQUE FACTORIZATION

Example 2.6.1. Find d = gcd(126, 49) and express it as a linear combination of 49 and 126. (1)

126 = 2 · 49 + 28,

(2)

49 = 28 + 21,

(3)

28 = 21 + 7,

(4)

21 = 3 · 7,

d = gcd(28, 49) d = gcd(28, 21) d = gcd(7, 21)

d = gcd(7, 0) = 7, ST OP

Next we use back substitution. Start with equation (3): 7 = 28 − 21. By (2) we have 21 = 49 − 28. Substituting this into previous yields 7 = 28 − (49 − 28) = 2 · 28 − 49. By (1) we have 28 = 126 − 2 · 49. Substituting this into previous yields 7 = 2 · (126 − 2 · 49) − 49 = 2 · 126 − 5 · 49. Array Method. Example 2.6.2. We shall redo the previous example using the array method. To begin, set up an array with the first three columns initialized as shown below. For a given choice of x and y the linear combination 126x + 49y is given in the first row. Now, perform the Euclidean Algorithm on the numbers in top row, but do the corresponding column operations on the entire array. Let C1 be the column with top entry 126, C2 the column with top entry 49, etc.. The first step in the Euclidean algorithm is to subtract 2 times 49 from 126, so we let the next column C3 be given by C3 = C1 − 2C2 . Then C4 = C2 − C3 , C5 = C3 − C4 . 126x + 49y x y

126 1 0

49 28 0 1 1 −2

21 7 −1 2 3 −5

Thus, 7 = 2 · 126 − 5 · 49. Example 2.6.3. Use the array method linear combination of 83 and 17. 83x + 17y 83 17 x 1 0 y 0 1

to find gcd(83, 17) and express it as a 15 1 −4

2 1 −1 8 5 −39

Thus (83, 17) = 1 and 1 = 8 · 83 − 39 · 17. Note 2.6.1. From a programming point of view, the array method is more efficient than the method of back substitution. In particular, there is no need to store the values qi , ri in memory as would be required for the method of back substitution. Homework 2.6.1. Use the array method to find x, y such that 423x + 198y = (423, 198). Note 2.6.2. The GCDLC theorem generalizes to more than two integers: For any integers a1 , . . . , ak not all zero, there exist integers x1 , x2 , . . . , xn such that a1 x1 + · · · + an xn = (a1 , . . . , an ), and the GCD is the smallest positive such linear combination of a1 , . . . , ak . Homework 2.6.2. i) Use the array method to find gcd(90, 126, 210), and express it as a linear combination of 90, 126 and 210. ii) Use the array method to find gcd(30, 42, 105), and express it as a linear combination of 30,42 and 105.

2.7. THE LINEAR EQUATION ax + by = c

19

2.7. The linear equation ax + by = c Consider the linear equation (2.1)

ax + by = c,

where a, b, c ∈ Z, and the companion homogeneous equation (2.2)

ax + by = 0.

Lemma 2.7.1. An integer pair (x, y) is a solution of (2.2) if and only if   −b a (x, y) = λ , , d d for some λ ∈ Z, where d = gcd(a, b).  a Proof. It is trivial to check that any point of the form λ −b d , d is a solution of (2.2). Conversely, if (x, y) is a solution of (2.2), then ax = −by, ad x = − db y, and so by Euclid’s Lemma, ad |y. Say y = λ ad for some λ ∈ Z. Then we also have x = − db y = λ db .  Lemma 2.7.2. Let a, b, c ∈ Z, d = gcd(a, b). Then (2.1) has an integer solution if and only if d|c. Proof. If (2.1) has an integer solution then c = ax + by for some x, y ∈ Z. Since d|a and d|b it follows that d|c. Conversely, suppose that c = dk for some k ∈ Z. By Theorem 2.5.1, we have d = ax0 + by0 for some x0 , y0 ∈ Z. Thus c = dk = a(x0 k) + b(y0 k) and so (2.1) is solvable.  Suppose now that (x0 , y0 ) is any particular solution of (2.1). By linearity, it follows that every solution of (2.1) is of the form (x, y) = (x0 , y0 ) + (x1 , y1 ), where (x1 , y1 ) is a solution of (2.2). Thus we obtain the following theorem. Theorem 2.7.1. Let d = gcd(a, b). Then (2.1) is solvable if and only if d|c, in which case the general solution of (2.1) is given by   −b a (x, y) = (x0 , y0 ) + λ , , d d where λ ∈ Z and (x0 , y0 ) is any particular solution of (2.1). Geometric interpretation: Solving (2.1) is equivalent to finding all integer points on the line ax + by = c. The theorem tells us that if there exists an integer point on the line, then all integer points are obtained by starting at a fixed integer point a (x0 , y0 ) on the line and adding integer multiples of the direction vector ( −b d , d ). Homework 2.7.1. Assume that a, b and c are all positive. Suppose that we wish to find integer points on the line in the first quadrant (x > 0, y > 0). Show that if ba ≤ cd then there exists at least one solution in the quadrant, and that the total number of solutions in the first quadrant is [cd/ab] or [cd/ab] + 1. Homework 2.7.2. Baseball schedule. Say we have two leagues with 7 teams each. Each team plays each team in the other league y games and each team in their own league x games. If there are 162 games in the season, find the best choice for x and y, that is, the “optimal” integer solution of the equation 6x + 7y = 162 with x, y both positive.

20

2. DIVISIBILITY AND UNIQUE FACTORIZATION

2.8. Primes and Euclid’s Lemma Definition 2.8.1. i) A positive integer p > 1 is called a prime if its only positive factors are 1 and itself. ii) A positive integer n > 1 is called a composite if it is not a prime, that is, n = ab for some positive integers a, b with a > 1 and b > 1. Note 2.8.1. 1 is not a prime or a composite. There are a couple reasons why 1 is not called a prime. The most important reason is that if 1 is a prime then we would not have unique factorization. For example, 6 = 2 · 3 = 1 · 2 · 3 = 1 · 1 · 2 · 3, etc. would all be different factorizations of 6. Another reason is that 1 just has a single positive factor, whereas every prime has two distinct positive factors. Lemma 2.8.1. Euclid’s Lemma. If a|bc and (a, b) = 1 then a|c. Proof. By the GCDLC theorem, since (a, b) = 1, ax + by = 1 for some x, y ∈ Z. Thus c = c(ax + by) = cax + cby = c(ax) + y(bc). Since a|ax and a|bc it follows that a|c(ax) + y(bc), that is, a|c.  In general it is a false statement to say that if a|bc then a|b or a|c, but for the case of prime divisors a, the statement is true. Lemma 2.8.2. a) Let p be a prime such that p|ab. Then p|a or p|b. b) Let p be a prime such that p|a1 a2 . . . an where ai are integers. Then p|ai for some i. Proof. a) Suppose that p|ab. If p|a we are done. Otherwise p - a. But in this case gcd(p, a) = 1 because the only positive divisors of p are 1 and p, and only 1 is a common divisor of both p and a (since p - a.) Thus, by Euclid’s lemma we must have p|b. b) We prove part b) by induction on n, the base case n = 1 being trivial. Suppose the statement is true for a given n, and now consider the case n + 1. Suppose that p|a1 · · · an an+1 . Then p|(a1 · · · an )an+1 . Viewing the latter quantity as a product of two integers, we see by the case n = 2 proven above, that either p|a1 · · · an or p|an+1 . In the former case we have p|ai for some i ≤ n by the induction hypothesis. Thus, in both cases p|ai for some i.  2.9. Unique Factorization in Z Theorem 2.9.1. Fundamental Theorem of Arithmetic, FTA. Any positive integer n > 1 can be expressed as a product of primes, and this expression is unique up to the order of the primes. Note 2.9.1. i) 12 = 2 · 2 · 3 = 2 · 3 · 2 = 3 · 2 · 2, are all considered the same factorization. ii) We say that a prime p has a trivial factorization as a product of primes. Proof. Existence. The proof is by the strong form of induction. Let P (n) be the statement that n has a factorization as a product of primes. P (2) is trivially true since 2 is a prime. Suppose now that P (k) is true for all values of k smaller than a given n and consider P (n). If n is prime we are done. Otherwise n = ab for some integers a, b with 1 < a < n, 1 < b < n. By the induction assumption, a and b can be expressed as products of primes, say a = p1 · · · pk , b = q1 · · · q` . Then ab = p1 · · · pk q1 · · · q` , a product of primes. QED

2.10. PROPERTIES OF GCDS AND LCMS

21

Uniqueness. Suppose that n is a positive integer with two representations as a product of primes, say, (2.3)

n = p1 · · · pk = q1 · · · qr

for some primes pi , qj , 1 ≤ i ≤ k, 1 ≤ j ≤ r. We may assume WLOG (without loss of generality) that k ≤ r. Then p1 |q1 . . . qr , so by the preceding lemma, p1 |qi1 for some i1 ∈ {1, 2, . . . , r}. Since p1 and qi1 are primes, we must have p1 = qi1 . Canceling p1 in (2.3) yields (2.4)

p2 p3 · · · pk = q1 · · · qˆi1 · · · qr ,

where qˆi1 indicates that this factor has been removed. We can then repeat the argument with p2 in place of p1 , and conclude that p2 = qi2 for some i2 6= i1 . After repeating this process k times we have that (2.5)

p1 = qi1 , p2 = qi2 , . . . , pk = qik

for some distinct integers i1 , i2 , . . . , ik ∈ {1, 2, . . . , r}. Moreover, after canceling each of the pi from (2.3) we are left with 1 on the left-hand side. If r > k then (2.3) would say that 1 is a product of primes, a contradiction. Therefore r = k, and so by (2.5), the primes pi are just a permutation of the primes qi .  Definition 2.9.1. Suppose that p is a prime. We write pe ka if pe |a and pe+1 - a. e is called the multiplicity of p dividing a. (This value is well defined by unique factorization.) The Fundamental Theorem of Arithmetic can be restated as follows: Theorem 2.9.2. Every positive integer n > 1 can be uniquely expressed as a product of distinct prime powers, n = pe11 pe22 . . . pekk . (Here, ei is the multiplicity of pi dividing n.) 2.10. Properties of GCDs and LCMs Lemma 2.10.1. Let a, b, c ∈ Z with a, b not both zero. i) Every common divisor of a, b divides (a, b). ii) If a|c, b|c and (a, b) = 1, then ab|c. Proof. i) If e|a and e|b then for any x, y ∈ Z, e|(ax + by). In particular, by GCDLC theorem e|(a, b). ii) We’ll leave as homework.  Definition 2.10.1. The least common multiple, LCM of two nonzero integers a, b denoted [a, b] is the smallest positive integer divisible by both a and b. Note 2.10.1. [a, b] exists and is unique for any nonzero a, b. Theorem 2.10.1. Let a, b be nonzero integers. Then i) [a, b](a, b) = |ab|. ii) Every common multiple of a, b is a multiple of [a, b]. Proof. We prove both parts simultaneously. Set R = ab/(a, b). Note R = b = (a,b) a ∈ Z, and is a common multiple of a and b. Suppose now that m is any common multiple of a and b, say m = as, m = bt for some s, t ∈ Z. Now, by GCDLC theorem, ax + by = (a, b) for some x, y ∈ Z. Thus, m(a, b) = max + mby = ab(tx + sy) and so m = R(tx + sy), a multiple of R. In particular |m| ≥ |R|. Thus we see that |R| is the least common multiple of a and b, and that every other common multiple is a multiple of |R|.  a (a,b) b

22

2. DIVISIBILITY AND UNIQUE FACTORIZATION

A second proof of the theorem can be given using prime power factorizations. First we note the following. Lemma 2.10.2. Suppose that p is a prime and that d, a ∈ Z with d|a. If pe kd and pf ka, then e ≤ f . Proof. Since pe |d and d|a we have pe |a. Therefore e ≤ f , by definition of f.  One readily deduces from this lemma the following theorem. Theorem 2.10.2. Suppose that a, b are positive integers with factorizations a = pe11 · · · pekk , b = pf11 · · · pfkk , (allowing zero exponents if necessary). Then min(e1 ,f1 )

· · · pk

max(e1 ,f1 )

· · · pk

(a, b) = p1

min(ek ,fk )

,

max(e1 ,f1 )

.

and [a, b] = p1

Note 2.10.2. As a corollary of this theorem we obtain another proof of the multiplication formula (a, b)[a, b] = |ab|, seen in Theorem 2.10.1. Proof. It suffices to prove the property for positive a, b. It follows immediately from the identities in the preceding theorem and the fact that max(e, f ) + min(e, f ) = e + f .  2.11. Units, Primes and Irreducibles Definition 2.11.1. Let D be an integral domain and a 6= 0 ∈ D. Then i) a is a unit if a has a multiplicative inverse in D. ii) a is composite if it can be expressed as a product a = bc with b and c nonunits. iii) a is irreducible if a is not a unit and not composite. iv) a is a prime if a is not a unit and whenever a|bc with b, c ∈ D then a|b or a|c. Note: Every nonzero element in D is either a unit, composite or irreducible. Example 2.11.1. i) The units in Z are ±1. The units in Z[i] are {1, −1, i, −i}. The units in R[x] where R is an integral domain are just the units in R. ii) The irreducibles in Z are {±2, ±3, ±5, ...}. Lemma 2.11.1. In any integral domain, the set of units is a multiplicative group. Proof. Elementary. We just observe that (ab)−1 = b−1 a−1 .



Lemma 2.11.2. In any integral domain the primes are irreducible. Proof. Suppose p is a prime in an integral domain D and that p = ab for some a, b ∈ D. Then p|ab and so p|a or p|b by the definition of prime. Without loss of generality say p|a, that is pk = a for some k ∈ Z. Then, by substitution, p = (pk)b and so bk = 1, implying that b is a unit. Therefore p is irreducible. 

2.12. UFDS, PIDS AND EUCLIDEAN DOMAINS

23

Note 2.11.1. i) The converse of the lemma is false as the √ √ following√example shows. √ Consider the ring Z[ −6]. Note 2 is irreducible, 2|(2 + −6)(2 − −6) and 2 - 2 ± −6. The details are left for homework. ii) In any principal ideal domain, prime and irreducible mean the same thing. In particular, such is the case for Z. iii) It is generally a convention in Number Theory for the word prime to mean positive prime, although this differs slightly from the algebraic definition of prime given in this section. iv) The set of irreducibles in Z are ±2, ±3, ±5, .... Definition 2.11.2. Two elements a, b in an integral domain are called associates, written a ∼ b if a = ub for some unit u. Lemma 2.11.3. i) If a|u and u is a unit, then a is a unit. ii) If p, q are irreducibles and p|q then p ∼ q. 2.12. UFDs, PIDs and Euclidean Domains Definition 2.12.1. An integral domain D is called a Unique Factorization Domain UFD if every nonzero element a of D has an essentially unique factorization into a product of irreducibles, a = up1 p2 . . . pk , where u is a unit, and p1 , . . . , pk are irreducible elements. By essentially unique we mean that if a has a second such factorization, say a = u0 p01 . . . p0` , then ` = k and there exists a permutation σ of {1, . . . , k} such that pi = ui p0σ(i) for some units ui . In this language we can restate the Fundamental Theorem of Arithmetic as follows. Theorem 2.12.1. The Fundamental Theorem of Arithmetic. Z is a Unique Factorization Domain. The astute reader will have noted that the ingredients we needed to prove the Fundamental Theorem of Arithmetic belong to any Euclidean Domain. For the existence part of the factorization one can induct on the value of δ(a). For the uniqueness part, one can prove in an identical manner the analogue of Euclid’s Lemma, since in any Euclidean Domain the greatest common divisor of two elements can be expressed as a linear combination of them. Thus we have Theorem 2.12.2. Any Euclidean Domain is a Unique Factorization Domain. Homework 2.12.1. Let S = {1, 2, 4, 6, 8, 10, . . . }, a monoid under multiplication. (i) Describe the irreducible elements of S. (Note, although S is not an integral domain, we can define the concept of irreducible and prime in the same manner.) (ii) Show that every element of S can be factored into a product of irreducibles. (iii) Find an irreducible element in S that is not a prime. (iv) Show that factorization is not unique. 2.12.1. Principal Ideal Domains. A more general example of Unique Factorization Domains are the Principal Ideal Domains. Definition 2.12.2. A Principal Ideal Domain, PID is an integral domain D in which every ideal is principal, that is, of the form (a) = {xa : x ∈ D}.

24

2. DIVISIBILITY AND UNIQUE FACTORIZATION

Homework 2.12.2. For any integers a, b not both zero, prove that (a) + (b) = ((a, b)), and (a) ∩ (b) = ([a, b]). Theorem 2.12.3. If D is a Euclidean domain then D is a PID. Proof. Let D be a Euclidean domain with respect to the mapping δ. Let I be a nonzero ideal in D and let a ∈ I be such that δ(a) is minimal. Then for any b ∈ I we have b = qa + r with either r = 0 or δ(r) < δ(a). Now r = b − qa ∈ I, and so by minimality of δ(a), we must have r = 0. Therefore a|b, that is b ∈ (a). Thus I = (a).  Example 2.12.1. Z, Z[i] and F [x] for any field F , are all PIDs. Theorem 2.12.4. If D is a Principal Ideal Domain, then D is a Unique Factorization Domain. Proof. We only give a rough sketch here. See for example Jacobson’s Basic Algebra I for details. One starts by generalizing our proof above for Z to show that in a PID, any irreducible is a prime. Existence: First note that any ascending chain of ideals is stationary, since the union of the ideals in the chain is again a principal ideal. Next, suppose that a is a nonzero element of D having no factorization into a product of primes. Then one can construct an infinite sequence of elements {an } in D with (a1 ) ( (a2 ) ( (a3 ) ( . . . , a contradiction. Uniqueness: Since primes and irreducibles are the same thing in a UFD we again have the lemma that says if p is irreducible and p|a1 · · · ak , then p|ai for some i. Thus we can repeat the proof we gave for Z.  Of course, Theorem 2.12.3 and Theorem 2.12.4 together yield another proof that any Euclidean Domain is a UFD, Theorem 2.12.2. One further example of a UFD’s is the following. Theorem 2.12.5. If D is a UFD then so is the polynomial ring D[x]. Proof. See Jacobson Basic Algebra I.



Example 2.12.2. Z[x] is a UFD although not a PID. The ideal < 5, x > is not principal. By induction one then gets that Z[x1 , x2 , . . . , xn ] is a UFD. 2.13. Gaussian Integers In your homework, you established that the ring of Gaussian Integers Z[i] was a Euclidean domain with respect to the mapping δ(a + bi) = a2 + b2 . Thus we have by Theorem 2.12.2 the following. Theorem 2.13.1. The Gaussian integers are a Unique Factorization Domain. Homework 2.13.1. Let p be a prime in Z. Show that p is reducible in Z[i] if and only if p is a sum of two squares, that is, p = a2 + b2 for some integers a, b. Make use of the fact that the mapping δ is multiplicative, that is, for any w, z ∈ Z[i] we have δ(wz) = δ(w)δ(z), or equivalently, |wz|2 = |w|2 |z|2 . (We will see later that such is the case if and only if p = 2 or p ≡ 1 (mod 4).)

2.14. THE SET OF PRIMES

25

√ Homework 2.13.2. Show that Z[ −6] is not a UFD, and give an example of an element having two different factorizations. Make use of the homework problem above showing that 2 is irreducible but not a prime. We√note that it is an open problem to determine the set of positive m such that Z[ m] is a UFD. For negative m, the answer is known. 2.14. The Set of Primes Theorem 2.14.1. There exist infinitely many primes in N. Proof. There are many proofs of this result, dating back to Euclid, who presented the following proof. Suppose that p1 , . . . , pk are the only primes. Consider the integer p1 p2 · · · pk + 1. It is not divisible by any of the pi and thus cannot be expressed as a product of primes, a contradiction. Here is a proof due to Euler: Suppose again that p1 , . . . , pk are the only primes. −1 Qk  is a finite value, but since every positive integer has a unique Then i=1 1 − p1i factorization into a product of primes we have −1 Y  X k  k  ∞ Y 1 1 1 1 1− 1+ , = + 2 + ··· = pi pi pi n n=1 i=1 i=1 but the latter sum diverges, a contradiction.



The next theorem is a stronger statement about the set of primes. Theorem 2.14.2. For any n ∈ N, X 1 > log log(n + 1) − 1. p p≤n p prime

Proof. First note that x2 x3 1 1 1 (2.6) − log(1 − x) = x + + + · · · = x + x2 ( + x + x2 + . . . ) < x + x2 2 3 2 3 4 for 0 ≤ x ≤ 21 . Also −1  Y  Y  1 1 1 1− = 1 + + 2 + ··· p p p p≤n p prime



(2.7)

p≤n p prime n X k=1

n

1 X ≥ k

k=1

Z k

k+1

1 dx = log(n + 1). x

Taking log of both sides one gets   X 1 − log 1 − > log log(n + 1), p p≤n p prime

and so by (2.6), X 1 X 1 > log log(n + 1) − . p p2

p≤n p prime

p≤n

The latter sum can be easily estimated to be less than 1.



26

2. DIVISIBILITY AND UNIQUE FACTORIZATION

In fact, a theorem of Merten states that X 1 lim − log log(n) = M, n→∞ p p≤n p prime

where M = .261497... is the Meissel-Merten constant. 2.14.1. Gaps between primes: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61... Theorem 2.14.3. There exist arbitrarily large gaps between consecutive primes. Equivalently, there exist arbitrarily large sequences of consecutive composite numbers. Proof. For any natural number n ≥ 2, n! + 2, n! + 3, . . . , n! + n are all composite.  2.14.2. Twin Primes. Twin primes are primes that are two units apart, such as (3, 5), (5, 7), (11, 13), (17, 19), (29, 31). Twin Prime Conjecture: There exist infinitely many twin primes. In 2013, Zhang [16] proved that there are infinitely many consecutive primes pn , pn+1 with pn+1 − pn less than a fixed constant that we will call the “gap size”. The gap size has been whittled away since Zhang’s proof first came out, as indicated in the following table. Date May 14, 2013 June 2, 2013 June 5, 2013 June 19, 2013 October 11, 2013 October 23, 2013 February 17, 2014 April 14, 2014

Author Gap size Zhang 700000000 Morrison 5000000 Sutherland 400000 Sutherland 50000 Engelsma 4400 Maynard 700 Clark and Jarvis 252 Tao and Nielsen 246

To prove the Twin Prime Conjecture one must reduce the gap size to 2. A More General Gap Conjecture: For any even number n there exist infinitely many consecutive primes pk , pk+1 with pk+1 − pk = n. How big do the gaps grow between consecutive primes? From 1 − 104 , the maximum gap is 36. From 104 − 106 the maximum gap is 114. From 106 − 1014 the maximum gap is 804, while from 1014 − 1018 the maximum gap is 1442. Thus we see that the gap size grows very slowly. In fact heuristic arguments and numerical evidence suggest the following conjecture to be true. Cramer’s Conjecture: Let pn denote the n-th prime. pn+1 − pn lim sup = 1. log2 (n) n→∞

2.14. THE SET OF PRIMES

27

2.14.3. Number of primes up to x. How many primes are there up to a given value x? Let π(x) denote this number. Gauss observed a striking similarity R x dt between the value of π(x) and the value of the logarithmic integral li(x) := 2 log t. x 103 104 105 106 107 108 109 1010

π(x) 168 1229 9592 78498 664579 5761455 50847534 455052512

R x dt li(x) = 2 log t 178 1246 9630 78628 664918 5762209 50849235 455055614

Integrating by parts, we see that x 2 li(x) = − + log x log 2

Z 2

x

dt . log2 t

Since the latter integral is smaller order of magnitude than logx x , we see that li(x) ∼ x x log x as x → ∞, that is, li(x) is asymptotic to log x . Recall, we say two functions are asymptotic as x → ∞ if the ratio approaches 1 as x → ∞. Thus, Gauss conjectured that π(x) ∼ logx x . This was finally proven in 1896 by Poussin and Hadamard, and is now called the Prime Number Theorem. Theorem 2.14.4. The Prime Number Theorem. π(x) ∼

x log x .

The proof of this theorem is beyond the scope of this class. The easiest approach to proving the theorem requires complex analysis. One might ask why it was reasonable for Gauss to consider the logarithmic integral in the estimation of π(x). Consider √ the following probabilistic argument. Pick a positive integer n at random from x to x. Lets estimate the probability √ P that n is a prime? Let p1 , . . . , pk be the primes up to x. For any prime pi ∈ {p1 , . . . , pk }, we let Pi denote the probability that n is not divisible by pi . Since one out of every pi numbers is divisible by pi we have Pi ≈ 1 − p1i . Now, n is a prime if and only if n is not divisible by any of the pi . Thus, assuming the events “divisible by pi ” are independent (which is not exactly the case), we have   k  Y Y  1 1 P ≈ 1− = 1− . pi p √ i=1 p< x

Now P −1 =

Y  √ p< x

1−

1 p

−1

√ 1 ≥ log( x) = log x, 2

by (2.7). It can also be shown in an elementary manner (see ) that P −1 √ ≤ 2 log x. Thus P ≈ log1 x , that is, the probability that a prime chosen between x and x is a prime is on the order of magnitude log1 x . To be precise, the probability density function for the distribution of primes is of order log1 x , and therefore the number of primes up to x is of order li(x).

28

2. DIVISIBILITY AND UNIQUE FACTORIZATION

2.14.4. Primes in Arithmetic Progressions. Euclid’s Theorem on the infinitude of set of primes generalizes to arithmetic progressions. Theorem 2.14.5. Dirichlet’s Theorem on Primes in Arithmetic Progressions. Let a, b be relatively prime positive integers. Then the arithmetic progression {a + kb : k ∈ Z} contains infinitely many primes. The proof goes beyond the scope of this course, although we shall see special cases of it. Homework 2.14.1. Show that the arithmetic progression 3 + 4Z contains infinitely many primes. 2.14.5. Goldbach Conjecture. Any even number larger than two is the sum of two primes. In 2013 Helfgott [6], [7] proved that every odd number greater than 5 can be expressed as a sum of three primes!

CHAPTER 3

Modular Arithmetic 3.1. Basic properties of congruences Definition 3.1.1. Let m be any positive integer. For a, b ∈ Z we say that a and b are congruent modulo m and write a ≡ b (mod m) if m|(a − b). Terminology: Let a ∈ Z. The smallest nonnegative integer that a is congruent to (mod m) is called the least residue of a (mod m). It is easy to see that the least residue of a (mod m) is just the remainder in dividing a by m. Lemma 3.1.1. Basic properties of congruences. (i) If a ≡ b (mod m) and c ≡ d (mod m) then a ± c ≡ b ± d (mod m). (ii) If a ≡ b (mod m) and c ≡ d (mod m) then ac ≡ bd (mod m). (iii) If a ≡ b (mod m), then for any positive integer n, an ≡ bn (mod m). (iv) If a ≡ b (mod m) and d|m then a ≡ b (mod d). Proof. (i) a ≡ b (mod m) ⇒ m|(a − b). c ≡ d (mod m) ⇒ m|(c − d). Thus, by a basic divisibility property, m|[(a − b) + (c − d], and so, m|[(a + c) − (b + d)], that is, a + c ≡ b + d (mod m). The same proof holds for a − c ≡ b − d (mod m). (ii) We’ll do this one in a different style. a ≡ b (mod m) ⇒ a = b + mk for some k ∈ Z. c ≡ d (mod m) ⇒ c = d + ml for some l ∈ Z. Thus ac = (b + mk)(d + ml) = bd + mkd + bml + mkml = bd + m(kd + bl + kml), and so ac ≡ bd (mod m). (iii) The proof is by induction on n. For n = 1 the statement is trivially true. Suppose the statement is true for n, and now consider n + 1. We have a ≡ b (mod m) by assumption, and an ≡ bn (mod m), by the induction hypothesis. Thus by property (ii), a · an ≡ b · bn (mod m), that is, an+1 ≡ bn+1 (mod m). (iv) We’ll leave as an exercise.  It is useful to think of these basic properties as being substitution laws for congruences, for they tell us that in doing modular arithmetic we may replace an integer with any integer congruent to it. Theorem 3.1.1. Let f (x) = cn xn + · · · + c1 x + c0 be a polynomial with integer coefficients. Then for any m > 0 and integers a, b with a ≡ b (mod m), we have f (a) ≡ f (b) (mod m). Proof. First note that by Theorem 3.2.1 (ii), ck ak ≡ ck bk (mod m) for k = 0, 1, ..., n. Then by property (i), f (a) ≡ f (b) (mod m).  Example 3.1.1. i) Find 4750 (mod 5). First note that 47 ≡ 2 (mod 5), then compute 21 , 22 , 23 , · · · = 2, 4, 3, 1, . . . to see that 24 ≡ 1 (mod 5). Thus 4750 ≡ 250 ≡ (24 )12 22 ≡ 22 ≡ 4 (mod 5). 29

30

3. MODULAR ARITHMETIC

ii) Find 2100 (mod 7). This time we note that 23 ≡ 8 ≡ 1 (mod 7) and so 2 ≡ (23 )33 2 ≡ 2 (mod 7). iii) Find 2100 (mod 17). This time we observe that 24 ≡ −1 (mod 17) and so 100 2 ≡ (24 )25 ≡ (−1)25 ≡ −1 ≡ 16 (mod 17). 100

Example 3.1.2. What is the remainder in dividing 21000 by 7? Since 23 ≡ 1 (mod 7), we have 2999 = (23 )333 ≡ 1 (mod 7). Thus 21000 ≡ 2 (mod 7), and so the remainder is 2. Homework 3.1.1. Divisibility criterion. Let n be a number with base ten representation n = ak 10k + · · · + a1 10 + a0 . Show that (i) 9|n if and only if 9|(ak + · · · + a1 + a0 ). (ii) 11|n if and only if 11|(ak − ak−1 + ak−2 − · · · + (−1)k a0 ). 3.2. The ring of integers (mod m), Zm It is easy to see that congruence (mod m) is an equivalence relation on Z. Theorem 3.2.1. Congruence (mod m) is an equivalence relation, that is, it satisfies the following three properties. (i) Reflexive: For any a ∈ Z, a ≡ a (mod m). (ii) Symmetric: If a ≡ b (mod m) then b ≡ a (mod m). (iii) Transitive: If a ≡ b (mod m) and b ≡ c (mod m), then a ≡ c (mod m). Proof. We’ll be brief. The reader can fill in details. (i) m|0. (ii) If m|(a − b) then m|(b − a). (iii) If m|(a − b) and m|(b − c) then by a basic divisibility property m|(a − b) + (b − c), that is m|a − c.  Thus congruence (mod m) partitions Z into equivalence classes of the form [a]m = a + (m) = {a + km : k ∈ Z}, called residue classes or congruence classes (mod m). Each residue class (mod m) is a double arithmetic progression of integers, extending to infinity in both the positive and negative directions. Definition 3.2.1. The ring of integers (mod m) is the set Zm of congruence classes (mod m), Zm := {[0]m , [1]m , [2]m , . . . , [m − 1]m }, together with addition and multiplication laws defined by (3.1)

[a]m + [b]m = [a + b]m ,

[a]m [b]m = [ab]m .

Note 3.2.1. i) By Theorem 3.2.1, addition and multiplication are well defined on Zm . ii) Zm is a ring with respect to the given addition and multiplication laws with zero element [0]m , and unity element [1]m . In fact Zm is a commutative ring with unity. We leave it as an exercise for the reader to verify the properties of a ring. They follow easily from the corresponding axioms for Z. iii) a ≡ b (mod m) ⇔ [a]m = [b]m in Zm . Thus in doing modular arithmetic their are two points of view one may take. One can either think in terms of congruences involving integers, or in terms of working in the ring Zm .

3.4. MULTIPLICATIVE INVERSES AND CANCELATION LAWS

31

Simplified Notation for Zm : If it is understood that we are working in a particular ring Zm , we may replace the cumbersome notation [a]m with the simpler notation a, or with just the representative a. In the first case we would write Zm = {0, 1, . . . , m − 1}, and in the second case, Zm = {0, 1, 2, . . . , m − 1}. Example 3.2.1. Make an addition table and multiplication table for Z4 using the simplified notation Z4 = {0, 1, 2, 3}. + 0 1 2 3 · 0 1 2 3 0 0 1 2 3 0 0 0 0 0 1 1 2 3 0 1 0 1 2 3 2 2 3 0 1 2 0 2 0 2 3 3 0 1 2 3 0 3 2 1 Another point of view we may take is that Z4 is the set of integers {0, 1, 2, 3}, together with a new addition and multiplication law given explicitly by the tables above. If we had started with this definition of Z4 , we would then be left with the cumbersome task of proving that all of the properties of a ring are satisfied. 3.3. Congruences in general rings The notion of congruence (mod m) generalizes in a natural way to congruence modulo an ideal in any ring. Definition 3.3.1. Let R be a ring and I be an ideal in R. (i) Two elements a, b ∈ R are said to be congruent modulo I if a − b ∈ I. (ii) The quotient ring of R with respect to I, denoted R/I is the set of cosets of the form a = a + I, a ∈ R, together with addition and multiplication laws given by Addition: a + b = a + b, for any a, b ∈ R. Multiplication: ab = ab, for any a, b ∈ R. Note 3.3.1. i) The addition and multiplication laws for R/I are well defined; follows from defining properties of ideals. ii) For any a, b ∈ R, we have a = b if and only if a − b ∈ I. In Z, as we noted above, all ideals are principal, that is of the form (m) for some integer m. In this case Zm = Z/(m), and congruence modulo m is the same thing as congruence modulo the ideal (m). 3.4. Multiplicative inverses and Cancelation Laws Definition 3.4.1. i) Let a, m ∈ Z, m > 0. An integer b is called a multiplicative inverse of a (mod m), denoted a−1 (mod m), if ab ≡ 1 (mod m). ii) An element a ¯ ∈ Zm is called a unit if it has a multiplicative inverse, that is, there exists a ¯b ∈ Zm with a ¯¯b = ¯1. Note 3.4.1. i) The two concepts of multiplicative inverse are equivalent, that is, a has a multiplicative inverse (mod m) if and only if a ¯ is a unit in Zm . ii) Multiplicative inverses (in both senses) are unique. iii) Not all elements have multiplicative inverses. Lemma 3.4.1. Let a, m ∈ Z with m > 0. Then a has a multiplicative inverse (mod m) if and only if (a, m) = 1. Equivalently, a is a unit in Zm if and only if (a, m) = 1.

32

3. MODULAR ARITHMETIC

Proof. Suppose that (a, m) = 1. Then by the GCDLC Theorem there exist integers x, y such that ax + my = 1. Thus, ax ≡ 1 (mod m) and so x is a multiplicative inverse of a (mod m). Conversely, suppose that a has a multiplicative inverse x (mod m). Then ax = 1 + km for some k ∈ Z, and so ax ≡ 1 (mod m). Therefore a has a multiplicative inverse.  Lemma 3.4.2. Cancelation Laws. Let a, x, y be integers with ax ≡ ay (mod m). i) If a and m are relatively prime, then x ≡ y (mod m). ii) More generally, if d = (a, m), then x ≡ y (mod m/d). Proof. (i) If (a, m) = 1 then a has a multiplicative inverse a−1 (mod m), and so multiplying both sides of the given congruence by a−1 gives the result. (ii) Suppose that ax ≡ ay (mod m), that is, ax = ay + km for some k ∈ Z. Then a m a a m a m a d x = d y + k d , that is, d x ≡ d y (mod d ). Since ( d , d ) = 1 we can apply (i) to get the desired conclusion.  3.5. The Group of units (mod m) and the Euler phi-function Definition 3.5.1. For any positive integer m, we let G(m) denote the set of units in Zm . It is easy to see that G(m) is a multiplicative group. In fact, we have the following, the proof of which is an exercise for the reader. Lemma 3.5.1. In any commutative ring with unity, the set of units is an abelian group under multiplication. Our goal is to describe the group structure of G(m). Definition 3.5.2. For any positive integer m, we define φ(m) to be the number of positive integers less than or equal to m that are relatively prime to m. We call φ the Euler phi function or Euler’s totient function. Lemma 3.5.2. For any positive integer m, the group of units G(m) is an abelian group of order φ(m). Proof. We have Z/(m) = {0, . . . , m − 1}. Now, by Lemma 3.4.1 exactly φ(m) of these elements are units.  Lemma 3.5.3. If p is a prime, then every nonzero element of Zp is a unit. Thus Zp is a finite field with p elements. Proof. If p is a prime, then every nonzero element is a unit since (a, p) = 1 for 1 ≤ a ≤ p − 1.  Definition 3.5.3. (i) A set of integers {a1 , . . . , am } is called a complete residue system (mod m) if the values are distinct (mod m), that is, Zm = {a1 , . . . , am }. (ii) A set of integers {a1 , . . . , aφ(m) } is called a reduced residue system (mod m) if G(m) = {a1 , . . . , aφ(m) }.

3.7. FERMAT’S LITTLE THEOREM, EULER’S THEOREM AND WILSON’S THEOREM

33

3.6. A few results from Group Theory Definition 3.6.1. (i) Let (G, ·) be a group and let g ∈ G. Then the order of g denoted ord(g) = ordG (g), is the smallest positive integer n such that g n = 1, (if such an n exists). (ii) In the group G(m) we write ordm (a) for the order of an element (mod m). Thus if (a, m) = 1, then ordm (a) is the minimal positive integer n such that an ≡ 1 (mod m). (iii) G is called a cyclic group if there exists an element g ∈ G such that G =< g >= {g n : n ∈ Z}. Notation: |H| denotes the cardinality of the set H. It is also called the order of the set H. Suppose G is a finite group and that H is a subgroup of G. Then we can write G = Hx1 ∪ Hx2 · · · ∪ Hxk , for some cosets Hxi of H in G. Moreover, the cosets are disjoint from one another. [G : H], called the index of H in G denotes the number of cosets of H in G. Thus |G| = |Hx1 | + |Hx2 | + · · · + |Hxk | = |H|[G : H], and we have established the following theorem. Theorem 3.6.1. Lagrange. If G is a finite group and H is a subgroup of G then |H|||G|. Indeed, we have |G| = |H|[G : H]. Theorem 3.6.2. i) If G is a finite group and g ∈ G then g has finite order and ordG (g)||G|. ii) If g is an element of finite order in an arbitrary group G and m ∈ Z, then g m = 1 if and only if ordG (g)|m. iii) If G is a finite group and g ∈ G then g |G| = 1. Proof. i) Let H :=< g >= {e, g, g 2 , g 3 , . . . }. Since H ⊆ G, H is a finite set and therefore g j = g k for some j < k. Thus g k−j = e and so g has finite order. Let k = ordG (g). Then it is easy to see that H = {e, g, g 2 , . . . , g k−1 } and thus |H| = k. By Lagrange’s Theorem it follows that k divides |G|. ii) Let k = ord(g) and suppose that m is such that g m = 1. By the division algorithm there exist integers q, r such that m = qk + r and 0 ≤ r < k. Since g m = g k = e it follows that g r = 1. Since r < k, then by the minimality of k, r = 0, and therefore k|m. The other direction is trivial. iii) Part iii) follows immediately from i) and ii).  3.7. Fermat’s Little Theorem, Euler’s Theorem and Wilson’s Theorem Theorem 3.7.1. A few classical results in Number Theory. i) Fermat’s Little Theorem: For any prime p and integer a with (a, p) = 1, ap−1 ≡ 1 (mod p). ii) Euler’s Theorem: For any modulus m and integer a with (a, m) = 1, aφ(m) ≡ 1 (mod m). (ii) Wilson’s Theorem: For any prime p, (p − 1)! ≡ −1 (mod p). Proof. Note that i) is just a special case of ii). ii) G(m) is a group of order φ(m) and so by Theorem 3.6.2 (iii) , for any integer a with (a, m) = 1, aφ(m) = 1, that is aφ(m) ≡ 1 (mod m).

34

3. MODULAR ARITHMETIC

(iii) We first note that the only elements of G(p) which are inverses of themselves are 1 and −1. (why? x ≡ x−1 (mod p) ⇔ p|(x2 − 1) ⇔ p|(x − 1) or p|(x + 1) ≡ x = ±1.) Thus pairing each element of G(p) with its multiplicative inverses, we see that the product of the elements of G(p) is −1.  3.7.1. Useful identities with prime moduli. Lemma 3.7.1. Useful identities for prime moduli. Let p be a prime. i) If x, y are variable symbols then (x + y)p ≡ xp + y p (mod p), that is, all of the corresponding coefficients are congruent (mod p). ii) For any integers a1 , . . . , an we have (a1 + a2 + · · · + an )p ≡ ap1 + ap2 + · · · + apn

(mod p).

iii) If a ≡ b (mod p), then for any positive integer k, k

ap ≡ bp

k

(mod pk+1 ).

The proof of the lemma hinges on the following fact about binomial coefficients. Lemma 3.7.2. If p is a prime and k is a positive integer with 1 ≤ k < p, then .      p−1 p−1 , that is, p k−1 = k kp . Thus p|k kp , but Proof. We have kp = kp k−1  plainly p - k, and so p| kp .  p|

p k

Proof of Lemma 3.7.1. i) This is an immediate consequence of the binomial expansion formula and the preceding lemma. ii) Part ii) follows by induction from part i). iii) The proof is by induction on k the case k = 0 being trivial. Suppose the k k statement has been established for a given k. Then ap = bp + tpk+1 for some integer t. Raising both sides to the p-th power yields     k+1 k p pk (p−1) k+1 p pk (p−2) 2 2k+2 pk+1 pk+1 a =b + b tp + b t p +· · · = bp +bp (p−1) tpk+2 +· · · . 1 2 Plainly pk+2 divides every term on the righthand side except for the first term, and k+1 k+1 so we get ap ≡ bp (mod pk+2 ).  Note 3.7.1. Part ii) of the lemma yields another proof of Fermat’s Little Theorem. Indeed setting ai = 1, 1 ≤ i ≤ n, we see that (1+1+· · ·+1)p ≡ 1p +1p +· · ·+1p (mod p), that is, np ≡ n (mod p). 3.8. Chinese Remainder Theorem We start by recalling a couple notions from ring theory. Definition 3.8.1. Let R, S be rings. (i) A mapping η : R → S is a ring homomorphism if η(ab) = η(a)η(b), η(a + b) = η(a) + η(b), for all a, b ∈ R. (ii) If in addition η is 1-to-1 and onto it is called an isomorphism, and the rings R, S are called isomorphic, written R ' S. (iii) The kernel of η is given by ker(η) = {x ∈ R : η(x) = 0}. Note 3.8.1. ker(η) is an ideal in R, and thus we can form the quotient ring R/ker(η).

3.8. CHINESE REMAINDER THEOREM

35

Theorem 3.8.1. First Isomorphism Theorem If η : R → S is a ring homomorphism, then R/ker(η) ' η(R). We will state two versions of the Chinese Remainder Theorem, the first an algebraic version, and the second, the classical version. Theorem 3.8.2. Chinese Remainder Theorem. Algebraic Version. Let m1 , . . . , mk be pairwise relatively prime integers (that is (mi , mj ) = 1 for all i 6= j), and let m = m1 m2 · · · mk . Then we have the ring isomorphism, Zm ' Zm1 × · · · × Zmk . Proof. Let η : Z → Zm1 × · · · × Zmk , be defined by η(n) = ([n]m1 , [n]m2 , . . . , [n]mk ). Then by Lemma 2.10.1, we plainly have ker(η) = (m), and so by the First Isomorphism Theorem, Z/(m) ' η(Z). In particular |η(Z)| = m = |Zm1 × · · · × Zmk |, and thus η is an onto mapping.  The onto property of η means the following: Given any integers ai , 1 ≤ i ≤ k, there exists an integer n such that η(n) = ([a1 ]m1 , . . . , [ak ]mk ), that is, [n]mi = [ai ]mi , 1 ≤ i ≤ k. In other words, n ≡ ai (mod mi ), 1 ≤ i ≤ k. This is the content of the classical Chinese Remainder Theorem. Theorem 3.8.3. Chinese Remainder Theorem. Classical Version. Suppose that m1 , . . . , mk are pairwise relatively prime integers, and that a1 , . . . , ak are arbitrary integers. Then there exists an integer n such that n ≡ ai

(3.2)

(mod mi ),

1 ≤ i < k,

that is, ai is the remainder on dividing n by mi . Moreover, the set of all solutions of (3.2) is a single residue class (mod m1 · · · mk ). Example 3.8.1. Find all integers x such that x≡1

(mod 6),

x≡2

(mod 37).

Say x = 2 + 37t. Then 2 + 37t ≡ 1 (mod 6) ⇔ 2 + t ≡ 1 (mod 6) ⇔ t ≡ −1 (mod 6) . Thus x ≡ 187 (mod 222). From a computational point of view, when solving the system of congruences (3.2), it is better to start with the largest modulus (as we did in the previous example) and then substitute this into the next largest modulus. We illustrate this again in the next example. Example 3.8.2. Solve. x ≡ 5 (mod 11), x ≡ 2 (mod 35), x ≡ 1 (mod 3). The largest modulus is 35, so we write x = 2 + 35t and substitute into the second largest modulus to get 2 + 35t ≡ 5 (mod 11), that is, 2t ≡ 3 (mod 11), t ≡ 7 (mod 11). Thus x ≡ 2 + 35 · 7 ≡ 247 (mod 385). Finally, writing x = 247 + 385s we get 247 + 385s ≡ 1 (mod 3), s ≡ 0 (mod 3), and x ≡ 247 (mod 1155). Theorem 3.8.4. Structure Theorem for G(m) (Part I). Let m be a positive integer with prime factorization m = pe11 · · · pekk . Then G(m) ' G(pe11 ) × G(pe22 ) × · · · × G(pekk ), as multiplicative groups. We will see a second part to this structure theorem in Section ??, Theorem ??.

36

3. MODULAR ARITHMETIC

Proof. By the Chinese Remainder Theorem we have Zm ' Zpe11 × · · · × Zpek . k The result now follows from two elementary results from algebra. i) If R1 ' R2 as rings and U1 , U2 are the groups of units in R1 and R2 then U1 ' U2 as groups. ii) If R1 , . . . , Rk are rings with groups of units U1 , . . . , Uk , then the group of units in the cartesian product R1 × · · · × Rk is just U1 × · · · × Uk . We leave the proofs of these facts to the reader.  Corollary 3.8.1. Properties of the Euler Phi Function. i) φ is multiplicative, that is, if a, b are positive integers with (a, b) = 1, then φ(ab) = φ(a)φ(b).  Qk  Qk Qk ii) If m = i=1 pei i then φ(m) = i=1 (pei i − piei −1 ) = m i=1 1 − p1i . Proof. i) By the Chinese Remainder Theorem we have Zab ' Za × Zb , and thus G(ab) ' G(a) × G(b), as in the proof of the previous theorem. In particular, |G(ab)| = |G(a)||G(b)|, that is, φ(ab) = φ(a)φ(b). ii) For any prime power pe it is plain that φ(pe ) = pe −pe−1 since the only values not relatively prime to pe are the pe−1 multiplies of p. Since φ is multiplicative we have k k Y Y φ(m) = φ(pei i ) = (pei i − piei −1 ). i=1

i=1

We simply factor out the quantity pei i to obtain the final identity.



3.9. Group of units modulo a prime, G(p) In this section we will prove that the group of units G(p) is a cyclic group for any prime p. We use the notation Fp = Zp , in order to emphasize that Zp is a finite field. We saw earlier that the polynomial ring Fp [x] is a Unique Factorization Domain, indeed, it is a Euclidean domain. The following lemma is valid for factoring polynomials over any field. Lemma 3.9.1. The Factor Theorem. Let f (x) ∈ Fp [x], a ∈ Fp . Then f (a) = 0 if and only if (x − a)|f (x). Proof. If (x − a)|f (x) then trivially f (a) = 0. Conversely, suppose that f (a) = 0. By the division algorithm there exist polynomials q(x), r(x) ∈ Fp [x] such that f (x) = q(x)(x − a) + r(x), with r(x) = 0 or deg r(x) < deg(x − a). In either case we see that r(x) is a constant polynomial, say r(x) = r0 , and we have f (x) = q(x)(x − a) + r0 . Inserting x = a yields r0 = 0 and consequently f (x) = q(x)(x − a), that is (x − a)|f (x).  Theorem 3.9.1. For any prime p the polynomial xp − x factors over the finite field Fp in the manner, xp − x = x(x − 1)(x − 2)(x − 3) · · · (x − (p − 1)). Proof. By Fermat’s Little Theorem, each of the values 0, 1, . . . , p − 1 is a zero of xp −x. Thus, by the factor theorem, x, (x−1), . . . , (x−(p−1)) are each factors of xp − x. Moreover, they are distinct irreducible factors and so x(x1 ) · · · (x − (p − 1)) is a divisor of xp − x. But this product is a monic polynomial of degree p, and so it must in fact equal xp − x. 

3.9. GROUP OF UNITS MODULO A PRIME, G(p)

37

Note 3.9.1. Matching the x coefficients of the two sides of the identity in Theorem 3.9.1 yields another proof of Wilson’s Theorem. Lemma 3.9.2. If p is a prime and d a positive integer with d|(p − 1), then the polynomial xd − 1 has d distinct zeros in Fp . Proof. Say de = p − 1 for some integer e, and write x(p−1) − 1 = xde − 1 = (xd )e − 1 = (xd − 1)g(x) for some polynomial g(x) over Fp . Thus we see that (xd − 1)|(xp−1 − 1), and so by Theorem 3.9.1 (xd − 1) is a product of d distinct linear factors over Fp . Therefore xd − 1 has d distinct zeros in Fp .  Lets recall another elementary result about groups. Lemma 3.9.3. i) If a is an element of finite order n in a group G then ordG (aj ) = n/(n, j). ii) If a, b are elements of an abelian group G of orders m, n respectively with (m, n) = 1 then ordG (ab) = mn. Proof. i) Let e denote the identity in G and a ∈ G with ord(a) = n. Then (aj )k = e ⇔ ajk = 1 ⇔ n|jk ⇔

n (n,j) |k,

and so the minimal such k is n/(n, j). ii) Since < a > ∩ < b > is a subgroup of < a >, its order divides m, and since it is also a subgroup of < b >, its order divides n. Since (m, n) = 1 it follows that the order is 1, that is, < a > ∩ < b >= {e}. Now, since G is abelian, (ab)k = e implies that ak = b−k , and so ak , bk ∈< a > ∩ < b >. Thus ak = bk = e and so m|k and n|k. Since (m, n) = 1 it follows that mn|k. Conversely, it is easy to see that (ab)mn = amn bmn = e. Thus mn is the minimal exponent k satisfying (ab)k = e.  Corollary 3.9.1. If Cn is a cyclic group of order n, then Cn has φ(n) generators. Proof. Say Cn =< a > with ord(a) = n. Then for 0 ≤ j < n we have ord(aj ) = n/(n, j) and so ord(aj ) = n iff (n, j) = 1. Thus the number of choices for j is φ(n).  Theorem 3.9.2. For any prime p, the group of units G(p) is a cyclic group. Proof. We may assume p is odd. Say p − 1 = pe11 . . . pekk is the prime factorization of p − 1. We will obtain an element of order p − 1 in G(p). Fix i, with 1 ≤ i ≤ k. Let ei Si = {x ∈ G(p) : xpi = 1}. By Lemma 3.9.2, |Si | = pei i . Now every element in Si is of order pji for some j ≤ ei and there are just pei i −1 elements of order less than pei i , since any such element is a ei −1 zero of the polynomial xpi − 1. Thus Si contains an element ai of order pei i . By Lemma 3.9.3 it follows that ord(a1 · · · ak ) = (p − 1).  Note 3.9.2. The same proof shows in fact that any finite multiplicative subgroup of a field is a cyclic group.

38

3. MODULAR ARITHMETIC

Definition 3.9.1. If m is a positive integer such that G(m) is cyclic, and a is a generator of G(m), then a is called a primitive root (mod m). Note 3.9.3. If G(m) is cyclic, then there exist φ(φ(m)) distinct primitive roots (mod m), by Corollary 3.9.1. What is the smallest primitive root for a given prime? This is a famous open problem in number theory. It is known that there is always a primitive root  p1/4+ ; numerical evidence suggests that one always exists of size  log3 (p). Artin’s Conjecture: Given a positive integer a (not a perfect square), a is a primitive root for infinitely many primes p. Although the conjecture is still not proved, Heath-Brown (1985) established that if q, r, s are any three multiplicatively independent integers (q e rf sg = 1 ⇒ e = f = g = 0) and none of q, r, s, −3qr, −3qs, −3rs, qrs is a square, then Artin’s conjecture holds for one of the numbers q, r, s. In particular, if q, r, s are distinct primes then one of them will be a primitive root for infinitely many primes. Artin’s conjecture is known to be true under the assumption of the Grand Riemann Hypothesis. There is a stronger conjecture asserting that if a 6= −1 is an integer with a = bc2 with b square-free and a 6≡ −1 (mod 4), then the fraction of primes p for which a  Q 1 is a primitive root (mod p), is p prime 1 − p(p−1) = .3739..., the average value of φ(p − 1)/(p − 1) (the latter ratio being the fraction of primitive roots (mod p).) Homework 3.9.1. For any prime p 6= 2, 5 show that ordp (10) is the length of the repeating cycle of the decimal expansion of 1/p. 3.10. Group of units G(pe ) Theorem 3.10.1. i) For any odd prime p and positive integer e, G(pe ) and G(2pe ) are cyclic groups. ii) G(2e ) is cyclic if and only if e = 1, 2. If e ≥ 3 then G(2e ) '< −1 > × < 5 >. Proof. i) We already know G(p) is cyclic. Let a be a primitive root (mod p). Say (3.3)

a(p−1) = 1 + kp,

for some k ∈ Z. If p|k then a cannot be a primitive root (mod p2 ), for in this case ordp2 (a) ≤ p − 1. However, replacing a by a + p yields a new primitive root (mod p) with associated k-value not divisible by p, as we demonstrate:     p−1 p − 2 2 p−3 p−1 p−1 p−2 (a + p) =a + pa + p a + ... 1 2 = 1 + p[k + (p − 1)ap−2 + p(stuff)] = 1 + pk 0 , with (p, k 0 ) = 1. Claim: If p - k in (3.3) then a is a primitive root (mod pe ) for all e ∈ N. We shall prove by induction that for any j ≥ 1 (3.4)

a(p−1)p

j−1

= 1 + pj kj

3.11. GROUP OF UNITS G(m) FOR ARBITRARY m

39

for some kj ∈ Z with (kj , p) = 1. By assumption (3.4) holds when j = 1. Suppose the statement is true for a given j. Then   p (p−1)pj−1 p j p j+1 (a ) = (1 + p kj ) = 1 + p kj + (pj kj )2 + . . . 2 = 1 + pj+1 (kj + p(stuff)) = 1 + pj+1 kj+1 , with (kj+1 , p) = 1. This establishes (3.4). Let e be any positive integer, and put t = ordpe (a). Then at ≡ 1 (mod p) and so (p − 1)|t. On the other hand t||G(pe )| = φ(pe ) = (p − 1)pe−1 , and so we can write t = (p − 1)pr for some nonnegative integer r ≤ e − 1. We wish to show r = e − 1. Now by (3.4) 1 + pr+1 k = ap

r

(p−1)

≡1

(mod pe ),

for some integer k with (k, p) = 1. Thus, pe |pr+1 , that is, e ≤ r + 1. Therefore r = e − 1. Finally we observe that for any odd p, G(2pe ) ' G(2) × G(pe ) ' G(pe ), and thus G(2pe ) is also cyclic. ii) Suppose that e ≥ 3. We show first that ord2e (5) = 2e−2 . This follows from the claim that for n ≥ 2 n−2 52 = 1 + kn 2n , for some odd integer kn , which can be established by induction in the same manner e−2 as (3.4). The claim implies that for a given e ≥ 3 , 52 ≡ 1 (mod 2e ) but e−3 52 6≡ 1 (mod 2e ), and thus ord2e (5) = 2e−2 . Note φ(2e ) = 2e−1 . We claim e−2 that G(2e ) = {±1, ±5, . . . , ±52 −1 }. It suffices to show that these elements are distinct, but this is immediate since (mod 4) it is clear that no positive element in this set is congruent to a negative element. In particular, G(2e ) is not cyclic since every element has order ≤ 2e−2 .  Example 3.10.1. Find a primitive root (mod 625). Start with 5. Clearly 2 is a primitive element (mod 5). Now 24 = 16 = 3 · 5 + 1. Since 5 - 3 we see that 2 is a primitive element (mod 5e ) for any e. In particular G(625) =< 2 >, a cyclic group of order φ(625) = 500. Example 3.10.2. G(8) = {±1, ±5} ' K4 , the Klein-4 group. Homework 3.10.1. Find a primitive root of (mod 3500 )) and (mod 98). 3.11. Group of units G(m) for arbitrary m Lemma 3.11.1. A direct product G1 × · · · × Gk of finite groups is cyclic if and only if each group Gi is cyclic and (|Gi |, |Gj |) = 1 for all i 6= j. Proof. Let G = G1 × · · · × Gk and x = (x1 , . . . , xk ) ∈ G. Then ordG (x) = [ordG1 (x1 ), . . . , ordGk (xk )]. (Why? xn = 1 iff xni = 1 for all i iff ordGi (xi )|n for all i.) Now ordG (x) = [ordG1 (x1 ), . . . , ordGk (xk )] ≤∗ Πki=1 ordGi (xi ) ≤∗∗ Πki=1 |Gi | = |G|, with strict inequality in (*) unless (ordGi (xi ), ordGj (xj )) = 1 for i 6= j, and strict inequality in (**) unless ordGi (xi ) = |Gi | for all i. If G is cyclic then there is an x ∈ G such that equality holds in both (*) and (**), whence each Gi is cyclic and the orders are relatively prime. Conversely, suppose that each Gi is cyclic with generator xi and that the orders are pairwise relatively prime. Then we have

40

3. MODULAR ARITHMETIC

equality in both (*) and (**) and so x = (x1 , . . . , xk ) is a generator element for G.  Theorem 3.11.1. Structure Theorem for G(m). Let m = 2e pe11 . . . pekk . Then G(m) ' G(2e ) × G(pe11 ) × · · · × G(pekk ), and G(m) is cyclic if and only if m = 1, 2, 4, pe or 2pe for some odd prime p and positive integer e. Proof. We established the isomorphism in Theorem 3.8.4. By Lemma 3.11.1, G(m) is cyclic if and only if G(2e ), G(pe11 ), . . . , G(pekk ) are all cyclic and their orders are relatively prime. Thus we must have e = 0, 1 or 2. For odd pi we have already seen that G(pei i ) is cyclic. Now |G(4)| = 2, |G(2)| = 1, and for any odd pi , |G(pei i | = pei i −1 (pi − 1). The latter value is always even. Thus, in order for the orders to be relatively prime we must have either no odd prime or exactly one odd prime (k = 1) together with e = 0 or 1.  Example 3.11.1. G(7000) ' G(23 ) × G(53 ) × G(7) ' C2 × C2 × C100 × C6 ' C2 × C2 × C4 × C25 × C2 × C3 ' C2 × C2 × C2 × C3 × C4 × C25 . Theorem 3.11.2. Let G be a cyclic group of order n. Then for any positive divisor d of n there exist φ(d) elements in G of order d. In particular, the group G has φ(n) generators. Proof. Say G =< a >. By Lemma 2.18 ord(aj ) = d iff n/(n, j) = d iff (n, j) = n/d. Thus j = nd ` with 1 ≤ ` ≤ d, (`, d) = 1. Hence there are φ(d) choices for j.  Corollary 3.11.1. If G(m) is cyclic then G(m) has φ(φ(m)) generators (primitive roots). P Theorem 3.11.3. For any positive integer n, d|n φ(d) = n. Proof. Let G be a cyclic group of order n. For d|n put Sd := P{x ∈ G : ord(x) = d}. Then G is a disjoint union of the Sd and so |G| = d|n |Sd | = P φ(d).  d|n

CHAPTER 4

Polynomial Congruences Let f (x) be a polynomial with integer coefficients, and m be a positive integer. We wish to solve the congruence (4.1)

f (x) ≡ 0

(mod m).

We first look at linear congruences, then consider the power congruence xn ≡ a (mod m), and finally deal with a general polynomial. 4.1. Linear Congruences Theorem 4.1.1. For any integers a, b, m, m > 0, the linear congruence ax ≡ b (mod m) has a solution if and only if (a, m)|b in which case there are (a, m) distinct solutions (mod m). Proof. Set d = (a, m). The congruence ax ≡ b (mod m) is solvable if and only if there exist x, y ∈ Z, with ax − my = b. We saw earlier that this linear equation is solvable if and only if d|b in which case the general solution is of the a m form x = x0 + t m d , y = y0 + t d , for some x0 , y0 , t ∈ Z. Thus x ≡ x0 + t d (mod m), 0 ≤ t < d − 1.  Homework 4.1.1. Verify that 65x ≡ 85 (mod 105) is solvable using the theorem above, and then solve it two ways. i) Use the array method to solve the associated linear equation. ii) Use the Chinese Remainder Theorem. 4.2. Power Congruences, xn ≡ a (mod m) We start with the quadratic congruence x2 ≡ a (mod p), with p a prime. The congruence is solvable if and only if a is a square (mod p). Plainly, if a is a square, say a ≡ α2 (mod p), then the complete solution set is x ≡ ±α (mod p). Euler’s Criterion gives us a test for determining when a given a is a square (mod p). Theorem 4.2.1. Euler’s Criterion. If p is an odd prime with p - a, then a is a p−1 square (mod p) if and only if a 2 ≡ 1 (mod p). Proof. This theorem is just a special case of Theorem 4.2.2 below, but we’ll give a proof here that doesn’t require knowing that G(p) is a cyclic group. We p−1 first note that for any a not divisible by p, a 2 ≡ ±1 (mod p), since the square of this value is 1 (mod p) by Fermat’s Little Theorem, and the only solutions of the congruence x2 ≡ 1 (mod p) are x ≡ ±1 (mod p). If a is a square (mod p), then there exists an x ∈ Z with x2 ≡ a (mod p), and by Fermat’s Little Theorem we p−1 get a 2 ≡ 1 (mod p). Since there are p−1 2 squares (mod p) and the congruence 41

42

4. POLYNOMIAL CONGRUENCES

x(p−1)/2 ≡ 1 (mod p) has at most are precisely the squares.

p−1 2

solutions, the solutions to this congruence 

Corollary 4.2.1. If p is an odd prime, then −1 is a square (mod p) if and only if p ≡ 1 (mod 4). Proof. Immediate.



Euler’s Criterion is just a special case of the following more general criterion for an element of a cyclic group to be an n-th power. Theorem 4.2.2. Let G be a cyclic group of order n, k ∈ N and a ∈ G. Then a is a k-th power if and only if an/(k,n) = 1, (where 1 is the identity element). If the latter holds, then the equation xk = a has (k, n) solutions in G. The theorem is an easy consequence of the following lemma. Lemma 4.2.1. Let G be a cyclic group on order n (with identity 1) and H a subgroup of G of order d. Let x ∈ G. Then x ∈ H if and only if xd = 1, that is, if and only if ordG (x)|d. Proof. Say G =< g >. Then H =< g n/d >. Let x ∈ G, say x = g j . Then  x = 1 iff (g j )d = 1 iff ord(g)|jd iff n|jd iff nd |j iff g j ∈ H. d

Proof of Theorem 4.2.2. Let H be the subgroup of k-th powers in G. Then |H| = n/(k, n), and so the first part of the theorem follows from the preceding lemma. Next, consider the homomorphism x → xk on G. The image is H and thus the mapping is a (k, n) to one mapping.  Corollary 4.2.2. General Euler Criterion. Suppose that m is a positive integer such that G(m) is cyclic, and that n is any positive integer. Let a be any integer relatively prime to m. Then the congruence xn ≡ a

(4.2)

(mod m)

has a solution if and only if (4.3)

aφ(m)/(φ(m),n) ≡ 1

(mod m).

If a solution exists then there are exactly (φ(m), n) solutions modulo m. Proof. We simply apply the previous theorem to the group G(m).



Example 4.2.1. If (n, φ(m)) = 1, then every a ∈ G(m) is an n-th power. Indeed the mapping x → xn on G(m) is a one-to-one mapping in this case. Definition 4.2.1. Let a, m ∈ Z with (a, m) = 1. Then a is called a quadratic residue (mod m) if there exists an x ∈ Z such that x2 ≡ a (mod m). If no such x exists then a is called a quadratic non-residue (mod m). Note 4.2.1. 1. If p is a prime and a is a quadratic residue (mod p) then the congruence x2 ≡ a (mod p) has exactly 2 solutions. 2. If m has k distinct odd prime factors and a is a quadratic residue (mod m), then the congruence x2 ≡ a (mod m) has 2k distinct solutions (mod m), by the Chinese Remainder Theorem. 3. If m = pe is an odd prime power, then a is a quadratic residue mod pe if and only if a(p−1)/2 ≡ 1 (mod p), that is, a is a quadratic residue (mod p).

4.4. GENERAL POLYNOMIAL CONGRUENCES: LIFTING SOLUTIONS

43

Proof. We know G(pe ) is a cyclic group of order pe−1 (p − 1) and thus by pe−1 (p−1)

2 the general Euler Criterion, a is a square (mod pe ) if and only if a ≡1 (mod pe ). But the latter condition is equivalent to a(p−1) ≡ 1 (mod p) by Lemma 3.7.1. 

Euler’s Criterion yields an easy way of testing whether a given a is a quadratic residue (mod p), but there remains the problem of determining the square-roots of a when it has one. In the following special cases the task is straightforward. Homework 4.2.1. i) Let p be a prime of the form p = 4k + 3 and a be a quadratic residue (mod p). Then, then the congruence x2 ≡ a (mod p) has solutions x ≡ ±ak+1 (mod p). ii) Suppose that p is a prime of the form p = 8k + 5 and a is a quadratic residue (mod p). Then x ≡ ±ak+1 or 12 (4a)k+1 (mod p) satisfies x2 ≡ a (mod p) according as a2k+1 ≡ 1 or −1 (mod p). There remains the hard case where p ≡ 1 (mod 8). In this case one can use an iterative algorithm, called Shanks algorithm for calculating square roots. In the next chapter, we will see how Quadratic Reciprocity can be used to give a more efficient way of determining when a is a quadratic residue, than the Euler Criterion. 4.3. A general quadratic congruence Let p be an odd prime and consider the quadratic congruence ax2 + bx + c ≡ 0

(mod p),

with a nonzero (mod p). The congruence is equivalent to 4a2 x2 + 4abx ≡ −4ac (mod p) (2ax + b)2 ≡ b2 − 4ac (mod p). Thus, the quadratic congruence is solvable iff b2 − 4ac is a quadratic residue (mod p). Letting α2 ≡ b2 − 4ac (mod p), with α ∈ Z, we see that the solutions are just x = (−b ± α)(2a)−1 (mod p).

4.4. General Polynomial Congruences: Lifting Solutions Let f (x) be a polynomial over Z and p a prime. Consider the congruence f (x) ≡ 0

(mod pn ),

First note that any solution must already be a solution (mod p), so we start by solving the congruence (mod p), and then proceed to (mod p2 ), (mod p3 ), and so on. Let x1 be an integer solution of the congruence f (x) ≡ 0

(mod p).

We shall attempt to lift the solution x1 to a solution (mod p2 ), that is, find a point x2 such that, (4.4)

x2 ≡ x1

(mod p)

and

f (x2 ) ≡ 0

(mod p2 ).

44

4. POLYNOMIAL CONGRUENCES

Say x2 = x1 + tp for some t ∈ Z. We wish to find t so that x2 is a solution (mod p2 ). Now, f 00 (x1 ) (tp)2 + . . . 2 Note: Since the polynomial on the left clearly has integer coefficients, each of the (k) values f k!(x1 ) is an integer. Thus we obtain

(4.5)

f (x1 + tp) = f (x1 ) + f 0 (x1 )tp +

f (x1 + tp) ≡ f (x1 ) + f 0 (x1 )tp and so we need to solve the congruence f 0 (x1 )t ≡ −

(4.6)

f (x1 ) p

(mod p2 ),

(mod p),

called the Lifting Congruence. The three options going from (mod p) to (mod p2 ). (i) If p - f 0 (x1 ), then there is a unique solution t of (4.6) and hence a unique solution x2 of (4.4) (mod p2 ). (ii) If p|f 0 (x1 ) and p2 - f (x1 ) then there is no solution of (4.6) and hence no solution of (4.4). (iii) If p|f 0 (x1 ) and p2 |f (x1 ), then any value of t is a solution of (4.6), and hence there are p distinct solutions of (4.4) (mod p2 ). Suppose now that we have constructed by induction a sequence of integers x1 , x2 , . . . xn such that xi+1 ≡ xi

(mod pi )

and

f (xi ) ≡ 0

(mod pi ),

for i = 1, 2 . . . , n. To continue we wish to find an xn+1 = xn + pn t such that f (xn + pn t) ≡ 0 (mod pn+1 ). After expanding, this amounts to solving f (xn ) + f 0 (xn )pn t ≡ 0

(mod pn+1 ),

or equivalently (noting that f 0 (x1 ) ≡ f 0 (xn ) (mod p)) f 0 (x1 )t ≡ −

(4.7)

f (xn ) . (mod p) pn

and so again we have three options. The three options going from (mod pn ) to (mod pn+1 ). (i) If p - f 0 (x1 ), then there is a unique solution t of (4.7) and hence a unique solution xn+1 satisfying (4.8)

xn+1 ≡ xn

(mod pn ),

f (xn+1 ) ≡ 0

(mod pn+1 ).

(ii) If p|f 0 (x1 ) and pn+1 - f (xn ) then there is no solution of (4.7) and hence no solution of (4.8). (iii) If p|f 0 (x1 ) and pn+1 |f (xn ), then any value of t is a solution of (4.7), and hence there are p distinct solutions of (4.8) (mod pn+1 ). Definition 4.4.1. A solution x1 of the congruence f (x) ≡ 0 (mod p) is called nonsingular if f 0 (x1 ) 6≡ 0 (mod p) and singular if f 0 (x1 ) ≡ 0 (mod p).

4.4. GENERAL POLYNOMIAL CONGRUENCES: LIFTING SOLUTIONS

45

Theorem 4.4.1. If x1 is a nonsingular solution of the congruence f (x) ≡ 0 (mod p) then for any positive integer n there is a unique solution xn (mod pn ) of the congruence f (x) ≡ 0 (mod pn ) such that xn ≡ x1 (mod p). Proof. At each step of the lifting process there is a unique solution and so the theorem follows easily by induction on n.  Example 4.4.1. Solve the congruence x2 ≡ −1 (mod 125). Start with x2 ≡ −1 (mod 5) which has solutions ±2. First lets lift 2. Set x = 2 + 5t. f (x) = x2 + 1, f (2) = 5, f 0 (2) = 4, and so Lifting Congruence is 4t ≡ −1 (mod 5), which gives t ≡ 1 (mod 5), x ≡ 7 (mod 25). Next lift 7. Set x = 7+25t. f (7) = 50. The Lifting Congruence is 4t ≡ −50/25 (mod 5), so t ≡ 2 (mod 5) and x ≡ 57 (mod 125). Clearly, the second solution (obtained by lifting −2) is x ≡ −57 (mod 125). Example 4.4.2. Solve x3 + x2 + 23 ≡ 0 (mod 53 ). Start with the same congruence (mod 5). By trial and error we see that x ≡ 1 or 2 (mod 5). (i) Take x1 = 1. Put x = 1 + 5t. Note that f 0 (1) = 5 ≡ 0 (mod p), that is 1 is a singular solution, while f (1)/5 = 5 ≡ 0 (mod 5). Thus we have have option (iii), that is, the lifting congruence is 0t ≡ 0 (mod 5), so t is arbitrary and we get x2 = 1 + 5t = 1, 6, 11, 16, 21. Now f (1 + 5t)/25 = 4t2 + t + 1, and we see f (1 + 5t)/25 ≡ 0 (mod 5) iff t = 3. Thus for x2 = 16 we have option (iii) and get five liftings to solution (mod 125), namely x ≡ 16, 41, 66, 91, 116 (mod 125). If one continues this to (mod 54 ) one discovers that all of the solutions (mod 53 ) lift. Thus there are 25 solutions (mod 625) all living above x1 = 1.

1

(mod 5)

(mod 52 )

(mod 53 )

1

16

(mod 54 )16+125t

6

11

41

41+125t

16

66

66+125t

21

91

116

91+125t 116+125t

(ii) Since x1 = 2 is a nonsingular solution, there is a unique lifting each time. We obtain x2 ≡ 17 (mod 25) and x3 ≡ 42 (mod 125), and (if we continue one more level) x4 ≡ 417 (mod 625). This information can be displayed in a tree graph with vertices 1 and 2 at the top and branches below for the (mod 25), (mod 125), (mod 625) liftings.

46

4. POLYNOMIAL CONGRUENCES

Homework 4.4.1. i) Solve the congruence f (x) = x3 + 7x2 + x = x(x − 1)2 ≡ 0 (mod 32 ). ii) Solve the congruence x3 + x + 1 ≡ 0 (mod 312 ). Hint: Note that 3 is a solution (mod 31). Use factor theorem and quadratic formula to obtain others. iii) Solve the congruence x495 −2x24 +8 ≡ 0 (mod 7). Hint: Use Fermats Little Theorem to make life easier. 4.5. Counting Solutions of Polynomial Congruences Theorem 4.5.1. Let f (x) be a polynomial with integer coefficients and m a positive integer with factorization m = pe11 · · · pekk . Then i) x is a solution of the congruence (4.9)

f (x) ≡ 0

(mod m)

if and only if x satisfies the system of congruences (4.10)

f (x) ≡ 0

(mod pei i ),

1 ≤ i ≤ k. ii) Letting N (m) denote the number of solutions of (4.9) (mod m) and N (pei i ) denote the number of solutions of (4.10), we have N (m) = Πki=1 N (pei i ). Proof. i) m|f (x) ⇔ pei i |f (x), 1 ≤ i ≤ k. ii) We claim that the CRT gives us a one-to-one correspondence between the k−tuples (x1 , . . . , xk ) ∈ Zpe11 × · · · × Zpek with xi a solution of (2.66) for 1 ≤ i ≤ k k and the solutions x of (2.65). Indeed, suppose that xi is a solution of (2.66) for 1 ≤ i ≤ k, and let x (mod m) be the unique value with x ≡ xi (mod pei i ) , 1 ≤ i ≤ k. Such an x satisfies f (x) ≡ f (xi ) ≡ 0 (mod pei i ) for all i, and so f (x) ≡ 0 (mod m). 

CHAPTER 5

Quadratic Residues and Quadratic Reciprocity 5.1. Introduction Consider the two congruences x2 ≡ 3 (mod 1009) and x2 ≡ 1009 (mod 3). Which one is easier to solve? Since 1009 ≡ 1 (mod 3), the second congruence simplifies to x2 ≡ 1 (mod 3) which has solutions x ≡ ±1 (mod 3). The first congruence does not simplify, and cannot be solved easily by trial and error. Is there any relationship between these two congruences? In the first one we are working over the field Z1009 (noting that 1009 is a prime), while in the second one, we are working in the field Z3 . To address this relationship, we introduce the Legendre symbol. Recall that an integer a, not divisible by a prime p, is called a quadratic residue (mod p) if a is a square (mod p), that is, the congruence x2 ≡ a (mod p) is solvable. Definition 5.1.1. Let p be an odd prime and a ∈ Z with p - a. The Legendre   a is defined to be 1 if a is a quadratic residue (mod p), and -1 if a is a p

symbol

quadratic nonresidue (mod p). Thus to address the solvability of the congruence, x2 ≡ 3 (mod 1009), we 3 . We’vealready shown that 1009 = 1, but does this reveal must calculate 1009 3 3 any information about 1009 . Euler and Legendre, in the late 1700’s, observed a beautiful relationship between these two quantities, called the law  of quadratic  p reciprocity. It says that if p and q are distinct odd primes then q = pq unless     p ≡ q ≡ 3 (mod 4), in which case pq = − pq . Thus, for our example above,   3 since 1009 ≡ 1 (mod 4) we conclude that 1009 = 1009 = 1, that is, 3 is a 3 quadratic residue (mod 1009). Although conjectured by Euler and Legendre, it was Gauss who first proved the law. 5.2. Properties of the Legendre Symbol Before proving the law of quadratic reciprocity lets state some basic properties of the Legendre symbol. Theorem 5.2.1. Let p be an odd prime and a, b ∈ Z with p - ab. Then   a (p−1)/2 i) p ≡ a (mod p).      b ii) ab = ap p p .     iii) If a ≡ b (mod p) then ap = pb .  2 iv) ap = 1. 47

48

5. QUADRATIC RESIDUES AND QUADRATIC RECIPROCITY p−1

Proof. (i) Note that for any a with (a, p) = 1, a 2 ≡ ±1 (mod p), since p−1 by Fermat’s Little Theorem a 2 is a solution of the congruence x2 ≡ 1 (mod p), which has solutions ±1 (mod p). By Euler’s criterion, Theorem   ??, a is a quadratic p−1 2

≡ 1 (mod p). Thus ap = 1 if and only if   p−1 p−1 a 2 ≡ 1 (mod p). Otherwise we must have ap = −1 and a 2 ≡ −1 (mod p).   p−1 Thus in both cases ap ≡ a 2 (mod p). residue (mod p) if and only if a

(ii) is immediate from part (i). (iii) and (iv) follow immediately from the definition of the Legendre symbol.  Corollary 5.2.1. For any odd prime p, (   p−1 1, if p ≡ 1 −1 = (−1) 2 = p −1, if p ≡ 3

(mod 4); (mod 4).

Proof. Immediate from part (i) of the preceding theorem.

  

 p

Example 5.2.1. Lets investigate the relationship between 3 and p3 for    various primes p. As noted above, calculating p3 is easy. To calculate p3 we use Euler’s criterion (part (i) of the preceding theorem) and a calculator. p

p 3 3 p

5 −1

7 1

11 −1

−1

−1

1

13 17 1 −1

19 1

23 −1

29 −1

31 1

37 41 1 −1

43 1

−1

−1

1

−1

−1

1

−1

−1

1

Note that if p ≡ 1 (mod 4) then the two values are equal while if p ≡ 3 (mod 4) they have opposite signs.    Example 5.2.2. Lets do the same thing for p5 and p5 for various primes p. p

p 5 5 p

3 −1

7 −1

11 13 1 −1

−1

−1

1

−1

17 19 1 1

23 −1

29 1

31 37 1 −1

1

−1

1

1

1

−1

41 43 1 −1 1

−1

We see that the values are identical in this case! By studying further examples of this type one discovers that whenever we start with a prime q of the form q ≡ 3 (mod 4) we get the behavior of the first example (where q = 3), and whenever it is of the form q ≡ 1 (mod 4), we get identical values as in the second example (where q = 5). This leads us to formulate the Law of Quadratic Reciprocity. Theorem 5.2.2. Law of Quadratic Reciprocity. For any odd primes p, q, ( pq ) = ( pq ) unless p ≡ q ≡ 3 (mod 4), in which case ( pq ) = −( pq ). Equivalently,     p−1 q−1 p q = (−1) 2 2 . q p There are many proofs of quadratic reciprocity. Gauss published six proofs. In the next section we give a proof making use of an elementary lemma of Gauss.

5.3. PROOF OF THE LAW OF QUADRATIC RECIPROCITY

49

5.3. Proof of the Law of Quadratic Reciprocity Define [x] to be the greatest integer function. It is elementary to prove the following lemma. Lemma 5.3.1. i) For any integer n and real x, [x + n] = [x] + n. ii) For any real number x, ( −[x], if x ∈ Z, [−x] = −[x] − 1, if x ∈ R − Z. iii) If (α, β) is an interval of reals with non-integer endpoints then the number of integers in this interval is [β] − [α]. Lemma 5.3.2. Gauss’ Lemma. Let p be an odd prime and a an integer with p - a. Consider the set of values a, 2a, 3a, . . . , p−1 2 a,

(mod p)

−p p each reduced to a value between   2 and 2 . Let ν be the number of negative values in the resulting set. Then ap = (−1)ν .

Note 5.3.1. It is significant to note that Gauss’s Lemma tells us that the value   a of the Legendre symbol p depends only on the parity of ν, that is, we don’t need to know the value of ν, but only its value (mod 2). Proof. Say l1 a, . . . , lν a are congruent to the negative values, and lν+1 a, . . . , l(p−1)/2 a are all congruent to positive values. We claim that as subsets of Zp , {−l1 a, . . . , −lν a, lν+1 a, . . . , l(p−1)/2 a} = {1, 2, . . . , p−1 2 }. But this is easy to see, since the values on the left-hand side are all distinct (mod p) and belong to the right-hand side. Thus,   p−1 (−l1 a) . . . (−lν a)lν+1 a . . . l(p−1)/2 a ≡ ! (mod p), 2 that is,  p−1 (−1) l1 l2 · · · l(p−1)/2 a ! (mod p), ≡ 2 and the result follows by cancelation and Euler’s criterion. ν

p−1 2





Let a be a positive integer and p an odd prime with p - a. How can we determine the value ν occurring in Gauss’ Lemma? We must count the number of values a, 2a, 3a, . . . , p−1 2 a that belong to one of the intervals 3 5 (p/2, p), ( p, 2p), ( p, 3p), . . . ((b − 21 )p, bp), 2 2 p+1 1 where b = [a/2]. The value b is chosen such that p−1 2 a < (b + 2 )p and 2 a > bp. Thus ν is the number of multiples of a in one of the intervals listed, and so our task is to count the number of values of k such that ka ∈ ((j − 12 )p, jp) for some j running from 1 to b. This is equivalent to counting the number of k ∈ ((j − 12 )p/a, jp/a) for some j between 1 and b. Thus we have by Lemma 5.3.1(iii),

50

5. QUADRATIC RESIDUES AND QUADRATIC RECIPROCITY

Lemma 5.3.3. Let a be a positive integer and p be an odd prime with p - a. The value ν occurring in Gauss’ Lemma is given by ν=

b h X pi h pi − (j − 21 ) , j a a j=1

where b = [a/2]. (In particular b depends only on a, not on p.) Theorem 5.3.1. For any odd prime p, (   p2 −1 1, if p ≡ ±1 (mod 8), 2 8 = = (−1) p −1, if p ≡ ±3 (mod 8). Proof. In this case a = 2, b = a/2 = 1 and by the preceding lemma, ν = [p/2] − [p/4]. Writing p = 8t + r for some t ≥ 0, r ∈ {1, 3, 5, 7}, we obtain ν = 2t + [r/2] − [r/4] ≡ [r/2] − [r/4] (mod 2), and the theorem follows.  Homework 5.3.1. Prove a related result for a = 3:   ( 1, if p ≡ ±1 (mod 12), 3 = p −1, if p ≡ ±5 (mod 12). The cases a = 2 and a = 3 suggest that the value of the Legendre symbol

  a p

only depends on the value of p (mod 4a), and moreover if p ≡ −q (mod 4a) then (a/p) = (a/q). This is proven in the next lemma. Lemma 5.3.4. Let a be a positive and p, q be odd primes with p - a,    integer a a q - a. If p ≡ ±q (mod 4a) then p = q . Proof. Suppose that p ≡ r (mod 4a), say p = 4at + r for some t, r ∈ Z. Then by Lemma 5.3.3, ν=

   b  X 4at + r 4at + r j − (j − 21 ) a a j=1

=

 h b  X r i jr 4tj + − (j − 21 )(4t + ) a a j=1

= 2tb +

 b  X jr j=1



 b  X jr j=1

a

a

h ri − (j − 12 ) a

h ri − (j − 12 ) a

(mod 2).

In particular, we see that the parity of ν only depends on r and so we can write ν ≡ ν(r) (mod 2). Thus if p ≡ q (mod 4a) then (a/p) = (a/q). Similarly if p ≡ −r (mod 4a) then we have the above equality with −r in place of r. Now since (r, a) = 1 1 and j < a, jr/a and   (j 1− r2)r/a are never integers, and so [−jr/a] = −[jr/a] − 1, 1 r −(j − 2 ) a = − (j − 2 ) a − 1. Thus we see that ν(r) ≡ ν(−r) (mod 2). 

5.4. THE JACOBI SYMBOL

( pq )

51

Theorem 5.3.2. Law of Quadratic Reciprocity. For any odd primes p, q, ( pq ) = unless p ≡ q ≡ 3 (mod 4) in which case ( pq ) = −( pq ). Equivalently,     p−1 q−1 p q = (−1) 2 2 . q p

Proof. Case i: Suppose that p ≡ q (mod 4), with p > q, say p = q + 4a. Then (p/q) = (a/q) and (q/p) = (−1)(p−1)/2 (a/p). Since p ≡ q (mod 4a), we have by the preceding lemma that (a/p) = (a/q). Thus (p/q) = (q/p) if p ≡ q ≡ 1 (mod 4), while (p/q) = −(q/p) if p ≡ q ≡ 3 (mod 4). Case ii: Suppose that p ≡ −q (mod 4), say p = −q + 4a. Then (p/q) = (a/q), (q/p) = (a/p). Since p ≡ −q (mod 4a), by the previous lemma we have (a/p) = (a/q). Therefore (p/q) = (q/p).  Example 5.3.1. Calculate (7/1009) and (86/1117). Note 1009, 1117 are primes with 1009 ≡ 1 (mod 8) and 1117 ≡ 5 (mod 8).       1009 1 7 = = = 1. 1009 7 7          86 2 43 1117 42 = = (−1) = (−1) 1117 1117 1117 43 43        2 21 43 1 = (−1) = (−1)(−1) = (−1)(−1) = 1. 43 43 21 21 Homework 5.3.2. i) Determine (3/p) again, this time using reciprocity. Note again how its value only depends on the value of p (mod 12). ii) Determine (5/p) using reciprocity, and note how its value only depends on the value of p (mod 20). 5.4. The Jacobi Symbol In order to evaluate the Legendre symbol in the most efficient manner, we need to generalize the Legendre symbol and the Law of Quadratic Reciprocity to non-prime entries. This is done using the Jacobi symbol. Definition 5.4.1. Let P, Q ∈ Z with (P, Q) = 1 and Q a positive odd integer Qk with prime factorization Q = i=1 qi . (The qi need not be distinct). Then the Jacobi symbol for P over Q is given by   k   Y P P := . Q qi i=1 Note 5.4.1. i) If Q is a prime, the Jacobi symbol is the same as the Legendre symbol. ii) If (P/Q) = −1 then x2 ≡ P (mod Q) has no solution. iii) If (P/Q) = 1 no conclusion can be made. eg. (2/9)=1 but 2 is not a square (mod 9). Theorem 5.4.1. Let Q, Q0 be positive odd integers and P, P 0 be integers with 0 (P P 0 , QQ    ) = 1. Then i)

P Q

P Q0

=

P QQ0

.

52

5. QUADRATIC RESIDUES AND QUADRATIC RECIPROCITY

  0  0 P P ii) Q = PQP . Q iii) (P/Q2 ) = (P 2 /Q) = 1. iv) If P 0 ≡ P (mod Q), then



P0 Q



=

  P Q

.

Proof. Immediate from Theorem 5.2.1.



Theorem If Q is odd and Q > 0, then   5.4.2.Q−1 2 . i) −1 = (−1) Q (   Q2 −1 1, if Q ≡ ±1 (mod 8) 2 . ii) Q = (−1) 8 = −1, if Q ≡ ±3 (mod 8) Qk Proof. i) Let Q = i=1 qi . Then   Y  Y k  k Pk qi −1 −1 −1 = = (−1) 2 = (−1) i=1 Q qi i=1 i=1

qi −1 2

.

We will be done if we can show that (5.1)

k X qi − 1 i=1

2



Q−1 ≡ 2

Qk

i=1 qi

−1

2

(mod 2),

that is, k X

qi − 1 ≡

i=1

k Y

qi − 1

(mod 4).

i=1

We may cancel any qi ≡ 1 (mod 4) since they do not contribute to either side. Pk Thus, assuming all qi ≡ 3 (mod 4), and we get i=1 qi − 1 ≡ 2k (mod 4) and Qk k q −1 = (−1) −1 (mod 4). It is a simple matter to verify that 2k ≡ (−1)k −1 i=1 i (mod 4) for any integer k. Qk ii) Let Q = i=1 qi . Then   Y k   Pk qi2 −1 2 2 = = (−1) i=1 8 , Q qi i=1 and so we must show that k X q2 − 1 i

i=1

8



Q2 − 1 8

(mod 2),

that is, k X

qi2 − 1 ≡

i=1

k Y

qi2 − 1

(mod 16).

i=1

Now the square of any odd number is ≡ ±1 (mod 8), and so ≡ 1 or 9 (mod 16). We may eliminate those primes with qi2 ≡ 1 (mod 16) from both sides. Thus, assuming all qi2 ≡ 9 (mod 16), we get k X i=1

qi2 − 1 ≡ 8k

(mod 16),

5.4. THE JACOBI SYMBOL

53

while, k Y

qi2 − 1 ≡ 9k − 1

(mod 16).

i=1

If k is even, 8k ≡ 9k − 1 ≡ 0 (mod 16), while for k odd, 8k ≡ 9k − 1 ≡ 8 (mod 16).  Theorem 5.4.3. General Law of Quadratic Reciprocity. Let P, Q be odd positive integers with (P, Q) = 1. Then    P −1 Q−1 P Q = (−1) 2 2 . Q P Qr Qs Proof. Let P = i=1 pi , Q = j=1 qj . Then   Y  Y   r Y s  r Y s pi −1 qj −1 P qj pi 2 2 = = (−1) Q q pi j i=1 j=1 i=1 j=1  Y   s r Y Pr Ps pi −1 qj −1 pi −1 qj −1 Q Q (−1) 2 2 = = (−1) i=1 j=1 2 2 . P i=1 j=1 P Thus, we must show that s r X X pi − 1 qj − 1 P −1Q−1 ≡ 2 2 2 2 i=1 j=1

(mod 2),

that is, s r X P −1Q−1 pi − 1 X qj − 1 ≡ 2 2 2 2 j=1 i=1

(mod 2)

But this follows from our observation (5.1) above.  5.4.1. Euclidean-type algorithm for evaluating a Jacobi symbol. Let A, B be any two positive integers with B odd. We shall construct an algorithm that will calculate gcd(A, B) and determine, at the same time, the value of the A Jacobi symbol B , in the case where gcd(A, B) = 1. First reduce A (mod B), say A ≡ A1 (mod B), with 1 ≤ A1 < B. Write A1 = 2k A2 with A2 odd, 1 ≤ A2 < B. Then      k   k     k   A A1 2 A2 2 A2 2 A2 = = = = . B B B B B B B  Now B2 can be evaluated using Theorem 5.4.2 and so we can dispense with this term. By quadratic reciprocity (and the assumption that B and A2 are relatively prime) we obtain     A2 −1 B−1 A2 B 2 2 = (−1) . B A2 Note also that since B is odd, gcd(A, B) = gcd(A1 , B) = gcd(A2 , B) = gcd(B, A2 ), that is the greatest common divisor remains invariant under the reduction above. We then reduce B (mod A2 ), and continue repeating the process above, following essentially the Euclidean algorithm. The algorithm stops when we obtain a 0 or 1 in the top position of the Jacobi symbol. If we obtain a 0, then A and B are not relatively prime and gcd(A, B) is just the entry below the 0. (In this case the

54

5. QUADRATIC RESIDUES AND QUADRATIC RECIPROCITY

Jacobi symbol used in the algorithm was not a true Jacobi symbol, but merely a tracking symbol for calculating the greatest common divisor). If we obtain a 1, then A and B are relatively prime and we have successfully calculated the Jacobi symbol.  Example 5.4.1. Evaluate 187 97 . We have, noting that 97 ≡ 1 (mod 8),          187 90 2 45 97 = = = 97 97 97 97 45           7 45 3 7 1 = = = =− = −1 = −1 45 7 7 3 3 Again, we note that it is not necessary to know ahead of time that A and B are relatively prime. That fact will be revealed in the process. If they are not relatively prime, then you would eventually end up with a zero on top if that is the case. 5.5. Local solvability implies global solvability We start with the following theorem. Theorem 5.5.1. Let a ∈ Z. The congruence x2 ≡ a (mod p) is solvable for all primes p if and only if a is a perfect square. Proof. If a is a perfect square, then certainly a is a quadratic residue mod p for any prime p. Suppose now that a is not a perfect square. It suffices to show that there exists a positive odd integer P such that ( Pa ) = −1 for this would imply that ( ap ) = −1 for some prime p|P , and therefore x2 ≡ a (mod p) is not solvable. We consider three cases. Case i: a = ±2k b for some positive odd integers k, b. In this case we select P so that P ≡ 5 (mod 8), and P ≡ 1 (mod b). Such a P exists by the Chinese Remainder Theorem. Then      a   ±1   2   b  P 1 = = 1 · (−1) · 1 · = −1 = −1. P P P P b b Case ii: a = ±2k q l b for some nonnegative even integer k, odd prime q with odd exponent l, and positive odd integer b not divisible by q. Choose P so that P ≡1

(mod 4b),

and

P ≡λ

(mod q),

where λ is a quadratic nonresidue mod q. Then       a   ±1   q   b  P P 1 = =1·1· = −1 = −1. P P P P q b b Case iii: a = −b2 for some integer b. Choose P ≡ 3 (mod 4) with (P, b) = 1. Then ( Pa ) = ( −1 P ) = −1.  Note 5.5.1. A closer examination of the proof reveals the following refinement: If the congruence x2 ≡ a (mod p) is solvable for all odd primes p ≤ 4|a|, then a is a perfect square.

5.6. SUMS OF TWO SQUARES

55

The above theorem can be restated as follows: The quadratic congruence x2 − ay ≡ 0 (mod p) has a nonzero (that is, p - x or y) solution for all primes p if and only if the equation x2 − ay 2 = 0 has a nonzero integer solution. Solvability mod p is called a local condition, while solvability over Z is called global. Thus nontrivial local solvability implies nontrivial global solvability for the case of a homogeneous quadratic equation in two variables. This can be generalized into a principal called the Hasse-Minkowski principal, stated in the following theorem, which we will not prove here; see [3, Theorem 1, pg 61]. 2

Theorem 5.5.2. Hasse, Minkowski. Let Q(x1 , . . . , xn ) be a quadratic form with integer coefficients. Then the equation (5.2)

Q(x1 , . . . , xn ) = 0,

has a nonzero integer solution if and only if for any prime p and positive integer e, the congruence (5.3)

Q(x1 , . . . , xn ) ≡ 0

(mod pe ),

has an integer solution with p - xi for some i, and the equation (5.2) has a solution in real numbers xi . 5.6. Sums of two Squares Our goal in this section is to characterize all integers that can be expressed as a sum of two squares of integers. Suppose that n is a sum of squares, n = a2 + b2 . Since any square is congruent to 0 or 1 (mod 4), n ≡ a2 + b2 ≡ 0, 1 or 2 (mod 4). Thus we have a necessary condition for n to be a sum of two squares. Unfortunately, this is not a sufficient condition. For example 6 ≡ 2 (mod 4), 12 ≡ 0 (mod 4) and 21 ≡ 1 (mod 4) but 6,12 and 21 are not sums of two squares. The problem is that 6, 12 and 21 have the prime factor 3, which cannot be expressed as a sum of squares. If we restrict our attention to primes then the necessary condition is sufficient. Theorem 5.6.1. Let p be an odd prime. Then p is a sum of two squares if and only if p ≡ 1 (mod 4). Moreover, this representation is unique up to order and ± signs. To prove the theorem we make use the Gaussian integers Z[i], which as we saw earlier is a Euclidean domain with respect to the mapping δ(a + bi) = a2 + b2 , and therefore a unique factorization domain. Note that δ is a multiplicative function, that is, for any w, z ∈ Z[i] we have δ(wz) = δ(w)δ(z), or equivalently, |wz|2 = |w|2 |z|2 . Proof. Existence: If p is a sum of two squares, then as we saw above p ≡ 0, 1 or 2 (mod 4), but p is odd, so p ≡ 1 (mod 4). Consider now the converse. Suppose that p ≡ 1 (mod 4). Then −1 is a quadratic residue (mod p), that is, there exists an integer u with u2 ≡ −1 (mod p). Thus p|(u2 + 1), that is, p|(u + i)(u − i) in the Gaussian integers Z[i]. If p is irreducible in the UFD Z[i], then p|(u + i) or p|(u − i), a contradiction. Therefore, p is reducible, and so by Problem 2.13.1, p is a sum of squares. Uniqueness: If p = a2 + b2 , then p = (a + bi)(a − bi) in Z[i], where δ(a + bi) = δ(a − bi) = p. Thus a + bi and a − bi are irreducibles in Z[i] and so by unique factorization, they are uniquely determined up to unit multiples. Note the

56

5. QUADRATIC RESIDUES AND QUADRATIC RECIPROCITY

unit multiples of a + bi are a + bi, −a − bi, −b + ai, b − ai, corresponding to the representations p = a2 + b2 = (−a)2 + (−b)2 = (−b)2 + a2 = b2 + (−a)2 , while the unit multiples of a − bi are a − bi, −a + bi, b + ai, −b − ai, corresponding to p = a2 + (−b)2 = (−a)2 + b2 = b2 + a2 = (−b)2 + (−a)2 .  To generalize this theorem to an arbitrary positive integer, let us recall that by unique factorization any positive integer n can be uniquely expressed in the manner Qk n = n1 n22 with n1 square-free. Indeed, if n = i=1 pei i , then we rearrange the primes so that the first l primes occur to an odd multiplicity while the remaining occur to an even multiplicity. Then, with ei = 2fi + 1, 1 ≤ i ≤ l, ei = 2fi , l + 1 ≤ i ≤ k, n=

l Y

i +1 p2f i

i=1

where n1 =

Ql

i=1

pi , n2 =

k Y

i p2f i

i=l+1

=

l Y i=1

pi

k Y

i = n1 n22 , p2f i

i=i

Qk

fi i=l+1 pi .

Theorem 5.6.2. Let n be a positive integer with n = n1 n22 , where n1 is squarefree. Then n is a sum of two squares if and only if n1 has no prime divisor p ≡ 3 (mod 4). Proof. Necessity: Let p be a prime divisor of n with p ≡ 3 (mod 4). We shall prove by induction on e that if p2e+1 kn, then n is not a sum of two squares. Suppose e = 0 and that pkn. If n = a2 + b2 for some a, b ∈ Z, then a2 + b2 ≡ 0 (mod p), and −1 2 so  and p|b, for otherwise we would have (ab ) ≡ −1 (mod p), contradicting  p|a −1 p

= −1. But this implies that p2 |(a2 + b2 ), that is, p2 |n a contradiction.

Suppose the assertion is true for e − 1 and consider the case e. Say p2e+1 kn. By the same argument, if n = a2 + b2 for some integers a, b then p|a and p|b, and so pn2 = (a/p)2 + (b/p)2 , but this is impossible by the induction assumption. We conclude that any prime divisor p of n with p ≡ 3 (mod 4) cannot be a factor of n1 . Ql Sufficiency: Say n1 = i=1 pi where pi = 2 or pi ≡ 1 (mod 4), 1 ≤ i ≤ l. Then for 1 ≤ i ≤ l, pi is a sum of two squares by the previous theorem, and so pi = δ(wi ) Qk for some Gaussian integer wi . Putting w = i=1 wi we see that δ(w) = n1 and δ(n2 w) = n22 δ(w) = n. Therefore n is a sum of two squares.  The above proof actually reveals a way of counting the number of representation of an integer as a sum of two squares. Definition 5.6.1. i) Let R(n) denote the number of ways of representing n as a sum of two squares. Thus if we let S(n) := {(a, b) ∈ Z2 : a2 + b2 = n}, then R(n) = |S(n)|. ii) Let r(n) denote the number of primitive representations of n as a sum of two squares, that is, if S 0 (n) := {(a, b) ∈ Z2 : a2 + b2 = n, gcd(a, b) = 1}, then r(n) = |S 0 (n)|. Example 5.6.1. If p ≡ 1 (mod 4) then p has an essentially unique representation as a sum of squares and so R(p) = r(p) = 8. Similarly, R(2) = r(2) = 4.

5.6. SUMS OF TWO SQUARES

57

For n = 25 we have the representations 25 = 32 + 42 and 25 = 52 + 02 . Thus R(25) = 12, r(25) = 8. Theorem 5.6.3. Let n be a positive integer with n = 2e n1 n2 , where n1 consists of prime factors p ≡ 1 (mod 4) and n2 of prime factors p ≡ 3 (mod 4). Then R(n) = 0 if n2 is not a perfect square and R(n) = 4τ (n1 ) if n2 is a perfect square. Proof. Note that there is a one-to-one correspondence between the elements of S(n) and the factorizations of n in Z[i] of the form n = ww. If (a, b) ∈ S(n) then (a + bi)(a − bi) = n and so we obtain n = ww with w = a + bi. Conversely, if n = ww with w = a + bi, then n = a2 + b2 and so (a, b) ∈ S(n). Thus R(n) = #{w ∈ Z[i] : ww = n}. Recall, the units in Z[i] are just {±1, ±i}, and we have the following factorizations of integer primes in Z[i]. 2 = iπ22 ,

where π2 = 1 − i; if p ≡ 1

p = πp π p ,

(mod 4), p = πp π p ; if p ≡ 3

p = prime,

(mod 4).

Moreover, the Gaussian primes listed above, π2 , πp , π p and q, are all of the Gaussian primes. We write z ∼ w if z and w are associates, that is, z = uw for some unit u. Let n be a positive integer with prime factorization n = 2e n1 n2 , where n1 consists of prime divisors p ≡ 1 (mod 4) and n2 of prime divisors q ≡ 3 (mod 4). We have already seen that n is a sum of squares if and only if n2 is a perfect square, thus, assuming such is the case we can write n1 =

k Y

pei i ,

n2 =

i=1

r Y

2f

qj j ,

j=1

for some distinct primes pi ≡ 1 (mod 4) and qj ≡ 3 (mod 4). Thus the prime factorization of n in Z[i] is given by n=

ie π22e

k Y i=1

πiei π ei i

r Y

2f

qj j .

j=1

Let w be a Gaussian integer with ww = n. In particular, w is a divisor of n and so it has a prime factorization (5.4)

w = uπ2E

k Y i=1

πiEi πi Gi

r Y

F

qi j ,

j=1

for some unit u and nonnegative integers E, Ei , Gi , Fj , 1 ≤ i ≤ k, 1 ≤ j ≤ r. Thus by unique factorization we must have E = e, Fj = fj , 1 ≤ j ≤ r and Ei + Gi = ei , 1 ≤ i ≤ k. There are (ei + 1) choices for (Ei , Gi ) and thus altogether Qk 4 i=1 (ei + 1) = 4τ (n1 ) choices for w, the factor 4 coming from the 4 units.  The primitive representations of n as a sum of two squares will correspond to those w in (5.4) with E = 0 or 1, Fj = 0, 1 ≤ j ≤ r and Ei Gi = 0, 1 ≤ i ≤ k (note if both Ei and Gi are positive, then p|n.) In particular such a representation is only possible if e = 0 or 1, that is 2kn and n2 = 1. For those admissible n there are only two choices for each (Ei , Gi ), one choice for each Fj and one choice for E,

58

5. QUADRATIC RESIDUES AND QUADRATIC RECIPROCITY

E = 0 if n is odd, E = 1 if 2kn. Altogether we obtain 4 · 2k choices for w. Thus we have the following. Theorem 5.6.4. A positive integer n has a primitive representation as a sum of two squares if and only if n = n1 or 2n1 where n1 has only prime divisors of the form p ≡ 1 (mod 4), in which case r(n) = 2k+2 where k is the number of distinct prime divisors of n1 . Note that the argument above also gives us a way of constructing all of the representations of n as a sum of two squares, provided we know the factorization of n in Z[i]. Example 5.6.2. Find all of the primitive representations of 3250 = 2 · 53 · 13 as a sum of squares. According to the theorem there are 24 = 16 such representations, which really means two representations up to order and ± signs. Start by noting 5 = (1 + 2i)(1 − 2i), 13 = (3 + 2i)(3 − 2i), 2 = (1 + i)(1 − i). The choices for w (up to order and sign) are (1 − i)(1 + 2i)3 (3 + 2i) = −57 + i and (1 − i)(1 − 2i)3 (3 + 2i) = −53 + 21i. Thus 3250 = 572 + 12 = 532 + 212 . Next find the imprimitive representations. Since R(3250) = 4(3+1)(1+1) = 32 there are 16 such. Here, the choices for w are (1−i)(1+2i)2 (1−2i)(3+2i) = 35+45i and (1 − i)(1 + 2i)(1 − 2i)2 (3 + 2i) = 15 − 55i. Thus we get 3250 = 352 + 452 and 3250 = 152 + 552 . 5.6.1. Algorithm for representing a prime as a sum of two squares. Let p be a prime with p ≡ 1 (mod 4). We shall outline an algorithm for representing p as a sum of two squares that, on the assumption of the Generalized Riemann Hypothesis GRH, runs in polynomial time. First we need an integer u with u2 ≡ −1 (mod p). It is easy to find such a u provided that we know a single quadratic p−1 nonresidue (mod p). Indeed, if r is a quadratic nonresidue (mod p), then r 2 ≡ p−1 −1 (mod p) and so we can take u ≡ r 4 (mod p). In particular, if p ≡ 5 (mod 8), then r = 2 works. Bach [2] proved (on the assumption GRH) that for any p there exists a quadratic nonresidue r with r < 2 log2 p, and so one can simply search for such a value by brute force. This was sharpened to r ≤ 23 log2 p, by Wedeniwski [14]. Having found u we then let L be the lattice of points L = {λ1 (1, u) + λ2 (0, p) : λ1 , λ2 ∈ Z}. Every point in (x, y) ∈ L satisfies the congruence x2 + y 2 ≡ 0 (mod p) and the minimal nonzero point (with respect to the Euclidean norm) satisfies x2 + y 2 = p. One can then use Gauss’ algorithm for finding the minimal point in a two dimensional lattice to obtain the minimal point in at most log1+√2 p iterations; see Vall´ee [13]. Mitchell [9] generalized Gauss’ algorithm to arbitrary norms. Homework 5.6.1. Prove that for any √ prime p, there exists a positive quadratic nonresidue less than or equal to p − 1. Hint: Let u be the least quadratic nonresidue and (by the division algorithm) say p = qu + r with 0 < r < u, 0 < q < p. Since r < u we have ( pr ) = 1. We also have qu ≡ −r (mod p) and so q r ( pq )( up ) ≡ ( −1 p )( p ) = 1.√Therefore ( p ) = −1 and by minimality of u, q ≥ u. Thus 2 p ≥ u + 1 and so u ≤ p − 1. 5.6.2. Sums of three squares and sums of four squares. We discuss sums of three squares and sums of four squares in Sections 13.5 and 13.6.

CHAPTER 6

Primality Testing, Mersenne Primes and Fermat Primes 6.1. Basic Primality Test We start with an elementary primality test, the proof of which we leave as an exercise for the reader. √ Theorem 6.1.1. If n is a positive integer having no prime factor p ≤ n, then n is a prime. 6.2. Pseudoprimes and Carmichael Numbers From Fermat’s Little Theorem we immediately obtain the following simple test. Theorem 6.2.1. Composite Number Test. Let m be a positive integer and a ∈ Z with m - a. If am−1 6≡ 1 (mod m), then m is composite. Proof. Suppose that m is prime. Then since m - a, by Fermat’s Little Theorem, am−1 ≡ 1 (mod m), a contradiction.  In general, if m is not a prime and (a, m) = 1 then we do not have am−1 ≡ 1

(6.1)

(mod m),

however, the identity may hold. Whenever (6.1) holds we call m a probable prime to the base a. If m is composite and (6.1) holds then m is called a pseudoprime to the base a. A number m is called a Carmichael number if it is a pseudoprime to every base a relatively prime to m, that is (6.1) holds for all a with (a, m) = 1. It is reasonable to ask whether such numbers exist. Not only do they exist, there are infinitely many Carmichael numbers. Indeed, if we let C(x) denote the number of Carmichael numbers up to x then C(x) > x.29 , for x sufficiently large; see Alford, Granville and Pomerance [1] (1992). Homework 6.2.1. i) Establish Korselt’s Criterion: m is a Carmichael number if and only if m is square-free and for any prime p|m we have (p − 1)|(m − 1). ii) Verify that 561=3·11·17 is a Carmichael number using Korselt’s Criterion. Theorem 6.2.2. Lucas’ Primality Test Let m > 1 . Suppose that there exists an integer a > 1 such that (i)

am−1 ≡ 1

(mod m),

ad 6≡ 1

(mod m),

(ii)

for every proper divisor d of m − 1. Then m is a prime. Proof. The two conditions of the theorem imply that ordm (a) = m − 1, and so φ(m) ≥ m − 1. But this implies that m is prime.  59

60

6. PRIMALITY TESTING, MERSENNE PRIMES AND FERMAT PRIMES

Note: This test is useful if the factorization of m − 1 is simple, for example, m = pk + 1, for some prime p. For the next test we need the following fact: If p is a prime, then (6.2)

x2 ≡ 1

⇔ x ≡ ±1

(mod p)

(mod p).

Pseudoprime Test: Let m be a large integer that we wish to test for primality. Write m − 1 = 2k d with d odd. Let a > 1. Compute the sequence of values, (6.3)

k−1

ad , a2d , . . . , a2

d

k

, a2

d

(mod m).

k

Case i. Suppose that a2 d 6≡ 1 (mod m). Then m is not a prime. j k Case ii. Suppose that a2 d ≡ 1 (mod m). Let j be minimal such that a2 d ≡ 1 j−1 (mod m) and suppose that j ≥ 1. If a2 d 6≡ −1 (mod m), then m is not a prime by (6.2). Case iii. Suppose that the first two cases fail, that is, either ad ≡ 1 (mod m) j j−1 or there is a value j ≥ 1 such that a2 d ≡ 1 (mod m) and a2 d ≡ −1 (mod m). Then m is called a strong probable prime to the base a. If m is composite, then m is called a strong pseudoprime to the base a. We let sp(a) denote the set of all strong probable primes to the base a. Example 6.2.1. m = 1387; m − 1 = 2 · 693. Then 21386 ≡ 1 (mod 1387), but 2693 ≡ 512 (mod 1387). Thus m is not a prime. Example 6.2.2. m = 2047; m − 1 = 2 · 1023. Then 21023 ≡ −1 (mod m) and 22046 ≡ 1 (mod m). Thus m is a strong probable prime to the base 2. However m is composite, m = 23 · 89. It can be shown that if m < 2047 and m ∈ sp(2) then m is a prime. In other words m = 2047 is the smallest composite number which is in the class sp(2). If m is a strong probable prime to the base 2, then we test base 3, 5, 7, 11 in succession. If m < 1373653 and m is in the class sp(2) and sp(3) then m is a prime. If m < 25 · 109 and m 6= 3215031751 and m is sp(2), sp(3), sp(5) and sp(7), then m is a prime. Thus any number m of this order can be tested for primality with just 4 applications of the above test. If we test in addition base 11, we can go up to 2.15 · 1012 , and if we throw in bases 13 and 17, we can definitively test any value up to 1014 . Homework 6.2.2. i) Show that for any positive oddinteger m, the set of elea ments a ∈ G(m) satisfying the congruence a(m−1)/2 ≡ m (mod m) is a subgroup H(m) of G(m). ii) Show that for prime m, H(m) = G(m). iii) It can be shown that for odd composite m, H(m) is always a proper subgroup of G(m) and therefore |H(m)| ≤ 21 |G(m)|. Use this observation to construct a Monte-Carlo primality test, that is a test that will be able to conclude, with 99.9999...% probability, that the value m is a prime.” (This test is called the Solovay-Strassen test.)

6.3. MERSENNE PRIMES AND FERMAT PRIMES

61

6.3. Mersenne Primes and Fermat Primes Definition 6.3.1. Any prime of the form 2n + 1 is called a Fermat prime. Any prime of the form 2n − 1 is called a Mersenne prime. Lemma 6.3.1. i) If 2n + 1 is prime then n is a power of 2. ii) If 2n − 1 is prime then n is prime. Proof. i) Immediate from the factoring formula X k + 1 = (X + 1)(X k−1 − X + · · · + 1), for k odd. Say n = dk for some odd k, and put X = 2d to obtain d 2 + 1|2n + 1. ii) Immediate from the factoring formula X n − 1 = (X − 1)(X n1 + · · · + 1). Say k = dn for some n ∈ N, and put X = 2d to obtain 2d − 1|2n − 1.  k−2

k

In view of i) we let Fk := 22 + 1, called the k-th Fermat number. Similarly, for any prime p, we let Mp := 2p − 1 denote the p-th Mersenne number. The first few Fermat primes are 3=2+1, 5 = 22 + 1, 17 = 24 + 1, 257 = 28 + 1 and k 65537 = 216 + 1. Fermat conjectured that all numbers of the form 22 + 1 would be prime, but Euler found that this was false for k = 5. Indeed, it is now known that for k = 5, 6, . . . , 21 the numbers Fk are all composite. The new question is whether there exist any more Fermat primes. Homework 6.3.1. i) Show that any prime factor p of Fk satisfies p ≡ 1 (mod 2k+1 ).   ii) Next, use the fact that p2 = 1 for any p ≡ 1 (mod 8), to show in fact that any prime divisor p of Fk satisfies p ≡ 1 (mod 2k+2 ). iii) Use ii) and a calculator to factor F5 = 4294967297. iv) Here’s another way to factor F5 . Verify the identity. 232 + 1 = (29 + 27 + 1)(223 − 221 + 219 − 217 + 214 − 29 − 27 + 1) Theorem 6.3.1. Pepin’s Primality Test. The following conditions are equivalent for any positive integers a, k ≥ 2. i) Fk is a prime and a is a quadratic nonresidue (mod Fk ). Fk −1 ii) a 2 ≡ −1 (mod Fk ). Proof. i) implies ii) is just Euler’s criterion. Suppose now that a is a positive integer satisfying ii). Then it is easy to see that ordFk (2) = Fk − 1 and so i) follows. We’ll leave the details as a homework problem.  Homework 6.3.2. i) Prove the converse part of Pepin’s Primality Test, ii) implies i). ii) Show that 3 is a quadratic nonresidue (mod Fk ) for any k ≥ 1. 2k −1

iii) Conclude that Fk is a prime if and only if 32

≡ −1 (mod Fk ).

Gauss proved the following connection between Fermat primes and the construction of regular n-gons; we will not prove here. Theorem 6.3.2. (i) A regular n-gon with n a prime, can be constructed with straight-edge and compass if and only if n is a Fermat prime. (ii) More generally, a regular n-gon can be constructed iff n is of the form n = 2k p1 p2 · · · pl for some k ≥ 0 and distinct Fermat primes p1 , . . . , pl .

62

6. PRIMALITY TESTING, MERSENNE PRIMES AND FERMAT PRIMES

The primes p = 2, 3, 5, 7 yield Mersenne primes M2 = 3, M3 = 7, M5 = 31, M7 = 127. However, p = 11, gives M11 = 2047 = 23 · 89, a composite. Thus we do not always get a Mersenne prime Mp , when p is prime. The reason M11 fails to be prime is that 11 ≡ 3 (mod 4) and 11 is a Sophie Germain prime. Primes q such that 2q + 1 is also a prime are called Sophie Germain primes. They were considered by Sophie Germain in connection with Fermat’s last theorem. Theorem 6.3.3. Lagrange’s Divisibility Test. (Lagrange 1775). If q is a prime and q ≡ 3 (mod 4) then 2q + 1 divides Mq if and only if 2q + 1 is a prime; in this case, if q > 3 then Mq is composite. Proof. Suppose q is a prime q ≡ 3 (mod 4), and that (2q + 1)|Mq . Suppose 2q + 1 is composite. Let r be a minimal prime factor, so r|Mq and r2 ≤ 2q + 1. Then 2q ≡ 1 (mod r) so ordr (2)|q and q prime implies ordr (2) = q. Thus q|(r − 1) so q ≤ r − 1, r ≥ q + 1 contradicting r2 ≤ 2q + 1. Thus 2q + 1 is prime. Converse: Suppose now that q, p = 2q + 1 are primes with q ≡ 3 (mod 4). Since p ≡ 7 (mod 8) we have 2q ≡ 2(p−1)/2 ≡

2 p

≡ 1 (mod p). Thus p|(2q − 1).

(Note also, n|Mq implies that n ≡ ±1 (mod 8) and n ≡ 1 (mod q).)



Consequently, if q = 11, 23, 83, 131, 179, 191, 239, 251 then Mq has the factors 23,47 etc.. It is an open question whether there are infinitely many Mersenne primes. All of the largest known primes are Mersenne primes. In 1876 Lucas discovered the record breaking prime number 2k − 1 with k = 127. He discovered a clever algorithm to test for primality that worked specifically for numbers of this type. It was popular for many years to test the speed of a new computer by seeing if it can find a record breaking prime number using the best algorithms of the day. In 1985 a Cray X-MP computer running for 3 hours, verified that Mk was prime for k = 216091, a 65000 digit prime number. In 2008 the 45-th known Mersenne prime was discovered at UCLA, 243112609 − 1 a number with 12978189 digits, earning the finders a $100000 prize. There are currently 48 known Mersenne primes, the largest, discovered in 2013, being 2257885161 − 1, a 17425170 digit number. Check GIMPS (Great Internet Mersenne Prime Search) on the internet if you wish to participate in the search for record breaking Mersenne primes, and win prize money.

CHAPTER 7

Arithmetic Functions 7.1. Properties of Greatest Integer Function and Binomial Coefficients Lemma 7.1.1. Properties of the greatest integer function. i) [x + m] = [x] + m for all real x and integers m. ii) [x] + [y] ≤ [x + y] ≤ [x] + [y] + 1 for all real x, y. iii) ( −[x] if x ∈ Z [−x] = −[x] − 1 if x 6∈ Z iv) For any real number x, [2x] − 2[x] ≤ 1. v) For n, a ∈ N, [ na ] is the number of positive integers less than or equal to n that are divisible by a. Proof. We leave parts i) to iv) as exercises for the reader. It is useful to write x = [x] + , y = [y] + δ with 0 ≤  < 1, 0 ≤ δ < 1 in writing the proofs. To prove v), let q denote the number of multiples of a less than or equal to n. Then  qa ≤ n < (q + 1)a and so q ≤ na < q + 1, that is, q = [ na ]. Theorem 7.1.1. Let n be a positive integer and p a prime, with pe kn!. Then P ∞ e = i=1 [ pni ]. (Note, this is really just a finite sum). Proof. Let M (k) denote the number of integers from 1 to n P that are divis∞ k k ible by p . By the preceding lemma, M (k) = [n/p ]. Then e = k=1 M (k) = P∞ n [ ].  k=1 pk  n Theorem 7.1.2. i) For any positive integers m ≤ n, m ∈ Z. ii) If a1 , a2 , . . . , ak are positive integers with a1 + a2 + · · · + ak = n then a1 !a2 ! · · · ak !|n!. Proof. First we note that i) is just a special case of ii), and that we have already seen iv) for the case of prime m. ii) It suffices to show that for each prime p the multiplicity of p dividing a1 ! · · · ak !, say e, is ≤ the multiplicity of p dividing n!, say f . Now   ! ∞ k ∞ k ∞ k ∞ X X X X X X a 1 X n a i i  [ ] = e= [ ] ≤ [ a ] = [ j ] = f. i j j j p p p p i=1 j=1 j=1 i=1 j=1 i=1 j=1  Theorem 7.1.3. For any positive integer n > 1 and prime p, if pe k pe < 2n. 63

 2n n

, then

64

7. ARITHMETIC FUNCTIONS

Proof. By the preceding lemma, if pe k have e=

K X

[2n/pk ] − 2

k=1

K X

[n/pk ] =

k=1

K X

2n n



, then letting K = [logp (2n)] we

[2n/pk ] − 2[n/pk ] ≤

k=1

K X

1 = K ≤ logp (2n).

k=1

The final inequality is strict unless p = 2 and n is a power of 2. But in the latter case, one obtains e = 1.  Theorem 7.1.4. If 0 < a ≤ m are positive integers with (a, m) = 1, then  .  Note, this theorem generalizes the fact that if p is a prime and a < p, then p| ap .  m m−1 We leave the proof for homework, but here is a hint: m a = a a−1 . m|

m a

7.2. The Divisor function and Sigma function Definition 7.2.1. A function f : N → C is called an arithmetic function. Definition 7.2.2. For any positive integer n we let τ (n) = denote the number of positive divisors of n, and σ(n) = the sum of the positive divisors of n. τ is called the divisor function, and σ the sigma function. More generally, for any integer k, we let σk (n) denote the sum of the k th powers of the divisors of n. Note 7.2.1. In sigma notation we can write X X τ (n) := 1, σ(n) := d, d|n

X

σk (n) :=

d|n

dk .

d|n e+1

Note 7.2.2. If n = pe then τ (n) = e + 1 and σ(n) = p p−1−1 . More generally, if n = pe11 · · · pekk then any positive divisor of n has a factorization of the form pf11 · · · pfkk for some nonnegative integers fi , 1 ≤ i ≤ k. There are ei + 1 choices for each fi and thus we see that τ (n) = (e1 + 1) . . . (ek + 1) =

k Y

τ (pei i ).

i=1

Also, it is plain that σ(n) =

e1 X f1 =0

···

ek X fk =0

pf11

· · · pfkk

=

e1 X f1 =0

pf11

···

ek X

pfkk

=

fk =0

k Y pei +1 − 1 i

i=1

pi − 1

=

k Y

σ(pei i ).

i=1

These formulas imply that τ and σ are multiplicative function. 7.3. Multiplicative Function Definition 7.3.1. i) An arithmetic function f is called multiplicative if for all a, b ∈ N with (a, b) = 1, we have f (ab) = f (a)f (b). ii) f is called totally multiplicative if for all a, b ∈ N, f (ab) = f (a)f (b). Note 7.3.1. i) If f is multiplicative then it is determined by its values on prime powers. That is, if n = pe11 · · · pekk then (7.1)

f (n) = f (pe11 ) · · · f (pekk ),

7.3. MULTIPLICATIVE FUNCTION

65

ii) In general we do not have f (pe ) = (f (p))e for a multiplicative function, and thus f is not totally multiplicative. iii) If f is an arithmetic function satisfying (7.1) for all positive integers n, then f is multiplicative. Thus τ and σ are multiplicative functions. iv) We already saw that φ is a multiplicative function. The key lemma to working with and constructing multiplicative functions is the following result, revealing a one-to-one correspondence between the divisors of ab and the pairs of divisors of a and b respectively, when a and b are relatively prime. Lemma 7.3.1. The Divisor Correspondence Lemma. Let a, b be positive integers with gcd(a, b) = 1. Let Da , Db and Dab denote the sets of positive divisors of a, b and ab respectively. Then there is a one-to-one correspondence between Da × Db and Dab given by η : Da × Db → Dab , where η(d1 , d2 ) = d1 d2 . The inverse of η is the mapping β : Dab → Da × Db : where β(d) = (gcd(a, d), gcd(b, d)). Qr Qs f Proof. Let a, b have prime factorizations, a = i=1 pei i , b = j=1 qj j . Since gcd(a, b) = 1 the primes pi , qj are all distinct. Let d1 , d2 be positive divisors of a, b Qr Qs Fj i respectively. Then d1 = i=1 pE i , d2 = j=1 qj for some Ei ≤ ei , 1 ≤ i ≤ r, Fj , Qr Q Fj s i 1 ≤ j ≤ s, and η(d1 , d2 ) = i=1 pE i j=1 qj . By the Fundamental Theorem of Arithmetic it is plain that η is a one-to-one and onto mapping. Moreover if we start Qr Qs Fj i with a divisor d of ab then d has prime factorization d = i=1 pE i j=1 qj , for Qr Qs Fj i some Ei , Fj , where i=1 pE i = gcd(a, d) and j=1 qj = gcd(b, d). Thus β(d) = (gcd(a, d), gcd(b, d)).  P Theorem 7.3.1. If f is multiplicative and F is defined by F (n) = d|n f (d) then F is also multiplicative. Proof. Let a, b ∈ N with (a, b) = 1. By the Divisor Correspondence Lemma, writing any divisor d of ab in the manner d = d1 d2 with d1 |a, d2 |b and noting that (d1 , d2 ) = 1 since a, b are relatively prime, we have X XX F (ab) = f (d) = f (d1 d2 ) d1 |a d2 |b

d|ab

=

XX

f (d1 )f (d2 ) =

d1 |a d2 |b

X

f (d1 )

d1 |a

X

f (d2 ) = F (a)F (b).

d2 |b

 P

Example 7.3.1. i) τ is a multiplicative since τ (n) = d|n 1 and the function f (n) = 1 is multiplicative. Thus, τ (pe11 · · · pekk ) = (e1 + 1) · · · (ek + 1). P ii) σk is multiplicative since σk (n) = d|n dk and the function f (n) = nk is multiplicative. In particular, pei i +1 − 1 pi − 1 P Example 7.3.2. Another proof of Theorem 3.11.3. Let F (n) = d|n φ(d). Since φ is multiplicative, so is F . Now for any prime power pe we have e e X X X F (pe ) = φ(d) = φ(pi ) = (pi − pi−1 ) = pe . σ(p1e1 · · · pekk ) = Πki=1

d|pe

i=0

i=0

66

7. ARITHMETIC FUNCTIONS

Therefore F (n) = n for any n. A generalization of the preceding theorem that can be proven in the same manner is the following. Theorem 7.3.2. If f and g are multiplicative functions then so is X n F (n) = f (d)g( ). d d|n

Proof. Suppose that (a, b) = 1 . Then XX X a b ab f (d1 d2 )g( ) F (ab) = f (d)g( ) = d d1 d2 d1 |a d2 |b

d|ab

=

X d1 |a

a X b f (d1 )g( ) f (d2 )g( ) = F (a)F (b). d1 d2 d2 |b

 7.4. Perfect Numbers Definition 7.4.1. i) A positive integer n is called perfect if n equals the sum of its proper divisors, that is, σ(n) = 2n. ii) n is called deficient if σ(n) < 2n and abundant if σ(n) > 2n The first few perfect numbers are 6 = 2 · 3 = 2(22 − 1) 28 = 4 · 7 = 22 (23 − 1) 496 = 16 · 31 = 24 (25 − 1) 8128 = 64 · 127 = 26 (27 − 1) The pattern suggests that we test a number of the type n = 2p−1 (2p − 1), where 2p − 1 is a Mersenne prime. σ(n) = σ(2p−1 )σ(2p − 1) = (2p − 1)2p = 2n. Thus, any such number is perfect! Theorem 7.4.1. An even number n is perfect if and only if n is of the form n = 2p−1 e(2p − 1), for some prime p such that 2p − 1 is a prime (a Mersenne prime). Thus we see that there are infinitely many even perfect numbers if and only if there are infinitely many Mersenne primes. Euclid (300 B.C.) established in his work “Elements”, the easy direction, the fact we observed above. It wasn’t until 2000 years later that Euler (mid 1700’s) established the converse. There remains the open problem of whether there any odd perfect numbers? It is known that if n is an odd perfect number, then n > 101500 and that n must have at least 9 distinct prime factors the largest of which is greater than 108 . Proof. If n is of the form in the Theorem then as we saw above, n is perfect. Suppose now that n is an even perfect number. Write n = 2e m with m odd, and e ≥ 1. Then σ(n) = 2n implies that (7.2)

(2e+1 − 1)σ(m) = 2e+1 m.

7.6. ESTIMATING ARITHMETIC SUMS

67

m e+1 Thus 2e+1 − 1|m and 2e+1 − 1. Then 1, −1 |m. Suppose that m 6= 2 m are distinct positive divisors of m and so m + m, σ(m) ≥ 1 + e+1 2 −1

m 2e+1 −1

and

(2e+1 − 1)σ(m) ≥ 2e+1 − 1 + 2e+1 m, whence by (7.2) we obtain the contradiction 2e+1 m ≥ 2e+1 − 1 + 2e+1 m. Hence m = 2e+1 − 1, and σ(m) = 2e+1 = m + 1. This is only possible if m is prime. Thus m is a Mersenne prime and the exponent e + 1 must also be a prime.  7.5. The M¨ obius Function Definition 7.5.1. The M¨obius function µ is defined as follows.   if n = 1 1 µ(n) = (−1)k if n = p1 p2 · · · pk , a product of k distinct primes   0 if n has any square factor p2 . Note 7.5.1. It is an easy exercise to check that µ is a multiplicative function. Theorem 7.5.1. i) For positive integers n, ( X 1 if n = 1 µ(d) = 0 if n > 1. d|n ii) For any positive integer n, X µ(d) d|n

d

=

φ(n) . n

P

Proof. i) Let F (n) = d|n µ(d). Since µ is multiplicative, so is F by Theorem 7.3.1. Now at any prime power pe with e ≥ 1, F (pe ) = 1 + (−1) + 0 + 0 + · · · + 0 = 0 and so F (pe11 · · · pekk ) = 0 unless all of the exponents are zero. Trivially F (1) = 1. P ii) Again, let F (n) = d|n µ(d)/d. At any prime power pe we have F (pe ) = 1 +

φ(pe ) −1 = . p pe 

The function F in part i) acts as a characteristic function for the singleton point 1, that is, F (1) = 1 but F (n) = 0 for n 6= 1 . 7.6. Estimating Arithmetic Sums Example 7.6.1. Find the sum S of all positive integers less than or equal to n that are relatively prime to n. By Theorem 7.5.1,   n n n X X X X X  S= 1= µ(d) a = µ(d) a a=1 (a,n)=1

a=1

d|(a,n)

d|n

a=1 d|a

68

7. ARITHMETIC FUNCTIONS

=

X

µ(d)

X d|n

 dα =

α=1

d|n

=

n/d X

X

µ(d)d 

d|n

n/d X

 α

α=1

n n n2 X µ(d) n X n µ(d) ( + 1) = + µ(d) = φ(n). 2 d 2 d 2 2 d|n

d|n

Homework 7.6.1. Let S2 denote the sum of the squares of the positive integers ≤ n that are relatively prime to n. Show that n2 φ(n) n + Πp|n (1 − p), 3 6 where the product is over the distinct prime divisors of n. P Lemma 7.6.1. Let F (n) = d2 |n µ(d). Then F (n) = 1 or 0 depending on whether n is square-free or not. (Note the sum is over all positive integers d such that d2 |n.) S2 =

Proof. Let n = n1 n22 with n1 square-free. We claim that d2 |n if and only if d|n2 . One direction is trivial, namely if d|n2 then d2 |n. For the converse, suppose that p is a prime with pe kd. If d2 |n the p2e |n and thus p2e−1 |n22 (since n1 has at most one factor of p.) Since the multiplicity of p dividing n22 is even we must therefore have p2e |n22 , that is, pe |n2 . Since this is true for every prime power divisor of d we conclude that d|n2 . Then ( X X 1, if n2 = 1; µ(d) = F (n) = µ(d) = 0, if n2 > 1, d2 |n d|n 2

by the preceding theorem. Now n2 = 1 is precisely the condition of n being squarefree.  Let N (x) denote the number of square-free positive integers ≤ x. Theorem 7.6.1. For any positive real number x we have (7.3)

N (x) =

6 x + E(x) π2

√ where the error term E(x) is bounded by |E(x)| ≤ 2 x + 1. Note, the factor 6/π 2 is what one expects from a probabilistic viewpoint. Indeed, in order to be square-free, an integer cannot be divisible by 22 or 32 or 52 , or p2 in general, where p is a prime. Now the likelihood of not being divisible by p2 is (1 − p12 ). Thus, assuming the events are independent (which they are not), the likelihood of not being divisible by any prime is  X ∞ Y  1 µ(n) 1− 2 = . p n2 n=1 p prime

In order to evaluate this infinite product, we consider its reciprical, −1  X ∞ Y  Y  1 1 1 1 π2 1− 2 = 1 + 2 + 4 + ··· = = , 2 p p p n 6 n=1 p prime

p prime

¨ 7.7. MOBIUS INVERSION FORMULA

69

the latter fact coming from the Fourier expansion ∞

x2 =

X π2 cos(kx) , +4 (−1)k 3 k2 k=1

evaluated at x = π. Proof. To establish the estimate we make use of the result of the preceding lemma, ( X 1, if n is square free; µ(d) = 0, otherwise. d2 |n Thus, writing [x/d2 ] =

x d2

− θ(d), with 0 ≤ θ(d) < 1, we have x  X XX X X µ(d) µ(d) 2 − θ(d) N (x) = µ(d) = 1= d √ √ 2 n≤x d |n

n≤x d2 |n

d≤ x

=x

∞ X µ(d) d=1

d2

d≤ x

+ E(x),

where X

E(x) := −

µ(d)θ(d) − x

√ d≤ x

X µ(d) . d2 √

d> x

Plainly, |E(x)| ≤



  Z ∞ X 1 √ √ √ 1 dx < x+x + √ x+x = x + 1 + x. 2 2 x √ d x x d> x



7.7. M¨ obius Inversion Formula Theorem obius inversion formula. If f is any arithmetic function P 7.7.1. The M¨ and F (n) = d|n f (d) , then f (n) =

X d|n

X n n F (d)µ( ) = F ( )µ(d). d d d|n

Corollary 7.7.1. With F defined as in the preceding theorem, we have F is multiplicative if and only if f is multiplicative. Proof. We’ve already established that if f is multiplicative then so is F in Theorem 7.3.1. Suppose now that F is multiplicative. Then by the M¨obius inversion P formula, f (n) = d|n µ(d)F ( nd ). Since µ and F are multiplicative, it follows from Theorem 7.3.2 that f is multiplicative. 

70

7. ARITHMETIC FUNCTIONS

¨ bius Inversion Theorem. For any positive integer n we Proof of the Mo have, n X X X XX µ(d)f (δ) µ(d)F = µ(d) f (δ) = d n d|n

d|n

=

X

δ| d

d|n

δ dδ|n

  X f (δ)  µ(d) = f (n), d| n δ

δ|n

the last equality following from Theorem 7.5.1. P P Example 7.7.1. i) σ(n) = d|n d, and so n = d|n σ(d)µ(n/d). P P ii) τ (n) = d|n 1, and so 1 = d|n τ (d)µ(n/d). P P P iii) n = d|n φ(d), and so φ(n) = d|n µ(d) nd = n d|n another proof that φ is a multiplicative function.

µ(d) d .



Note, this yields

7.8. Estimates for τ (n), σ(n) and φ(n) For any arithmetic function f defined on [1, x] with x ∈ N, its average value on P the interval is x1 n≤x f (n). Theorem 7.8.1. We have the following average values for τ , σ and φ. P √ i) x1 n≤x τ (n) = log x + (2γ − 1) + O(1/ x), where, γ = .57721..., Euler’s constant. P 2 ii) x1 n≤x σ(n) = π12 x + O(log x). P iii) x1 n≤x φ(n) = π32 x + O(log x). Proof. i) We have X

τ (n) =

n≤x

X X

1=

n≤x d,e≥1 de=n

=

X X √ d≤ x e≤x/d

X

1

d,e≥1 de≤x

1+

X X

1

√ e≤ x d p prime 1 − p12 = π62 . Thus we have Theorem 7.8.3. For any natural number n we have

6 2 π2 n

average

1 p



< σ(n)φ(n) < n2 .

7.8.1. Estimates for τ (n). We’ve seen on average Q that τ (n) is very small, but just how big can it get? Consider the value n = p≤x p, the product of all primes less than or equal to x. We have log n = θ(x). Also, θ(x) = cx for some c with log 2 < c < 2 log 2, and so log log n = log c + log x. By Theorem ?? we have π(x) =

log n θ(x) + e(x) = + e(x), log x log log n − log c

for some e(x) with e(x) = O(x/ log2 (x)) = O(log n/(log log n)2 ). Thus we have log n

log n

2

τ (n) = 2π(x) = 2 log x +e(x) = 2 log log n−log c +O(log n/(log log n) log 2

= n log log n +O(1/(log log n)

2

))

)

.

As it turns out this is in fact the largest possible value of τ (n). Theorem 7.8.4. For any positive real number  and n sufficiently large, we have (1+) log 2 τ (n) ≤ n log log n .

72

7. ARITHMETIC FUNCTIONS

Proof. Let n =

Qk

i=1

pei i . For any positive real number δ we have k

τ (n) Y ei + 1 = , nδ pδei i=1 and so we focus our attention on estimating the function fp (x) := x+1 for x ≥ 1. pδx x+1 1/δ If p ≥ 2 then fp (x) ≤ 2x ≤ 1 uniformly for x ≥ 1. Suppose now that p < 21/δ . Note that fp (x) attains a local maximum value at pδ 1 x = δ log p − 1 where we have fp (x) = eδ log p . Now in order for the local maximum to be on our interval of interest, x ≥ 1, we must have p ≤ e1/2δ . Suppose that this is the case. Then 1 1 pδ ≤√ ≤ . fp (x) ≤ eδ log p δ eδ log p 1/2δ Otherwise, p > e , and fp (x) attains a maximum value at x = 1, whence we get 2 2 1 fp (x) ≤ δ < √ < , p δ e √ 1 for δ < 2 e, which we may assume. Thus for any δ < 2√ we have uniformly e 1 fp (x) ≤ δ . Therefore, Y 1  1 ω(n) τ (n) ≤ = . nδ δ δ p|n p≤21/δ

In the worst case n is just the product of all primes less than 21/δ , and so we get τ (n) ≤ nδ (1/δ)π(2

1/δ

)

.

log 2 (1−) log log n

and assuming that n is sufficiently large we get nδ to be Inserting δ = the size require for the theorem, while 1

2 δ = e(1−) log log n = (log n)1− , and so

(log n)1− C log log n (1/δ) < (1/δ) ≤ ≤ n (log log n)2 , log 2 for some constant C. Thus we obtain the desired upper bound for τ . π(21/δ )

21/δ





CHAPTER 8

Recurrence Sequences 8.1. The Fibonacci Sequence Fibonacci lived 1170-1250. Although the Fibonacci sequence appears earlier in Indian mathematics and elsewhere, he was the first to put together some of its important properties. The Fibonacci sequence {Fn }n=0 = 0, 1, 1, 2, 3, 5, 8, . . . is governed by the recurrence relation Fn = Fn−1 + Fn−2 , for n ≥ 2, that is, the n-th term is the sum of the previous two terms. The Fibonacci sequence is not the only sequence governed by this relation. Indeed we could start with any two numbers and generate a sequence using this relation, for example 2, 5, 7, 12, 19, 31, 50, . . . . Lets see if we can characterize all sequences {un } satisfying the same relation (8.1)

un = un−1 + un−2 ,

for n = 2, 3 . . . . First, lets test to see whether a geometric sequence of the form un = λn , with λ a fixed complex number, can satisfy this relation. In order to satisfy (8.1) we must have λn = λn−1 + λn−2 , that is, λ = ϕ or ϕ0 where ϕ=

√ 1+ 5 , 2

ϕ0 =

√ 1− 5 . 2

The reader may recognize that ϕ is the golden ratio. It is easy to verify that the sequences {ϕn }, and {ϕ0n } each satisfy (8.1). By linearity, any sequence of the type wn = c1 ϕn + c2 ϕ0n , with c1 , c2 constants, will also satisfy (8.1). Conversely, we claim that then any sequence {un } satisfying (8.1) is of this form. Indeed, the sequence {un } is completely determined by the values u0 and u1 , and so we must show that for any complex numbers u0 , u1 , there exist complex numbers c1 and c2 such that w0 = u0 and w1 = u1 , that is, c1 + c2 = u0

c1 λ1 + c2 λ2 = u1 .

This system will always be solvable since ϕ 6= ϕ√0 . For the case √ of the Fibonacci sequence, u0 = 0, u1 = 1, and we obtain c1 = 1/ 5, c2 = −1/ 5. Thus we obtain an explicit formula for the n-th Fibonacci number. (8.2)

√1 5

1 1 Fn = √ (ϕn − ϕ0n ) = √ 5 5

√ !n 1+ 5 1 −√ 2 5

√ !n 1− 5 . 2

8.1.1. Prove that for any n ≥ 1, Fn is the nearest integer to Homework √ n 1+ 5 . (That is, the Fibonacci sequence is essentially a geometric sequence.) 2 73

74

8. RECURRENCE SEQUENCES

8.2. Second order linear recurrences The example above easily generalizes to any second order recurrence sequence, that is, a sequence {un } satisfying (8.3)

un = aun−1 + bun−2 ,

for n ≥ 2, where a, b are fixed complex numbers with b 6= 0. Theorem 8.2.1. Suppose that {un }n=0 is a recurrence sequence satisfying (??), where a, b ∈ C, b 6= 0, and that λ1 , λ2 are the roots of the associated quadratic equation x2 − ax − b = 0. (i) If λ1 6= λ2 , that is, the quadratic has distinct roots, then un = c1 λn1 +c2 λn2 , n = 0, 1, 2, . . . , where c1 and c2 are constants satisfying u0 = c1 + c2 and u1 = c1 λ1 + c2 λ2 . (ii) If λ1 = λ2 = λ, say, then un = c1 λn + c2 nλn , n = 0, 1, 2, . . . , where c1 , c2 satisfy, u0 = c1 , u1 = (c1 + c2 )λ. Proof. The proof of (i) is identical to what we did above. For part (ii) We note that if λ is a double root, then the sequence un = nλn also satisfies (8.3), nλn = (n − 1)aλn−1 + n − 2bλn−2 , or equivalently nλ2 = naλ + nb − aλ − 2b. Having a double root means that x2 − ax − b has a zero discriminant, that is, a2 = −4b, and that λ = a/2. Thus −aλ − 2b = 0 and so the preceding equation just becomes λ2 = aλ + b. Now let un = c1 λn + c2 nλn . Then u0 = c1 and u1 = c1 λ + c2 λ.  The reader who has studied differential equations, will note the analogy with solving a second order linear differential equation. Example 8.2.1. Let {un } be a sequence with u0 = 0, u1 = 1, un = un−1 −un−2 , so that Let {un } = 0, 1, 1, 0, −1, −1, 0, 1, 1, 0, . . . , a periodic sequence with period 6. The √associated quadratic equation is x2 − x + 1 = 0 with roots ω, ω. Here ω = 1+2 3i = eπi/3 , a primitive 6-th root of unity. The general solution to the linear recurrence is of the form c1 ω n + c2 ω n , and with the initial conditions u0 = 0, −i u1 = 1 we get c1 = √ , c2 = √i3 , and so 3 −i i un = √ ω n + √ ω n . 3 3 Homework 8.2.1. Find all periodic second order recurrence sequences. 8.3. A Matrix view of the Fibonacci Sequence The Fibonacci sequence enjoys a number of interesting properties. First, lets extend the definition of Fibonacci numbers to negative indices using the relation Fn−1 = Fn+1 − Fn . Thus F0 , F−1 , F−2 , · · · = 0, 1, −1, 2, −3, 5, −8, 13, −21, . . . . It is easy to see by induction that with   0 1 A= , 1 1

8.4. CONGRUENCE AND DIVISIBILITY PROPERTIES OF THE FIBONACCI SEQUENCE 75

we have for any positive integer n,   Fn−1 Fn . Fn Fn+1   −1 1 −1 Since A is invertible over Z with A = , the identity in (8.4) holds in fact 1 0 for all integers. Since det(A) = −1, we have det(An ) = (det(A))n = (−1)n and so we obtain the identity An =

(8.4)

(8.5)

Fn−1 Fn+1 − Fn2 = (−1)n ,

for any integer n. It follows from (8.5) that any two consecutive Fibonacci numbers are relatively prime. Homework 8.3.1. a) Deduce from part (8.5) that for any positive integer k k = FFk−1 − FFk+1 . k ≥ 2, F(−1) k Fk+1 k k+1 Pn n b) Deduce from part a) that for any positive integer n, FFn+1 = k=1 (−1) Fk Fk+1 . Homework 8.3.2. a) Use part b) of the preceding problem to prove that limn→∞ FFn+1 exists. n √

b) Prove that the limit of the sequence in a) is the Golden ratio ϕ = 5−1 2 using only the fact that Fn+1 = Fn + Fn−1 . c) Confirm part b) by computing the same limit using the formula for Fn , (8.2). 8.3.1. A matrix proof of (8.2). The characteristic polynomial for A is given by det(A − λI) = λ2 − λ − 1, with roots ϕ, ϕ0 , and thus we obtain the diagonalization P −1 AP = D where the columns of P are the eigenvectors associated with the eigenvalues ϕ, ϕ0 ,       1 −ϕ0 1 1 1 ϕ 0 −1 P = , P =√ , D= . ϕ ϕ0 −1 0 ϕ0 5 ϕ Then, A = P DP −1 , and An = P Dn P −1 for any integer n. This recovers the formula for Fn given in (8.2). 8.4. Congruence and Divisibility Properties of the Fibonacci Sequence From (8.4) we have (8.6)

An ≡



Fn−1 0

0

 (mod Fn ).

Fn+1

For any positive integer k, since Ank = (An )k we have     k  k Fnk−1 Fnk F 0 F 0 ≡ n−1 ≡ n−1 k Fnk Fnk+1 0 Fn+1 0 Fn+1 Thus we obtain for any positive integers k, n, (8.7)

k Fnk−1 ≡ Fn−1

(mod Fn )

(8.8)

k Fn+1

(mod Fn )

Fnk ≡ 0

(mod Fn )

(8.9)

Fnk+1 ≡

(mod Fn ).

76

8. RECURRENCE SEQUENCES

The last congruence can be restated as follows. Theorem 8.4.1. For any positive integers d, n with d|n we have Fd |Fn . Next, from the identity Am An = Am+n we obtain the identity (8.10)

Fn+m = Fm−1 Fn + Fm Fn+1 ,

for any integers m, n, from which we see that (Fn+m , Fm ) = (Fm−1 Fn + Fm Fn+1 , Fm ) = (Fm−1 Fn , Fm ) = (Fn , Fm ), the second equality following from the gcd-invariance property and the last from the fact that consecutive Fibonacci numbers are relatively prime. Therefore (by induction) for any integer q, (8.11)

(Fn−qm , Fm ) = (Fn , Fm ).

Applying the Euclidean algorithm to (n, m), and using (8.11), we deduce the following theorem. Theorem 8.4.2. For any positive integers m, n we have (Fm , Fn ) = F(m,n) . Homework 8.4.1. Show that for any prime p, Fp is relatively prime to all Fibonacci numbers before it. Thus each Fp gives rise to at least one prime factor that is not a factor of any previous Fibonacci number. In fact this result can be generalized; see next note. Note 8.4.1. Carmichael’s theorem: With the exception of 1,8 144, every Fibonacci number has a prime factor that is not a factor of any smaller Fibonacci number. In particular, this yields a new proof of the infinitude of the set of primes. We will not prove Carmichael’s theorem here. 8.5. Periodicity of the Fibonacci sequence (mod m) A sequence {an } is said to be periodic (mod m) if for some k > 0, (8.12)

an+k ≡ an

(mod m),

for all natural numbers n. It is said to have period k (mod m), if k is the minimal value for which (8.12) holds for all n. The Fibonacci sequence (mod 2) and (mod 3), is given by 0, 1, 1, 0, 1, 1, 0, 1, 1, . . .

(mod 2),

0, 1, 1, 2, 0, 2, 2, 1, 0, 1, 1, 2, 0, 2, 2, 1, . . .

(mod 3),

respectively and thus has period 3 (mod 2) and period 8 (mod 3). It is easy to see that the Fibonacci sequence is periodic (mod m) for any m. Indeed, there are only m2 possible values for the ordered pairs (Fn , Fn+1 ) (mod m) and so these values must repeat. Once we have (Fn , Fn+1 ) ≡ (Fn+k , Fn+k+1 ) (mod m) for some k, it follows by induction that Fn+k+i ≡ Fn+i (mod m), for any integer i. Thus the sequence is periodic (mod m) with period at most m2 . Definition 8.5.1. The period of the Fibonacci sequence (mod m) is called a Pisano period, denoted π(m), named after Leonardo Pisano (Fibonacci!). Another way to obtain the periodicity of the Fibonacci sequence is to use the generating matrix for the sequence. First lets define the order of a matrix.

8.5. PERIODICITY OF THE FIBONACCI SEQUENCE

(mod m)

77

Definition 8.5.2. If A is an invertible n by n matrix over the ring Zm then the order of A (mod m), denoted ordm (A), is the minimal k such that Ak ≡ In (mod m), where In is the identity matrix. Note that the order of an invertible matrix A exists since A is an element of the finite multiplicative group GLn (Zm ) of invertible n by n matrices over Zm . It is plain that if m = pe11 · · · pekk then the order of A (mod m) is the least common multiple of the orders of A (mod pei i ), 1 ≤ i ≤ k. Furthermore, we have Lemma 8.5.1. If p is a prime and A is an invertible matrix (mod p), then for any positive integer e, ordpe (A)|pe−1 ordp (A). i

i

Proof. This follows from the property that if A ≡ B (mod p), then Ap ≡ B p (mod pi+1 ), for i ≥ 0. (The same proof used to prove the analogous result for e−1 integers works for matrices.) Thus, if Ak ≡ I (mod p) then Ap k ≡ I (mod pe ).    0 1 Now let A = , the generating matrix for the Fibonacci sequence. Then 1 1 Ak ≡ I2 (mod m) is equivalent to saying     Fk−1 Fk 1 0 ≡ (mod m), Fk Fk+1 0 1 that is, Fk ≡ 0 (mod m) and Fk−1 ≡ Fk+1 ≡ 1 (mod m). Moreover, since for any n, An+k ≡ An Ak ≡ An (mod m), we have that Fn+k ≡ Fn (mod m). Thus, ordm (A) = π(m), the Pisano period, and we can identify the Pisano period by looking for the first occurrence of the sequence 1,0,1 in the Fibonacci sequence (mod m). The preceding lemma applied to the Fibonacci generating matrix yields Lemma 8.5.2. If p is a prime, then for any positive integer e, we have π(pe )|pe−1 π(p). By our observation above on the order of a matrix (mod m), we see that π(m) is the least common multiple of the orders of A modulo the prime power divisors of m. These periods in turn can be estimated using the preceding lemma, leaving us with the task of determining the period modulo primes. Theorem 8.5.1. For any prime p the Pisano period π(p) of the Fibonacci sequence satisfies i) π(2) = 3, π(5) = 20. ii) If p ≡ ±1 (mod 5), then π(p)|(p − 1). iii) If p ≡ ±3 (mod 5), then π(p)|(2p + 2). Proof. i) These values can be calculated numerically. ii) We work in the finite field Fp . The first observation we make is that the proof of Theorem 8.2.1 holds just as well  forsequences over Fp as for C. If p ≡ ±1 (mod 5) then by quadratic   reciprocity, p5 = p5 = ±1 = 1, and so there exists a nonzero λ ∈ Fp with 5 λ2 = 5, and the roots of the associated quadratic x2 − x − 1 are λ1 = 2−1 (1 + λ), λ2 = 2−1 (1 − λ). Thus the analogue of (8.2) we obtain is Fn = λ−1 (λn1 + λn2 ). Since the orders of λ1 and λ2 are divisors of p − 1 in F∗p (Fermat’s Little Theorem), it follows that π(p)|(p − 1).

78

8. RECURRENCE SEQUENCES

iii) If p ≡ ±3 (mod 5) then by the same argument 5 is not a quadratic residue (mod p) and so the quadratic x2 − x − 1 remains irreducible over Fp . The roots λ1 , λ2 ∈ Fp2 satisfy λ2 = λp1 (since the mapping x → xp is an automorphism of Fp2 fixing Fp ) and λ1 λ2 = −1 (the constant term of the quadratic). Thus λp+1 = −1, 1 2p+2 = 1, and the same for λ2 . Thus the orders of both λ1 and λ2 are divisors of λ1 2p + 2.  Putting together the preceding theorem and lemma, one can obtain the following refinement of our earlier observation that π(m) ≤ m2 . Corollary 8.5.1. For any m, we have π(m) ≤ 6m. This problem was posed by Peter Freyd, in the American Mathematical Monthly (E3410, March 1992). We leave the proof as an exercise for the reader. Example 8.5.1. Lets find π(11). Since 11 ≡ 1 (mod 5), the theorem gives π(11)|10. We have 42 ≡ 5 (mod 11), so √ 4 serves as the square-root of 5, and −1 −1 2 ≡ 6 (mod 11). Thus φ ≡ 2 (1 + 5) ≡ 6(1 + 4) = 30 ≡ 8 (mod 11) and φ0 ≡ 6(1 − 4) ≡ 4 (mod 11). Therefore Fn ≡ 4−1 (φn − φ0n ) ≡ 3(8n − 4n )

(mod 11).

Hence, π(11) = [ord11 (4), ord11 (8)] = 10. 8.6. Further Properties of the Fibonacci Sequence Homework 8.6.1. Prove that for any positive integer n we have F1 + F2 + · · · + Fn = Fn+2 − 1, F1 + F3 + · · · + F2n−1 = F2n , F2 + F4 + · · · + F2n = F2n+1 − 1 Give a proof by induction, and another proof using the matrix A. Homework 8.6.2. What is the sum of the Fibonacci numbers appearing in one cycle (mod m). Homework 8.6.3. Prove that every prime is a divisor of infinitely many Fibonacci numbers. In particular show that if p ≡ ±1 (mod 5) then p|Fp−1 and if p ≡ ±3 (mod 5) then p|F2p+2 . Homework 8.6.4. Zeckendorf’s theorem: Show that every positive integer can be uniquely expressed as a sum of one or more distinct Fibonacci numbers F2 , F3 , . . . , no two of which are consecutive. Note: We omit F1 from the list because F1 = F2 and the uniqueness part of the statement would fail on account of the identity F2n = F1 + F3 + · · · + F2n−1 . Homework 8.6.5. Partitioning a Rectangle. i) Draw a rectangle with sides of length Fn , Fn+1 , and show how it can be partitioned into squares with edge lengths, F1 , F2 , . . . , Fn . Deduce the identity, F12 + F22 + · · · + Fn2 = Fn Fn+1 ii) Give an induction proof of this identity.

8.6. FURTHER PROPERTIES OF THE FIBONACCI SEQUENCE

79

Homework 8.6.6. Fibonacci Number Test. Prove that a natural number n is a Fibonacci number if and only if either 5n2 + 1 or 5n2 − 1 is a perfect square. Homework 8.6.7. Prove that for any positive n, 2 Fn2 + Fn+1 = F2n+1 .

Try a proof using A. Homework 8.6.8. Deduce from the previous problem that every odd numbered Fibonacci number starting with F5 is the hypotenuse of a right triangle, by showing 2 that (2Fn Fn−1 , Fn2 − Fn−1 , F2n−1 ) is Pythagorean triple for n ≥ 3. Remark 8.6.1. The first few Fibonacci primes are 2, 3, 5, 13, 89, 233, 1597, 28657, 514229, .... It is an open question whether there are infinitely many.

CHAPTER 9

Diophantine Equations 9.1. Preliminaries Let p(x, y, z) be a polynomial with integer coefficients. Our interest is in finding integer solutions to the equation (9.1)

p(x, y, z) = 0

In particular we ask following two questions. 1) Does (9.1) have a solution? If so, how many? Is the number finite or infinite? The most famous diophantine equation is the Fermat equation xn + y n = z n . It was asserted by Fermat (1637) that for n > 2 there is no integer solution to this equation with xyz 6= 0. Although Fermat was never able to provide a proof of this assertion, it became known as Fermat’s Last Theorem (FLT). Attempts to prove FLT have given rise to the development of much new mathematics. In the centuries after Fermat made his assertion, FLT was proven for many special cases of n. It was shown by Faltings (1983) that for a fixed n > 2 such an equation can have at most finitely many integer solutions, other than the trivial ones xyz = 0. This followed from his proof of Mordell’s conjecture, which asserts that an algebraic curve of genus greater than one can have at most finitely many rational points. Finally, in 1993 Andrew Wiles proved Fermat’s Last Theorem in its entirety. 2) A second question we might ask is to find all the solutions of (9.1), or at least provide an explicit description of the solution set. For instance consider the pythagorean equation x2 + y 2 = z 2 . We will be able to give a precise description of all integer solutions to this equation. 9.2. Systems of Linear Equations In Section 2.7 we found all integer solutions to the linear equation ax + by = c. We turn now to a system of k linear equations in n unknowns. (9.2)

a11 x1 + .. .

... .. . ak1 x1 + . . .

+a1n xn .. .

=

b1

+akn xn

=

bn

which can be written as the matrix equation AX = B where       a11 . . . a1n x1 b1  ..      . . . .. ..  , A= . X =  ..  and B =  ...  . ak1 . . . akn xn bk 81

82

9. DIOPHANTINE EQUATIONS

Our goal is to make an appropriate change of variables so that the system can be diagonalized. Suppose that P, Q are square matrices, invertible over Z, such that P AQ = D where D = [dij ] is a diagonal-type matrix of the form   d1 0 . . . 0 . . . 0  0 d2 . . . 0 . . . 0   D=. .. ..  , ..  .. . . . 0 0 . . . dk . . . 0 that is dij = di for i = j = 1, 2, . . . , k, and dij = 0 for i 6= j. Then putting X = QY with Y a column matrix of variables, we see have AX = B ⇔ P AX = P B ⇔ P AQY = P B ⇔ DY = P B, and thus (9.2) is equivalent to the diagonal system of equations (9.3)

di yi = b0i ,

1≤i≤k

where B 0 = P B. Now (9.3) is solvable if and only if di |b0i for 1 ≤ i ≤ k, in which case the general solution is given by yi = b0i /di , 1 ≤ i ≤ k , yk+j = λj , 1 ≤ j ≤ n−k, where the λj are arbitrary integers. In order to understand how to diagonalize a matrix lets recall the following notions from matrix theory. Definition 9.2.1. Let A, B be two k × n matrices over a Z. We say that A and B are equivalent, and write A ∼ B, if there exist invertible matrices P, Q over Z with P AQ = B. Note 9.2.1. i) Plainly, equivalence of matrices as defined above is an equivalence relation. ii) If A is equivalent to a diagonal-type matrix D, then D is called a diagonalization of A. Definition 9.2.2. i) There are three types of elementary row operations: multiplying a row by a -1, (−1Ri ), adding an integer multiple of one row to another (λRi + Rj ), and swapping two rows, (Ri ↔ Rj ). The same applies to column operations. ii) An elementary matrix is a matrix obtained from an identity matrix by a single elementary row operation. Note 9.2.2. i) If E is an elementary matrix obtained by a particular row operation and EA is defined, then EA is the matrix obtained by performing the same row operation on A.      1 3 a b a + 3c b + 3d 3R2 + R1 : = 0 1 c d c d ii) Any elementary matrix can also be obtained from the identity matrix by an elementary column operation. iii) If E is an elementary matrix obtained by a particular column operation and AE is defined, then AE is the matrix obtained by performing the same column operation on A.      a b 1 3 a b + 3a 3C1 + C2 , = c d 0 1 c d + 3c

9.2. SYSTEMS OF LINEAR EQUATIONS

83

iv) Performing a sequence of elementary row operations with corresponding matrices E1 , E2 , . . . , Ej and elementary column operations F1 , F2 , . . . , Fk on a matrix A, produces the equivalent matrix Ej Ej−1 · · · E1 AF1 F2 · · · Fk . Theorem 9.2.1. Diagonalization Theorem. Let A be a k×n matrix with integer entries. Then there exists a k × k matrix P and an n × n matrix Q with integer entries such that P and Q are invertible over Z and P AQ is a diagonal-type matrix. Proof. It suffices to show that A can be diagonalized by a finite sequence of elementary row and column operations. We may assume that A is a nonzero matrix, that (by row and column swaps if necessary) a11 is the entry with minimum absolute value, and multiplying by -1 if necessary, a11 > 0. If a11 does not divide some entry in the first row of A, say a1j , then a1j = qa11 + r with 0 < r < a11 . Thus by subtracting qC1 from Cj , we obtain the entry r in the 1j-th position, and can now move r to the 1,1-th position by a column swap. Repeating this process finitely many times yields a matrix in which every entry in the first row is a multiple of the entry in the 1-1-position. Subtracting appropriate multiples of the first column from the others anihilates all entries in the first row except the first. The same procedure can be applied to the first column of A   using a11 0 elementary row operations, thus yielding a matrix of the form . The 0 A0 proof now proceeds by induction on the number of rows of the matrix A.  Note 9.2.3. i) The diagonal matrix D obtained in the Diagonalization Theorem is not unique. For example       4 0 6 0 2 0 ∼ ' . 0 6 0 4 0 12 ii) The diagonal matrix is unique if we impose a further constraint: Any matrix of rank r over Z is equivalent to a unique diagonal-type matrix diag(d1 , d2 , . . . , dr , 0, . . . , 0), with the di positive integers satisfying d1 |d2 | · · · |dr . The values di are called the invariant factors of A. We will not concern ourselves with the uniqueness statement here. iii) The Diagonalization Theorem, together with the uniqueness statement, generalizes to any principal ideal domain.     6 4 2 2 0 0 Example 9.2.1. Show that ∼ , over Z. Our task is to 8 10 4 0 2 0 show that the first matrix can be transformed into the diagonal-type matrix by a sequence of elementary row and column operations. We follow the strategy given in the proof above, starting by moving the smallest entry to the upper left corner: C1 ↔ C3 , −2C1 + C2 , −3C1 + C3 , −2R1 + R2 , 2C2 + C3 , yields           6 4 2 2 4 6 2 0 0 2 0 0 2 0 0 ∼ ∼ ∼ ∼ 8 10 4 4 10 8 4 2 −4 0 2 −4 0 2 0 Example 9.2.2. Diagonalize the matrix A = [6, 21], and determine the matrices P, Q such that P AQ = D. To diagonalize A, we simply perform the Euclidean algorithm. First, we subtract 3 · 6 from 21 to get [6, 3], then subtract 2 · 3 from 6

84

9. DIOPHANTINE EQUATIONS

to get [0, 3], and finally, swap the two columns. Thus     1 −3 1 0 0 1 [6, 21] = [3, 0], AQ = D, 0 1 −2 1 1 0   0 1 say where A = and D = [3, 0]. Rather than multiply the three matrices 1 0 to compute Q, it is more efficient to augment the original matrix with an identity matrix from the start, and just keep track of the cumulative effect of the column operations as we go along.         6 21 6 3 0 3 3 0 1 0  → 1 −3 →  7 −3 → −3 7  . 0 1 0 1 −2 1 1 −2   −3 7 Thus Q = . There were no row operations, so P is trivial, P = [1]. 1 −2 Example 9.2.3. Determine for what integers a, b the following system is solvable and find the general solution when it is solvable. 4x1 + 6x2 − 2x3 = a, 2x1 − 8x2 + 10x3 = b. We form the following augmented matrix and then perform elementary row and column operations in order to diagonalize the coefficient matrix. (There is no need to find the matrix P , but if we wanted to do so, we would also augment the matrix with a 2 × 2 identity matrix to keep track of the row operations.)   4 6 −2 a 2 −8 10 b     1 0 0    0 1 0 0 0 1 Swapping rows 1 and 2 (R1 ↔ (−2R1 + R2 ) we obtain  2 −8 10 4 6 −2  1 0 0  0 1 0 0 0 1

R2 ) and then adding -2 times row 1 to row 2  b a  ,  

 2 0  1  0 0

−8 22 0 1 0

 10 b −22 a − 2b   0   0 1

Next, add 4 times column one to column 2, (4C1 + C2 ), and -5 times column one to column 3, and then add column 2 to column 3, (C2 + C3 .)     2 0 0 b 2 0 0 b 0 22 −22 a − 2b 0 22 0 a − 2b     1 4  , 1 4 −1 . −5     0 1  0 1  0 1 0 0 1 0 0 1

9.3. PYTHAGOREAN TRIPLES

85

The change of variables matrix Q is the bottom three rows, and the diagonal system we end up with is 2y1 = b 22y2 = a − 2b. Thus the system is solvable if and only if 2|b and 22|(a − 2b), in which case the solution to the diagonalized system is y1 = b/2, y2 = (a − 2b)/22, y3 = t, where t is any integer. The general integer solution to the original system is then given by X = QY , that is,      x1 1 4 −1 b/2 x2  = 0 1 1  (a − 2b)/22 x3 0 0 1 t or 2 b x1 = + (a − 2b) − t, 2 11 a − 2b x2 = + t, 22 x3 = t, with t ∈ Z. Homework 9.2.1. a) Solve the system 4x + 6y − 2z = 0, 2x − 8y + 10z = 1, by forming the augmented matrix, and diagonalizing the coefficient matrix. Show that it is equivalent to the system 2y1 = 1, 22y2 = −2, and therefore there is no solution. b) Solve the same system with constants 10 and -6 on the righthand side. Make use of the matrix Q determined in part a). Show that the general solution can be expressed in the manner (x, y, z) = (1, 1, 0) + λ(−1, 1, 1), with λ ∈ Z. Homework 9.2.2. 1) Solve the system x1 −3x1 −x1

+ − −

x2 x2 x2

+ 4x3 + 0 + 2x3

+ 2x4 − 6x4 − 2x4

= 5 = 3 = 1

2) Solve the system 5x − 2y − 4z = 1. 3) For what values a, b ∈ Z can we solve the system 2x +3y x −4y

−z = a +5z = b

9.3. Pythagorean Triples Our goal in this section is to find all integer solutions of the equation, (9.4)

u2 + v 2 = w2 .

Such triples are called Pythagorean triples: (3,4,5), (5,12,13), (8,15,17) etc.. Letting x = u/w, y = v/w we consider the equivalent problem of finding all rational solutions of the equation x2 + y 2 = 1, that is, of finding all rational points on the circle C : x2 + y 2 = 1.

86

9. DIOPHANTINE EQUATIONS

1.0

C

B

−1.0

0

1.0

−1.0

a

Let Lm be the line of slope m passing through (−1, 0), with equation y = m(x + 1). Then Lm intersects the circle C at the point 1 − m2 2m , y= . 2 1+m 1 + m2 Since any rational m leads to a rational point (x, y) in (9.5) and conversely any rational point (x, y) on the circle leads to a rational slope m, we see that (9.5) parameterizes all points on the circle with rational coordinates, except for the point (−1, 0). Suppose now that u, v, w is a positive primitive solution of (9.4), that is, u, v, w > 0 and (u, v, w) = 1. In particular, (u, v) = (u, w) = (v, w) = 1. Set m = a/b with (a, b) = 1. Then we obtain

(9.5)

(9.6)

x=

u b2 − a2 = 2 , w b + a2

v 2ab = 2 , w b + a2

and so (9.7)

u(b2 + a2 ) = w(b2 − a2 ),

v(b2 + a2 ) = 2abw.

Now it is clear that for any primitive solution of (9.4) one of u, v is odd and the other is even. Without loss of generality, say u is odd. Then we must have a, b of opposite parity, for otherwise, if both are odd, then by (9.7) 2|u a contradiction. Then (b2 + a2 , b2 − a2 ) = (b2 + a2 , −2a2 ) = (b2 + a2 , a2 ) = (b, a) = 1, and (b2 + a2 , 2ab) = (b2 + a2 , ab) = (b2 + a2 , a)(b2 + a2 , b) = 1, and so we must have u = ±(b2 − a2 ), w = ±(b2 + a2 ) and v = ±2ab. Theorem 9.3.1. The set of positive primitive solutions of the equation u2 +v 2 = w with v even and u odd is given by u = b2 − a2 , v = 2ab and w = a2 + b2 where 0 < a < b are integers with (a, b) = 1 and a, b have opposite parity. 2

9.4. Rational Points on Conics The method above can be used to find all rational points on any conic. Let Q(x, y) be a quadratic polynomial with rational coefficients, Q(x, y) = Ax2 + Bxy + Cy 2 + Dx + Ey + F. We wish to find all rational points on the conic (9.8)

Q(x, y) = 0,

starting from a given rational point (x0 , y0 ) satisfying (9.8). Let Lm be a line of slope m passing through the given point, with equation y = mx + b. To find the second point of intersection of Lm with the conic, we must solve the quadratic equation Q(x, mx + b) = 0. Since one of the zeros of the quadratic equation is the

9.5. THE EQUATIONS x4 + y 4 = z 2 AND x2 + 4y 4 = z 4

87

rational point x0 , the second zero is also rational, and can be explicitly written down. When does a rational point (x0 , y0 ) on a conic exist? By completing the square and clearing denominators and common factors, we may assume the conic has the form ax2 + by 2 = c, with a, b, c pairwise relatively prime integers, c ≥ 0. Solving the above equation over Q is equivalent to solving au2 + bv 2 = cw2 , in integers u, v, w. A necessary condition for solvability is that it have a real solution. Since c ≥ 0, we must have a ≥ 0 or b ≥ 0. Another necessary condition for solvability over Z, is that each of the following congruences is solvable: au2 + bv 2 ≡ 0 2

(mod c) 2

(mod a)

au2 ≡ cw2

(mod b)

bv ≡ cw

that is, ab is a square (mod c), bc is a square (mod a) and ac is a square (mod b). Theorem 9.4.1 (Legendre). If the necessary conditions listed above are all satisfied then there exists an integer solution to the equation au2 + bv 2 = cw2 . We prove Legendre’s Theorem in Section 13.7. Homework 9.4.1. Find all rational points on the ellipse x2 + 5y 2 = 1. 9.5. The Equations x4 + y 4 = z 2 and x2 + 4y 4 = z 4 Theorem 9.5.1. The equation x4 +y 4 = z 2 has no integer solution with xy 6= 0. The same is true for the equation x2 + 4y 4 = z 4 . We immediately deduce the following corollary. Corollary 9.5.1. The Fermat equation x4 + y 4 = z 4 has no integer solution with xy 6= 0. Proof of Theorem 9.5.1. The proof is by Fermat’s method of infinite descent. We claim that given a solution of the equation (9.9)

x4 + y 4 = z 2 ,

we can obtain a corresponding solution of the equation (9.10)

u2 + 4v 4 = w4 ,

with w < z, and conversely, given a solution of (9.10), we can obtain a corresponding solution of (9.9) with z = w. This leads to an endless chain of solutions of (9.9) with smaller and smaller z coordinates, a contradiction. Suppose that (x, y, z) is a given solution of (9.9). Replacing (x, y, z) with (x/d, y/d, z/d2 ), where d = gcd(x, y, z), we may assume that the solution is primitive. Thus (x2 , y 2 , z) is a primitive Pythagorean triple, and so, assuming x is odd and y is even, x2 = b2 − a2 , y 2 = 2ab, z = a2 + b2 for some a, b ∈ Z with 0 < a < b, gcd(a, b) = 1, b odd and a even. Since 2ab is a perfect square, we must have a = 2v 2 , b = w2 for some v, w ∈ Z

88

9. DIOPHANTINE EQUATIONS

with gcd(v, w) = 1. Thus x2 = w4 − 4v 4 , that is setting u = x, u2 + 4v 4 = w4 . We note that w ≤ w4 = b2 < z. Next, suppose that (u, v, w) is a primitive solution of (9.10). Then (u, 2v 2 , w2 ) is a primitive Pythagorean triple, and so there exist integers a, b with gcd(a, b) = 1, 0 < a < b, b odd, a even, with u = b2 − a2 , 2v 2 = 2ab, w2 = b2 + a2 . In particular, we must have a = x2 , b = y 2 for some x, y ∈ N. Thus, setting z = w, we have z 2 = x4 + y 4 , and we have obtained a solution of (9.9) with smaller value of z than we started with.  9.6. Cubic Curves Example 9.6.1. Find all rational points on the cubic y 2 = x3 − 3x + 2 = (x − 1)2 (x + 2). 4.0

2.0

A

−6.0

−4.0

−2.0B

0

2.0

4.0

6.0

−2.0

−4.0

a

√ Sketch graph. There are two branches y = ±(x − 1) x + 2, in particular we need x ≥ −2. There is a “double point” at (1, 0). Our first attempt might be to draw a line of slope m through (-2,0) to find new rational points on curve by intersecting the line and curve, but this fails in general. The other points of intersection may not be rational. Next we try lines through the double point (1,0), y = m(x − 1). This yields the parametrization, x = m2 − 2,

y = m(m2 − 3),

with m ∈ Q. This also gives us all integer points on the curve, for in order for x to be an integer, we must have that m is an integer. We note that the above example worked because the curve had a double point. We need to define this concept. Let f (x, y) be any polynomial with real coefficients, and Cf (R) be the points on the curve f (x, y) = 0 with real coordinates, Cf (Q) the points with rational coordinates, etc. Definition 9.6.1. 1) A point P on Cf is called a singular point if both partial derivatives vanish at P . 2) A point that is not singular is called a nonsingular point or a simple point. 3) A singular point is called a double point if one of the second partial derivatives at P is nonzero. It is called a triple point if all second partial derivatives are zero but some third order partial is nonzero, etc.

9.6. CUBIC CURVES

89

4) Cf is called a nonsingular curve if every point is simple. In the above example we have fx = 3x2 − 3 = 3(x2 − 1), fy = 2y, and so there is a singularity at (1, 0). It is a double point since fyy = 2 6= 0. Note that at this point there are two tangent lines to the curve, one for each branch. Lemma 9.6.1. At any simple point P = (x1 , y1 ) there is a unique well defined tangent line to the curve given by (x − x1 , y − y1 ) · (fx (P ), fy (P )) = 0. 9.6.1. The Method of Chords. We need the following elementary lemma, the proof being left to the reader. Lemma 9.6.2. If f (x) is a cubic polynomial with coefficients in a given field F , having two of its zeros (counted with multiplicity) are in F . Then all three zeros are in F . Let f (x, y) be a cubic polynomial over a given field F defining the cubic curve Cf (F ). Suppose that P = (x1 , y1 ), Q = (x2 , y2 ) are two given points on Cf with coordinates in F , and x1 6= x2 . Let L be the line joining P and Q, say y = mx + b where m, b ∈ F. Intersect L with Cf . We have p(x) = f (x, mx + b) = 0, where p(x) has rational coefficients. Suppose that p(x) is a cubic polynomial; in general such will be the case. Now p(x1 ) = 0 and p(x2 ) = 0 and so by the preceding lemma, p(x) has a third zero x3 in F . Since y3 = mx3 + b we see that y3 ∈ F also. Thus in this manner we obtain a new point R = (x3 , y3 ), with coordinates in F . (It may be the case that R = P or R = Q.) If x1 = x2 then the line joining P and Q is just x = x1 , and so the intersection is given by p(y) = f (x1 , y) = 0 which in general will be a cubic equation with two roots in F , and so the third root is in F . The process above allows us to construct a new points on the curve with coordinates in F from two given points with coordinates in F , but does not give a parametrization of all points with coordinates in F , such as we had with conics.

9.6.2. Method of Tangent Lines. Suppose now that P = (x0 , y0 ) is a simple point on the cubic curve f = 0 with coordinates in a given field F , say without loss of generality, fy (P ) 6= 0. Let L be the tangent line to the curve at P say L is given by y = mx + b with m = −fx (P )/fy (P ), an element of F . Let h(x) = f (x, mx + b). Then h(x) has rational coefficients and in general it will be of degree 3. Note that x0 is a double root of h(x) since h0 (x0 ) = fx (P ) + fy (P )m = 0. Thus the third root of h(x) also is in F . Example 9.6.2. Our goal in this example is to find new rational points on the curve x3 + y 3 = 9, starting from the given point (2,1), by the method of tangent lines. Equivalently, we are obtaining integer solutions of the equation x3 +y 3 = 9z 3 , an equation very similar to the Fermat equation x3 + y 3 = z 3 , for which it is known that there are no nontrivial integer solutions.

a

b

f 90

9. DIOPHANTINE EQUATIONS

4.0

D

B

2.0 A

−2.0

0

−2.0

2.0

4.0

C

Let A = (2, 1). Then, evaluated at A we have fx = 12, fy = 3, and so the slope of the tangent line is m = −4. The tangent line at (2,1) is therefore given by y = −4x + 9. The third point of intersection is obtained by solving x3 + (−4x + 9)3 = 9, that is, 7x3 −48x2 +108x−80 = 0. Since 2 is a double zero (x−2)2 is a factor of the polynomial, and dividing out (x − 2)2 we see that (7x − 20) is the remaining factor. Thus we obtain the point of intersection C = (20/7, −17/7) = (2.86, −2.43). Next, we draw the tangent line to the curve at C and intersect it with the curve to find a new point D. If we write P1 = C, P2 = D, and in general Pn = (xn /zn , yn /zn ), the n-th new point obtained this way, then one finds that xn+1 = xn (x3n + 2yn3 ),

yn+1 = −yn (2x3n + yn3 ),

zn+1 = zn (x3n − yn3 ).

In particular, D = P2 = (−36520/90391, 188479/90391) = (−.404, 2.085). By induction it is easy to see xn , yn and zn are pairwise relatively prime, that yn and zn are odd and that 2n+1 ||xn . Thus the points are always distinct and we obtain an infinite family of rational points on the curve. Homework 9.6.1. What happens if you apply the method of chords to construct a new point in the previous example from the given points (2, 1), (1, 2).

CHAPTER 10

Elliptic Curves 10.1. Definition of an Elliptic Curve Let Cf (R) be the curve defined by the cubic equation (10.1)

y 2 = p(x) := x3 + ax2 + bx + c

with a, b, c real numbers. Here, f = f (x, y) := y 2 − p(x). If the polynomial p(x) := x3 + ax2 + bx + c has distinct complex zeros, then the curve Cf is called an elliptic curve. The condition that the zeros be distinct is is equivalent to saying that the curve is nonsingular. Indeed at any singularity we must have y 2 = x3 + ax2 + bx + c,

2y = 0,

p0 (x) = 0,

and

and therefore x must be a double zero of p(x). Writing p(x) = (x − r1 )(x − r2 )(x − r3 ), where r1 , r2 , r3 are the complex zeros of p(x), the discriminant of p(x) is defined by D = (r1 − r2 )2 (r1 − r3 )2 (r2 − r3 )2 . In terms of the coefficients, we have the formula D = a2 b2 − 4a3 c − 4b3 + 18abc − 27c2 . To obtain the formula, one starts by establishing that for the cubic polynomial x3 − Ax − B, the discriminant is D = 4A3 − 27B 2 . Observing that D is invariant under translation, we substitute x − a/3 for x in p(x) to obtain (x − a/3)3 + a(x − a/3)2 + b(x − a/3) + c = x3 + (b −

2a3 ba a2 )x + ( − + c) 3 27 3

and then apply the formula D = 4A3 − 27B 2 . Note 10.1.1. The following conditions are equivalent. (1) p(x) has distinct roots. (2) The curve C defined by (10.1) is nonsingular over R. (3) The curve C defined by (10.1) is nonsingular over C. (4) The discriminant of p(x) is nonzero. There are two types of nonsingular curves we can obtain in the real plane. If D > 0, then p(x) has three distinct real roots and so the curve C(R) consists of two components. If D < 0, then p(x) has one real zero, and so the curve consists of a single component. 10.2. Addition of Points on an Elliptic Curve The condition of non-singularity is important for it will imply that there is a unique, well defined tangent line at every point of the curve C. Moreover, as we saw in the previous section, the tangent line to the curve intersects the curve at a unique third point, provided that the line is not vertical. If the curve is defined 91

92

10. ELLIPTIC CURVES

over Q and we start with points with rational coordinates then the third point also c d has rational coordinates. Definition 10.2.1. If the tangent line to the curve Cf given above has intersection multiplicity 3 at the point of tangency, then the point is called an inflection point of the curve. It is clear that any (non-vertical) chord joining two points on the curve will intersect the curve at a third point, and that if we start with rational coordinates then the third point has rational coordinates.

b

Example 10.2.1. Consider the curve y 2 = x3 + 1. D = −27B 2 = −27 < 0. Graph consists of a single component. Now (−1, 0), (0, ±1), and (2, ±3) are all points on the graph with integer coordinates. C

2.0 B

A f

−2.0

g

0

2.0

4.0

E

−2.0 D

a

Note that the tangent line at (2, 3) intersects the curve at (0, −1) and the tangent line at (2, −3) intersects at (0, 1). The line joining (−1, 0) and (0, −1) meets the curve at (2, −3) etc. Thus we have a closed system of points in the sense that chords and tangents yield no new points. This suggests that we define a group structure on these points. But we need an identity element before we can proceed. We introduce a new point at infinity P∞ , and say that every vertical line passes through this point. Thus, every line that intersects the curve meets the curve exactly 3 times, where tangency is counted twice. For any two points P , Q on the curve let P Q denote the third point of intersection. P P denotes the intersection of the tangent line with the curve. If P is an inflection point then P P = P . P P∞ denotes the intersection of a vertical line through P . Thus if P = (x, y), P P∞ = (x, −y). P∞ P∞ = P∞ . We define addition of points by saying P + Q := (P Q)P∞ . It is easy to see that P∞ is an identity element with respect to this law, since P∞ + P = (P∞ P )P∞ = (x, −y)P∞ = (x, y) = P.

10.3. THE PROJECTIVE PLANE

93

Thus, if we let G := {P∞ , A = (−1, 0), B = (0, 1), C = (2, 3), D = (2, −3), E = (0, −1)}, we see that A + B = D, 2A = A + A = P∞ , 2B = B + B = E, 3B = E + B = P∞ , 2C = B, 3C = B + C = A, 4C = A + C = E, 5C = E + C = D, 6C = D + C = P∞ . Thus G is a cyclic group of order 6 generated by C. 10.3. The Projective Plane We now wish to make precise the concept of a point at infinity. To do this we need to use the projective plane P2 (R). Start by defining an equivalence relationship on R3 − {0} by saying that (x, y, z) ∼ (x0 , y 0 , z 0 ) if there exists a nonzero λ ∈ R such that (x, y, z) = λ(x0 , y 0 , z 0 ), that is, the two points lie on a straight line through the origin. Definition 10.3.1. The real projective plane is P2 (R) := R3 − {0}/ ∼. We shall denote elements by (a : b : c). Thus (a : b : c) is the equivalence class consisting of all nonzero multiples of the vector (a, b, c). Note that (0 : 0 : 0) is not a point of projective space. The ordinary plane R2 is called the affine plane. We divide the real projective plane into two sets, those with third coordinate not zero, called finite points, and those with third coordinate 0, called points at infinity. Every finite point has a unique representative of the form (a : b : 1), and thus the set of finite points can be identified with the plane z = 1 in affine space R3 . Another way to view P2 (R) is to identify it with the set of points on a sphere of radius 1 centered at the origin where antipodal points are regarded as equal. Every finite point has a unique representative on the upper hemisphere z > 0, while every point at infinity has a unique representative on the equator, z = 0. A polynomial in 3 variables x, y, z is called homogeneous of degree d if every monomial has total degree d. Second degree homogeneous polynomials are called quadratic forms, third degree, cubic forms. If F (x, y, z) is homogeneous of degree d > 0, then we can view the set of solutions of the equation F (x, y, z) = 0 as a set of points in R3 or as a set of points in P2 (R). The latter interpretation is well defined because if F (x, y, z) = 0 then F (λ(x, y, z)) = λd F (x, y, z) = 0, and so it doesn’t matter which representative we choose. Example 10.3.1. Let F (x, y, z) = x2 − y 2 − z 2 . In projective space we can view the solution set as a hyperbola on the z = 1 plane union with the two points (1 : 1 : 0) and (1 : −1 : 0). The latter two points are points at infinity. If we view the projective plane as the upper hemisphere we see that the hyperbola is just a single component connected by the two points at infinity. Suppose now that f (x, y) is a polynomial in two variables of degree d with coefficients in R. We can homogenize f by forming the polynomial F (x, y, z) = fH (x, y, z) = z d f (x/z, y/z). P P Note that if f (x, y) = i,j aij xi y j then F (x, y, z) = i,j aij xi y j z d−i−j , a homogeneous polynomial of degree d, since i + j ≤ d for all i, j, and i + j + d − i − j = d.

94

10. ELLIPTIC CURVES

Example 10.3.2. Let f (x, y) = x2 − y 2 − 1. In R2 the curve Cf is a hyperbola with asymptotes x = ±y. Let F = fH = x2 − y 2 − z 2 . Then in the projective plane we obtain the two additional points at infinity (1 : ±1 : 0). Note, these points correspond to the asymptotes of the hyperbola, and can be thought of as points obtained by letting x and y go to infinity. 10.4. Elliptic curves in the projective plane Let f (x, y) = y 2 −(x3 +ax2 +bx+c). Then F = fH = zy 2 −x3 −azx2 −bz 2 x−cz 3 . The points at infinity (z = 0) satisfy x3 = 0, that is x = 0. Thus there is a single point at infinity, P∞ = (0 : 1 : 0). Thus an elliptic curve in the plane has a single point at infinity in the projective plane. Note also that the elliptic curve is nonsingular viewed as a curve in the projective plane, meaning that there is no point satisfying Fx = Fy = Fz = 0. Homework 10.4.1. Verify that an elliptic curve in the plane remains nonsingular in the projective plane. A line with equation Ax + By = C in the affine plane, has equation Ax + By = Cz in the projective plane and thus can be interpreted as a great circle on the unit sphere. Vertical lines (B = 0) correspond to great circles passing through (0 : 1 : 0), the point at infinity on the elliptic curve. What is the tangent line to the curve at P∞ ? The normal vector is (0 : 0 : 1), thus z = 0, is the tangent plane, and its only point of intersection with the curve is at P∞ . Thus P∞ is an inflection point. 10.5. The Elliptic Curve as an abelian Group Let Cf = Cf (R) be an elliptic curve as defined in (10.1) with point at infinity P∞ in the projective plane. For any two points A, B on C let AB denote the third point of intersection (in the projective plane) of the chord joining A and B with the understanding that AA denotes the third point of intersection of the tangent line to the curve at A. Definition 10.5.1. We define addition of points on Cf by A + B = (AB)P∞ . Thus, if A, B are finite points with different x-coordinates, then A + B is obtained by drawing the line through A and B intersecting the curve in a third point C and then reflecting C through the x-axis. If A is a point with a non-vertical tangent line, then A+A is obtained in the same manner, using the tangent line at A instead. Lemma 10.5.1. Let Cf be an elliptic curve with point at infinity P∞ . i) P∞ is the zero element with respect to the addition law on Cf . ii) Addition of points on Cf is commutative. iii) Every point on Cf has an additive inverse. If P = (x, y) is a finite point then −P = (x, −y). Proof. i) First, P∞ + P∞ = (P∞ P∞ )P∞ = P∞ P∞ = P∞ . Suppose next that A is a finite point. Then A + P∞ = (AP∞ )P∞ = A0 P∞ = A, where A0 = (x, −y) if A = (x, y). The opposite direction P∞ + A = A follows from ii). ii) Commutativity follows trivially from the observation that AB = BA. iii) Let P = (x, y) be a finite point on Cf and P 0 = (x, −y). Then P + P 0 = (P P 0 )P∞ = P∞ P∞ = P∞ . By ii) we also have P 0 + P = P∞ . Therefore P 0 is the additive inverse of P . 

10.5. THE ELLIPTIC CURVE AS AN ABELIAN GROUP

95

Next lets find a formula for the sum of two points on the elliptic curve. Let P = (x1 , y1 ) and Q = (x2 , y2 ) be any two points on the elliptic curve. Suppose that x1 6= x2 . Let y = mx + k be the line connecting P and Q, so that m = (y2 − y1 )/(x2 − x1 ) and k = y1 − mx1 = (y1 x2 − y2 x1 )/(x2 − x1 ). Then we have (mx + k)2 = x3 + ax2 + bx + c. Now, x1 , x2 are both roots of this cubic and so the third root, say x3 must satisfy x1 + x2 + x3 = m2 − a, that is (10.2)

x3 = m2 − a − x1 − x2 ,

y3 = m(x3 − x1 ) + y1 .

The formula in (10.2) gives us an easy way to calculate x3 , y3 . We first find m, then x3 , and then y3 . By definition, we then have P + Q = (x3 , −y3 ). Suppose now that P = Q. We can obtain 2P = P + P by continuity. Letting Q → P in the above formula, we see that m approaches the slope of the tangent line to the curve at P . Now 2yy 0 = p0 (x) and so m → p0 (x1 )/2y1 . Thus we have (10.3) (3x21 + 2ax1 + b) (3x21 + 2ax1 + b)2 −a−2x1 , and y3 = (x3 −x1 )+y1 . x3 = 2 4y1 2y1 Lemma 10.5.2. The addition law on an elliptic curve is associative. Proof. There are several approaches. The first one is just a brute force approach using the formula obtained in (10.2) and (10.3). We shall just sketch the ideas of a geometric proof, leaving the details to the reader. The first observation that we make is that a general homogeneous cubic polynomials in three variables is of the form F (x, y, z) = a1 x3 +a2 x2 y+a3 x2 z+a4 xyz+a5 xy 2 +a6 xz 2 +a7 y 3 +a8 y 2 z+a9 yz 2 +a10 z 3 , and so it can be identified with a vector (a1 , a2 , . . . , a10 ) in R10 . Suppose that we are given 8 distinct points in the projective plane, and wish to describe all cubic curves F (x, y, z) = 0 passing through them. This gives us a system of 8 linear equations in the unknowns a1 , . . . , a10 . Thus, assuming these points are in generic position, we would expect the null space to be of dimension 2 (as a subspace of R10 ). In fact, this is the case if no four of the points are colinear and no seven of the points lie on a conic. If four zeros of F are colinear, say zeros of the linear form L(x, y, z), then in fact we must have L(x, y, z)|F (x, y, z). If seven zeros of F lie on a conic Q(x, y, z) = 0, then in fact we must have Q(x, y, z)|F (x, y, z). We shall assume these facts from geometry here without proof. It follows under this assumption on the eight points that any three cubic forms F1 , F2 , F3 vanishing on the given set of 8 points must be linearly dependent, that is, there exist real numbers λ, β, γ, such that λF1 + βF2 + γF3 = 0. In particular, if no two Fi are multiples of one another (so that λβγ 6= 0), then any common zero of any two of the three forms must also be a zero of the third. Let f1 be the elliptic curve y 2 = p(x), f2 = L(A, B)L(BC, B + C)L(A + B, C), and f3 = L(B, C)L(AB, A+B)L(B +C, A), where for any two points P, Q, L(P, Q) denotes the linear polynomial ax+by −c such that ax+by = c is the equation of the line passing through P, Q. Let F1 , F2 , F3 denote the homogenizations of f1 , f2 , f3 . Then F1 , F2 and F3 have eight common zeros A, AB, A + B, B, BC, B + C, C, P∞ . We make the further assumption that the eight points are distinct. Since F1 is

E

96

10. ELLIPTIC CURVES

irreducible, it has no linear or quadratic factor and therefore no four of the points are colinear and no seven lie on a conic. Now (A + B)C is a common zero of F1 and F2 and so (A + B)C must also be a zero of F3 . But the only common zeros of F1 and F3 are the eight above and (B + C)A. Therefore (A + B)C = (B + C)A, and so (A + B) + C = A + (B + C).  AB

6 B+C B

4

A

2

y 2 = x3 − 8x + 8

−4

−2

0

C

2

4

−2

(A+B)+C=A+(B+C)

−4 BC

−6 A+B

Note 10.5.1. Although in the above discussion we restricted our attention to elliptic curves over the real numbers, we can just as well define an elliptic curve over any field F , and form the abelian group Cf (F ) in the same manner as above. Theorem 10.5.1. Mordell’s Theorem. Suppose that Cf is an elliptic curve defined by the cubic equation y 2 = x3 + ax2 + bx + c, with rational coefficients

10.6. THE POLLARD (p − 1)-METHOD OF FACTORIZATION

97

a, b, c. Then the group of points Cf (Q) with rational coordinates under the addition law above, is a finitely generated abelian group. Recall that any finitely generated abelian group G is isomorphic to a direct sum tor(G) + free(G) where tor(G) is the torsion subgroup of G consisting of all elements of finite order and free(G) is the free part of G, free(G) ' Zr where r is the rank of G. Thus an elliptic curve has positive rank if and only if it has infinitely many rational points, and has rank 0 if and only if it has finitely many rational points. The following theorem is useful for finding tor(G). Theorem 10.5.2. Lutz-Nagell Theorem. Suppose that Cf (Q) is an elliptic curve given by the equation y 2 = x3 + ax2 + bx + c, where a, b, c are integers. If (x0 , y0 ) is a rational point of finite order on the curve, then x0 , y0 are both integers. Moreover, either y0 = 0 or y02 divides the discriminant of x3 + ax2 + bx + c. Example 10.5.1. Consider the curve y 2 = x3 + 1 we visited earlier. D = −27. Suppose that (x, y) is a point of finite order on the curve, with rational coordinates. If y = 0, then x = −1. Otherwise, we must have y 2 |27 and so y = ±1, ±3, and this gives us the 6 points we had before. Thus tor(G) is a cyclic group of order 6. It is still possible that the curve might have a rational point of infinite order, but in fact it has been shown that this is not the case. Determining the rank of an elliptic curve over the rational numbers is an open problem. One of the seven Millennium Prize Problems listed by the Clay Mathematics Institute is the Birch and Swinnerton-Dyer conjecture, which states that the rank of the elliptic curve is the order of the zero of an associated L-function at s = 1. 10.6. The Pollard (p − 1)-method of Factorization Before describing Lenstra’s Elliptic curve method of factorization, we start by describing the Pollard (p − 1)-method of factorization. Let m be a large number that we wish to factor. For k ∈ N, let Lk = [2, 3, 4, 5, 6, . . . , k]. Suppose that m has a prime factor p such that (p − 1)|Lk , that is, all prime power factors of p − 1 are less than or equal to k. Then 2Lk ≡ 1 (mod p), that is, p|(2Lk − 1). Let d = (m, 2Lk − 1). Then p|d, so d > 1. If, in addition d < m, then d is a proper divisor of m bigger than 1, and so we have made progress factoring m. The value d := (m, 2Lk − 1) can be calculated very rapidly. First, one uses modular exponentiation to calculate r := 2Lk (mod m), with 0 < r < m. Note that d = (m, r − 1). Then one calculates d using the Euclidean algorithm. To implement the algorithm we we let k = 100, 200, 300..., until we get a value of d with 1 < d < m. (Of course, we may not succeed in getting such a d.) When is Pollard p − 1 algorithm successful? The algorithm above will succeed in factoring m provided that m has an odd prime divisor p such that all of the prime power divisors of p − 1 are ≤ k, and m - 2Lk − 1. A certain percentage of large numbers having all small prime power divisors, and so we do have some success with this method, but it does not work for many numbers. For instance if m = pq, a product of two primes p, q such that p − 1 and q − 1 both have a large prime divisor, then we would have no success.

98

10. ELLIPTIC CURVES

Homework 10.6.1. Show that the probability that a number chosen at random from 1 to x will have all of its prime power divisors ≤ 100 is roughly log5 x . Thus, if x is a 20 digit number the likelihood of success is about 10%, while if x is a 100 digit number, the likelihood of success is about 2%. In the Pollard p − 1 method we are working in the multiplicative group G(p), a group of order p − 1. The method is successful provided that the order of the group has all small prime power divisors. Goal: Find a similar algorithm that is successful in factoring m provided that m has a prime divisor p such that, for some small value of k, p ± k has all small prime power divisors. In order to do this, we need a group of order p ± k to be working in. The group we use is an elliptic curve group. 10.7. Elliptic Curve Method of Factorization The idea of Lenstra’s Elliptic Curve Method is to work over the additive group of points on an elliptic curve defined over the finite field Fp . Let m be a large number we wish to factor. We may assume that m has no prime divisor p < 105 , and so we will only search for prime factors p > 105 . For A = 1, 2, 3, . . . , 10000, let CA (Fp ) be the elliptic curve over Fp defined by y 2 = x3 − Ax + A.

(10.4)

The discriminant of the curve is D = 4A3 − 27A2 = A2 (4A − 27). D 6= 0 in Fp provided that p - A and p - 4A − 27. Since 4A − 27 < 105 < p we see that D 6= 0 and so we have an elliptic curve. Let Gp (A) denote the additive group of points on the curve CA (Fp ) with coordinates in Fp . and let N = |Gp (A)|. It is known by a theorem of Hasse, that √ |N − (p − 1)| ≤ 2 p, √ that is, N = p + k for some k with |k| ≤ 2 p + 1. For each value of A we obtain a new value for k. This time, we are successful in obtaining a nontrivial factor of our given number m, provided that any one of the group orders p + k has all small prime power divisors for some prime p|m. Thus our likelihood of success is greatly increased. Lets recall the addition law for points on CA (Fp ). If P = (x1 , y1 ) and Q = (x2 , y2 ), with x1 6= x2 , and P + Q = (x3 , y3 ), then x3 = m2 − x1 − x2 ,

(10.5)

y3 = m(x1 − x3 ) − y1 ,

where m = (y2 − y1 )(x2 − x1 )−1 (mod p), and for 2P we have (10.6)

x3 =

(3x21 − A)2 − 2x1 , 4y12

and

y3 =

(3x21 − A) (x1 − x3 ) − y1 . 2y1

We start with the point P0 = (1, 1) which is clearly a point on the curve. Then we compute multiples of P0 of the form [2, 3, 4, 5, . . . k]P0 until we hit the zero element of the group, P∞ . (Note this is analogous to what we did in the Pollard (p − 1)-algorithm computing powers of 2 until we hit the identity element of the group F∗p .) But wait! We don’t know p so how we perform the group operation on the elliptic curve over Fp . The trick is to do all of the computations (mod m) instead of (mod p), and use the fact that congruence (mod m) implies congruence (mod p).

10.7. ELLIPTIC CURVE METHOD OF FACTORIZATION

99

For purposes of computation it is convenient to use the following version of the addition formulas (10.5) and (10.6) in order to obtain multiples of P0 . For convenience, we state the formulas as though we were working over the rational numbers. Say P = (u1 /w12 , v1 /w13 ), Q = (u2 /w22 , v2 /w23 ) are given points on the elliptic curve, and P + Q = (u3 /w32 , v3 /w33 ). (See Theorem 5.25 in [?] for proof.) Then we obtain, u3 = (v2 w13 − v1 w23 )2 − (u1 w22 + u2 w12 )(u1 w22 − u2 w12 )2 , v3 = −v1 w23 (u2 w12 − u1 w22 )3 − (v2 w13 − v1 w23 )u3 + w22 (u2 w12 − u1 w22 )2 u1 (v2 w13 − v1 w23 ) w3 = w1 w2 (u2 w12 − u1 w22 ), all polynomial expressions in the previous variables. Similar formulas are available for 2P = (u3 /w32 , v3 /w33 ): u3 = (3u21 − Aw14 )2 − 8u1 v12 , v3 = −8v14 − (3u21 − Aw14 )(u3 − 4u1 v12 ), w3 = 2v1 w1 . As long as (w3 , m) = 1 then the same formulas hold for modular arithmetic (mod m), where the reciprocal w13 is taken to be the multiplicative inverse of w3 (mod m). For example for our starting point P0 = (1, 1) we can take u1 = v1 = w1 = 1. Then for 2P0 we have w3 = 2, and so in the formula for 2P0 , the reciprocal 1 w3 is the multiplicative inverse of 2 (mod m). If in the computation we find that (w3 , m) > 1 then we are in even better shape. Indeed, this is the desired goal, for in this case we have obtained a divisor of m greater than 1. If p is a prime divisor of (w3 , m), then w3 ≡ 0 (mod p), and so the corresponding point on the elliptic curve over Fp is in fact the point at infinity, the zero element of the elliptic curve group. In summary, we start from P = (1, 1) and compute [2, 3, 4, 5, . . . k]P for a suitable k, using the formulas above, but working (mod m). Let w3 be the resulting value of w3 from the above calculation. Compute (w3 , m). If this is a value strictly between 1 and m we’re done. Otherwise, we change the value of A and start over. If m has a prime factor p such that |Gp (A)| is comprised of prime powers all less than k then we will succeed. The running time for this algorithm is expected to be on the order of √ e (log m)(log log m) . √ This may be compared with the brute force method which takes m trials.

CHAPTER 11

Prime Number Theory 11.1. Euler-Maclaurin Summation Formula and Estimating Factorials Let s(t) be the sawtooth function s(t) := t − btc − 12 . Theorem 11.1.1. Euler-Maclaurin Summation Formula. Let f (x) be a real valued continuously differentiable function on [m, n], where m, n ∈ Z. Then Z n Z n n X 1 f (k) = f (t) dt + (f (m) + f (n)) + f 0 (t)s(t) dt. 2 m m k=m

The theorem we have stated here is just the first order Euler-Maclaurin Summation Formula. By introducing Bernoulli polynomials and using higher order derivatives, more precise formulas can be obtained. Proof. Integrating by parts, we have for any integer k, with m ≤ k ≤ n, we have Z k+1 Z k+1 Z k+1 1 k+1 0 f (t)s(t) dt = f (t)s(t)|k − f (t) dt = (f (k+1)+f (k))− f (t) dt. 2 k k k Summing over k from m to n − 1 yields the theorem.  Theorem 11.1.2. For any positive integer n ≥ 6 we have n X log k = n log n − n + 21 log n + δn , k=1

for some δn with .91690 < δn < .93281. Consequently, for any natural number n ≥ 6, n √ (11.1) n! = cn n ne , for some real number cn with 2.501 < cn < 2.542. √ In fact Stirling’s formula states that n! ∼ 2πn we shall not prove this here.

 n n e

√ = 2.506... n

 n n , e

but

Proof. Let m be any positive integer with 0 < m < n. Applying the EulerMaclaurin summation formula to log t, yields Z n n X 1 s(t) n log k = (t log t − t)|m + (log m + log n) + dt 2 t m k=m

(11.2) where

1 = n log n − n − m log m + m + (log m + log n) + Im,n 2 Z n s(t) Im,n := dt. t m 101

102

11. PRIME NUMBER THEORY

We can write the latter integral as an alternating series with decreasing terms Z n s(t) Im,n := dt = −am + bm − am+1 + bm+1 − · · · − an−1 + bn−1 , t m R i+ 1 R i+1 s(t) where −ai = i 2 s(t) t dt, bi = i+ 12 t dt. Since the sum is alternating, we have −am < Im,n < 0, where   Z m+ 12 t − m − 12 m + 21 1 1 −am = dt = − (m + ) log . t 2 2 m m Pn Next, define L(n) = k=1 log k. Then for any m < n, by (11.2) we have L(n) = L(m − 1) + = n log n − n

n X

log k

k=m + 12 log n

+ L(m − 1) − m log m + m +

1 2

 log m + Im,n .

Inserting m = 20 and using a calculator to compute L(19) and −a20 , and using −a20 < I20,n < 0, yields the theorem for n ≥ 20. The result may be checked on a computer for n < 20.  Corollary 11.1.1. We have for any n ≥ 6,   2n 4n = kn √ , n n for some kn with .547 < kn < .575. √ c  Proof. Since 2n = (2n)! 2 c2n 2 , with cn as defined in n n!n! , we obtain kn = n the preceding theorem. Using the bounds on cn given in the theorem yields the corollary.  11.2. Chebyshev Estimate for π(x) Recall that π(x) denotes the number of primes less than or equal to x. In this section we will prove that π(x) is on the order of magnitude logx x , that is, there exist constants m, M such that x x m ≤ π(x) ≤ M , log x log x for all x ≥ 2. Chebyshev was the first to establish such estimates. We start with the following estimate for the least common multiple of all integers from 1 to n. Theorem 11.2.1. For any positive odd integer n ≥ 5 we have 2n ≤ [1, 2, 3, 4, . . . , n] ≤ nπ(n) . One immediately deduces a lower bound Chebyshev estimate: For any odd integer n ≥ 3 we have n (11.3) π(n) ≥ log 2 . log n Proof. For any prime p ≤ n, let ep denote the multiplicity of p dividing [1, 2, . . . , n]. Then trivially pep ≤ n and we have the upper bound Y Y [1, 2, . . . , n] = pep ≤ n = nπ(n) . p≤n p prime

p≤n p prime

11.2. CHEBYSHEV ESTIMATE FOR π(x)

103

The proof of the lower bound much more subtle. First we observe that for Pis n−1 any nonzero polynomial f (x) := k=0 ak xk over Z we have Z 1 n−1 n−1 X Z 1 X ak f (x) dx = I := ak xk dx = k+1 0 0 k=0

k=0

a0 a1 a2 an−1 A = + + + ··· + = , 1 2 3 n [1, 2, 3, . . . , n] for some integer A. If I 6= 0 then |A| ≥ 1 and we conclude that [1, 2, . . . , n] ≥ |I|−1 . The trick then is to make a good choice for f (x). We shall let f (x) = fn (x) := n−1 (x − x2 ) 2 . Let In denote the corresponding integral I. Note that 0 ≤ x − x2 ≤ 14 on the interval [0, 1], and that by direct calculation 1 I7 = 140 < 2−7 . Then for odd n ≥ 7 we have Z Z 1  n−1 −3 1 n−1 (x − x2 )3 dx < 2−(n−1)+6 2−7 = 2−n , In = (x − x2 ) 2 dx ≤ 14 2 0

0

and thus [1, 2, . . . , n] ≥ 25 .

1 In

n

≥ 2 . For n = 5 one can check that [1, 2, 3, 4, 5] = 60 > 

To obtain an upper bound on π(n) we first obtain an estimate for the θ function defined by X θ(x) := log p. p≤x p prime

Lemma 11.2.1. For any positive integer n ≥ 2 we have (11.4)

θ(n) ≤ n log 4 − log 8.

An equivalent formulation is to define P (n) :=

Y

p.

p≤n p prime

The lemma is equivalent to showing that (11.5)

P (n) ≤ 81 4n ,

for n ≥ 2. Proof. First note, that by Corollary 11.1.1, for n ≥ 5 we have Next, since any prime p with n < p ≤ 2n is plainly a divisor of  P (2n) 2n n−1 , that is, P (n) ≤ n < 4 (11.6)

2n n 2n n ,



≤ 4n−1 . we have

P (2n) ≤ 4n−1 P (n).

for n ≥ 5. We shall establish (11.5) by induction. For n ≤ 9 the inequality can be checked directly. Suppose now that the statement is true for a given n ≥ 9. If n is even then n2 ≥ 5 and we have by (11.6) and the induction assumption, that n

1 n−1 4 . 8 = 18 4n by the previous case. n

n

P (n) = P (2 · n2 ) ≤ 4 2 −1 P ( n2 ) ≤ 4 2 −1 18 4 2 = If n is odd then P (n) = P (n + 1) ≤ 81 4(n+1)−1



104

11. PRIME NUMBER THEORY

Thus, for any positive integer n, X √ √ π(n) ≤ π( n) + 1< n+

log p √ log n x 2n then p2 does n not divide any integer ≤ 2n. Thus the multiplicity of p dividing (2n)! is just the number of multiples of p up to 2n, [2n/p], while the multiplicity of p dividing n! is [n/p], and so ep = [2n/p] − 2[n/p] = 0 or 1. We turn now to the proof of the theorem. Suppose that there is no prime  2 strictly between n and 2n. Then, every prime divisor of 2n is ≤ n 3 n. Thus   Y Y Y Y 2n = p ep pep ≤ 2n p n √ √ √ √ 2 2 n

1 4n √ , 2 n



n(2n)

2n/2

,

11.4. THE VON MANGOLDT FUNCTION AND THE ψ FUNCTION

105

which is a contradiction for n > 53. The theorem is easily seen to be true for n ≤ 53.  11.4. The von Mangoldt function and the ψ function In order to obtain better constants in the Chebyshev constants and precise estimates of special sums over primes we need to introduce the ψ function. For any positive integer n, and positive real x, we define the von Mangoldt function Λ(n) by ( log p, if n = pk for some prime p; Λ(n) := 0, otherwise, and the ψ function by ψ(x) :=

X

Λ(n).

n≤x

Thus ψ(x) is a weighted counting of all the prime powers up to x. It is easy to see that for any positive integer n, we have [1, 2, 3, . . . , n] = eψ(n) ,

(11.7)

and thus for any odd integer n ≥ 5, by Lemma 11.2.1, we have (11.8)

(log 2) n < ψ(n) < π(n) log n

We obtain from our estimate of π(n) the estimate (log 2) n < ψ(n) < 3n, for n ≥ 2. We shall improve this slightly in Lemma ?? below. Now for any positive real number x, X X ψ(x)−θ(x) = log p+ log p+· · · = ψ(x1/2 )+ψ(x1/3 )+ψ(x1/4 )+· · ·+ψ(x1/k ) p2 ≤x

p3 ≤x

where k = [log2 x]. Using the upper bound ψ(x1/j ) < 3x1/j we get for k ≥ 4 and 2k−1 < x ≤ 2k ,   √ 1 1 k−4 ψ(x) − θ(x) ≤ 3 x 1 + 1/6 + 1/4 + 3/10 x x x  √  3(1−k) 1−k 1−k ≤ 3 x 1 + 2 6 + 2 4 + (k − 4)2 10 √ √ ≤ 3 x(2.7151...) < 8.146 x. (A graphing calculator was used on the second to last inequality, where the function was determined to have a maximum at k = 7.) Lemma 11.4.1. For any x ≥ 2 we have

√ θ(x) ≤ ψ(x) < θ(x) + 9 x.

From our estimate for θ(x) in Theorem 11.2.1 we deduce, Theorem 11.4.1. We have for x ≥ √ 2 i) (log 2)x < ψ(x) < 2(log 2)x + 9 x. √ ii) (log 2)x − 9 x < θ(x) < (log 4)x. Lemma 11.4.2. For any positive integer n we have X Λ(d) = log n. d|n

106

11. PRIME NUMBER THEORY

Qk Proof. Let n have prime factorization n = i=1 pei i . Let d be a divisor of n. P The only contribution to d|n Λ(d) comes from divisors that are powers of primes. Now for fixed i, Λ(pi ) + Λ(p2i ) + · · · + Λ(pei i ) = ei log p. Thus ! k k X X Y ei Λ(d) = ei log pi = log pi = log(n). i=1

d|n

i=1

 Theorem 11.4.2. For x > 2 we have X Λ(n) = log x + O(1). n

a)

n≤x

X log p = log x + O(1). p

b)

p≤x p prime

P Proof. We define L(x) := n≤x log n. By the preceding lemma we have XX X X L(x) = Λ(d) = Λ(d) 1 n≤x d|n

=

X

Λ(d)[x/d] =

d≤x

=x

n≤x d|n

d≤x

X

Λ(d)



x d

+

d≤x

n x o d

X Λ(d) + cψ(x), d

d≤x

for some c with 0 ≤ c < 1. Thus, by Lemma ?? X Λ(d) L(x) ψ(x) log x δx = −c = log x − 1 + 21 + − c0 , d x x x x d≤x

where 0 ≤ c0 < 3. In particular, we obtain the estimate in a). Next, X Λ(n) X log p X log p X log p = + + + ··· . n p p2 p3 2 3 n≤x

p≤x

p ≤x

p ≤x

Now   X log p X log p X 1 1 + + ··· < log p + 3 + ··· p2 p3 p2 p √ 2 3

p ≤x

p ≤x

p≤ x

∞ X log n X log n 0 for all reals x, y, positive semidefinite if Q(x, y) ≥ 0 for all x, y, negative definite if Q(x, y) < 0 for all reals x, y and negative semidefinite if Q(x, y) ≤ 0 for all x, y. Note 12.1.2. Q(x, y) = y 2 q(x/y), where q(t) := at2 + bt + c. Thus Q(x, y) is positive definite if and only if q(t) is strictly positive, and so on. We easily obtain the following lemma. 109

110

12. BINARY QUADRATIC FORMS

Lemma 12.1.1. Let Q(x, y) = ax2 + bxy + cy 2 , d = dQ = b2 − 4ac. i) If d > 0 then Q(x, y) is indefinite. ii) If d < 0 and a > 0 then Q(x, y) is positive definite. iii) If d < 0 and a < 0 then Q(x, y) is negative definite. iv) If d = 0 then the form is positive or negative semi-definite according as a > 0 or a < 0. Note 12.1.3. Q(x, y) properly represents 0 if and only if dQ is a perfect square. 12.2. Equivalent Forms and Reduced Forms Definition 12.2.1. We say that two quadratic forms Q1 , Q2 are equivalent and 2 2 write Q1 ∼ Q2 if there  exists an linear transformation T : Z Z with associated α β matrix AT = of determinant 1, such that Q1 (x, y) = Q2 (T (x, y)), that is, γ δ AQ1 = AtT AQ2 AT . We also say the matrices AQ1 and AQ2 are equivalent in this case and write AQ1 ∼ AQ2 . Note 12.2.1. i) If Q1 ∼ Q2 then dQ1 = dQ2 , since det(AQ1 ) = det(AQ2 ). ii) If Q1 ∼ Q2 then an integer n is represented by Q1 if and only if it is represented by Q2 , and properly represented by Q1 if and only if it is properly represented by Q2 . The first statement is immediate. Lets check the second. Suppose that we have a proper representation Q(x, y) = n. Set (u, v) = T (x, y). Then (x, y) = T −1 (u, v). Since the matrix for T −1 has integer entries it follows that gcd(u, v) = gcd(x, y) = 1. iii) We could work with a more general equivalence relation allowing the determinant of AT to be ±1, but the theory that would follow will lack some of the properties we get with the extra restriction of a positive determinant. iv) If Q(x, y) = nx2 + bxy + cy 2 then Q trivially represents n a proper manner, Q(1, 0) = n. Conversely we obtain: Theorem 12.2.1. Suppose that Q(x, y) represents an integer n in a proper manner. Then Q(x, y) is equivalent to a form whose x2 coefficient is n. Proof. Let Q(x, y) = ax2 + bxy + cy 2 and say Q(α, γ) = n, that is      a b/2 α α γ = n, b/2 c γ with gcd(α,γ) = 1. Let β, δ be integers with αδ − γβ = 1, that is, det B = 1, where α β B= . The entry in the first row and first column of B t AQ B is γ δ      a b/2 α α γ = n. b/2 c γ  Note 12.2.2. If dQ is a square then Q represents 0 and thus by the preceding theorem Q ∼ bxy + cy 2 for some b, c ∈ Z.

12.2. EQUIVALENT FORMS AND REDUCED FORMS

111

Definition 12.2.2. Let Q be a binary form whose discriminant d is not a perfect square. We call Q reduced if −|a| < b ≤ |a| < |c|,

or

0 ≤ b ≤ |a| = |c|.

Two useful tricks for obtaining reduced forms are the following: i) Given any integer k we have     b a b/2 − ka a 2 . AQ = ∼ b b/2 c 2 − ka Q(−k, 1)   c −b/2 ii) AQ ∼ . −b/2 a The first similarity following from       b − ka 1 0 a b/2 1 −k a 2 = b , −k 1 b/2 c/ 0 1 2 − ka Q(−k, 1)   0 1 the second using P = . −1 0 Example 12.2.1. Find the reduced form equivalent to Q(x, y) = 7x2 + 25xy + 23y . We have       7 25/2 7 −3/2 7 −3/2 ∼ ∼ 25/2 23 −3/2 Q(−2, 1) −3/2 1       1 3/2 1 1/2 1 1/2 ∼ ∼ = . 3/2 7 1/2 Q(−1, 1) 1/2 5 2

Thus Q ∼ x2 + xy + 5y 2 and the smallest two values represented by Q are 1 and 5. Homework 12.2.1. Find a reduced form equivalent to 18x2 −22xy +7y 2 . Find all reduced forms of discriminant -23. 3.5 #7, 3.6 #12. Theorem 12.2.2. Let Q be a binary quadratic form with discriminant d, not a perfect square. Then Q is equivalent to a reduced form. Note 12.2.3. i) For d a positive perfect square Q is equivalent to a form with c = 0, 0 ≤ a < |b|. ii) If d = 0 and Q is positive semi-definite, then Q is equivalent to Ax2 with A = gcd(a, b, c). Proof. Let a be the smallest (in absolute value) integer represented by Q. Since d is not a square, a 6= 0. By Theorem 12.2.1, Q ∼ ax2 + b0 xy + c0 y 2 for some b0 , c0 ∈ Z. Thus   a b0 /2 AQ ∼ 0 . b /2 c 0

|a| b By the division algorithm, there exists an integer k with − |a| 2 < 2 − ka ≤ 2 . Thus # " #     " 00 b0 1 0 a b0 /2 1 −k a − ka a b2 2 AQ ∼ = b0 = b00 −k 1 b0 /2 c/ 0 1 c00 2 − ka Q(−k, 1) 2

for some integers b00 , c00 with − |a| 2 < Q ∼ ax2 + b00 xy + c00 y 2 ,

b00 2



|a| 2 ,

with

that is, −|a| < b00 ≤ |a|. Thus − |a| < b00 ≤ |a| ≤ |c00 |,

112

12. BINARY QUADRATIC FORMS

the latter inequality following from the minimality of a. We are done, unless b00 < 0 and |a| = |c00 |. In this case we simply replace x with y and y with −x to obtain Q ∼ c00 x2 − b00 xy + ay 2 , which is reduced.  12.3. Representation by Positive Definite Binary Quadratic Forms Lemma 12.3.1. Suppose that Q(x, y) = ax2 + bxy + cy 2 is a positive definite reduced form, so that −a < b ≤ a < c or 0 ≤ b ≤ a ≤ c. Then i) a is the smallest value properly represented by Q, and if a 6= c, then c is the second smallest value properly represented by Q. ii) Q(x, y) = a if and only if (x, y) = (±1, 0), or c = a and (x, y) = (0, ±1). iii) Q(x, y) = c if and only if (x, y) = (0, ±1), or c = a and (x, y) = (0, ±1), or a = b and (x, y) = (1, −1) or (−1, 1). Proof. The only proper representations in which x or y equals 0 are Q(±1, 0) = a and Q(0, ±1) = c. Thus we may assume xy 6= 0. If |x| < |y| then Q(x, y) = x2 (a + b(y/x) + c(y/x)2 ) > x2 (a − |b| + c) ≥ c. If |x| ≥ |y|, then Q(x, y) = y 2 (a(x/y)2 + b(x/y) + c) ≥ y 2 (a − |b| + c) ≥ cy 2 ≥ c, with equality if and only if a = |b|, y = ±1 and |x| = |y|. Thus, to have equality we must have b > 0, a = b and x = −y, that is, (x, y) = (1, −1) or (−1, 1).  Theorem 12.3.1. Let Q(x, y) be a positive definite binary quadratic form. Then Q is equivalent to a unique reduced form ax2 + bxy + cy 2 , with a being the smallest integer represented by Q. Note 12.3.1. Indefinite forms do not have unique representations. Proof. Suppose that we have two equivalent reduced positive definite forms ax2 + bxy + cy 2 ∼ Ax2 + Bxy + Cy 2 , −a < b ≤ a ≤ c, −A < B ≤ A ≤ C. Suppose first that a 6= c. Then we must have a = A, c = C and since b2 − 4ac = B 2 − 4AC, it follows that b = ±B. We must consider the possibility that ax2 + bxy + cy 2 ∼ ax2 − bxy + cy 2 , where without loss of generality b > 0. In this case, there is an integer matrix P with det P = 1 and     a b/2 a −b/2 t P P = . b/2 c −b/2 c     1 1 Thus Q applied to the vector P equals a and so by the preceding lemma P = 0 0         ±1 0 0 . Similarly P = or ± 1 −1 . It follows that P is one of the 0 1 ±1 following matrices,       ±1 0 ±1 1 ±1 −1 P = , , . 0 ±1 0 −1 0 1   1 −1 Since det(P ) = 1, we get P = ±I or ± . For the latter matrices we would 0 1 deduce that a = b contradicting the fact that ax2 − bxy + cy 2 is reduced. Therefore P = ±I, from which we deduce b = 0, and thus the two forms are identical.

12.5. CONGRUENCE TEST FOR REPRESENTATION

113

Next, suppose that a = c. Then b > 0 by definition of being reduced. Again a = A, c = C and thus b2 = B 2 from which we conclude b = B.  Theorem 12.3.2. Let Q be a reduced binary quadratic form whose discriminant d is not a perfect square. √ i) If Q is indefinite, then 0 < |a| ≤ d/2.p ii) If Q is positive definite, then 0 < a ≤ −d/3. Corollary 12.3.1. Given any d not equal to a square. There are only finitely many reduced binary forms of discriminant d. Proof. Suppose that Q(x, y) = ax2 +bxy+cy 2 is a reduced form. In particular, |b| ≤ |a| ≤ |c|. Also a 6= 0 since d 6= 0. If Q is indefinite, then a, c have opposite 2 signs, else d = b2 − 4ac ≤ a2 − 4a2 < √ 0. Thus for indefinite Q, d = b − 4ac = 2 2 b + 4|ac| ≥ 4a , which implies |a| ≥ d/2. If Q is positive definite, then a, c > 0 and we have d ≤ −3a2 , implying a2 ≤ −d/2. Thus, there are only finitely many choices for a, b. c is determined uniquely from a, b, d.  12.4. Class Number Definition 12.4.1. If d is not a perfect square then the number of equivalence classes of positive definite binary quadratic forms of discriminant d is called the class number of d, denoted H(d). Note 12.4.1. If d > 0, then H(d) ≤ d. If d < 0, then H(d) ≤ 2d/3. Example 12.4.1. Let’s find all positive definite reduced forms with |d| ≤ 11. p |a| ≤ −d/3 < 2, so a = 1. For a reduced form we need −a < b ≤ a < c or 0 ≤ b ≤ a = c, that is, −1 < b ≤ 1 < c or 0 ≤ b ≤ 1 = c. Thus we get the following: a 1 1 1 1 1 1

b 0 0 0 1 1 1

c 1 2 3 1 2 3

Q(x, y) x2 + y 2 x2 + 2y 2 x2 + 3y 2 x2 + xy + y 2 x2 + xy + 2y 2 x2 + xy + 3y 2

d −4 −8 −12 −3 −7 −11

Thus we see that H(d) = 1 for −11 ≤ d < 0. Homework 12.4.1. Find H(43). 12.5. Congruence test for Representation In previous section we showed that an integer n has a primitive (proper) representation as a sum of two squares if and only if the congruence x2 ≡ −1 (mod |)n| is solvable, that is 4 - n and if p is a prime with p ≡ 3 (mod 4), then p - n. Theorem 12.5.1. Let n, d be given integers with n 6= 0. There exists a binary quadratic form of discriminant d that represents n properly if and only if the congruence x2 ≡ d (mod 4|n|) has a solution.

114

12. BINARY QUADRATIC FORMS

Proof. Suppose that x2 ≡ d (mod 4|n|) has a solution, say b2 = d + 4|n|k, for some b, k ∈ Z, that is, b2 − 4|n|k = d. If n ≥ 0 then we take Q = nx2 + bxy + ky 2 . If n < 0 we take Q = nx2 + bxy − ky 2 . Conversely, suppose Q is a binary form of discriminant d properly representing n, say Q(x, y) = n with gcd(x, y) = 1. Now 4an = 4aQ(x, y) = (2ax + by)2 − dy 2 , and so (2ax + by)2 ≡ dy 2 (mod 4n). Similarly, 4cn = 4cQ(x, y) = (2cy + bx)2 − dx2 and so (2cy + bx)2 ≡ dx2 (mod 4n). Now for any prime power divisor pe k4n either p - x or - y. In the first case, we have (2cyx−1 + b)2 ≡ d (mod pe ), while in the second case (2axy −1 + b)2 ≡ d (mod pe ). Thus d is a square (mod 4n).  Homework 12.5.1. Characterize all primes represented by Q(x, y) = x2 + 2y 2 . Corollary 12.5.1. Let Q be a binary quadratic form of discriminant d, not a perfect square, such that H(d) = 1. Then Q properly represents a nonzero integer n if and only if the congruence x2 ≡ d (mod 4|n|) is solvable. In particular if d = −4 we recover the representation theorem for sums of two squares. 12.5.1. Ideal Class Number. In this section we make some further remarks connecting the√two different notions of class number, without supplying proofs. If F = Q( m) is a quadratic number field, then the ring of integers is ( √ Z[ m], if m ≡ 2, 3 (mod 4); √ RF = { a+b2 m : a, b ∈ Z, a ≡ b (mod 2)}, if m ≡ 1 (mod 4); with discriminant, ( dF =

4m, m

if m ≡ 2, 3 (mod 4); if m ≡ 1 (mod 4).

Two ideals a, b in RF are called equivalent if a = b(α) for some principal ideal (α) in RF . We define the ideal class number h(dF ) of F to be the number of equivalence classes of ideals with respect to this relation. Theorem 12.5.2. h(dF ) = 1 if and only if RF is a principal ideal domain. Since RF is a Dedekind domain it also follows that RF is a PID if and only if RF is a UFD. Theorem 12.5.3. If d < 0 then h(dF ) = H(dF ). Example 12.5.1. If F = Q(i), then RF = Z[i], the Gaussian integers, and dF = −4. We saw above that H(−4) = 1 and thus h(−4) = 1, that is, the Gaussian integers are a PID. √ √ Example 12.5.2. Let F = Q( −6), RF = Z[ −6], dF = −24. Find H(−24). √ a ≤ d/2 implies that a = 1 or 2. If a = 1, and b = 0, then c = 6, Q ∼ x2 + 6y 2 . If a = 1 and b = 1 cannot solve. If a = 2 then b = 0, c = 3 and Q ∼ 2x2 + 3y 2 . Thus H(−24) = 2. Theorem 12.5.4. The only values of d < 0 with H(d) = 1 are d = −3, −4, −7, −8, −11, −19, −43, −67, −163. Note 12.5.1. When d > 0, it is an open question to determine all values of d with H(d) = 1. In particular, it is unknown whether there are infinitely many d with H(d) = 1.

12.6. TREE DIAGRAM OF VALUES REPRESENTED BY A BINARY QUADRATIC FORM115

12.6. Tree diagram of Values Represented by a Binary Quadratic Form In this section we discuss the visual method proposed by John Conway [?] for describing all value represented by a binary quadratic form. It is based on the observation that Q(u + v) + Q(u − v) = 2(Q(u) + Q(v)). Thus Q(u − v), Q(u) + Q(v), Q(u + v) form an arithmetic progression. Example 12.6.1. Let Q(x, y) = 3x2 − 8y 2 . Start with u = (1, 1), v = (1, 0), so that u − v = (0, 1), u + v = (2, 1). Observing that Q(0, 1) = −8, Q(1, 0) = 3, Q(1, 1) = −5, the arithmetic progression is −8, −2, 4, thus we know Q(2, 1) = 4. Next we get Q(3, 1) from the progression −5, 3 + 4, 19, Q(3, 2) from the progression 3, −5 + 4, −5, and Q(1, 2) from the progression 3, −5 − 8, −29. Continuing in this manner we obtain the value of Q at every primitive ordered pair. Multiplying by squares then gives us all of its values. The positive values obtained are 3, 19, 40, 43, 67, 75, 110, 115, 120, . . . and the negative values −5, −8, −24, −29, −53, −60, −69, −92, −101, . . . Thus all values from 1 to 100 represented are 3, 12, 27, 48, 75, 19, 76, 40, 43, 67, 75. We see that 75 has two representations, one primitive, Q(7, 3) and one imprimitive Q(5, 0). Note 12.6.1. In the previous example there is a repeating pattern along the river separating the positive and negative values of the form. In particular, any number represented by an indefinite form with nonsquare discriminant has infinitely many such representations. This follows from the infinite family of solutions of the Pell equation x2 − dy 2 = ±4. Homework 12.6.1. Find all positive values 0 ≤ n ≤ 50 properly represented by 3x2 − 2y 2 . Draw the tree until you clearly see the repeating pattern along the river. Answer: 1,3,10,19,25,30,43, 46,...

CHAPTER 13

Geometry of Numbers 13.1. Lattices and Bases Definition 13.1.1. Let v1 , . . . , vm be a set of m ≤ n linearly independent (over R) vectors in Rn . i) The set of vectors ( n ) X L= λi vi : λi ∈ Z = Zv1 + · · · + Zvm . i=1

is called an m-dimensional lattice in Rn , generated by v1 , . . . , vm . ii) If m = n then L is called a full lattice. iii) The set {v1 , . . . , vm } is called a basis for L. Note 13.1.1. i) If A = [v1 , v2 , . . . , vm ] is the n × m matrix whose columns are the vectors vi , then L = AZm , the image of Zm under matrix multiplication by A. ii) An m-dimensional lattice is a free Z-module of rank m. iii) A free Z-module of rank m need not be an m-dimensional lattice. (Homework.) What is needed is that the Z module also should contain m linearly independent points over R. iii) If {v1 , . . . , vm }, {w1 , . . . , wm } are two bases for L, then there exists an Pm invertible m × m matrix U over Z with wi = j=1 uij vj , 1 ≤ i ≤ m. Definition 13.1.2. Suppose that L is a full lattice with basis v1 , . . . , vn . i) The set of points, ( n ) X P = P(v1 , . . . , vn ) = λi vi : 0 ≤ λi ≤ 1 , i=1

is called the fundamental parallelepiped for L associated with the basis {v1 , . . . , vn }. ii) The volume of L, denoted ∆(L), is the volume of any fundamental parallelepiped associated with L. Note 13.1.2. i) If L = AZn , then ∆(L) = |det(A)|. ii) The volume of L does not depend on the choice of basis for L. 13.2. Discrete Subgroups of Rn Definition 13.2.1. i) A set of points S in Rn is called discrete if for any compact subset K of Rn , S ∩ K has finite cardinality. (ii) If S is a subgroup of Rn (under addition), and discrete as a set, then S is called a discrete subgroup of Rn . Theorem 13.2.1. A set of points in Rn is a lattice if and only if it is a discrete subgroup of Rn . 117

118

13. GEOMETRY OF NUMBERS

Proof. First we show that any lattice L is discrete, and therefore a discrete subgroup of Rn . We may assume L is full, say L = AZn . Let B be a solid ball of radius r centered about the origin. We must show B ∩ L has finite cardinality. Let Aλ ∈ L, where λ ∈ Zn . Then Aλ ∈ B iff λ ∈ A−1 (B). But A−1 (B) is a compact subset of Rn , and so it is contained in a box of the type |x| ≤ k for some positive integer k. Thus A−1 (B) contains at most (2k + 1)n points z with integer coordinates. Suppose next that M is a discrete subgroup of Rn . Let {v1 , . . . , vr } be a subset of M containing a maximal number of linearly independent vectors over R. Then the vi are linearly independent and M ⊆ Rv1 + · · · + Rvr . Let M0 be the lattice generated by v1 , . . . , vr . WePclaim that M0 is of finite index in M , that is, M/M0 r is a finite group. Let P = { i=1 λi vi : 0 ≤ λi ≤ 1}. By discreteness of M we have M ∩ P = {m for some ` ∈ P N, mi ∈ M , 1 ≤ i ≤ `. For any x ∈ M we can Pr1 , . . . , m` } P r r write x = i=1 αi vi = i=1 [αi ]vi + i=1 λi vi = u + w say, with λi = αi − [αi ], for some αi ∈ R. Plainly u ∈ M0 , w ∈ M ∩ P, and so w = mi for some i ≤ `. Therefore x ≡ mi (mod M0 ), and so we see that |M/M0 | ≤ `. Say |M/M0 | = k. Then for any x ∈ M , kx ∈ M0 , and so M is a submodule of the free Z-module k1 M0 of rank r. Therefore, M is a free Z module of rank r0 ≤ r, say with basis {e1 , . . . , er0 }. Since M contains a set of r linearly independent points over R, we must have r0 = r and thus {e1 , . . . , er } is a basis for M as a free Z-module, consisting of linearly independent vectors over R.  Corollary 13.2.1. A subset of Rn is a full lattice if and only if it is a discrete subgroup of Rn containing a set of n linearly independent points (over R.) Homework 13.2.1. Let M be a free Z-module in Rn of rank r > n. Prove that M has an accumulation point. 13.3. Minkowski’s Fundamental Theorem Definition 13.3.1. i) A set of points S in Rn is said to be symmetric about the origin if x ∈ S implies −x ∈ S. ii) A set of points S in Rn is called convex if whenever u, v ∈ S and λ ∈ R with 0 ≤ λ ≤ 1, then λu + (1 − λ)v ∈ S. Lemma 13.3.1. Let T : Rn → Rn be a linear transformation, and S ⊆ Rn . Then i) If S is symmetric about the origin, then so is T (S). ii) If S is convex, then so is T (S). Lemma 13.3.2. Blichfelt’s Lemma. Let R be a subset of Rn with n-dimensional volume, vol(R) > 1 (possibly ∞). Then there exist distinct points x 6= y ∈ R with x − y ∈ Zn . Proof. We may assume R is bounded as follows. Let Br be the ball of radius r centered at the origin. Then R = ∪∞ r=1 (R ∩ Br ). Since vol(R) > 1 we must have vol(R ∩ Br ) > 1 for some r. Thus we may replace R with R ∩ Br , a bounded set. Let B denote the unit box B = {x ∈ Rn : 0 ≤ xi < 1, 1 ≤ i ≤ n}.

13.4. CANONICAL BASIS THEOREM AND SUBLATTICES

119

For any u ∈ Zn we define Ru = {x ∈ R : ui ≤ xi < ui + 1, 1 ≤ i ≤ n}, Su = Ru − u = {x − u : x ∈ Ru }. Note, Su ⊆ B. We claim that two of the regions Su must overlap. Indeed, suppose not. Then X vol (Su ) = vol (∪u∈Zn Su ) ≤ vol(B) = 1. u∈Zn

But vol(Su ) = vol(Ru ) and so X X vol(Ru ) = vol(R). vol(Su ) = u∈Zn

u∈Zn

Since vol(R) > 1 we have a contradiction. Let u 6= v ∈ Zn with Su ∩ Sv 6= ∅, say z ∈ Su ∩ Sv . Then z + u ∈ Ru ⊆ R and z + v ∈ Rv ⊆ R. Set x = z + u, y = z + v. Then x 6= y and x − y = u − v ∈ Zn .  Theorem 13.3.1. Minkowski. Let L be a full lattice in Rn of volume ∆ and R be a convex subset of Rn symmetric about the origin. i) If vol(R) > 2n ∆, then R contains a nonzero point in L. ii) If vol(R) = 2n ∆ and R is closed, then R contains a nonzero point in L. Note 13.3.1. The value 2n ∆ is sharp. Consider, for instance, the box |xi | < 1, 1 ≤ i ≤ n of volume 2n . The only integer point in this box is the origin. Proof. i) Case 1: Suppose that L = Zn , so that ∆ = 1, and that R is convex, symmetric about the origin with volume > 2n . Then vol( 21 R) > 1 and so by Blichfelt’s Lemma there exist x 6= y ∈ 12 R with x − y ∈ Zn . By symmetry −y ∈ 12 R, and then by convexity, 21 x + 12 (−y) ∈ 21 R. But this implies x − y is a nonzero integer point in R. Case 2: Suppose L = AZn where A is an invertible n×n matrix over Z. Suppose vol(R) > 2n ∆ = 2n |det(A)|. Our goal is to obtain a nonzero point in R ∩ L, that is, obtain a nonzero x ∈ Zn with Ax ∈ R, that is, x ∈ A−1 (R). Since A−1 (R) is convex, symmetric about the origin, and vol(A−1 (R)) = |det(A)|−1 vol(R) > 2n , we have the desired conclusion by Case 1. ii) Suppose that vol(R) = 2n ∆. For k ∈ N, let Rk = (1 + k1 )R. Then vol(Rk ) > n 2 ∆ and so by part (i), there is a nonzero xk ∈ L ∩ Rk . Since L is discrete and 2R is compact (Homework), there are only finitely many choices for xk . Thus some nonzero x ∈ L must be contained in infinitely many Rk , that is x ∈ ∩∞ k=1 Rk . Since R is closed, this intersection is just R, that is, x ∈ R.  Homework 13.3.1. Prove that if R is a convex subset of Rn having a finite, nonzero volume, then R is bounded. 13.4. Canonical Basis Theorem and Sublattices Theorem 13.4.1. If L is a full lattice of volume ∆ contained in Zn , then [Z : L] = ∆. n

120

13. GEOMETRY OF NUMBERS

Proof. By the canonical basis theorem, there exists a basis {v1 , . . . , vn } of Zn and positive integers di , such that {d1 v1 , . . . , dn vn } is a basis for L. We have Zn /L ' Z/(d1 ) × · · · × Z/(dn ) and so [Zn : L] = d1 · · · dn . Also, it is plain, looking at a fundamental parallelepiped, or the determinant of the matrix [dv1 , . . . , dvn ], that ∆ = vol(L) = d1 . . . dn vol(Zn ) = d1 · · · dn . Thus ∆ = [Zn : L].  Homework 13.4.1. Show that the linear congruence a1 x1 + · · · + an xn ≡ 0 (mod m) has a nonzero solution with |xi | ≤ (m/d)1/n , 1 ≤ i ≤ n, where d = gcd(a1 , . . . , an , m). View the set of solutions as a lattice in Zn and apply Minkowski’s theorem. Homework 13.4.2. Show that if p is a prime with p ≡ 1 (mod 4), then p is a sum of two squares. Proceed as follows. There is an integer λ with λ2 ≡ −1 (mod p). Let L be the lattice of points in Z2 with x ≡ λy (mod p). Then every point in L satisfies the congruence x2 + y 2 ≡ 0 (mod p). Now apply Minkowski’s theorem to a disk centered at the origin to obtain a small nonzero point in L.

13.5. Lagrange’s 4-squares Theorem In this section we use Minkowski’s Theorem to deduce Lagrange’s 4-squares Theorem. Theorem 13.5.1. Lagrange. Every positive integer is the sum of 4 squares. Lemma 13.5.1. Let p1 , p2 , p3 , p4 be the quadratic forms in variables xi , yi , 1 ≤ i ≤ 4, defined by the matrix equation      p1 x1 x2 x3 x4 y1 p2  −x2 x1 −x4 x3  y2       . P :=   =  p3 −x3 x4 x1 −x2  y3  p4 −x4 −x3 x2 x1 y4 Then (x21 + x22 + x23 + x24 )(y12 + y22 + y32 + y42 ) = p21 + p22 + p23 + p24 . In particular, the product of two integers that are sums of four squares is again a sum of four squares. Proof. Observing that the rows of the matrix M appearing in the lemma are orthogonal we see that p21 + p22 + p23 + p24 = P t P = (M Y )t M Y = Y t M t M Y = Y t (x21 + x22 + x23 + x24 )I4 Y = (x21 + x22 + x23 + x24 )(y12 + y22 + y32 + y42 ).  Lemma 13.5.2. Let γn denote the surface area of a sphere of radius 1 in Rn and βn denote its volume. Then, for n ≥ 2 we have βn = γn /n and γn+2 = 2πβn .

13.5. LAGRANGE’S 4-SQUARES THEOREM

121

Proof. The first identity γn = nβn is immediate. To obtain the second identity, observe that Z ∞ n √ n −x2 ( π) = e dx −∞ Z Z ∞ 2 −r 2 e−r γn rn−1 dr = e dV = n 0 R Z γn ∞ −x n −1 γn = e x2 dx = Γ(n/2). 2 0 2 Since Γ(x + 1) = xΓ(x) for x > 0, we have Γ( n2 + 1) = n/2

βn =

n n 2 Γ( 2 ).

Thus

n/2

γn 2π π 1 = = = γn+2 . n n nΓ(n/2) Γ( 2 + 1) 2π  2

2

Lemma 13.5.3. For any prime p the congruence x + y + 1 ≡ 0 (mod p) has a solution. Proof. Let S = {x2 : x ∈ Fp }, T = −1 − S. Then |T | = |S| = S ∩ T 6= ∅.

p+1 2

and so 

Proof of Lagrange’s Theorem. By the first lemma it suffices to prove that any prime p is a sum of four squares. Our strategy is to find a lattice of solutions of the congruence (13.1)

x21 + x22 + x23 + x24 ≡ 0

(mod p),

and then apply Minkowski’s theorem to pick out a small nonzero point in this lattice. Let a, b be integers satisfying a2 + b2 + 1 ≡ 0 (mod p). Then u1 := (a, b, 1, 0), u2 := (b, −a, 0, 1) are solutions of (13.1). Also, u3 := (p, 0, 0, 0) and u4 := (0, p, 0, 0) are trivially solutions. Let L be the lattice of points in Rn with basis {u1 , u2 , u3 , u4 }. It is easy to show that every point in L satisfies the congruence (13.1), and that √ ∆(L) = p2 . Let S be the sphere of radius 2p centered at the origin in R4 . By the above lemma, p V ol(S) = ( 2p)4 β4 = 2π 2 p2 > 24 p2 = 24 ∆(L), and thus by Minkowski’s Theorem, S contains a nonzero point x ∈ L. This point satisfies x21 + · · · + x24 = kp for some positive integer k, and at the same time x21 + · · · + x24 < 2p. Therefore, x21 + · · · + x24 = p.  Homework 13.5.1. Show that any number of the form 2 · 4k cannot be written as a sum of four nonzero squares. Note 13.5.1. We state the following without proof. i) Any n > 169 can be expressed as a sum of five nonzero squares. ii) The number of representations N (n) of a positive integer n as a sum of four squares is 8 times the sum of the divisors of n that are not multiples of 4. Homework 13.5.2. i) Show that N (4) = 24, N (9) = 104 using the formula in the preceding note, and then find the explicit representations. ii) Show that if p is an odd prime then N (p) = 8(p + 1). iii) Show that in order to have an essentially unique representation as a sum of four squares we must have N (n) = 6, 8, 12, 24 or 48.

122

13. GEOMETRY OF NUMBERS

iv) Find all n having an essentially unique representation as a sum of four squares. 13.6. Sums of Three Squares Theorem 13.6.1. Legendre (1797). A positive integer is a sum of three squares if and only if it is not of the form 4n (8m + 7) for some nonnegative integers n, m. Proof. We shall only prove the easy direction here, the converse requiring much more machinery. Suppose that there exists a number of the form 4n (8m + 7) that is a sum of three squares, and let k = 4n (8m + 7) be the minimal such number. Note that every square is congruent to 0, 1 or 4 (mod 8) and thus a sum of three squares is always 0, 1, 2, 3, 4, 5 or 6 (mod 8). Thus n > 0 and 4|k. But, in order to obtain 0 as a sum of three squares (mod 4), each of the squares must be even, and therefore we have integers x, y, z satisfying (2x)2 + (2y)2 + (2z)2 = 4n (8m + 7). Dividing by 4 gives a smaller number of the same form as a sum of three squares, a contradiction.  Homework 13.6.1. Deduce Lagrange’s four squares theorem from Legendre’s three squares theorem. 13.7. The Legendre Equation The Legendre equation is (13.2)

ax2 + by 2 − cz 2 = 0,

where a, b and c are positive integers. The equation is said to be in normal form if a, b, c are square-free and pairwise relatively prime. It is not hard to show that any Legendre equation can be reduced to one in normal form. Clearly, if (13.2) has a solution, then it has a primitive solution, that is one with gcd(x, y, z) = 1. If the equation is in normal form and has a primitive integer solution x, y, z, then gcd(a, yz) = gcd(b, xz) = gcd(c, yz) = 1, and we have (13.3)

ax2 + by 2 − cz 2 ≡ 0

(mod abc),

or equivalently, by 2 ≡ cz 2 (13.4)

(mod a),

ax2 ≡ cz 2 2

ax ≡ −by

(mod b), 2

(mod c).

Thus, bc is a square (mod a), ac is a square (mod b) and −ab is a square (mod c). This proves one direction of Legendre’s Theorem. Theorem 13.7.1. Legendre. If (13.2) is in normal form, then it has a nonzero integer solution if and only if bc, ac and −ab are quadratic residues modulo a, b and c respectively. Proof. We are left with establishing sufficiency. Suppose that bc, ac and −ab are quadratic residues modulo a, b and c respectively, say α2 ≡ bc (mod a), β 2 ≡ ac

13.8. THE CATALAN EQUATION

123

(mod b), γ 2 ≡ −ab (mod c). Let L be the lattice of points in Z3 satisfying the linear system (13.5)

by − αz ≡ 0

(mod a),

ax − βz ≡ 0

(mod b),

ax − γy ≡ 0

(mod c). 3

Since L is the kernel of the mapping ψ : Z → Za × Zb × Zc defined by ψ(x, y, z) = (by − αz, ax − βz, ax − γy), ∆(L) = abc. Also, every point in L is a solution of (13.3). Let B be the box of points defined by √ √ √ (13.6) |x| ≤ bc, |y| ≤ ac, |z| ≤ ab. By Minkowski’s Theorem B contains a nonzero point (x, y, z) in L. This point satisfies −abc ≤ ax2 + by 2 − cz 2 ≤ 2abc, with strict inequality in the first position unless a = b = 1, and strict inequality in the second position unless a = b = c = 1. If a = b = 1 then the Legendre equation is x2 + y 2 = cz 2 , and the first hypothesis in the theorem is that −1 is a square (mod c). It follows that c is a sum of two squares and so we can solve the Legendre equation with z = 1. Thus, since ax2 + by 2 − cz 2 ≡ 0 (mod abc) we either have ax2 + by 2 − cz 2 = 0, whence we are done, or ax2 + by 2 − cz 2 = abc. In the latter case, setting x1 = −by + xz, y1 = ax + yz, z1 = z 2 + ab, we see that (x1 , y1 , z1 ) is a solution of the Legendre equation.  Note 13.7.1. i) With more work, one can show that the lattice L constructed in the above proof always contains a solution of the Legendre equation in the box B defined by (13.6); see Cochrane [4]. ii) Holzer [8], using a deep theorem of Hecke, was the first to establish that if (13.2) is in normal form and has a nonzero integer solution then there is a nonzero solution with √ √ √ |x| ≤ bc, |y| ≤ ac, |z| ≤ ab, Mordell [10] gave an elementary proof of Holzer’s theorem but there was a slight omission in his proof. Williams [15] filled in the gap in Mordell’s proof. 13.8. The Catalan Equation In some Diophantine equations, the variables appear in both the base and exponent position. The Catalan equation is given by xa − y b = 1, the goal being to find solutions in positive integers a, b, x, y with a > 1 and b > 1. One such solution is given by 32 − 23 = 1. Catalan conjectured (1844) that this is in fact the only solution to this equation. The conjecture was proven by Mihailescu in 2002. Homework 13.8.1. Prove that the only solution to the Catalan equation with a = 2 or b = 2 is the solution 32 − 23 = 1.

CHAPTER 14

Best Rational Approximations and Continued Fractions 14.1. Approximating real numbers by rationals Definition 14.1.1. A rational number ab is called a best rational approximation to a given real number x if a c (14.1) x − < x − , b d for all dc ∈ Q with 1 ≤ d < b. ii) The rational ab is called a best rational approximation in the strong sense if |xb − a| < |xd − c|,

(14.2) for all

c d

with 1 ≤ d < b.

Note 14.1.1. Note that |xb − a| < |xd − c| implies that a d c c x − < x − < x − b b d d for 0 < d < b, and thus any rational number satisfying (14.2) also satisfies (14.1). There are however best rational approximations that do not satisfy the stronger inequality (14.2) as we shall see. The main goal of this chapter is to prove that the best rational approximations to a real number x, in the strong sense, are precisely the convergents to the continued fraction expansion of x. Let’s start with an elementary result on rational approximation. Theorem 14.1.1. Let x ∈ R and N be a positive integer. Then there exists a rational number ab with 1 ≤ b ≤ N such that a 1 . x − < b bN Proof. Let T be the torus T = R/Z, viewed as an additive group. View T as the interval [0, 1). Let S = {bx + T : 0 ≤ b ≤ N } viewed as a set of N + 1 points in [0, 1). We divide [0, 1) into N subintervals [1, N1 ), [ N1 , N2 ), . . . of length N1 . By the pigeon-hole principle, one of these intervals contains at least two points of S, say b1 x, b2 x with b1 < b2 . Then (b2 − b1 )x = a + δ for some integer a and real number δ with |δ| < N1 . Put b = b2 − b1 . Then 1 ≤ b ≤ N and |bx − a| = |δ| < N1 .  Note 14.1.2. Using Farey fractions it’s not hard to show that one can strengthen the inequality in the theorem to |x − ab | ≤ b(N1+1) . 125

126

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

Theorem 14.1.2. i) For any irrational x ∈ R, there exist infinitely many distinct rational numbers a/b such that a 1 (14.3) x − < 2 . b b ii) For any rational x, (14.3) has at most finitely many solutions. Proof. Let x ∈ R. By the preceding theorem, for any positive integer n there exists abnn ∈ Q with x − a n < 1 . bn bn n Now, there may be some duplication. Suppose that there is an infinite subsequence a of equal values, say bnni = ab , i ∈ N. Then i ani 1 a 1 < ≤ , x − = x − b bn i bn i n i ni and so letting ni → ∞ we see that x = ab a rational number. Thus if x is irrational, then there are infinitely many distinct ab satisfying (14.3). If x is rational, x = rs say, then (14.3) implies that |rb − sa| < sb . If rs 6= ab then |rb − sa| ≥ 1 and so b < s. Thus there are finitely many choices for b, and once b is selected there is at most two a with | rs − ab | < 1b .  14.2. Continued Fractions 14.2.1. Simple Finite Continued Fractions. We let 1

[a0 ; a1 , a2 , . . . , an ] = a0 +

1

a1 + a2 +

1 a3 + . . .+ 1 an

Although our focus will be on continued fractions in Q, the expression above is well defined for entries in any integral domain, the continued fraction itself belonging to the field of fractions. We appeal to the more general setting in some of our proofs. Example 14.2.1. Express

43 30

as a continued fraction. We have

43 13 1 1 1 1 =1+ = 1 + 30 = 1 + 4 =1+ 1 =1+ 1 , 30 30 2 + 2 + 2 + 13 13 13 3+ 1 4

and so

43 30

4

= [1; 2, 3, 4].

Example 14.2.2. Expand 333 106 . Following the Euclidean Algorithm we have 333 = 3 · 106 + 15, 106 = 7 · 15 + 1. Thus 333 15 1 =3+ =3+ 1 = [3; 7, 15]. 106 106 7 + 15

14.3. CONVERGENTS TO CONTINUED FRACTIONS

127

We could also write 333 15 =3+ =3+ 106 106

1 7+

1

= [3; 7, 14, 1].

1 1 As we shall see, the continued fraction expansion of a rational number is unique aside from the trivial expansion of the final term as in the example above, 15 = 14 + 11 . 14 +

Definition 14.2.1. A continued fraction expansion (14.2.1) is said to be simple if a0 ∈ Z and ai ∈ N, i ≥ 1. Theorem 14.2.1. If [a0 ; a1 , . . . , aj ] = [b0 ; b1 , . . . , bk ] are simple continued fraction expansions with aj > 1 and bk > 1, then j = k and ai = bi , 0 ≤ i ≤ k. Proof. WLOG j ≤ k. Let x := [a0 ; a1 , . . . , aj ] = [b0 ; b1 , . . . , bk ]. Then [x] = a0 = b0 . Canceling a0 , and inverting we get [a1 ; a2 , . . . , aj ] = [b1 ; b2 , . . . , bk ]. Again we see that a1 = b1 . Repeating this process j times yields ai = bi , 0 ≤ i < j. If k > j then we have 1 aj = bj + . 1 bj+1 + bj+2 + . . .+ 1 bn Since bk > 1, [bj+1 ; bj+2 , . . . , bk ] > 1 and so 0 < aj − bj < 1 a contradiction.



Corollary 14.2.1. Any rational number can be expressed as a simple continued fraction in exactly two ways, [a0 ; a1 , . . . , aj ] and [a0 ; a1 , . . . , aj − 1, 1] for some integers ai , 0 ≤ i ≤ j, with aj > 1. Proof. The existence of such an expansion follows from the Euclidean algorithm. Suppose [b0 ; b1 , . . . , bk ] another expansion. If bk > 1, then the preceding theorem applies. If bk = 1, then we rewrite the expansion [b0 ; b1 , . . . , bk−1 + 1] where now the final entry is greater than. Again the theorem applies.  14.3. Convergents to Continued Fractions Definition 14.3.1. The j-th convergent of the continued fraction [a0 ; a1 , . . . , an ] is the fraction [a0 ; a1 , . . . , aj ]. The ai here are allowed to be elements of any integral domain. Theorem 14.3.1. Let a0 , a1 , . . . be distinct variable symbols. We define two sequence {hn }, {kn } of polynomials in the ai , hn = hn (a0 , . . . , an ), kn = kn (a1 , . . . , an ), recursively by (14.4)

h−2 = 0, h−1 = 1,

hn = an hn−1 + hn−2 ,

(14.5)

k−2 = 1, k−1 = 0,

kn = an kn−1 + kn−2 .

i) For n ≥ 0, [a0 ; a1 , . . . , an ] = ii) For n ≥ 0, (14.6)

hn kn .

hn kn−1 − hn−1 kn = (−1)n−1 .

128

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

Proof. We prove both parts simultaneously by induction on n. For n = 0, we have h0 = a0 + 0 = a0 , k0 = a0 · 0 + 1 = 1 and so hk00 = a0 = [a0 ], and h0 k−1 − h−1 k0 = a0 · 0 − 1 = (−1)−1 . Suppose the result holds for n. Then for n + 1 we have, applying the induction assumption with an replaced by an−1 + a1n , [a0 ; a1 , . . . , an ] = [a0 ; a1 , . . . , an−1 + = =

(an−1 + (an−1 +

1 an )hn−2 1 a n )kn−2

hn (a0 , a1 , . . . , an−1 + 1 ]= an kn (a0 , a1 , . . . , an−1 +

+ hn−3 + kn−3

=

1 an ) 1 an )

an−1 an hn−2 + hn−2 + an hn−3 an−1 an kn−2 + kn−2 + an kn−3

an (an−1 hn−2 + hn−3 ) + hn−2 an hn−1 + hn−2 hn = = . an (an−1 kn−2 + kn−3 ) + kn−2 an kn−1 + kn−2 kn

Also, hn kn−1 − hn−1 kn = (an hn−1 + hn−2 )kn−1 − hn−1 (an kn−1 + kn−2 ) = hn−2 kn−1 − hn−1 kn−2 = −(−1)n−2 = (−1)n−1 .  From the identity in (14.6) we immediately obtain part (i) of the following theorem. Part (ii) follows from part (i) on observing that the sum is telescoping. Theorem 14.3.2. Let hn , kn be as defined in the preceding theorem. Then i) For n ≥ 1 we have hn hn−1 (−1)n−1 − = . kn kn−1 kn kn−1 ii) For n ≥ 1 we have n X (−1)i−1 hn = a0 + . kn ki ki−1 i=1

The preceding two theorems give identities involving polynomial expressions over Z. In particular the same identities are valid if we evaluate the variables ai at integers. In this case, we see in next theorem that the sequence { hknn } converges, and thus the terms are called “convergents”. Theorem 14.3.3. Let x := [a0 ; a1 , . . . , an ] be a finite simple continued fraction. Let hn , kn be the integers obtained by evaluating the polynomial expressions defined in (14.4) at the ai . i) Then x = hknn and moreover the fraction hknn is reduced, that is, (hn , kn ) = 1. ii) hk00 < hk22 < · · · < x < · · · < hk33 < hk11 . Proof. The fact that x = hknn is immediate from Theorem 14.3.1 (i). For part (ii) we observe that by the positivity of the an , {kn } is a strictly increasing sequence of positive integers, and so the partial sums of the alternating sum in Theorem 14.3.2 (ii) oscillate about x in the manner stated.  The identities in (14.4), (14.7)

h−2 = 0, h−1 = 1,

hn = an hn−1 + hn−2 ,

(14.8)

k−2 = 1, k−1 = 0,

kn = an kn−1 + kn−2 ,

14.4. INFINITE CONTINUED FRACTION EXPANSIONS

129

provide an efficient way of calculating the convergents as we illustrate in the next example. Example 14.3.1. Let x =

355 113

= [3; 7, 15, 1]

an 3 7 15 1 hn 0 1 3 22 333 355 kn 1 0 1 7 106 113 n −2 −1 0 1 2 3 The bottom row is only included for illustration purposes. In the array, the rules (14.7), (14.8) translate to “multiply diagonally and add the entry to the left”. Thus, for example, 333 = 15 · 22 + 3, 106 = 15 · 7 + 1. 14.4. Infinite Continued Fraction Expansions An infinite continued fraction is an expression of the form [a0 ; a1 , a2 , a3 , . . . ]. Again, we call it a simple continued fraction if the an ∈ Z and an ≥ 1 for n ≥ 1. Definition 14.4.1. The value of an infinite continued fraction [a0 ; a1 , a2 , . . . ] is defined to be limn→∞ [a0 ; a1 , . . . , an ], should this limit exist. In this case we simply write [a0 ; a1 , a2 , . . . ] = limn→∞ [a0 ; a1 , a2 , . . . , an ]. Theorem 14.4.1. The value of any infinite simple continued fraction is defined, and is an irrational number. Proof. Let [a0 ; a1 , a2 , . . . ] be a simple continued fraction. Then n ∞ X X (−1)i−1 (−1)n−1 hn lim [a0 ; a1 , . . . , an ] = lim = lim a0 + = a0 + . n→∞ n→∞ kn n→∞ ki ki−1 ki ki−1 i=1 i=1 The latter series converges by the alternating series convergence test. Let x be the value of the continued fraction. Then, by Theorem 14.3.3 and Theorem 14.3.2 1 1 x − hn < hn+1 − hn = < 2. kn kn+1 kn kn kn+1 k n

But we saw in Theorem 14.1.2 that such an inequality has at most finitely many solutions for rational x. Therefore, x is irrational.  Example 14.4.1. Evaluate [1; 1, 1, . . . ]. Let x be the value of this infinite continued fraction. It exists by the preceding theorem. Then plainly 1 + x1 = x, √ that is, x2 − x − 1 = 0, whence x = ϕ = 1+2 5 , the golden ratio. (Note x > 0 and so the positive root must be chosen.) Theorem 14.4.2. Any irrational number has a unique simple continued fraction expansion. Before proving the theorem let’s observe that two continued fractions having the same partial quotients ai up to a given point, must in fact be very close. Lemma 14.4.1. If x = [a0 ; a1 , . . . , an−1 , A] and y = [a0 ; a1 , . . . , an−1 , B] for some real numbers ai , A, B, then (−1)n (A − B) , x−y = (Akn−1 + kn−2 )(Bkn−1 + kn−2 ) where the ki are as defined in (14.8).

130

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

Hn Proof. Let { K } be the convergents for x and { hknn } the convergents for y. n Then Hn hn Ahn−1 + hn−2 Bhn−1 + hn−2 x−y = − = − Kn kn Akn−1 + kn−2 Bkn−1 + kn−2 B(hn−2 kn−1 − kn−2 hn−1 ) + A(hn−1 kn−2 − kn−1 hn−2 ) = , (Akn−1 + kn−2 )(Bkn−1 + kn−2 )

and the result follows from (14.6).



Proof of Theorem 14.4.2. Existence. We define recursively a sequence of integers {an } with an > 0 for n ≥ 1, and sequence of irrational numbers {xn }, all greater than one, such that for any n ≥ 1, (14.9)

x = [a0 ; a1 , a2 , . . . , an−1 , xn ].

/ Q and x1 > 1. Thus x = [a0 ; x1 ]. First, set a0 = [x]. Write x = a0 + x11 , where x1 ∈ 1 Set a1 = [x1 ], and write x1 = a1 + x2 . Thus x = [a0 ; a1 , x2 ]. Suppose now that a0 , . . . , an−1 , x1 , . . . , xn have been defined. We simply define an = [xn ] and write 1 for some irrational xn+1 greater than one. Let y be the value of xn = an + xn+1 the infinite continued fraction [a0 ; a1 , a2 , . . . ] so constructed and { hknn } its sequence of convergents. Then by the preceding lemma |an − xn | 1 x − hn = < , kn (xn kn−1 + kn−2 )(an kn−1 + kn−2 ) kn−1 kn−2 and so x = limn→∞ hknn = y. Uniqueness: Suppose that x = [a0 ; a1 , a2 , . . . ] = [b0 ; b1 , b2 , . . . ]. We shall prove by induction that an = bn for all n. Certainly a0 = b0 = [x]. Suppose now that ai = bi for i < n. Put A = [an ; an+1 , . . . ], B = [bn ; bn+1 , . . . ], so that x = [a0 ; a1 , a2 , . . . , an−1 , A] = [a0 ; a1 , a2 , . . . , an−1 , B]. Then by the preceding lemma, A − B = x − x = 0, that is A = B, and so an = [A] = [B] = bn .  14.5. Best Rational Approximations to Irrationals Lemma 14.5.1. Let x be an irrational number with sequence of convergents { hknn }. Suppose that ab is a rational number with |bx − a| < |kn x − hn |, for some n ≥ 1. Then b ≥ kn+1 . Proof. We may assume that (a, b) = 1 and that b > 0. Suppose to the  hn hn+1 contrary that |bx − a| < |kn x − hn | and b < kn+1 . Since is invertible kn kn+1 over Z (having determinant ±1), there exist integers α, β satisfying αhn + βhn+1 = a αkn + βkn+1 = b. Thus (14.10)

bx − a = α(kn x − hn ) + β(kn+1 x − hn+1 ).

14.5. BEST RATIONAL APPROXIMATIONS TO IRRATIONALS

131

We claim in fact that α and β are nonzero and have opposite signs. Indeed, if α = 0 n+1 then ab = hkn+1 , contradicting b < kn+1 while if β = 0 then ab = hknn , that is, a = hn , b = kn , contradicting |bx − a| < |kn x − hn |. If α and β are both positive, then b > kn+1 , while if they are both negative then b is negative, a contradiction. Since n+1 ) it follows that these two kn x − hn = kn (x − hknn ), kn+1 x − hn+1 = kn+1 (x − hkn+1 quantities also have opposite signs. Thus |bx − a| = |α||kn x − hn | + |β||kn+1 x − hn+1 | > |kn x − hn |, a contradiction.

 a b

Theorem 14.5.1. A rational number with b > 1, (a, b) = 1, is a best rational approximation in the strong sense, to an irrational number x, if and only if ab is one of the convergents in the continued fraction expansion of x. Note 14.5.1. When b = 1, it may be the case that [x] + 1 is a better approximation to x than [x], and so we need b > 1 in the statement. Proof. Suppose that ab is a best rational approximation in the strong sense, with b > 1. Say kn < b ≤ kn+1 for some n ≥ 0. (Recall, k0 = 1.) By definition, |bx − a| < |kn x − hn |, and thus by the preceding theorem, b ≥ kn+1 . Therefore n+1 − hknn | = kn k1n+1 we have b = kn+1 . Since |x − hknn | < | hkn+1 |kn+1 x − a| = |bx − a| < |kn x − hn |
kn . Therefore, tion in the strong sense.

hn kn

is a best approxima

Example 14.5.1. Lets find the continued fraction expansion of π and its convergents, starting from a decimal approximation π = 3.1415926535897.... We have 1 1 1 π = 3 + .14159... = 3 + =3+ =3+ = ··· , 1 .14159... 7 + .06251... 7 + 15.9... yielding π = [3; 7, 15, 1, 292, 1, 1, 1, . . . ]. The correct determination of the partial quotients, 3, 7, 15, 292 etc. depends on the accuracy of our initial approximation for π. For example, 3.14=[3;7,7], 3.1415=[3;7,14,1,8,2]. Since [3; 7, 15, 1, 292, 1, 1, 2] = 3.1415926535583.. < π and π < 3.1415926536189... = [3; 7, 15, 1, 292, 1, 1, 1],

132

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

we know the first eight partial quotients given above are correct. Thus, we obtain the following sequence of convergents to π: 3 22 333 355 52163 , = 3.14285..., , = 3.1415929..., . 1 7 106 113 16604 We note the great accuracy (six decimal places) of the approximation because the next partial quotient is 292, and thus π − 355 < 1 · 1 . 113 292 1132

355 113 .

This is

14.6. Hurwitz’s Theorem Recall that for an irrational x with convergents hknn , we have 1 1 1 x − hn ≤ . = ≤ kn kn+1 kn (an+1 kn + kn−1 )kn an+1 kn2 Thus, a large partial quotient an+1 implies that the convergent hknn is an excellent approximation to x, as we saw for the case of π in the preceding example. In particular, |x − hknn | ≤ k12 . Hurwitz’s Theorem, gives a slightly stronger statement n about the convergents to x. Theorem 14.6.1. Hurwitz. Given any irrational x, there exist infinitely many rational hk with h 1 (14.11) x − k < √5k 2 . Indeed, among any three consecutive convergents to x at least one satisfies (14.11). Moreover the constant √15 in (14.11) is best possible. Proof. We proceed by contradiction. Suppose that there is an n with hn+1 hn hn+2 ≥ √1 , ≥ √ 1 ≥ √ 1 − x − x , − x . kn 2 2 kn+1 kn+2 5kn2 5kn+1 5kn+2 Then   hn+1 hn 1 1 1 , kn+1 − kn ≥ √5 k 2 + k 2 n n+1

  hn+2 hn+1 1 1 1 . kn+2 − kn+1 ≥ √5 k 2 + k 2 n+1 n+2

Using the identity |kn+1 hn − kn hn+1 | = 1 we obtain √ √ kn+1 kn kn+2 kn+1 5> + , 5> + , kn kn+1 kn+1 kn+2 √ √ the strict inequality following from the irrationality of 5. Now x + x1 < 5 implies √ that x < ϕ := 1+2 5 . Thus kn+1 < ϕkn and kn+2 < ϕkn+1 . Now kn+2 ≥ kn+1 + kn , n+2 and so kkn+1 ≥ 1 + hknn ≥ 1 + ϕ1 = ϕ, contradicting kn+2 < ϕkn+1 . √ For x = ϕ the constant 5 in the theorem cannot be improved. Indeed, for n +hn−1 any n we have ϕ = [1; 1, . . . , 1, ϕ] = ϕh ϕkn +kn−1 from which we get 1 1 1 ϕ − hn = = 2 . kn (ϕkn + kn−1 )kn kn (ϕ + kn−1 ) kn

14.7. THE SET OF ALL BEST RATIONAL APPROXIMATIONS

Thus lim

n→∞

kn2

133

1 1 1 ϕ − hn = lim = =√ . k n−1 n→∞ ϕ + kn ϕ + 1/ϕ 5 h n−1

Thus for any  > 0 there are at most finitely many n with 1 x − hn < √ 1 . kn ( 5 + ) k 2 n

 A result of related interest is the following. Theorem 14.6.2. Let x be an irrational number. If ab is a rational number with b ≥ 1 and a 1 x − < 2 , b 2b then ab is one of the convergents of the simple continued fraction expansion of x. Proof. We may assume (a, b) = 1. Let { hknn } be the sequence of convergents to x. Let n be such that kn ≤ b < kn+1 . Then by Theorem 14.5.1 |bx − a| ≥ |xkn − hn | and consequently b|x − ab | ≥ kn |x − hknn |, whence x − hn ≤ b 1 = 1 . kn kn 2b2 2bkn Thus

hn a hn + x − ≤ − − x kn b kn

a 1 1 + 2. ≤ b 2bkn 2b

1 + If ab 6= hknn , then | hknn − ab | ≥ kn1 b , and thus kn1 b ≤ 2bk n hn a this implies b = kn , a contradiction. Therefore b = kn .

1 2b2 ,

that is b ≤ kn . But 

14.7. The set of all best rational approximations We have seen that any convergent to a real number x is a best rational approximation to x. The converse is false, that is, there exist best rational approximations that are not convergents. However, it can be shown that every best rational approximation is either a convergent or an “interpolate” between two consecutive n+1 convergents hknn , hkn+1 of the form hn j + hn−1 , kn j + kn−1 with j ∈ {1, 2, . . . , an+1 − 1}, where an+1 is the (n + 1)-th partial quotient; h0 = 1, k0 = 1. We will leave this as an exercise for the reader to prove. For j = an the n+1 j-th interpolate is just hjn+1 . We note that not every interpolate is a best rational approximation, as we see in the next example. 333 Example 14.7.1. The first few convergents to π are 31 , 22 7 and 106 , and the first few partial quotients are 3, 7, 15. The interpolates between 31 and 22 7 are given by 3j + 1 , j ∈ {1, 2, . . . , 6}, j+0

134

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

13 16 19 that is, 41 , 27 = 3.5, 10 3 = 3.33..., 4 = 3.25, 5 = 3.2, 6 = 3.166.... Among these fractions, only the latter three are better approximations to π than 13 . 333 The interpolates between 22 7 and 106 are given by

22j + 3 , 7j + 1

j ∈ {1, 2, . . . , 14},

and one sees that these are better approximations to π than obtain the additional best approximations to π,

22 7

for j ≥ 8. Thus we

179 201 223 245 267 289 311 , , , , , , . 57 64 71 78 85 92 99 14.8. Quadratic Irrationals and Periodic Continued Fractions Definition 14.8.1. A quadratic irrational is an irrational number of degree 2 over Q, that is, satisfying a quadratic equation with rational coefficients. A real number is a quadratic irrational if and only if it can be expressed in the form √ a+b m , x= c with a, b, c integers and m a square-free positive integer strictly greater than one. Definition 14.8.2. i) A periodic continued fraction is an infinite continued fraction of the form (14.12)

[a0 ; a1 , . . . , ar , b0 , b1 , . . . , bn−1 ],

where the overbar indicates that this sequence of integers keeps repeating. iii) It is said to have period n, if n is the minimal length of the repeating cycle. ii) It is said to be purely periodic if it is of the form [b0 ; b1 , . . . , bn−1 ]. Theorem 14.8.1. For any real number x, x has a simple periodic continued fraction if and only if x is a quadratic irrational. Proof. Suppose first that x has a periodic expansion as given in (14.12). Then, setting y = [b0 ; b1 , . . . , bn−1 ], (14.13) where

x = [a0 ; a1 , . . . , ar , b0 , b1 , . . . , bn−1 ] = [a0 ; a1 , . . . , ar , y] =

hr hr−1 kr , kr−1

are convergents for x. Now y = [b0 ; b1 , . . . , bn ] = [b0 ; b1 , . . . , bn , y] =

h0

yhr + hr−1 ykr + kr−1

yh0n + h0n−1 , 0 ykn0 + kn−1

where the k0n are the convergents for [b0 ; b1 , . . . , bn ]. Thus y satisfies the quadratic n 0 equation y 2 kn0 + y(kn−1 − h0n ) − h0n−1 = 0. Since kn0 6= 0, y is a quadratic irrational, and thus by (14.13), so is x. Next, suppose that x is a quadratic irrational, satisfying f (x) := Ax2 + Bx + C = 0 for some integers A, B, C, A 6= 0, and having continued fraction expansion x = [a0 ; a1 , a2 , . . . ]. Let {hn /kn } be the sequence of convergents to x. Then for any n we can write x = [a0 ; a1 , a2 , . . . , an , αn ],

14.8. QUADRATIC IRRATIONALS AND PERIODIC CONTINUED FRACTIONS

135

with αn = [an+1 ; an+2 , . . . ]. Our goal is to prove that there exist positive integers m < n with αn = αm for this will imply that x has the form x = [a0 ; a1 , a2 , . . . , am , am+1 , . . . , an ], making it periodic. By Theorem 14.3.1 we have αn hn + hn−1 x= , αn kn + kn−1 and thus f (x) = 0 implies that A(αn hn + hn−1 )2 + B(αn hn + hn−1 )(αn kn + kn−1 ) + C(αn kn + kn−1 )2 = 0, that is (14.14)

An αn2 + Bn αn + Cn = 0

for some An , Bn , Cn ∈ Z, with (14.15)

An = Q(hn , kn ) = Ah2n + Bhn kn + Ckn2 ,

Cn = Q(hn−1 , kn−1 ).

Define Q(X, Y ) to be the quadratic form Q(X, Y ) := AX 2 + BXY + CY 2 , so that f (X) := Q(X, 1). For any n we define Qn (X, Y ) = An X 2 + Bn XY + Cn Y 2 , 

 hn hn−1 = ±1, kn kn−1 we have Q(X, Y ) ∼ Qn (X, Y ) over Z (that is, Qn (X, Y ) can be obtained from Q(X, Y ) by an invertible change of variables), and thus the determinants of the matrices associated with these quadratic forms must be equal, that is, for any n, with An , Bn , Cn as given in the previous paragraph. Since det

Bn2 − 4An Cn = B 2 − 4AC. Set fn (X) := Qn (X, 1), so that by (14.14), fn (αn ) = 0. We claim there are at most finitely many possibilities for the quadratic polynomial fn (X), and therefore at most finitely many possible zeros αn as n runs through N, completing the proof of the theorem. Note, by (14.15)     2  An = kn2 f (hn /kn ) = kn2 f (x) + f 0 (x) x − hknn + f 00 (x) x − hknn , and so since |x − hn /kn | < k12 , n   1 |An | ≤ kn2 |f 0 (x)||x − hknn | + |f 00 (x)||x − hknn |2 ≤ |f 0 (x)|+|f 00 (x)|/2kn2 ≤ |f 0 (x)|+|f 00 (x)|, 2 and similarly |Cn | ≤ |f 0 (x)| + |f 00 (x)|. 2 2 Noting that |Bn | = B −4AC +4An Cn , we see that there are finitely many choices for An , Cn and Bn .  Next we characterize when a quadratic irrational has a purely periodic continued fraction. Theorem 14.8.2. The quadratic irrational x has a purely periodic continued fraction expansion if and only if x > 1 and the conjugate of x satisfies x ∈ (−1, 0).

136

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

Proof. Suppose that x = [a0 ; a1 , . . . , an ] = [a0 ; a1 , . . . , an , x]. Then x=

hn x + hn−1 , kn x + kn−1

and so kn x2 + (kn−1 − hn )x − hn−1 = 0. Now, certainly a0 6= 0 (since an+1 6= 0) and so a0 ≥ 1, x ≥ 1. We claim that the quadratic polynomial f (X) := kn X 2 + (kn−1 − hn )X − hn−1 has a zero between 0 and -1. Indeed, f (0) = −hn−1 < 0, f (−1) = kn + hn − kn−1 − hn−1 > 0. This zero must in fact be x. Converse: Suppose that x > 1 and that −1 < x < 0. For any n we have αn hn + hn−1 , αn kn + kn−1 αn hn + hn−1 x = [a0 ; a1 , . . . , an , αn ] = . αn kn + kn−1

x = [a0 ; a1 , . . . , an , αn ] =

but note that the latter is not a simple continued fraction expansion since αn 6≥ 1. Our motivation is that if x has a purely periodic expansion then so does αn . We claim that for any n, [−1/αn ] = an and −1 < αn < 0. The proof is by induction. For n = 0, x = a0 + 1/α0 and so [−1/α0 ] = [a0 − α0 ] = a0 , and α0 = 1/(α − a0 ) ∈ (−1, 0), since x − a0 < −1. Suppose true for n − 1 and consider n. We have αn−1 = an + 1/αn , and so [−1/αn ] = an + [−αn−1 ] = an , and αn =

1 ∈ (−1, 0), αn−1 − an

since αn−1 − an < −1. Then, since αm = αn we have αm = αn , implying am = an , implying αn−1 = αm−1 , implying am−1 = an−1 , etc. finally getting a0 = an−m and x = αn−m .  √ Consider now the continued fraction expansion of√ m where m is a positive √ integer that √ √ is not a perfect square. Set x = m + [ m]. Then x > 1 and x = [ m] − m ∈ (−1, 0), and therefore x is purely periodic, say x = [a0 ; a1 , . . . , ar−1 ] √ with r minimal, and a0 = 2[ m]. Thus √ √ √ √ (14.16) m = x − [ m] = [[ m]; a1 , . . . , ar−1 , 2[ m]], and we have Theorem 14.8.3. If m is a positive integer that is not a perfect square, then √ the continued fraction expansion of m has the form (14.16). Another elementary result we will need is the following. Theorem 14.8.4. If x > 1 has a continued fraction expansion x = [a0 ; a1 , . . . ], then x−1 has expansion x−1 = [0; a0 , a1 , . . . ]. In particular, the n-th convergent to x−1 is the reciprical of the (n−1)-st convergent to x.

14.9. PELL EQUATIONS

137

14.9. Pell Equations The Pell equation is a Diophantine equation of the type (14.17)

x2 − my 2 = N,

with m, N ∈ Z. Of particular interest √ √ is the case where N = ±1, where √ we are determining the units x + my ∈ Z[ m], that is the values with δ(x + y m) = ±1, √ where δ(x + y m) = x2 − my 2 . Example 14.9.1. Consider the equation x2 − y 2 = N. It is elementary to prove that this is solvable if and only if N ≡ 0, 1, 3 mod 4. We leave this as an exercise for the reader. Next, lets explore the connection with best rational approximations. Suppose that m is not a perfect square and that (x, y) is a solution of (14.17). Then we have √ √ (x − my)(x + my) = N   x √ N √ − m = y y x + my   N x √ √ − m = , y y(x + my) √ and we see that xy is a good approximation to m. Just how good is it? Lets say √ |N | < m. √ √ Case i: Suppose that N > 0. Then xy > m, x > my and we get   1 x √ − m < 2, y 2y √ implying that xy is one of the convergents to m by Theorem 14.6.2. √ √ Case ii: Suppose that N < 0, so that xy < m, x < my and 1 |N | 1 1 √ − y = m x (x + √my)√mx < (x + x)x = 2x2 , implying that xy is a convergent to √1m . Thus, by the preceding theorem √ a convergent to m. This establishes the following

x y

is again

Theorem 14.9.1. √ Let m be a positive integer, not a perfect square, N be any integer with |N | < m and suppose that x, y are positive integers satisfying the Pell equation (14.17). Then xy is one of the convergents for the continued fraction √ expansion of m. The √ next lemma shows that one can systematically determine all values of N < m for which there is a solution to the Pell equation (14.17). √ Lemma 14.9.1. Let { hknn } be the sequence of convergents to m. Then i) The sequence {h2n − mkn2 } is√periodic. ii) For any n, |h2n − mkn2 | < 2 m.

138

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

Proof. Say

√ √ m = [[ m]; a1 , . . . , ar−1 , 2[ m]], √ √ with a0 = [ m], ar = 2[ m]. For any positive integer n, write √ m = [a0 ; a1 , . . . , an , xn ], (14.18) so that

Then





m=

xn hn − hn−1 . xn kn + kn−1

√ √ √ (hn−1 − kn−1 m)(kn m + hn ) hn−1 − kn−1 m √ = mkn2 − h2n kn m − hn √ (hn hn−1 − mkn kn−1 ) + m(hn−1 kn − hn kn−1 ) = mkn2 − h2n √ n−1 sn + m(−1) = , tn

xn =

where tn = h2n − mkn2 , sn = mkn kn−1 − hn hn−1 . Since {xn } is periodic, so is {(−1)n−1 tn } and {tn }. Also, 2 √ √ hn − mkn2 = kn2 hn − m hn + m kn kn   √ 1 hn √ 2 ≤ kn 2 max , m < 2 m, kn kn+1 kn completing the proof. In particular, from the definition of xn in (14.18) we have xr−1 = and obtain √ √ √ sr−1 + m(−1)r m + [ m] = , tr−1 and so (−1)r /tr−1 = 1 and



 √ m + [ m],

2 h2r−1 − mkr−1 = (−1)r .

Moreover, we obtain the same relation if r is replaced by any multiple of√r by √ the periodicity of the continued fraction expansion (that is, xjr−1 = m + [ m]). Thus, we obtain an infinite family of solutions of the Pell equations x2 − my 2 = ±1, suggesting the following result. Theorem 14.9.2. All positive solutions of the Pell equation x2 − my 2 = ±1, √ are found among the convergents { hknn } to m. Let r be the (minimal) period of the expansion. If r is even, then x2 − my 2 = −1 has no solution, and the positive hjr−1 , j = 1, 2, 3, . . . . If r is odd, then solutions of x2 − my 2 = 1 are of the form kjr−1 the positive solutions of x2 − my 2 = −1 are of the form of x2 − my 2 = 1 of this form with j even.

hjr−1 kjr−1 ,

with j odd, and those

14.9. PELL EQUATIONS

139

Proof. By√Theorem 14.9.1, we know any solution of this Pell equation must be √ a convergent to m. Let tn = h2n −mkn2 as above. If tn = 1 then hknn > m, and so n √ √ −1 is odd √ √ < xn < 0, √ and xn = sn +√ m, xn = sn − m. Since√xn is purely periodic, so m − 1 < sn < m, implying that sn = [ m] and xn = [ m] + m = xr−1 . Thus by periodicity n = r − 1, 2r − 1, . . . . If r is odd, then since n is odd we have n = 2r − 1, 4r − 1, . . . , while √ if r is even, n = r − 1, 2r − 1, √ .... √ If tn = −1, then hknn < m, n is even, xn = −sn + m, xn = −sn − m, √ √ √ −sn = [ m] and xn = [ m] + m = xr−1 for n = r − 1, 2r − 1, . . . .  Note that the set of values √ Sm := {x + y m : (x, y) satisfies the Pell equation 14.19}, x2 − my 2 = 1,

(14.19)

can be broken into 4 pieces on the real number line Sm = S1 ∪ S2 ∪ S3 ∪ S4 ∪ {−1, 1}, where S1 = Sm ∩ (−∞, −1), S2 = Sm ∩ (−1, 0), S3 = Sm ∩ (0, 1), S4 = Sm ∩ (1, ∞), and that √ solution (that is both x and y positive), then α := √ if (x, y) is a positive x + y m ∈ S4 , α = x − y m ∈ S3 , −α ∈ S1 and√−α ∈ S2 . Now if (x, y) is a positive solution of (14.19) then xy is a convergent of m, and thus by monotonicity of the convergents, √ there must be a minimal positive solution in S4 , say (x1 , y1 ). Put α = x1 + y1 m. We claim that every positive solution of (14.19) is of the form (xn , yn ) with √ xn + yn m = α n . Indeed, suppose β is such a solution. Then αn ≤ β < αn+1 for some positive integer n. Multiplying by αn we get 1 ≤ βαn < α, and so by minimality of α, β = αn . Similarly, if α0 is the minimal solution of the Pell equation x2 − my 2 = −1, then the general solution is obtained from α0 αn , n ∈ Z. √ Example 14.9.2. Solve the Pell Equation x2 − 7y 2 = ±1. We have 7 = [2; 1, 1, 1, 4]. an 2 1 1 1 4 hn 0 1 2 3 5 8 37 kn 1 0 1 1 2 3 14 and we see that h2n − 7kn2 = −3, 2, −3, 1, −3, 2, . . . , and so there is no solution to x2 − 7y 2 = −1, and the minimal√solution to√x2 − 7y 2 = 1 is (x, y) = (8, 3). The general solution is given by x + y 7 = (8 + 3 7)n together with the conjugate and negative variants. √ Example 14.9.3. Solve the Pell Equation x2 − 73y 2 = ±1. We have 73 = [8; 1, 1, 5, 5, 1, 1, 16]. an hn kn

8 1 1 8 9 17 1 1 2 √ and so letting α = 1068 + 125 73 we see 0 1

1 0

5 94 11

5 1 487 581 57 68

1 1068 125

that δ(α) = −1, δ(α2 ) = 1, where √ α2 = 2281249 + 267000 73.

140

14. BEST RATIONAL APPROXIMATIONS AND CONTINUED FRACTIONS

Thus the minimal solution of (14.19) for -1 is (1068, 125), while the minimal solution for +1 is (2281249, 267000). 14.10. Liouville’s Theorem Theorem 14.10.1. Liouville (1844). If α is algebraic of degree n > 1 over Q, then there exists a constant c = c(α) such that for any rational number p/q we have p c(α) (14.20) α − q > q n . Proof. Let f (x) be the minimal polynomial for α, f (x) = an xn + · · · + a0 with ai ∈ Z. Since f is irreducible, f (p/q) 6= 0, and thus since q n f (p/q) ∈ Z we have |f (p/q)| ≥ 1/q n . Now by the mean value theorem, f (p/q) = f 0 (c)(p/q − α) for some c between α and p/q. We may assume that |p/q − α| ≤ 1 by taking c(α) ≥ 1 (otherwise |p/q−α| > 1/q n ). Let M be the maximum value of |f 0 (x)| on the interval |p/q − x| ≤ 1. Then |f (p/q)| ≤ M |p/q − α|, and we get |p/q − α| ≥ 1/(M q n ).  For quadratic irrationals we can be more precise Theorem 14.10.2. If α is a quadratic irrational, we can take the constant in Liouville’s Theorem to be c(α) = M1+2 , where M is the maximum partial quotient in the continued fraction expansion of α. Proof. Let {hn /kn } be the sequence of convergents to α = [a0 ; a1 , . . . ], and put xn = [an+1 ; an , . . . ], M = max{|an |}. Let p/q be an approximation to α with kn ≤ q < kn+1 . Then α − p ≥ α − hn q kn xn hn + hn−1 1 hn = = − xn kn + kn−1 kn kn (xn kn + kn−1 ) 1 1 ≥ > kn ((an+1 + 1)kn + kn−1 ) kn (an+1 + 2)kn 1 1 ≥ . ≥ (M + 2)kn2 (M + 2)q 2  Corollary 14.10.1. For every choice of signs the real number 1 1 1 x := 1 ± ± 2! ± 3! ± · · · , 2 2 2 is transcendental. Moreover the values are distinct for distinct choices of signs, and so this produces an uncountable collection of transcendental numbers. Such values x are examples of what are called Liouville numbers, numbers for which the inequality in Liouville’s Theorem fails for all positive integers n. Proof. Let x be such a number with a particular choice of signs, and let Sk denote the (k + 1)-st partial sum, so that Sk = pk /qk for some integers pk and qk = 2k! . Then for any positive integer k,   1 1 1 1 2 x − pk ≤ + · · · ≤ 1 + + + · · · < k+1 , qk 2(k+1)! 2 4 2(k+1)! q k

14.10. LIOUVILLE’S THEOREM

141

and so by Liouville’s Theorem, x cannot be algebraic of degree k. Finally to see that the values are distinct, suppose that x, y have distinct choices of signs, but x = y. By cancelation and rearranging the terms so that only positive signs occur we obtain two distinct binary expansions for the same positive real 1 1 number, a contradiction; or note 21n! > 2(n+1)! + 2(n+2)! + ··· .  Liouville’s Theorem was sharpened in subsequent work by Thue, Siegel and Roth, with Thue obtaining (1909) for any algebraic number α and positive ε, ε) α − p > c(α, , n +1+ε 2 q q Siegel (1921), α −

c(α, ε) p > 2√n+ε , q q

and Roth (1955), α −

p c(α, ε) > 2+ε . q q The latter inequality is generally referred to as the Thue-Siegel-Roth Theorem. It is best possible in the sense that the ε cannot be removed from the exponent. We conclude this section with an application of the Thue-Siegel-Roth Theorem to solving a diophantine equation. Example 14.10.1. Lets show that the equation x3 − 5y 3 = N, has at most finitely many integer solutions x, y. √ We may assume y > 0. Note that any such solution provides an approximation to 3 5, x √ − 3 5 ≤ N , y3 M y √ with M ≈ 3( 3 5)2 . On the other hand, by the Thue-Siegel-Roth Theorem, there is a constant c such that x √ − 3 5 ≥ c . y y 5/2 2 Thus we obtain y ≤ (N/M c) , and so there are at most finitely many choices for y. For each choice of y there is at most one choice for x.

CHAPTER 15

Dirichlet Series 15.1. Definition and Convergence of a Dirichlet series 15.1.1. A Dirichlet series is an infinite sum of the form F (s) = P∞Definition an , where the ai ∈ C and s = σ + it is a complex variable. s n=1 n Note that for any complex number s = σ + it, |ns | = nσ . Example 15.1.1. The Riemann zeta function is defined to be the series ζ(s) =

∞ X 1 , s n n=1

wherever this series converges. The series converges absolutely for σ > 1, and fails to converge absolutely if σ ≤ 1. In fact, it fails to converge at all for any s with σ ≤ 1. The details are left as an exercise. P∞ Theorem 15.1.1. Let n=1 anns be a Dirichlet series which converges absolutely for some s but not for all s ∈ C. Then there is a real number σa , called the abscissa of absolute convergence, such that the series converges absolutely on the half plane σ > σa but does not converge absolutely if σ < σa . P∞ Proof. Suppose that n=1 | nasn0 | converges for some s0 = σ0 + it0 . Then by the comparison test, the series converges absolutely for all s with σ ≥ σ0 . Thus we can define ( ) ∞ X an σa = Infimum σ ∈ R : converges absolutely . nσ n=1  We note that by Theorem 15.5.2 below there is also an abscissa of convergence σc such that the series converges on the half-plane σ > σc but diverges on the half-plane σ < σc . Moreover, one can prove that 0 ≤ σa − σc ≤ 1. Homework 15.1.1. Prove that σa ≤ 1 + σc . Hint: If the series converges at 1 s0 = σ0 + it0 , show that there is a constant C such that | anns | ≤ C nσ−σ , for any 0 s = σ + it with σ > σ0 P∞ Theorem 15.1.2. Uniqueness Theorem. Let F (s) = n=1 fn(n) and G(s) = s P∞ g(n) be Dirichlet series converging absolutely for σ > σ . If F (s) = G(s) for a n=1 ns σ > σa then f (n) = g(n) for all positive integers n. ( In fact, all we need is that F (sn ) = G(sn ) for some sequence {sn } with sn = σn + itn and σn → ∞.) 143

144

15. DIRICHLET SERIES

P∞ Proof. Let H(s) = n=1 h(n) ns := F (s) − G(s) so that H(s) = 0 on the halfplane σ > σa . Suppose that h(n) 6= 0 for some n. Let N be the smallest such n. Fix α > σa , and let s be such that σ ≥ α. Then 0 = H(s) =

∞ ∞ X X h(N ) h(n) h(n) = + , s s n N ns n=1 n=N +1

and so |h(N )| ≤ N

σ

∞ ∞ X X h(n) |h(n)| 1 = Nσ ns nα nσ−α

n=N +1



Nσ (N + 1)σ−α

n=N +1

 σ ∞ X |h(n)| N = Cα , nα N +1

n=N +1

for some constant Cα depending only on α. Letting σ → ∞ we conclude that h(N ) = 0 a contradiction. Therefore all h(n) = 0.  P∞ P∞ Theorem 15.1.3. If n=1 fn(n) and n=1 g(n) s ns are Dirichlet series converging absolutely on the half-plane σ > σa , then for any s in this half-plane ∞ X f (n) ns n=1

where h(n) =

P

d|n

!

∞ X g(n) ns n=1

!

∞ X h(n) = , ns n=1

f (d)g(n/d).

Corollary 15.1.1. Let f be an arithmetic function and F (n) = Then

P

d|n

f (d).

∞ ∞ X f (n) X F (n) = . ζ(s) ns ns n=1 n=1

15.2. Important examples of Dirchlet Series P∞ 1 Corollary 15.2.1. ζ(s) = n=1 µ(n) ns , for σ > 1. Proof. By the preceding corollary, ∞ ∞ X 1 X µ(n) = 1, ns n=1 ns n=1

since

P

d|n

µ(d) = 0 unless n = 1 in which case it equals 1.



¨ 15.3. ANOTHER PROOF OF THE MOBIUS INVERSION FORMULA

145

Theorem 15.2.1. Some important Dirichlet series. (a)

∞ X 1 = ζ(s) s n n=1

(b)

∞ X µ(n) 1 = , s n ζ(s) n=1

(c)

∞ X ζ(s − 1) φ(n) = , s n ζ(s) n=1

(d)

∞ X τ (n) = ζ 2 (s), s n n=1

(e)

∞ X σ(n) = ζ(s)ζ(s − 1) ns n=1

(σ > 1), (σ > 1), (σ > 2), (σ > 1), (σ > 2).

Proof. We’ve already seen (a) and (b). (c) follows from P P the preceding Ptheorems, using φ(n) = d|n µ(d) nd . For (d), (e), use τ (n) = d|n 1, σ(n) = d|n d. The details are left for homework. 

15.3. Another Proof of the M¨ obius Inversion Formula Next, lets give another proof of the M¨obius inversion formula using Dirichlet series. Proof. Let f be an arithmetic function and F (n) = ζ(s)

P

d|n

f (d). Then

∞ ∞ X f (n) X F (n) = , ns ns n=1 n=1

on the half-plane of convergence. Thus, on this half-plane ∞ ∞ ∞ ∞ ∞ X f (n) 1 X F (n) X µ(n) X F (n) X cn = = = , ns ζ(s) n=1 ns ns n=1 ns ns n=1 n=1 n=1

where cn =

P

d|n

µ(d)F (n/d). Thus, f (n) =

X d|n

n F (d)µ( ). d

If the Dirichlet series does not converge absolutely at any point then one can proceed PN by truncating the series and considering n=1 fn(n) s . Repeating the above argument, ζ(s) where F (n) =

P

d|nd≤N

N ∞ X f (n) X F (n) = , ns ns n=1 n=1

f (d). In particular F (n) = F (n) for n ≤ N .



146

15. DIRICHLET SERIES

15.4. Product Formula for Dirichlet Series Theorem 15.4.1. (i) Suppose that f (n) is a multiplicative function such that P∞ f (n) n=1 ns converges absolutely on σ > σa . Then for s with σ > σa ,   ∞ X f (p) f (p2 ) f (n) Y = 1 + + + . . . . ns ps p2s p n=1 Q Here, p denotes an infinite product over the set of primes. (ii) If in addition f is totally multiplicative then for s with σ > σa ,  −1 ∞ X f (n) Y f (p) . = 1 − ns ps p n=1 This infinite product over primes is sometimes called an Euler product. In particular, we have a product formula for the Riemann zeta function: For σ > 1 we have −1 Y 1 (15.1) ζ(s) = 1− s . p p Proof. Let s be such that σ > σa . For positive x ∈ R let  X f (n) Y f (p) f (p2 ) + . . . = , G(x) = 1+ s + s p p ns p≤x

n∈Sx

where Sx is the set of positive integers all of whose prime factors are ≤ x. Then X X |f (n)| X |f (n)| f (n) ≤ |G(x) − F (s)| = ≤ , s σ n n nσ n∈S n≥x n6=Sx / x the tail of a convergent series. Thus G(x) → F (s) as x → ∞.



15.5. Analytic properties of Dirichlet series This section is for students familiar with the basic notions from complex analysis. Recall, a complex valued function f (s) of a complex variable s is called analytic on an open region in the complex plane if its derivative exists at every point in the region. P∞ Theorem 15.5.1. Let F (s) := n=1 anns be a Dirichlet series with abscissa of absolute convergence σa ∈ R. a) The series for F (s) converges uniformly on any half-plane of the type σ ≥ σa + , with  > 0. In particular, F (s) is a continuous function on the half-plane σ > σa . P∞ n . b) F (s) is analytic on the half plane σ > σa , with F 0 (s) = n=1 an nlog s Proof. a) Fix  > 0, and let σ0 = σa + . Put Mn = |an |/nσ0 . Then for any s = σ + it in the half-plane σ ≥ σ0 , P we have |an /ns | ≤ Mn . Since the series ∞ F (s) converges absolutely at σ0 , we have n=1 Mn < ∞. Clearly, the terms of the series are continuous functions on the half-plane σ ≥ σ0 . Thus, by the Weierstrass M -test, the series converges to a continuous function. log n d an b) First note that ds ns = −an ns . Suppose that s is in the half-plane σ ≥ σ0 . Let σ1 be any number with σa < σ1 < σ0 . Then for n sufficiently large we have

15.5. ANALYTIC PROPERTIES OF DIRICHLET SERIES

147

|an log n/ns | ≤ |an |/nσ1 . Thus the series for the derivative converges uniformly on P∞ n the half-plane σ ≥ σ1 . Thus F 0 (s) = n=1 an nlog on the half-plane of absolute s convergence.  With further work one can show that the Dirichlet series in fact represents an analytic function on the half-plane σ > σc , where σc is the abscissa of convergence. P∞ an Theorem 15.5.2. If F (s) := n=1 ns is a Dirichlet series converging at a given point s0 = σ0 + it0 , then it also converges at all s on the half-plane σ > σ0 . We define the abscissa of convergence to be σc := inf{σ : F (s) converges at some point of the type s = σ + it}, putting σc = −∞ if it converges everywhere. Moreover, F (s) represents an analytic function on the half-plane σ > σc . Proof. Let s = σ + it with σ > σ0 . Then N N N X X X an 1 an 1 (cn − cn−a ) s−s0 , = = s s s−s 0 0 n n n n

n=M

n=M

n=M

where cn :=

n X ak . k s0

k=1 P∞ We are given that n=1 nasn0 converges. Thus, limn→∞ cn converges, so there exists a constant C such that |cn | ≤ C for all n. By partial summation we then get   N N X X an 1 cN cM −1 1 = c − + s−s0 − s−s0 n ns ns−s0 (n + 1)s−s0 N M n=M n=M Z n+1 N −1 X du cN cM −1 = cn (s − s0 ) + s−s0 − s−s0 s−s +1 0 u N M n n=M

Thus Z n+1 N N X X du |cN | |cM −1 | an ≤ |c ||s − s | + σ−σ0 − σ−σ0 n 0 σ−σ0 +1 ns u N M n n=M n=M Z N du |cN | |cM −1 | ≤ c|s − s0 | + σ−σ0 − σ−σ0 σ−σ0 +1 u N M M c|s − s0 | 1 2C ≤ + σ−σ0 . σ − σ0 M σ−σ0 M The latter bound is uniform in N , and thus ∞ X a c|s − s | 1 2C 0 n + σ−σ0 . ≤ ns σ − σ0 M σ−σ0 M n=M

It follows that

∞ X a n lim = 0, M →∞ ns n=M

P∞ and therefore, by the Cauchy-Criterion, that the series n=1 anns converges. Moreover, the convergence is uniform on compact subsets of the half-plane σ > σc , and so the series converges to an analytic function on this half-plane. 

148

15. DIRICHLET SERIES

P∞ n Homework 15.5.1. Let ζ2 (s) = n=1 −1 ns . Prove the following. i) σc = 0, σa = 1. ii) ζ2 (s) is analytic on the half-plane σ > 0. . This yields an analytic continuation of ζ(s) to the half-plane iii) ζ(s) = 1−ζ2 (s) 1 2s−1

σ > 0. It also shows that on this half-plane ζ(s) has only one pole, and that it occurs at s = 1. P∞ Theorem 15.5.3. Let σc be the abscissa of convergence of F (s) = n=1 anns . P If n≤x an = O(xk ) for some k ∈ R, as x → ∞, then σc ≤ k. Pn P k Proof. Let An := k=1 ak . Then, by the hypothesis n≤x an = O(x ), there exists a constant C > 0 such that An ≤ Cnk for all n ≥ 1. Then  N N N −1  X X X an 1 1 1 AM +1 AN = − + s (An − An−1 ) s = − s s s s n n n (n + 1) M N n=M n=M n=M Z N −1 n+1 X AM +1 AN du − + s. = An s s+1 s u M N n n=M

Thus, for σ > k we have N Z n+1 N X a X du CM k CN k n k n ≤ C|s| + + s σ+1 σ n u M Nσ n n=M

n=M

C|s|

∞ X n=M

nk ·

1 + 2CM k−σ . nσ+1

If σ > k, the right-hand side tends to zero as M → ∞. Thus, the series converges.  Theorem P∞ 15.5.4. Let σa be the abscissa of absolute convergence for the series F (s) := n=1 anns . If X (15.2) |an |  xc , n≤x

as x → ∞, then σa ≤ c. Conversely, if c > max{0, σa } then (15.2) holds as x → ∞. Proof. Suppose that (15.2) holds for a given c. Then there exists a constant C1 such that for any positive integer N 2N 2N X 1 X C1 (2N )c C2 an |an | ≤ = σ−c , s ≤ σ σ n N N N n=N

n=N

where C2 = 2c C1 . Thus,  k+1  2N ∞ ∞ 2X −1 ∞ X X X |an |  X C2 1 an  ≤ = C . s = 2 s| k(σ−c) k(σ−c) n |n 2 2 k

n=N

k=0

n=2

k=0

k=0

Hence, if σ > c, the series converges absolutely, that is, σa ≤ c. Conversely, suppose that c > max{0, σa }. Then for x > 1, X n≤x

|an | ≤

X n≤x

|an |

∞ X |an | X xc |an | c c = x ≤ x  xc . c nc nc n n=1 n≤x

15.6. THE RIEMANN ZETA FUNCTION AND THE RIEMANN HYPOTHESIS

149

We note that the first inequality requires c > 0, while the last one uses the absolute convergence of the series at s = c.  15.6. The Riemann Zeta Function and the Riemann Hypothesis P∞ The series ζ(s) = n=1 n1s defines an analytic function on the half plane σ > 1. This function can be extended to a meromorphic function on all of C having just one pole, a simple pole at s = 1. To do this, we first extend it to a meromorphic function on the half-plane σ > 0. We already did this in a homework problem above by making use of the alternating series ζ2 (s). Another way to do this is to utilize the Euler-Macluarin summation formula: Z N Z N N X 1 f 0 (t)s(t) dt, (15.3) f (t) dt + (f (1) + f (N )) + f (n) = 2 1 1 n=1 where s(t) := t − [t] − 21 . Letting f (t) = t−s , we derive N   Z N N X t−s+1 1 1 1 s(t) = dt + 2 1 + Ns − s s s+1 n 1 − s t 1 1 n=1 Thus for σ > 1,

Z ∞ 1 1 s(t) + −s dt, s−1 2 ts+1 1 and we see that ζ(s) has a simple pole at s = 1 . Now, the integral converges absolutely for σ > 0 and thus this formula for ζ(s) yields an analytic extension of ζ(s) to the half-plane σ > 0. The zeta function can be extended to the entire complex plane by making use of the functional equation (which we will not prove here), πs ζ(1 − s) = 2(2π)−s Γ(s) cos( )ζ(s), 2 where Γ(s) is the Rgamma function, a meromorphic function on C, defined for ∞ σ > 0 by Γ(s) = 0 xs−1 e−x dx. Γ is analytic on C except for simple poles at 0, −1, −2, −3, . . . with residues (−1)n /n!. We also have ∞  Y 1 s  −s/n = seγs 1+ e , Γ(s) n n=1 ζ(s) =

for all s ∈ C. Thus Γ(s) is never 0. Since ζ(s) = Πp (1 − p1s )−1 , for σ > 1 it follows that ζ(s) 6= 0 for σ > 1; this requires proof. Homework 15.6.1. i) Deduce from the product formulation of the zeta function that ζ(s) is nonzero for σ > 1. P∞ 1 ii) Does the series n=1 n1+it converge or diverge for a fixed t 6= 0? From the functional equation one can see that ζ(s) has zeros at s = −2, −4, −6 . . . , called the trivial zeros, and that these are the only zeros of ζ(s) outside of the critical strip 0 ≤ σ ≤ 1. This leaves the question of the zeros of the zeta function in the critical strip. Riemann Hypothesis: All zeros of ζ(s) in the critical strip 0 ≤ σ ≤ 1 are on the vertical line σ = 1/2.

150

15. DIRICHLET SERIES

It has been shown (X. Gourdon (2004) and Patrick Demichel) that the first 1013 zeros of the zeta function in the critical strip are on the line σ = 12 (and have t < 1024 ). Moreover, they are simple. The Riemann Hypothesis is equivalent to the statement that 1/ζ(s) can be exP∞ tended to an analytic function on the half-plane σ > 1/2. But 1/ζ(s) = n=1 µ(n) ns . It follows, that the Riemann Hypothesis is equivalent to Conjecture: For any  > 0 there is a constant c() such that, X 1 (15.4) µ(x) ≤ c()x 2 + . n≤x

We can deduce one direction of this equivalence from Theorem 15.5.2. Suppose that the inequality in (15.4) holds. Then from the theorem we deduce that the abscissa of convergence of the series ζ1 satisfies σc ≤ 12 + . In particular, ζ(s) can have no zero on the half-plane σ > 12 + . Since this holds for arbitrary epsilon, we conclude that ζ(s) has no zero on the half-plane σ > 12 . From the functional equation we deduce that it has no zero on the strip 0 ≤ σ < 12 as well. Thus the only zeros it can have in the P critical strip are on the line σ = 12 . √ x Merten conjectured that n=1 µ(n) < x, but this has been shown to be false 64 (1985) for some x < e3.3·10 . 15.7. More on the zeta function Recall the definition of the Von Mangoldt function ( log p, if n = pk , for some prime p and k ∈ N; (15.5) Λ(n) = 0, otherwise. Theorem 15.7.1. P∞ Then following expansions are valid for σ > 1, a) −ζ 0 (s) = n=1 log s . P∞ n 1 b) log(ζ(s)) = n=1 Λ(n) log n ns . 0 P∞ (s) c) − ζζ(s) = n=1 Λ(n) ns . Proof. a) Trivial. b) From ζ(s) = log(ζ(s)) = −

X p



1 log 1 − s p

Q  p

 =

1−

1 ps

∞ XX p k=1

−1

we have ∞

XX 1 1 = . kpsk k(pk )s p k=1

On the other hand Λ(n) 1 = log n ns

(

1 1 k pks ,

if n = pk ;

0,

otherwise.

Thus, by rearranging the terms we obtain (b). Part (c) is immediate on taking the derivative of the series in (b).  P∞ Note 15.7.1. We also have −ζ 0 (s)/ζ(s) = − n=1 X µ(d) log(n/d), Λ(n) = − d|n

log n ns

·

P∞

µ(n) n=1 ns .

Thus

15.7. MORE ON THE ZETA FUNCTION

and from −ζ 0 (s) = ζ(s)

P∞

n=1

151

Λ(n) ns

we get X log n = Λ(d). d|n

P

We define ψ(x) = n≤x Λ(n). The prime number theorem is equivalent to the P statement ψ(x) ∼ x. Suppose that ψ(x) = x+O(xδ ), that is n≤x (Λ(n)−1)  xδ . We claim that δ ≥ 12 . Define F (s) =

∞ X ζ 0 (s) Λ(n) − 1 = − ζ(s), s n ζ(s) n=1

for σ > 1. By Theorem 15.5.3 the abscissa of convergence for F (s) satisfies σc ≤ δ. Thus F (s) is analytic on the half-plane σ > δ. Now ζ(s) is analytic on σ > 0 except 0 (s) = F (s) + ζ(s) is analytic on the half-plane for a simple pole at s = 1. Thus ζζ(s) σ > δ except for a pole at s = 1. In particular, ζ(s) 6= 0 on the half-plane σ > δ. Since we know that ζ has zeros on the line σ = 21 , we can’t have δ < 12 . In fact, the Riemann-Hypothesis implies that 1

ψ(x) = x + O(x 2 log2 (x)).

APPENDIX A

Preliminaries Definition A.0.1. A statement is a sentence that can be assigned a truth value. (In general there is a subject, verb and object in the statement). Example A.0.1. Suppose that x is a given real number. The following are statements, that is, we can definitively assert whether A, B or C is true or false: A : “x2 = 4.”

B : “x = 2.”

C : “x = ±2.”

The latter statement is read, x equals plus or minus 2. For example, if x = −2 then statement A is true, statement B is false and statement C is true. Note that these statements are complete sentences. In statement A, the subject is “x2 ”, the verb is “=” and the object is “4”. If A and B are statements, A ⇒ B means A implies B, that is, if A is true then B is true. A ⇔ B means A is equivalent to B, that is, A is true if and only if B is true. Example A.0.2. Which of the following are true statements? 1. If x2 = 4 then x = 2. 2. If x2 = 4 then x = ±2. 3. If x = 2 then x2 = 4. 4. x2 = 4 ⇔ x = ±2. If you answered false, true, true, true to the four statements above, then you are probably thinking correctly, but note the truth value actually depends on an implicit assumption about what type of object x is, such as x is an integer or x is a real number. If our implicit assumption is that x is a natural number, then the first statement is true. If x ∈ Z4 , a ring we will see later in the semester, then statement 4 is false. Note A.0.2. The symbols ⇒ and ⇔ are used between statements. The symbol = is used between objects (numbers, functions, sets, etc. ). Be careful in making this distinction whenever you write a proof.

Definition A.0.2. Let A, B be given sets. A function f : A → B (pronounced, a function f from A to B), is a rule that assigns to each element x ∈ A a unique element f (x) ∈ B. The set A is called the domain of f and the set B, the co-domain of f . The range of f , denoted f (A), is the set of all output values, f (A) := {f (x) : x ∈ A}. The range is a subset of the codomain. 153

154

A. PRELIMINARIES

Definition A.0.3. The cartesian product of two sets A, B, denoted A × B, is the set of all ordered pairs (x, y) with x ∈ A, y ∈ B. That is, A × B = {(x, y) : x ∈ A, y ∈ B}. Example A.0.3. Z × Z is the set of all ordered pairs of integers, Z × Z = {(x, y) : x, y ∈ Z}. Definition A.0.4. 1) A binary operation ⊕ on Z is a function ⊕ : Z×Z → Z, that assigns to each ordered pair (a, b) of integers a unique integer denoted a ⊕ b. 2) It is called commutative if a ⊕ b = b ⊕ a for all a, b ∈ Z. 3) It is called associative if a ⊕ (b ⊕ c) = (a ⊕ b) ⊕ c for all a, b, c ∈ Z. 4) An element e ∈ Z is called an identity element with respect to ⊕ if a ⊕ e = a and e ⊕ a = a for all integers a. Example A.0.4. Ordinary addition and multiplication are binary operations on Z; so is subtraction. Division fails? Why? Because for a, b ∈ Z, a ÷ b in general is not an integer. All we need is one counterexample to show a given formula is not a binary operation. So we could just say 1 ÷ 2 6∈ Z, so division is not a binary operation. Addition and Multiplication are both commutative and associative, and both have identities. 0 is the additive identity, and 1 is the multiplicative identity. √ Example A.0.5. Let a ⊕ b := ab, for a, b ∈ Z. (Note, the colon after a ⊕ b is used in mathematics to indicate√that this is a definition.) Is this a binary operation on Z? No, for example, 1⊕2 = 2 which is not an integer. To be a binary operation on Z, the output has to be an integer for all possible integer inputs. If this fails for one example, then the operation fails to be a binary operation. Example A.0.6. Which of the following are binary operations on Z. a⊕b := 3, a⊕b := gcd(a2 +1, b2 +1), (where gcd is the greatest common divisor.) a⊕b := b2 /a, a ⊕ b := ±a, a ⊕ b := ab . Answer: Just the first two. Example A.0.7. Lets define an operation by a ⊕ b := 3b for any a, b ∈ Z. (When you read a definition like this, you should keep in mind that the choice of the letters a, b is irrelevant. We could just as well have written x ⊕ y = 3y. The way you should think about the operation is to use words: a ⊕ b is 3 times the second number.) i) Is this a binary operation? Plainly, for any b ∈ Z, 3b is in Z and it is uniquely defined. Thus ⊕ is a binary operation. ii) Is this operation commutative? Here we need to test whether a ⊕ b = b ⊕ a for all a, b ∈ Z. By definition a ⊕ b = 3b, while b ⊕ a = 3a. Thus to be commutative we would need 3b = 3a, that is, b = a for any two integers a, b, which is blatantly false. An alternate way to show the operation is not commutative is with a single counterexample: 3 ⊕ 2 = 6, while 2 ⊕ 3 = 9. iii) Is the operation associative? (1 ⊕ 2) ⊕ 3 = 6 ⊕ 3 = 9, while 1 ⊕ (2 ⊕ 3) = 1 ⊕ 9 = 27. Thus we have a counterexample, so the operation is not associative. iv) Is there an identity element? Suppose that e is an identity element. Then e ⊕ a = a and a ⊕ e = a for all a ∈ Z. Thus, 3a = a and 3e = a for all a ∈ Z. Both of these statements are absurd. The first implies that 3 = 1, a contradiction, while the second implies that e = a/3 for all a, a contradiction. (All we would need is for one of these two statements to be false.)

A. PRELIMINARIES

155

Definition A.0.5. A subset S of Z is said to be closed under a given binary operation ⊕ (or with respect to ⊕) if for any two a, b ∈ S we have a ⊕ b ∈ S. Example A.0.8. Let S = {−1, 0, 1}. Is S closed under ordinary addition? We must test all possible sums: −1 + 0 = −1, −1 + 1 = 0, 0 + 1 = 1. So far, it looks like the values we get are always back in the set S. However, if we try 1 + 1 we get 2, a value not in S. Therefore S is not closed under addition. Is S closed under multiplication? This time the answer is yes. The product of any two numbers in S is back in S. Example A.0.9. Lets define an operation by a ⊕ b := 2a + b, for a, b ∈ Z. i) Is this a binary operation on Z? Yes, given any two integers a, b the output 2a + b is a uniquely defined integer. ii) Is this operation commutative? Note that a ⊕ b = 2a + b, but b ⊕ a = 2b + a. Thus a ⊕ b 6= b ⊕ a in general, for example 1 ⊕ 2 = 3 but 2 ⊕ 1 = 5. iii) Is the operation associative? a ⊕ (b ⊕ c) = a ⊕ (2b + c) = 2a + (2b + c) = 2a + 2b + c, whereas, (a ⊕ b) ⊕ c = (2a + b) ⊕ c = 2(2a + b) + c = 4a + 2b + c. Since 2a + 2b + c 6= 4a + 2b + c for a 6= 0 we see that associativity fails. iv) Is there an identity element? Suppose that e is an identity. Then e ⊕ a = a and a ⊕ e = a for all a ∈ Z. Thus 2e + a = a and 2a + e = a, that is, e = 0 and e = −a for all a ∈ Z. The latter condition clearly fails (e cannot equal −a for all integers a.) Therefore, there is no identity. v) Is the set of odd integers O closed under ⊕? Lets check. Let a, b be odd integers. Then a ⊕ b = 2a + b = even + odd = odd. Thus O is closed.

APPENDIX B

Proof of Additional Properties of Z In this section we will deduce the Additional Properties of Z listed in Chapter 0 from the axioms. We will provide examples of two styles of proofs. The first is “two-column” style, where the right column provides the justification for each step. The second is “text style”, where the proof is written in paragraph form with complete sentences following all the rules of grammar. In formal mathematical writing one always uses “text style”. B.0.1. Subtraction-Equality principle. For any integers x, y, x − y = 0 if and only if x = y. Proof. x − y = 0,

assumption

⇔ (x − y) + y = 0 + y,

addition is well defined

⇔ (x + (−y)) + y = 0 + y,

definition of subtraction

⇔ x + (−y + y) = 0 + y,

associative law

⇔ x + 0 = 0 + y,

additive inverse property

⇔ x = y,

0 is additive identity

Note that because the statement was an if and only if statement we needed left-right arrows at each step.  B.0.2. Cancelation Law for Addition. : Let a, x, y be integers such that a + x = a + y. Then x = y. Proof. a + x = a + y,

assumption



− a + (a + x) = −a + (a + y),



(−a + a) + x = (−a + a) + y,



0 + x = 0 + y,

addition is well defined associative law additive inverse property

⇒ x = y,

0 is additive identity 

Note B.0.3. i) The following is also a version of the cancelation law: If x + a = y + a then x = y. ii) Look at the axioms required to prove the cancelation law. Any algebraic system satisfying those same axioms will also satisfy the cancelation law. “Rings” and “Additive Groups” are both examples of such systems that we will visit this semester. 157

158

B. PROOF OF ADDITIONAL PROPERTIES OF Z

B.0.3. Every integer has a unique additive inverse. Proof. (We’ll do this one in text form.) By one of the axioms of Z, we know that every integer has an additive inverse, so our task here is to show that it is unique. Let a be a given integer. Suppose that b, c are additive inverses of a. Then a + b = 0 and a + c = 0. By the transitive law for equality, a + b = a + c. Thus by the cancelation law for addition (which we just proved), b = c.  B.0.4. Zero Multiplication Property. For any integer n, n · 0 = 0. Proof. Since 0 is linked with additive properties of Z and this theorem is a multiplicative property, we will need to make use of the one axiom linking addition and multiplication, the distributive law. Now start by writing 0 = 0 + 0. Then use substitution to say n · 0 = n · (0 + 0). We’ll leave the last couple steps for the reader to complete.  B.0.5. Properties of Negatives. For any integers a, b we have i) −(−a) = a. ii) (−1)a = −a. iii) (−a)b = −(ab) = a(−b). iv) (−a)(−b) = ab. Proof. i) Since a + (−a) = 0 = (−a) + a by the definition of additive inverse, we see that a is the additive inverse of −a, that is a = −(−a). ii) For this part our goal is to show that (−1)a is the additive inverse of a, that is, (−1)a + a = 0. Now, (−1)a + a = (−1)a + 1(a),

1 is the multiplicative identity

= (−1 + 1)a, = 0a,

distributive law property of additive inverses

= 0,

by zero mult property

iii) We have (−a)b = ((−1)a)b,

by part (ii)

= (−1)(ab),

by associativity

= −(ab),

by part (ii)

The second equality can be proven in the same manner. iv) We have (−a)(−b) = −(a(−b)),

by part (iii)

= −(−(ab)),

by part (iii)

= ab,

by part (i). 

B.0.6. Basic consequence of Trichotomy. Let a ∈ Z. If a > 0 then −a < 0, and if a < 0 then −a > 0. Proof. Suppose that a > 0 that is, a ∈ N. Then −a ∈ −N and so by definition −a < 0. Next, suppose that a < 0, that is, a ∈ −N. Then a = −c for some c ∈ N. Thus, by a property of negatives, −a = −(−c) = c ∈ N, and so −a > 0. 

B. PROOF OF ADDITIONAL PROPERTIES OF Z

159

B.0.7. Products of Positives and Negatives. i) If a > 0 and b < 0 then ab < 0. ii) If a < 0 and b < 0, then ab > 0. Proof. i) Suppose that a < 0 and b > 0. Then a = −c for some c > 0, by definition of 0, and thus by the preceding property, −(cb) < 0, that is, ab < 0. ii) Suppose that a < 0 and b < 0. Then a = −c, b = −d for some positive integers c, d. Thus ab = (−c)(−d) = cd by a property of negatives. By the Positivity Axiom, cd > 0, and thus ab > 0.  B.0.8. Zero divisor property or Integral domain property of Z. If a, b are integers with ab = 0, then a = 0 or b = 0. Proof. We’ll do a proof by contradiction. Suppose that ab = 0 but a 6= 0 and b 6= 0. Then by trichotomy either a is positive or a is negative, and the same for b. If a, b are both positive then by the Positivity Axiom ab is positive, a contradiction. If a is positive and b is negative then ab is negative by the preceding property, a contradiction. Finally if both a and b are negative, then ab is positive by the preceding property, a contradiction. Thus, in all cases we are led to a contradiction. Therefore a = 0 or b = 0.  B.0.9. Cancelation Law for Multiplication. If a, x, y are integers with ax = ay and a 6= 0, then x = y. Proof. Since we have only introduced integers at this point, we wish to prove this law without using fractions. Thus we cannot simply divide both sides by a or multiply both sides by 1/a. Instead, we will make use of the subtraction equality principle and the integral domain property of Z. Since ax = ay we have ax−ay = 0 by the subtraction equality principle. Next use the distributive law, the integral domain property of Z, and the subtraction equality principle again. The details are left for your homework.  Note B.0.4. Be careful in your use of the symbols = and ⇒ when writing a proof. Note, the equal symbol is used between objects (equal numbers, equal sets, equal functions, etc.), whereas the symbols ⇒ and ⇔ are used between statements (remember a statement is a sentence that can be assigned a truth value, true or false.) B.0.10. General Associative-Commutative Law. a) Addition: When adding a collection of n integers a1 + a2 + · · · + an , the numbers may be grouped in any way and added in any order. In particular, the sum a1 +a2 +· · ·+an is well defined, that is, no parentheses are necessary to specify the order of operations. b) Multiplication: When multiplying a collection of n integers a1 a2 · · · an , the numbers may be grouped in any way and multiplied in any order. In particular, the product a1 a2 · · · an is well defined, that is, no parentheses are necessary to specify the order of operations. Note B.0.5. A formal proof of these laws requires a sophisticated use of induction and will not be presented here. Instead, lets just gain some appreciation of what the laws say, since we make extensive use of them in practice.

160

B. PROOF OF ADDITIONAL PROPERTIES OF Z

What does a + b + c + d mean? Remember, addition is a binary operation, that is you can only add two integers at a time. There are many possible definitions, ((a+b)+c)+d, (a+(b+c))+d, (a+b)+(c+d), a+((b+c)+d), a+(b+(c+d)) and so on. The general associative law tells us that all of these expressions are equal, and thus there is no need to include the parentheses at all. For instance, we can see that the first two expressions in the list are plainly equal by one application of the associative law, (a + b) + c = a + (b + c). If we throw in the word “commutative”, then the general associative-commutative law tells us that we can also rearrange the order. Thus for example (d + b) + (a + c) would also equal a + b + c + d. A similar discussion holds for multiplication. We can really appreciate this law when working with rational numbers. For example, try calculating the following in 1 · 10 · 18 . What is the easiest way to do it? your head: 88 · 917 · 11 B.0.11. The General Distributive Laws. We will just look at the following one. For any integers a, b, c, d, (a + b)(c + d) = ac + ad + bc + bd. Proof. We have (a + b)(c + d) = (a + b)c + (a + b)d, = (ac + bc) + (ad + bd), = ac + ad + bc + bd,

distributive law distributive law general associative-commutative law 

B.0.12. Binomial Square Formula. For any positive integer n and integers a, b we have (a + b)2 = a2 + 2ab + b2 . We have (a + b)2 = (a + b)(a + b),

definition of square

2

2

2

2

= a + ba + ab + b ,

FOIL law

= a + ab + ab + b , 2

commutative law for mult 2

= a + (ab + ab) + b , 2

2

= a + 2ab + b ,

general associative law definition of 2 times a number.

We shall prove the general binomial expansion formula using induction in Section C.3.3. Note that this expansion formula depends on the commutative property of multiplication.

APPENDIX C

Discreteness Axioms for Z In this appendix we discuss the discreteness axioms for Z. These are the axioms that distinguish the integers from sets such as Q or R, which also satisfy all of the algebraic axioms (associative law, commutative law, distributive law, etc. ) These axioms imply that the integers are discrete objects. In particular, there is no integer between 0 and 1. More generally, for any distinct integers a, b we can say |a−b| ≥ 1. This is an important fact used in many proofs, such as showing that e is an irrational number. For dense (indiscrete) sets such as Q or R, between any two elements of the set there are infinitely many other elements of the set. There is no gap between one rational or real number and the next one, that is to say, the very concept of “next” (relative to the given ordering >) doesn’t exist. a) Well Ordering Property of N. Any nonempty subset of N has a smallest element. b) Axiom of Induction. Let S be a subset of N such that (i) 1 ∈ S and (ii) n ∈ S ⇒ n + 1 ∈ S. Then S = N. c) Maximum Element Principle. Any nonempty subset of integers bounded above has a maximum element. d) Minimum Element Principle. Any nonempty subset of integers bounded below has a minimum element. Other important consequences of the discreteness axioms are the following. 1) Minimality of 1. 1 is the smallest positive integer. 2) Natural Numbers are sums of 1’s. Every positive integer is a (finite) sum of 1’s. That is, N = {1, 2, 3, . . . }, where as usual, 2 := 1 + 1, 3 := 1 + 1 + 1, and so on. 3) Strong Form of Induction. Let S be a subset of N such that (i) 1 ∈ S and (ii) If {1, 2, . . . , n} ⊆ S then n + 1 ∈ S. Then S = N. The reader should check that all of these properties plainly fail if we replace the integers with the set of rational numbers or real numbers. C.1. Equivalence of the Discreteness Axioms In this section we outline a proof of the equivalence of the Discreteness Properties (a), (b), (c) and (d). The details are left as an exercise. First, lets observe that 1 is a positive integer, (otherwise the axiom of induction would be vacuous.) 161

162

C. DISCRETENESS AXIOMS FOR Z

Lemma C.1.1. 1 is a positive integer, that is, 1 ∈ N. Proof. We’ll leave the details as an exercise for the reader. Try a proof by contradiction and use the positivity axiom and the property that (−1)(−1) = 1.  Proof of c ⇔ d. Assume (c). Let S be a set of integers bounded below by m. Apply (d) to −S. Similarly for the other direction.  Proof of a ⇔ d. Assume (a). Let S be a subset of integers bounded below by m. Apply (a) to S − m + 1. The converse is trivial.  Proof of a ⇒ b. Assume (a). Let S be a subset of N such that (i) 1 ∈ S and (ii) n ∈ S ⇒ n + 1 ∈ S. Let T be the complement of S in N. If T is nonempty then what does (a) imply? Obtain a contradiction.  Proof of b ⇒ a. Assume (b). Let S be a nonempty subset of N. Suppose that S has no smallest element. Let T be the set of lower bounds for S in N. Note 1 ∈ T and that if x ∈ T then x + 1 ∈ T . Now use (b) to get a contradiction.  C.2. Proof of Additional Discreteness Properties (b).

Proof of b) ⇒ 6. Assume b). Let S be the set of all finite sums of 1’s. Apply 

(b).

Proof of b) ⇒ 5. Assume b). Let S be the set of natural numbers ≥ 1. Apply 

Proof of b) ⇒ 7. Assume b). Let S be a set of positive integers satisfying (i) 1 ∈ S and (ii) If {1, 2, . . . , n} ⊆ S then n+1 ∈ S. Our goal is to show that S = N. Let T be the set of natural numbers n such that {1, 2, . . . , n} ⊆ S. Then 1 ∈ T and, by assumption (ii), if n ∈ T then n+1 ∈ S. But then {1, 2, 3, . . . , n, n+1} ⊆ S, that is, n + 1 ∈ T . Therefore n ∈ T implies n + 1 ∈ T , and so by axiom (b), T = N. Since T ⊆ S, it follows that S = N.  C.3. Proof by Induction Induction is a valuable method for proving that a given statement is true for all natural numbers. Principle of Induction. Let P (n) be a statement involving a natural number n. Suppose that (i) P (1) is true. (Base Case.) (ii) If P (n) is true for a given n ∈ N then P (n + 1) is true. (Inductive Step.) Then P (n) is true for all n ∈ N. The assumption “P (n) is true for a given n ∈ N” is called the induction assumption. Note C.3.1. How would you respond to someone who objects to the Principle of Induction by saying “in the induction assumption you are assuming what you wish to prove”, thus invalidating your proof? (Note the subtle distinction. In the induction assumption, although n is arbitrary, we are only assuming P (n) is true for one value of n, not for all integers n.)

C.3. PROOF BY INDUCTION

163

C.3.1. Strong Form of Induction. A variation of induction that we will sometimes use is called the Strong Form of Induction given below. It has the advantage in that one is allowed to assume more in the induction assumption. It is used for example to prove the Fundamental Theorem of Arithmetic. Strong Form of Induction. Let P (n) be a statement involving a natural number n. Suppose that (i) P (1) is true. (Base Case.) (ii) If P (k) is true for all k < n, for a given n ∈ N, then P (n) is true. Then P (n) is true for all n ∈ N. C.3.2. Examples of Induction Proofs. Example C.3.1. Prove that for any positive integer n, 1 3 + 2 3 + · · · + n3 =

(C.1)

n2 (n + 1)2 . 4 2

2

Proof. Proof by induction. For n = 1 we have 13 = 1 4·2 , a true statement. Suppose that statement (C.1) is true for a given n. Then for n + 1 we have 13 + 23 + · · · + n3 + (n + 1)3 = (13 + 23 + · · · + n3 ) + (n + 1)3 =

n2 (n + 1)2 + (n + 1)3 , 4

by induction assumption (C.1).

(Lets interrupt the proof with a little motivation. In your formal write-up you do not need to include these comments. Our goal is to establish the truth of (C.1) for n + 1, that is, we are hoping to get (n + 1)2 (n + 2)2 /4. Since this expression is in factored form, we proceed by factoring, rather than expanding.) (n + 1)2 2 [n + 4(n + 1)], 4 (n + 1)2 (n + 1)2 ((n + 1) + 1)2 (n + 1)2 2 [n + 4n + 4] = [n + 2]2 = . = 4 4 4 =

Thus (C.1) holds for n + 1. At this point, there are two ways to conclude the induction proof. You can either say “Thus, by the Principle of Induction, the statement is true for all n ∈ N”, or you can simply write “QED”, which stands for the Latin expression “quod erat demonstrandum” meaning literally “what was to be demonstrated”, but is more liberally taken to mean “thus we have established what we wished to prove”. In this example you should also try restating everything in sigma notation. The 2 2 Pn statement in this notation would read k=1 k 3 = n (n+1) for any n ∈ N. 4  Example C.3.2. n3 − n is a multiple of 3 for any positive integer n. Proof. Proof by induction. For n = 1 we note that 13 − 1 = 0 = 0 · 3, a multiple of 3. Suppose that the statement is true for a given n, that is, n3 − n = 3k

164

C. DISCRETENESS AXIOMS FOR Z

for some k ∈ Z. Then for n + 1 we have (n + 1)3 − (n + 1) = n3 + 3n2 + 3n + 1 − n − 1 = (n3 − n) + 3n2 + 3n = 3k + 3n2 + 3n,

by induction assumption,

2

= 3(k + n + n) = 3 · integer, since the integers are closed under addition and multiplication. QED.



Example C.3.3. 6n − 1 is a multiple of 5 for any positive integer n. Proof. Proof by induction. For n = 1, 6n − 1 = 6 − 1 = 5, a multiple of 5. Suppose that the statement is true for a given n, that is, 6n − 1 = 5k for some integer k. Then for n + 1 we have, 6n+1 − 1 = 6n · 6 − 1 = (5k + 1)6 − 1, by the induction hypothesis. Then, using the distributive law we see that 6n+1 − 1 = 30k + 6 − 1 = 30k + 5 = 5(6k + 1), a multiple of 5, since 6k + 1 is an integer. Thus the statement is true for n + 1. QED.  Exercise C.3.1. The word induction is connected to the concept of “inductive reasoning”, a type of reasoning where one looks at data and tries to find a pattern or rule governing the data. Try the following example. Look at the sum of the first n odd numbers for n = 1, 2, 3, 4, 5: 1=1, 1+3=4, 1+3+5=9, 1+3+5+7=16, 1+3+5+7+9=25. What is the pattern? Write down a conjecture for what you think 1 + 3 + 5 + · · · + (2n − 1) equals in general, and then prove it by induction. Example C.3.4. The Fibonacci sequence {Fn } = 1, 1, 2, 3, 5, 8, 13, . . . , is governed by the rule Fn+1 = Fn + Fn−1 for n ≥ 2, and the initial values F1 = F2 = 1. Prove that F1 + F3 + · · · + F2k−1 = F2k ,

(C.2) for any k ∈ N.

Proof. Proof by induction on k. For k = 1 we have F1 = 1 = F2 , so the statement is true. Suppose that the statement (C.2) is true for a given k. Then for k + 1 we have F1 + F3 + · · · + F2k−1 + F2k+1 = (F1 + F3 + · · · + F2k−1 ) + F2k+1 = F2k + F2k+1 ,

by the induction hypothesis,

= F2k+2 = F2(k+1) , by the defining property of the Fibonacci sequence. QED.



C.3.3. Property 11. Binomial Expansion Formula. For any positive integer n and integers a, b we have (C.3)       n   X n k n−k n n−1 n n−2 2 n n n (a+b) = a b =a + a b+ a b +· · ·+ abn−1 +bn . k 1 2 n−1 k=0

C.3. PROOF BY INDUCTION

165

Proof. The proof is by induction on n. For n = 1 the statement is trivial, (a + b)1 = a + b. Suppose the statement is true for a given n. Then for n + 1 we have n   X n k n−k (a + b)n+1 = (a + b)(a + b)n = (a + b) a b k k=0 n   n   X n k+1 n−k X n k n+1−k = a b + a b k k k=0 k=0     n−1   n   n n+1 X n k+1 n−k n n+1 X n k n+1−k = a + a b + b + a b n k 0 k k=0 k=1  n  n   X X n n l n+1−l n+1 n+1 l n+1−l =a +b + ab + ab l−1 l l=1 l=1    n  X n n = an+1 + bn+1 + + al bn+1−l l−1 l l=1   n  n+1  X n + 1 l n+1−l X n + 1 l n+1−l n+1 n+1 =a +b + ab = ab , l l l=1

l=0



APPENDIX D

Review of Groups, Rings and Fields D.1. Definition of a Ring Definition D.1.1. A ring is a set R with two binary operations, addition +, and multiplication ·, satisfying the following properties. (1) R is closed under addition and multiplication. (This is actually implicit in the definition of a binary operation). (2) Associative law holds for both addition and multiplication: (a + b) + c = a + (b + c),

(ab)c = a(bc),

for all a, b ∈ R. (3) Commutative law holds for addition: a + b = b + a for all a, b ∈ R. (4) Distributive laws hold: a(b+c) = ab+ac, (a+b)c = ac+bc, for all a, b, c ∈ R. (5) R has a zero element 0 satisfying a + 0 = a = 0 + a for all a ∈ R. (6) Every element a ∈ R has an additive inverse −a ∈ R satisfying a + (−a) = 0 = (−a) + a. If R is a ring with commutative multiplication then R is called a commutative ring. If R is a ring with unity element 1 then R is called a ring with unity. (We require 1 6= 0, so that R 6= {0}.) 1 is also called the multiplicative identity. Example D.1.1. Z, R, Q, Zm are all rings. Here, Zm = {0, . . . , m − 1} is the ring of integers (mod m). Exercise D.1.1. a) Is the set of even integers E := {2n : n ∈ Z}, a ring? Answer: Yes, it is a commutative ring with no unity element. b) Is the set of odd integers O := {2n + 1 : n ∈ Z}, a ring? Answer: No, it has no zero element and is not closed under addition. Definition D.1.2. Let R be a given ring. A subset S of R is called a subring of R if S is a ring under the same two binary operations. Note D.1.1. To show that a subset S is a subring it suffices to verify that S is closed under + and ·, has a zero element, and the additive inverse of each element of S is in S. All other properties are inherited. Example D.1.2. E is a subring of Z. Z is a subring of Q. Q is a subring of R. Exercise D.1.2. Show that the subrings of Z are of the form nZ := {nx : x ∈ Z}, with n a fixed integer. Hint: Let S be a nonzero subring of Z. First show, by induction, that for any x ∈ S and q ∈ N, qx ∈ S and −q ∈ S. Now let n be the smallest positive element of S. Show that S = nZ. 167

168

D. REVIEW OF GROUPS, RINGS AND FIELDS

Exercise D.1.3. For any d|m we define dZm = {0, d, 2d, . . . , ( m d − 1)d}. Show that every subring of Zm is of the form dZm with d|m. (Proof is similar to previous problem.) b) Find all subrings of Z12 . D.2. Basic properties of Rings In the following we repeat the list of further properties of Z given in Chapter 0. Some of these properties hold true for an arbitrary ring R, and some require R to satisfy further properties. Here, a, b, x, y represent arbitrary elements of a ring R. We start with a list of those properties that are valid for any ring R. The proofs are identical to the proofs given for Z. D.2.1. Properties Valid in any Ring R. 1] Subtraction-Equality principle. x = y if and only if x − y = 0. 2] Cancelation law for addition: If a + x = a + y then x = y. 3] Additive inverses are unique, that is, if a, b, c ∈ R are such that a + b = 0 and a + c = 0 then b = c. 4] Zero multiplication property: a · 0 = 0 for any a ∈ R. 5] Properties of negatives: (−a)b = −(ab) = a(−b), (−a)(−b) = ab, (−1)a = −a. 10a] General Associative-Commutative Law for Addition: When adding a collection of n elements of R, a1 + a2 + · · · + an , the elements may be grouped in any way and added in any order. In particular, the sum a1 + a2 + · · · + an is well defined, that is, no parentheses are necessary to specify the order of operations. 11] General Distributive Laws such as the FOIL law: For any a, b, c, d ∈ R, (a + b)(c + d) = ac + ad + bc + bd. D.2.2. Properties Valid in any Commutative Ring. 10b] General Associative-Commutative Law for Multiplication: When multiplying a collection of n elements of R, a1 a2 · · · an , the values may be grouped in any way and multiplied in any order. In particular, the product a1 a2 · · · an is well defined, that is, no parentheses are necessary to specify the order of operations. 12] Binomial Expansion:  For any a, b ∈ R and positive integer n we have Pn (a + b)n = k=0 nk ak bn−k = an + n1 an−1 b + n2 an−2 b2 + · · · + bn . In particular, (a + b)2 = a2 + 2ab + b2 (a + b)3 = a3 + 3a2 b + 3ab2 + b3 . D.2.3. Properties Valid in any Integral Domain. 8] Zero divisor property, or integral domain property: If ab = 0 then a = 0 or b = 0. 9] Cancelation law for multiplication: If ax = ay and a 6= 0 then x = y. D.3. Units and Zero Divisors Definition D.3.1. A nonzero element a ∈ R (where R is a ring) is called a zero divisor if ab = 0 or ba = 0 for some nonzero b ∈ R.

D.5. POLYNOMIAL RINGS

169

Example D.3.1. 3 is a zero divisor in Z6 since 3 · 2 = 0 in Z6 . Definition D.3.2. Let R be a ring with unity. An element a ∈ R is called a unit if a has a multiplicative inverse in R, that is, ab = 1 = ba for some b ∈ R. Problem D.3.1. Find all units in Z, Q, Z6 Exercise D.3.1. Show that if a is a unit in a ring R, then a is not a zero divisor. Exercise D.3.2. Find all zero divisors and units in Z9 . Answer: The zero divisors are 3, 6. The units are 1, 2, 4, 5, 7, 8. Theorem D.3.1. Let a ∈ Zm , a 6= 0. i) a is a unit if and only if gcd(a, m) = 1. ii) a is a zero divisor if and only if gcd(a, m) > 1. We’ll leave the proof as an exercise. In particular, every nonzero element of Zm is either a unit or a zero divisor. This is not the case in general for a ring. Exercise D.3.3. Give an example of a nonzero element of a ring that is neither a unit nor a zero divisor. Answer: In Z any element other than 0, 1 or -1 will work. D.4. Integral Domains and Fields Definition D.4.1. An integral domain is a commutative ring with unity having no zero divisors, that is, if ab = 0 then either a = 0 or b = 0. Example D.4.1. Z, Q, R are integral domains. Theorem D.4.1. Zm is an integral domain if and only if m is a prime. Exercise D.4.1. Notice the difference between solving an equation in an integral domain and solving an equation in a non integral domain. a) Let R be an integral domain. Find all x ∈ R such that x2 − 3x + 2 = 0. Justify! b) Solve the equation (x − 1)2 = 0 in Z8 . Definition D.4.2. A ring R is called a field if i) R has a unity element, (ii) R is a commutative ring, and (iii) Every nonzero element of R is a unit. Example D.4.2. Standard examples of fields: Q, R, C, Zp where p is a prime. Also, F (x) the set of all rational functions p(x)/q(x) with coefficients in a given field F . D.5. Polynomial Rings Definition D.5.1. Let R be given ring. The polynomial ring in the variable x over R is the set of all polynomials in x with coefficients in R, R[x] = {an xn + · · · + a0 : ai ∈ R, 0 ≤ i ≤ n, n ≥ 0}, together with the standard addition and multiplication laws for polynomials. Note D.5.1. i) If R is ring with unity then so is R[x]. ii) If R is commutative then so is R[x].

170

D. REVIEW OF GROUPS, RINGS AND FIELDS

Note D.5.2. i) If p(x) = an xn + an−1 xn−1 + · · · + a0 , with an 6= 0, then the degree of p(x) is n, the leading term of p(x) is an xn and the leading coefficient of p(x) is an . ii) If p(x) = an xn + · · · + a0 , q(x) = bm xm + · · · + b0 , with a0 6= 0, b0 6= 0, then p(x)q(x) = an bm xm+n + · · · + a0 b0 . Note that if an , bm are not zero divisors then an bm 6= 0 and so the degree of p(x)q(x) is m + n. We immediately deduce from the last note the following theorem. Theorem D.5.1. If R is an integral domain, then so is R[x]. Definition D.5.2. Let F be a field, and F [x] be the set of polynomials with coefficients in F . a) If f (x) ∈ F [x] we call f (x) a polynomial “over” F . b) The zero polynomial is the polynomial f (x) = 0 (with all coefficients equal to zero). c) Say f (x) = an xn + · · · + a0 with an 6= 0. Then an is the leading coefficient of f (x), an xn is the leading term, and n is the degree of f (x). d) f (x) is called monic if an = 1. Definition D.5.3. Let F be a field. a) A polynomial f (x) over F is called reducible over F if f (x) = g(x)h(x) for some nonconstant polynomials g(x), h(x). In particular 1 ≤ deg(g), deg(h) < deg(f ). b) A polynomial f (x) over F is a called irreducible over F if deg(f ) ≥ 1 and f (x) is not reducible. Note D.5.3. There are four types of polynomials in F [x]: 1) Zero, 2) Nonzero constant polynomials (these are the units), 3)Reducibles, and 4) Irreducibles. Note analogy with Z. Definition D.5.4. Let f (x), g(x) ∈ F [x]. We say that f (x) divides g(x) in F [x], written f (x)|g(x) if f (x)h(x) = g(x) for some h(x) ∈ F [x]. f (x) is called a factor or divisor of g(x), etc. (same language as in Z.). Theorem D.5.2. Division Algorithm for Polynomials. Let F be a field and f (x), g(x) ∈ F [x] with g(x) 6= 0. Then there exist polynomials q(x), r(x) over F such that f (x) = q(x)g(x) + r(x),

with r(x) = 0 or deg(r(x)) < deg(g(x)).

The polynomial q(x) is called the quotient and r(x) the remainder. Proof. Let g(x) = bm xm + · · · + b0 be a fixed polynomial over F with bm 6= 0. We will prove that the theorem is true for all f (x) over F by the strong form of induction on the degree of f (x). Suppose first that f (x) = a0 , a constant polynomial. If g(x) = b0 another constant polynomial, then we let q(x) = a0 b−1 0 , r(x) = 0. If g(x) has positive degree, then we let q(x) = 0, r(x) = a0 . Thus the base case has been established. Suppose now that the theorem is true for all polynomials of degree less than n, and let f (x) be a polynomial of degree n. Let f (x) = an xn + · · · + a0 . Our goal is to compute f (x) ÷ g(x) by the method of long division. Case i: Suppose that n < m. Then f (x) = 0 · g(x) + f (x), and so we can simply take q(x) = 0 and r(x) = f (x) to satisfy the conclusion of the theorem. Case ii: Suppose that n ≥ m. Then we

D.5. POLYNOMIAL RINGS

171

proceed following the method of long division. The first step is to multiply g(x) by an appropriate monomial so that the leading coefficient matches the leading n−m coefficient of f (x). Thus we calculate an b−1 g(x) and observe that its leading m x n −1 term is an x (note bm exists since F is a field.) Subtracting this from f (x) gives n−m the polynomial h(x) := f (x) − an b−1 g(x) of degree strictly less than n. Thus m x by the induction hypothesis h(x) = q1 (x)g(x) + r1 (x) for some q1 (x), r1 (x) over F with deg r1 (x) < m. Thus n−m n−m f (x) = an b−1 g(x) + h(x) = an b−1 g(x) + q1 (x)g(x) + r1 (x) m x m x n−m = (an b−1 + q1 (x))g(x) + r1 (x), m x n−m and so we can take q(x) = an b−1 +q1 (x), r(x) = r1 (x) to satisfy the conditions m x of the theorem. 

Definition D.5.5. Let f (x) ∈ F [x]. An element a ∈ F is called a zero or root of f if f (a) = 0. Theorem D.5.3. Factor Theorem for Polynomials. Let F be a field, f (x) ∈ F [x], a ∈ F . Then, a is a zero of f (x) if and only if (x − a) is a factor of f (x). Proof. If (x − a) is a factor then a is trivially a zero. Converse. Suppose that a is a zero of f (x). By the division algorithm, f (x) = q(x)(x − a) + r(x) for some q(x), r(x) with either r(x) = 0 or deg(r(x)) < 1. In either case r(x) is a constant, say r(x) = c with c ∈ F , and we have f (x) = q(x)(x − a) + c. Thus 0 = f (a) = q(a)0 + c = c and so c = 0 and f (x) = q(x)(x − a).  Note the analogy between Z and F [x]: 1)The four types of objects: primes, composites, units, zero. 2) The concept of factors. 3) The concept of greatest common divisor. 4) Division algorithm. 5) Euclidean Algorithm. 6) GCDLC theorem. 7) Euclid’s Lemma. 8) If p is a prime and p|ab then p|a or p|b. 9) Unique Factorization Theorem. Lets state item 8)for polynomials as a lemma without proof, and deduce the Unique Factorization Theorem. Lemma D.5.1. Let p(x) be an irreducible polynomials such that p(x)|f (x)g(x). Then p(x)|f (x) or p(x)|g(x). Theorem D.5.4. Unique Factorization Theorem for F [x]: Let F be a field and f (x) be a polynomial over F of degree ≥ 1. Then f (x) can be expressed as a product of irreducible polynomials over F and this factorization is unique up to the order of the factors and unit multiples. Proof. This is discussed in Chapter 2. Since we have a division algorithm, F [x] is a Euclidean domain, whence it is a Unique Factorization Domain. Existence: We will use the strong form induction on the degree of f (x). If f (x) is of degree 1, then it is irreducible, and so we are done. Suppose now that any polynomial of degree less than n can be expressed as a product of irreducibles. Let f (x) be of degree n. If f (x) is irreducible we are done. Otherwise f (x) = g(x)h(x)

172

D. REVIEW OF GROUPS, RINGS AND FIELDS

for some g(x), h(x) over F of smaller degrees than f (x). Thus, by the induction assumption, g(x) = p1 (x) · · · pk (x) and h(x) = q1 (x) · · · ql (x) for some irreducibles pi (x), qj (x). Thus f (x) = p1 (x) · · · ql (x), a product of irreducibles. QED Uniqueness: Suppose that f (x) has two factorizations f (x) = p1 (x) · · · pk (x) = q1 (x) · · · ql (x), for some irreducibles pi (x), qj (x), with k ≤ l. Then p1 (x)|q1 (x) · · · ql (x) and so p1 (x)|qj1 (x) for some j1 , 1 ≤ j1 ≤ l. Since p1 (x) and qj1 (x) are both irreducible, we must have p1 (x) = c1 qj1 (x) for some constant c1 . By cancelation, we then get c1 p2 (x) · · · pk (x) = q1 (x) · · · qˆj1 (x) · · · ql (x), where qˆj1 (x) indicates that this term is missing. (Note that the cancelation law holds since F [x] is an integral domain.) The process can be repeated with p2 (x), p3 (x) . . . in turn. If k < l we are left with an equation having a constant on the left-hand side and a polynomial of positive degree on the right, a contradiction. Therefore, k = l and the k factors on the left, p1 (x), . . . , pk (x), are a permutation of the k factors on the right, q1 (x), . . . , qk (x), up to constant multiples.  Example D.5.1. What do we mean by unique up to order and unit multiples. Suppose that we wish to factor x2 − 3x + 2 over R. There are many ways of doing this. 1 2 x2 −3x+2 = (x−1)(x−2) = (x−2)(x−1) = (1−x)(2−x) = .. = (7x−7)( x− ). 7 7 All of these are regarded as the same factorization. Definition D.5.6. Let F be a field and f (x) ∈ F [x]. A zero a of f (x) is said to have multiplicity m if (x − a)m |f (x), but (x − a)m+1 - f (x). Example D.5.2. Suppose that f (x) = (x + 1)3 (x − 2)4 (x2 + 1). Over R f (x) has a zero at -1 of multiplicity 3 and zero at 2 of multiplicity 4. Over C it has additional zeros at ±i each of multiplicity 1. Theorem D.5.5. Number of zeros of a polynomial. Let F be any field, f (x) ∈ F [x] of degree n. Then the total number of zeros of f (x) in F counted with multiplicity is at most n. Proof. The proof is by the strong form induction on the degree n of f (x). For n = 1 it is trivial that any polynomial of degree 1 has a single zero in F of multiplicity 1, and so the result is true. Suppose now that the theorem is true for all polynomials of degree less than n, and let f (x) be a polynomial of degree n. If f (x) has no zero in F we are done. Otherwise, suppose that a is a zero in F of multiplicity m. Then f (x) = (x − a)m g(x) for some g(x) ∈ F [x] of degree n − m. By the induction assumption, g(x) has at most n − m zeros (counted with multiplicity). Now every zero of f (x) must either be a zero of g(x), or equal to a. Thus f (x) has at most m + (n − m) = n zeros.  D.6. Ring homomorphisms and Ideals Definition D.6.1. A subset I of a ring R is called an ideal if (i) I is an additive subgroup of R and (ii) I is closed under multiplication by elements of R: For any a ∈ I, r ∈ R we have ar ∈ I and ra ∈ I.

D.6. RING HOMOMORPHISMS AND IDEALS

173

Note D.6.1. (i) If I is just closed under right multiplication it is called a right ideal. Similarly for left ideal. (ii) If R is a ring with unity, then a subset I is an ideal iff I is closed under addition and closed under multiplication by elements of R. Example D.6.1. Suppose that R is a commutative ring. For any a ∈ R, the set (a) := {ax : x ∈ R} is an ideal, called the principal ideal generated by a. Definition D.6.2. An integral domain is called a principal ideal domain (PID) if every ideal is a principal ideal. Theorem D.6.1. Z is a PID. Proof. Let I be an ideal in Z. If I = {0} we are done. Otherwise I contains a positive element (why?). Let m be the minimal positive element. Then, by the division algorithm one can show I = (m). The details are left to the reader.  Theorem D.6.2. For any field F , F [x] is a PID. Proof. Follows from the fact that F [x] is a Euclidean Domain.



Definition D.6.3. Let I be an ideal in a ring R. The quotient ring R/I is the set of additive cosets of I, R/I := {a + I : a ∈ R}, with addition and multiplication defined by (a + I) + (b + I) := (a + b) + I,

(a + I)(b + I) := ab + I,

for a, b ∈ R. Note D.6.2. (i) R/I as an additive group is the same as our earlier definition. (ii) Addition and multiplication are well defined. Verify. (iii) R/I is in fact a ring. Verify. (iv) If R is a ring with unity 1, then so is R/I, with unity 1 + I. Example D.6.2. Z/(m) = Zm for any positive integer m. Definition D.6.4. i) A mapping φ : R → S between two rings R, S is called a ring homomorphism if φ(ab) = φ(a)φ(b) and φ(a + b) = φ(a) + φ(b) for all a, b ∈ R. ii) If in addition φ is 1-to-1 then it is called an isomorphism of R into S. iii) If φ is both 1-to-1 and onto then it is an isomorphism between R and S. iv) We say two rings R, S are isomorphic if there exists an isomorphism between them. Exercise D.6.1. (i) If φ is a ring homomorphism then φ(0) = 0, and φ(−a) = −φ(a). (ii) Suppose R, S are rings with unity 1. If S is an integral domain or φ is onto, then φ(1) = 1. (This need not hold in general.) (iii) Suppose that R, S are rings with unity, φ : R → S is a homomorphism, and φ(1) = 1. Show that if a is a unit in R, then φ(a) is a unit in S. (iv) If a is a zero divisor in R does it follow that φ(a) is a zero divisor in S? Definition D.6.5. If φ : R → S is a ring homomorphism then the kernel of φ, denoted ker(φ) is given by ker(φ) := {x ∈ R : φ(x) = 0}.

174

D. REVIEW OF GROUPS, RINGS AND FIELDS

Theorem D.6.3. If φ : R → S is a ring homomorphism, then ker(φ) is an ideal in R. Proof. Let a, b ∈ ker(φ). Then φ(a + b) = φ(a) + φ(b) = 0 + 0 = 0 and so a + b ∈ ker(φ). Also φ(−a) = −φ(a) = 0, so −a ∈ ker(φ), and φ(0) = 0, so 0 ∈ ker(φ). Thus ker(φ) is a subgroup of R. Next, for any r ∈ R, φ(ra) = φ(r)φ(a) = φ(r) · 0 = 0, so ra ∈ ker(φ). Likewise, ar ∈ ker(φ).  Theorem D.6.4. Let φ : R → S be a ring homomorphism. Then φ is an isomorphism if and only if ker(φ) = {0}. Proof. Suppose that ker(φ) = {0}, and that a, b ∈ R with φ(a) = φ(b). Then φ(a − b) = φ(a) − φ(b) = 0, and so a − b ∈ ker(φ). Thus a − b = 0, that is a = b. Conversely, suppose that φ is one to one, and that a ∈ ker(φ). Then φ(a) = φ(0) = 0 and so a = 0. Therefore ker(φ) = {0}.  Note D.6.3. If φ : F → S is a nonzero ring homomorphism and F is a field, then φ is an isomorphism. Why? Theorem D.6.5. First Isomorphism Theorem. Let φ : R → S be a ring homomorphism. Then R/ ker(φ) ' φ(R). Proof. Let I = ker(φ). We define a mapping φ : R/I → φ(R), by setting φ(a + I) = φ(a). It is well defined since if a + I = b + I then a − b ∈ I and so φ(a − b) = 0, whence φ(a) − φ(b) = 0, that is, φ(a) = φ(b). It is trivial to check that φ is a ring homomorphism. Finally, to show φ is one to one, by the preceding theorem it suffices to show that its kernel is trivial. We have a + I ∈ ker(φ) if and only if φ(a + I) = 0, that is, φ(a) = 0, that is, a ∈ I, or equivalently a + I = 0, the zero element in R/I.  D.7. Group Theory

Definition D.7.1. A group is a set G with binary operation ∗ such that i) G is closed under ∗, that is for any x, y ∈ G, x ∗ y ∈ G. ii) ∗ is associative: For any x, y, z ∈ G, (x ∗ y) ∗ z = x ∗ (y ∗ z). iii) G has an identity element e satisfying x ∗ e = e ∗ x = x for all x ∈ G. iv) Inverses exist: For any element x ∈ G there is an element y ∈ G such that x ∗ y = y ∗ x = e. We write y = x−1 in this case. If in addition v) ∗ is commutative, then G is called an abelian group. Notation: 1. (G, ∗) denotes a group G with binary operation ∗. 2. If + is used, generally 0 is used to denote the identity element, and −a the inverse of a. 3. If · is used, 1 is commonly used to denote the identity element, and a−1 the inverse. Example D.7.1. The following are standard examples of groups. 1. (Zm , +), (Z, +), (R, +) where R is any ring. 2. (Um , ·), where Um is the set of units in Zm . 3. (F ∗ , ·) where F is any field and F ∗ = F − {0}.

D.7. GROUP THEORY

175

Exercise D.7.1. Verify the following properties of a group. a) The identity element of a group is unique. b) Every element of a group has a unique inverse. c) For any x, y ∈ G, (xy)−1 = y −1 x−1 . Explain why x−1 y −1 fails to be the inverse in general. d) Cancelation law for a group G: If a, x, y ∈ G with ax = ay, then x = y. Definition D.7.2. A subset H of a group (G, ∗) is called a subgroup of G, denoted H < G if H is a group with respect to the binary operation ∗. Note D.7.1. To show a subset of a given group is a subgroup it suffices to check properties (i), (iii) and (iv). Associativity is inherited. Exercise D.7.2. If H is a finite subset of a group (G, ∗), closed under ∗, then H is a subgroup of G. Exercise D.7.3. Find all subgroups of (Z6 , +). Definition D.7.3. Let (G, ∗) be a group and a ∈ G. a) For any n ∈ N, an = a ∗ a . . . a, n-times and a−n = (an )−1 = a−1 . . . a−1 . For n = 0 we define a0 = e. b) < a > denotes the subgroup of G generated by a: < a >= {an : n ∈ Z}. Note D.7.2. i) The set < a > is in fact a subgroup of G. ii) If the symbol + is used for the binary operation, then < a >= {na : n ∈ Z}. Exercise D.7.4. a) In (Z6 , +), find < 1 >, < 2 >. b) In (U5 , ·), find < 1 >, < 2 >, and < 3 >. Definition D.7.4. a) The order of a group G is the number of elements in G, denoted |G|; also called the cardinality of G. b) The order of an element a of a group G, denoted ord(a) is the smallest positive integer n such that an = e, (if such an n exists.). If no such n exists, a is said to have infinite order. Note D.7.3. In additive notation the definition reads: If (G, +) is a group and a ∈ G then the order of a is the smallest positive integer n such that na = 0. Example D.7.2. In (U5 , ·), find ord(2). In Z find ord(2). Theorem D.7.1. If G is a group and a ∈ G, then ord(a) = | < a > |. Definition D.7.5. G is called a cyclic group if G =< a > for some a ∈ G. a is called a generator of G. Example D.7.3. a) Cyclic groups of order 4: (U5 , ·), (U8 , ·), (Z4 , +), < i >= {1, i, −1, −i} in C. b) Cyclic groups of order 6: (U7 , ·), (U9 , ·), (Z6 , +), < ω > in C, where ω = e2πi/6 . Lemma D.7.1. If G =< a > is a cyclic group of order n, and H is a subgroup of G, then H =< am > for some integer m with m|n. Proof. If H =< e > we are done, so assume H contains an element other than e. Let S := {k ∈ Z : ak ∈ H}. Let m be the smallest positive element of S. In particular am ∈ H. We claim that S = Zm (all multiples of m), and consequently

176

D. REVIEW OF GROUPS, RINGS AND FIELDS

H =< am >. Indeed, if k ∈ S then k = qm + r for some q, r with 0 ≤ r < m. Then ar = ak (am )(−q) ∈ H. But this implies r ∈ S. Since r < m we must have r = 0 and so k = qm. Since n ∈ S (because an = e ∈ H) we must have m|n.  Theorem D.7.2. Subgroups of Cyclic Groups: Let Cn =< a > be a cyclic group of order n (under multiplication). Then Cn is an abelian group and (i) For any positive divisor d of n, there is a unique subgroup of order d given by Cd =< an/d >. (For an additive group we would have Cd =< nd a >.) (ii) Every subgroup of Cn is of the type given in part (i). Definition D.7.6. If G, H are groups, the cartesian product G × H can be made into a group by defining (a, b) · (a0 , b0 ) = (aa0 , bb0 ) for a, a0 ∈ G, b, b0 ∈ H. Example D.7.4. Example of a noncyclic group. Klein 4-group: K4 = Z2 × Z2 , under addition. Exercise D.7.5. Show that Cm × Cn is cyclic if and only if gcd(m, n) = 1. D.8. Lagrange’s Theorem Definition D.8.1. Let (G, ·) be a group and H be a subgroup of G. A right coset of H is a set of the form Ha := {ha : h ∈ H}, with a a fixed element of G. (Similar definition for left coset.) In additive notation, if (G, +) is an additive group, then a right coset is denoted H + a := {h + a : h ∈ H}. We will just work with right cosets and so will drop the word “right” and just call them cosets. Exercise D.8.1. 1) Show that for any two cosets Ha, Hb, we either have Ha = Hb or Ha ∩ Hb = ∅. 2) If H < G then G can be expressed as a disjoint union of cosets of H. Definition D.8.2. A subgroup H of a group G is said to be of finite index in G if G is a finite union of distinct cosets of H, G = Ha1 ∪ Ha2 ∪ · · · ∪ Hak . for some ai ∈ G, 1 ≤ i ≤ k. In this case, we say k is the index of H in G and write k = [G : H]. Example D.8.1. Let H =< m >= Zm in (Z, +). Then the cosets of H are just residue classes (mod m), that is, H + a = [a]m and we have Z = [0]m ∪ [1]m ∪ · · · ∪ [m − 1]m . Thus [Z : H] = m. Theorem D.8.1. Lagrange’s Theorem: If G is a finite group and H is a subgroup of G then |H| is a divisor of |G|. Indeed, we have |G| = |H|[G : H]. Proof. Since G is finite, H has finite index, say k = [G : H]. Then G is a disjoint union of k cosets of H. Since |Ha| = |H| for any coset Ha we conclude that |G| = k|H|. 

D.9. NORMAL SUBGROUPS AND GROUP HOMOMORPHISMS

177

Theorem D.8.2. Order of Element Theorem. If G is a finite group of order n and a ∈ G then ord(a)|n. Proof. We simply apply Lagrange’s Theorem to the subgroup H =< a >.  Exercise D.8.2. 1. Deduce Fermat’s Little Theorem by applying the theorem to Up the group of units (mod p). 2. Deduce Euler’s Theorem by applying the theorem to Um . Theorem D.8.3. If G is a group of order p, where p is a prime, then G is a cyclic group. Proof. Let a 6= e ∈ G. By the Order of Element Theorem, ord(a)|p and so ord(a) = p. Therefore G =< a >.  D.9. Normal Subgroups and Group Homomorphisms Definition D.9.1. A subgroup H of a group G is called a normal subgroup if gHg −1 = H, that is, ghg −1 ∈ H for any h ∈ H, g ∈ G. Note D.9.1. If G is abelian then any subgroup of G is normal. Definition D.9.2. Suppose that H is a normal subgroup. Let G/H denote the set of cosets of H in G. We turn this set into a group, called the quotient group or factor group of G by H, by defining multiplication as follows: For Ha, Hb ∈ G/H, we define (Ha)(Hb) = Hab. Problem D.9.1. a) Show that multiplication is well defined on G/H and that H is the identity element. b) Note that normality is essential for multiplication to be well defined. c) Note that if G = (Z, +) and H = Zm, then G/H = (Zm , +). Theorem D.9.1. If H < G and [G : H] = 2, then H is a normal subgroup of G. Proof. If x, y ∈ / H then Hyx 6= Hx and so Hyx = H, that is, yx ∈ H. Thus, for x ∈ / H, h ∈ H, we have hx−1 ∈ / H, and so x(hx−1 ) ∈ H.  Definition D.9.3. Let H, G be groups and φ : G → H. a) φ is called a homomorphism if φ(ab) = φ(a)φ(b) for all a, b ∈ G. b) If in addition φ is 1-to-1 then φ is called an isomorphism of G into H. c) If φ is 1-to-1 and onto, then φ is called an isomorphism of G onto H, or an isomorphism between G and H. d) If there exists an isomorphism between G and H, we say that G and H are isomorphic groups, written G ' H. e) An automorphism of G is an isomorphism of G onto G. Exercise D.9.1. 1. Let φ(x) = log(x). Show that φ is an isomorphism of (R+ , ·) onto (R, +). 2. If φ is a homomorphism then φ(e) = f , where e, f are the identities for G, H resp.. 3. If φ is a homomorphism then φ(a−1 ) = φ(a)−1 . 4. If φ : G → H is a homomorphism then φ(G) is a subgroup of H.

178

D. REVIEW OF GROUPS, RINGS AND FIELDS

Exercise D.9.2. Suppose that G ' H as groups and let φ : G → H be an isomorphism. a) Show that if a ∈ G and ord(a) = k then ord(φ(a)) = k. b) Prove that if G has a cyclic subgroup of order k then so does H. c) Prove that G and H have the same number of elements of order k for any k ∈ N. Definition D.9.4. Let φ : G → H be a homomorphism between groups G, H. The kernel of φ, denote ker(φ) is given by ker(φ) = {x ∈ G : φ(x) = e}, where e is the identity in H. Theorem D.9.2. If φ : G → H is a homomorphism then ker(φ) is a normal subgroup of H. Exercise D.9.3. Show that a homomorphism φ : G → H is an isomorphism iff ker(φ) = {e}. Theorem D.9.3. Any two cyclic groups of the same order are isomorphic. Proof. Let G =< a >, H =< b > be cyclic groups of order n. Define φ : G → H by φ(ak ) = bk , for k ∈ Z. Show that φ is well defined, a homomorphism, and that it has a trivial kernel.  Example D.9.1. We have seen a number of different examples of cyclic groups of order 4: (Z4 , +) =< 1 >, (< i >, ·) in C, (U5 , ·), (U10 ·), < (1, 2, 3, 4) > in S4 , etc. These are all isomorphic. Theorem D.9.4. First Isomorphism Theorem. Let φ : G → H be a group homomorphism. Then G/ker(φ) ' φ(G). Proof. Let N = ker(φ), a normal subgroup of G. Define φˆ : G/N → φ(G), by ˆ a) = φ(a). Verify that φˆ is well defined and a homomorphism. To show that φˆ φ(N is 1-to-1 it suffices to show the kernel is trivial. ˆ ⇔ φ(a) = e ⇔ a ∈ ker(φ) ⇔ N a = N, N a ∈ ker(φ) the identity element of G/N .



Bibliography [1] W. R. Alford, A. Granville, C. Pomerance, There are Infinitely Many Carmichael Numbers, Annals of Mathematics 139 (1994), 703722. [2] E. Bach, Analytic methods in the analysis and design of number-theoretic algorithms, MIT Press, Cambridge, Mass., 1985. [3] Z. I. Borevich and I. R. Shafarevich, Number Theory, Academic Press, New York, 1966. [4] T. Cochrane and P. Mitchell, Small solutions of the Legendre equation, J. Number Theory 70 (1998), no. 1, 62-66. [5] H. Hasse, Zur Theorie der abstrakten elliptischen Funktionenkrper. I, II & III, J. Reine Angew. Math. 175 (1936), 55-62, 69-88, 193-208. [6] H. A. Helfgott, The ternary Goldbach conjecture, (Spanish) Gac. R. Soc. Mat. Esp. 16 (2013), no. 4, 709726. [7] H. A. Helfgott and D. J. Platt, Numerical verification of the ternary Goldbach conjecture up to 8.8751030, Exp. Math. 22 (2013), no. 4, 406409. [8] L. Holzer, Minimal solutions of diophantine equations, Canad. J. Math. 2 (1950), 238-244. [9] P. Mitchell, Algorithms for finding small solutions. Thesis (Ph.D.)Kansas State University, 1995, 1-55. [10] L. J. Mordell, On the magnitude of the integer solutions of the equation ax2 + by 2 + cz 2 = 0, J. Number Theory 1 (1969), 1-3. [11] I. Niven, H. S. Zuckerman and H. L. Montgomery, An introduction to the theory of numbers, Fifth edition, John Wiley and Sons, Inc., New York, 1991. [12] J. B. Rosser and L. Schoenfeld, Approximate formulas for some functions of prime numbers, Illinois J. Math. 6 (1962), 6494. [13] B. Vall´ ee, Gauss algorithm revisited, J. Algorithms 12 (1991), 556-572. [14] S. Wedeniwski, Primality Tests on Commutator Curves, Dissertation. Tbingen, Germany, 2001. [15] K. S. Williams, On the size of a solution of Legendre’s equation, Utilitas Mathematica 34 (1988), 65-72. [16] Y. Zhang, Bounded gaps between primes, Ann. of Math. (2) 179 (2014), no. 3, 11211174.

179