An Introduction to the Circle Method

195 83 8MB

English Pages 279 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

An Introduction to the Circle Method

Table of contents :
Cover
Half-title
Title Page
Copyright
Contents
Acknowledgments
Preface
Index of notations
Chapter 1. Introduction and overview
1.1. Introduction
1.2. Preparatory chapters
1.3. Early developments in the study of Waring’s problem
1.4. The method of exponential sums
1.5. Origins of the circle method and applications to additive problems
Chapter 2. Fundamental theorem of arithmetic
2.1. Mathematical induction
2.2. Divisibility
2.3. Greatest common divisor
2.4. Prime numbers and unique factorization
Chapter 3. Arithmetic functions
3.1. Multiplicative functions
3.2. Möbius function and Möbius inversion
3.3. Greatest integer function
3.4. The big-Ox and little-ox notations
3.5. Averages of arithmetical functions
3.6. Technique of partial summation
3.7. The Cauchy–Schwarz and Hölder inequalities
Chapter 4. Introduction to congruence arithmetic
4.1. Definition and basic properties of congruences
4.2. Congruence powers and Euler’s theorem
4.3. Linear congruence equations
4.4. Linear congruences and the Chinese remainder theorem
4.5. Polynomial congruences
4.6. Order and primitive roots
Chapter 5. Distribution of prime numbers
5.1. Dirichlet series
5.2. Euler products and Dirichlet series
5.3. Analytic properties of Dirichlet series
5.4. Distribution functions for prime numbers
5.5. Primes in arithmetic progressions
5.6. Dirichlet characters and Dirichlet 𝐿-functions
5.7. Ramanujan sums and Ramanujan series
Chapter 6. An introduction to Waring’s problem
6.1. Fermat’s two square theorem
6.2. Lagrange’s four square theorem
6.3. A conjectured value for 𝑔(𝑘)
6.4. The easier Waring’s problem
Chapter 7. Waring’s problem
7.1. Schnirelmann density
7.2. Schnirelmann density and Waring’s problem
7.3. Proof of Linnik’s theorem
Chapter 8. Exponential sums
8.1. Exponential sums for polynomials of degree 1
8.2. Exponential sums and Diophantine approximation
8.3. Exponential sums over primes
Chapter 9. The circle method and Waring’s problem
9.1. An outline of the circle method
9.2. The contribution from the major arcs
9.3. The singular integral
9.4. Singular series
9.5. Minor arcs in Waring’s problem
Chapter 10. The circle method and the Goldbach conjectures
10.1. Major and minor arcs
10.2. Contribution from the major arcs
10.3. Contribution from the minor arcs
10.4. Comments about Vinogradov’s theorem
10.5. The circle method and the binary Goldbach conjecture
Chapter 11. Epilogue
11.1. The philosophy of the circle method
11.2. An axiomatic framework
11.3. The singular series
11.4. The minor arcs
11.5. The future of the circle method
Bibliography
Index
Selected Published Titles in this Series
Back cover

Citation preview

STUDENT MATHEMATICAL LIBRARY Volume 104

An Introduction to the Circle Method M. Ram Murty Kaneenika Sinha

An Introduction to the Circle Method

STUDENT MATHEMATICAL LIBRARY Volume 104

An Introduction to the Circle Method M. Ram Murty Kaneenika Sinha

EDITORIAL COMMITTEE John McCleary Rosa C. Orellana (Chair)

Paul Pollack Kavita Ramanan

The unpublished poem “The Sine” on unnumbered page v is used with permission by V. Kumar Murty. 2020 Mathematics Subject Classification. Primary 11P55, 11P32, 11E25.

For additional information and updates on this book, visit www.ams.org/bookpages/stml-104

Library of Congress Cataloging-in-Publication Data Cataloging-in-Publication Data has been applied for by the AMS. See http://www.loc.gov/publish/cip/. DOI: https://doi.org/10.1090/stml/104

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for permission to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For more information, please visit www.ams.org/publications/pubpermissions. Send requests for translation rights and licensed reprints to [email protected]. c 2023 by the American Mathematical Society. All rights reserved.  The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at https://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

28 27 26 25 24 23

THE SINE Rolling along I go, a sinusoidal wave; maxima and minima, a cadence in octave. Increasing in degrees, another half-pi; struggling to rise, I am stopped by the sky! Why should I wind, down and up forever? What lies above? Is there a greater power? I once saw a secant, flying to infinity; I quickly asked it how to get liberty. Before disappearing, it shouted at me; “The exponential function, just try it and see.” “What? The exponential? That’s asking quite a bit! For it is bounded, and everyone knows it!” Periodically, however, I returned to the same spot; The words of the secant echoed in my thought.

Vijaya Kumar Murty The Visionary Eye and other poems.

Contents

Acknowledgments

xi

Preface

xiii

Index of notations

xvii

Chapter 1.

Introduction and overview

1

§1.1. Introduction

1

§1.2. Preparatory chapters

5

§1.3. Early developments in the study of Waring’s problem

8

§1.4. The method of exponential sums

9

§1.5. Origins of the circle method and applications to additive problems Chapter 2.

Fundamental theorem of arithmetic

10 15

§2.1. Mathematical induction

16

§2.2. Divisibility

17

§2.3. Greatest common divisor

18

§2.4. Prime numbers and unique factorization

22

Chapter 3.

Arithmetic functions

25

§3.1. Multiplicative functions

25

§3.2. Möbius function and Möbius inversion

27 vii

viii

Contents

§3.3. Greatest integer function

31

§3.4. The big-O and little-o notations

33

§3.5. Averages of arithmetical functions

35

§3.6. Technique of partial summation

39

§3.7. The Cauchy–Schwarz and Hölder inequalities

44

Chapter 4.

Introduction to congruence arithmetic

49

§4.1. Definition and basic properties of congruences

49

§4.2. Congruence powers and Euler’s theorem

51

§4.3. Linear congruence equations

53

§4.4. Linear congruences and the Chinese remainder theorem 56 §4.5. Polynomial congruences

59

§4.6. Order and primitive roots

65

Chapter 5.

Distribution of prime numbers

73

§5.1. Dirichlet series

75

§5.2. Euler products and Dirichlet series

76

§5.3. Analytic properties of Dirichlet series

78

§5.4. Distribution functions for prime numbers

83

§5.5. Primes in arithmetic progressions

100

§5.6. Dirichlet characters and Dirichlet 𝐿-functions

102

§5.7. Ramanujan sums and Ramanujan series

112

Chapter 6.

An introduction to Waring’s problem

119

§6.1. Fermat’s two square theorem

119

§6.2. Lagrange’s four square theorem

123

§6.3. A conjectured value for 𝑔(𝑘)

125

§6.4. The easier Waring’s problem

131

Chapter 7.

Waring’s problem

133

§7.1. Schnirelmann density

133

§7.2. Schnirelmann density and Waring’s problem

138

§7.3. Proof of Linnik’s theorem

140

Contents Chapter 8.

ix Exponential sums

153

§8.1. Exponential sums for polynomials of degree 1

155

§8.2. Exponential sums and Diophantine approximation

163

§8.3. Exponential sums over primes

174

Chapter 9.

The circle method and Waring’s problem

187

§9.1. An outline of the circle method

190

§9.2. The contribution from the major arcs

195

§9.3. The singular integral

197

§9.4. Singular series

205

§9.5. Minor arcs in Waring’s problem

217

Chapter 10.

The circle method and the Goldbach conjectures

219

§10.1. Major and minor arcs

222

§10.2. Contribution from the major arcs

224

§10.3. Contribution from the minor arcs

230

§10.4. Comments about Vinogradov’s theorem

232

§10.5. The circle method and the binary Goldbach conjecture 234 Chapter 11.

Epilogue

237

§11.1. The philosophy of the circle method

238

§11.2. An axiomatic framework

239

§11.3. The singular series

242

§11.4. The minor arcs

244

§11.5. The future of the circle method

244

Bibliography

249

Index

255

Acknowledgments

We would like to thank Vijaya Kumar Murty, Padma Ramanathan, Debmalya Basak, Rik Sarkar, Sridip Pal, M. Subramani, A. Sivakumar, H. G. Gadiyar and the reviewers for feedback on an earlier draft of this book. We are grateful to Arijit Chakraborty, Sneha Chaubey, Neha Prabhu and Sudhir Pujahari for their encouraging comments. This text grew out of winter schools and semester-long courses at IISER Kolkata, IISER Pune and Queen’s University. We thank the students who participated in these courses. We thank Ina Mette and the American Mathematical Society for their interest in this book, and are grateful to Marcia Almeida, Christine Thivierge, and Abigail Lawson for help and correspondence related to the preparation of the manuscript. We thank John F. Brady for the beautiful cover design of the book. The second named author (K. S.) would like to acknowledge support from the MATRICS grant of the Science and Engineering Research Board, Department of Science and Technology, Government of India.

xi

Preface

To see form where form is yet unmade, and shape in the ephemeral air. ... In whose presence things dare not be incoherent, this inspired mind is the visionary eye.1

The idea for this book goes back at least forty years when the senior author was a post-doctoral fellow at the Tata Institute of Fundamental Research in Bombay, India. He gave a course of lectures on the circle method from which the first draft of a monograph was produced. For various reasons, this first draft lay dormant for many years until 1987. That year, being the birth centenary of the legendary Indian mathematician, Srinivasa Ramanujan, many research institutes in India had organized international conferences to commemorate the occasion. 1

These lines are from the poem ‘The Visionary Eye’ by V. Kumar Murty.

xiii

xiv

Preface

The senior author was at such a conference in the city of Chennai (then known as Madras), but as fate would have it, on Ramanujan’s actual birthday December 22, some civil unrest prevented the conference from proceeding and attendees had to stay in their allotted hotel and guest house rooms. It was then that he gave a series of improvised lectures on the circle method of Ramanujan to a group of Indian mathematicians who were not number theorists, but who were together at the same guest house. It was also at that time that the dormant manuscript was resurrected and started showing new signs of life. But the momentum to write a monograph was lacking partly due to the fact that in the meantime Robert C. Vaughan had published his Cambridge tract titled “The Hardy–Littlewood method”, as the circle method is referred to by some. As explained by Atle Selberg in his 1987 lecture at the Ramanujan centenary, the circle method was actually introduced by Ramanujan in one of his letters to Godfrey H. Hardy, and was developed and applied by Hardy and Ramanujan to a number of problems in additive number theory, most prominently to study the partition function. One might therefore have expected this method to be called “The Ramanujan–Hardy method”, but mathematical nomenclature seems to follow its own byzantine logic. The present monograph is the outcome of an endeavour to write about the circle method in a way that is accessible to undergraduate students who are familiar with real and complex analysis, but are learning number theory for the first time. This sets it apart from the monograph of Vaughan which is aimed at graduate students. This text grew out of two mini-courses given by the authors in 2010–2011 at the Indian Institute of Science Education and Research (IISER), Kolkata to undergraduate students. We literally started from scratch. Our goal was to show the students the elegance of the circle method and at the same time give a complete solution of the famous Waring problem as an illustration of the method. Subsequently, the authors gave semester-long courses on the circle method to undergraduate students at Queen’s University and IISER Pune. This motivated us to write the text in a self-contained manner so that it is accessible to an undergraduate student who wants to learn the circle method, but has no previous knowledge of number theory. This book is suitable for a one-semester undergraduate course. Students familiar with elementary number theory can read Chapter 1 for

Preface

xv

an overview of the contents of the book, and jump directly to Chapter 5 where we derive some classical theorems of analytic number theory used later in the book. Students who have not seen any number theory before can read Chapters 2–4 for a quick introduction to topics in elementary number theory that will be used later in the textbook. Chapters 6–10 formed the bulk of our short courses. Chapter 6 contains a solution of Waring’s problem using some ideas of Joseph H. Schnirelmann and Yuri V. Linnik. Chapters 7–10 describe the application of the circle method to Waring’s problem and the ternary Goldbach conjecture. Finally, in Chapter 11, we provide the reader a lightning view of the origins of the circle method and describe the underlying philosophy of the method in a general way. We also indicate future directions and avenues of further study, along with several references for the student to explore this topic in greater depth. This chapter is aimed at the advanced student who wants to have a panoramic understanding of the method after having studied the more technical aspects treated in Chapters 6 and beyond. This chapter can also be read by the non-expert to gain a cursory understanding of the method without too many technicalities. M. Ram Murty, Kingston, Canada Kaneenika Sinha, Pune, India September, 2022

Index of notations

• ℕ denotes the set of natural numbers. • ℤ denotes the set of integers. • ℤ≥0 denotes the set of nonnegative integers. • ℙ denotes the set of prime numbers. • 𝑝 or 𝑝 𝑖 denotes a prime number. • For 𝑘 ∈ ℕ, 𝐴𝑘 denotes the set {𝑛𝑘 ∶ 𝑛 ∈ ℕ}. • ℚ denotes the set of rational numbers. • ℝ denotes the set of real numbers. • ℂ denotes the set of complex numbers. ℂ∗ denotes the set of nonzero complex numbers. • For a set 𝐴, |𝐴| or #𝐴 denotes the number of elements in the set 𝐴. • For a set 𝐴 ⊂ ℂ and 𝑛 ∈ ℕ, 𝑛 ≥ 1, 𝐴𝑛 = {(𝑎1 , 𝑎2 , . . . , 𝑎𝑛 ), 𝑎𝑖 ∈ 𝐴}. • For 𝛼 ∈ ℝ, 𝑒(𝛼) ≔ 𝑒2𝜋𝑖𝛼 . • For 𝑥 ∈ ℝ, exp(𝑥) ≔ 𝑒𝑥 . • We denote a complex number 𝑠 = 𝜎 + 𝑖𝑡, 𝜎 = ℜ(𝑠), 𝑡 = ℑ(𝑠). • Given 𝑠 ∈ ℂ, |𝑠| denotes √𝜎2 + 𝑡2 . xvii

xviii

Index of notations • Let 𝑏, 𝑎 ∈ ℤ with 𝑏 ≠ 0. 𝑏 ∣ 𝑎 or 𝑏|𝑎 denotes that 𝑏 divides 𝑎. If 𝑏 does not divide 𝑎, then we say that 𝑏 ∤ 𝑎. • Given 𝑎, 𝑏 ∈ ℤ, (𝑎, 𝑏) denotes the greatest common divisor of 𝑎 and 𝑏. If (𝑎, 𝑏) = 1, then we say that 𝑎 and 𝑏 are coprime. • Let 𝑎 ∈ ℕ such that 𝑝𝑎 ∣ 𝑛, but 𝑝𝑎+1 ∤ 𝑛. We say that 𝑝𝑎 ∣∣ 𝑛. • For 𝑛 ∈ ℕ, 𝜈(𝑛) denotes the number of distinct prime divisors of 𝑛. • For 𝑛 ∈ ℕ, 𝜇(𝑛) denotes the Möbius function of 𝑛. • Given 𝑥 ∈ ℝ, [𝑥] or ⌊𝑥⌋ denotes the greatest integer less than or equal to 𝑥. {𝑥} denotes 𝑥 − [𝑥]. • Given 𝑥 ∈ ℝ, ‖𝑥‖ denotes the nearest integer to 𝑥. • For 𝑛 ∈ ℕ, 𝜙(𝑛) denotes the number of integers 𝑎 ∈ {1, 2, . . . , 𝑛 − 1} such that (𝑎, 𝑛) = 1. • For 𝑛 ∈ ℕ, Λ(𝑛) denotes the function {

log 𝑝

if 𝑛 = 𝑝𝑘 , 𝑘 ≥ 1

0

otherwise

and 𝑇(𝑛) denotes the function {

log 𝑝

if 𝑛 = 𝑝

0

otherwise

.

• The partial sums of the above functions are denoted as: 𝜓(𝑥) ≔ ∑ Λ(𝑛) 𝑛≤𝑥

and 𝜃(𝑥) ≔ ∑ 𝑇(𝑛). 𝑛≤𝑥

• For a complex valued function 𝑓(𝑥) and a real valued function 𝑔(𝑥) such that 𝑔(𝑥) > 0 for all 𝑥 ≥ 𝑥0 , we write 𝑓(𝑥) = O(𝑔(𝑥)) or 𝑓(𝑥) ≪ 𝑔(𝑥) if there exists a constant 𝐶 > 0 such that |𝑓(𝑥)| ≤ 𝐶𝑔(𝑥) for all 𝑥 ≥ 𝑥0 . 𝐶 is called the implied constant. We write 𝑓(𝑥) = O𝑎 (𝑔(𝑥)) or 𝑓(𝑥) ≪𝑎 𝑔(𝑥)

Index of notations

xix

if the above implied constant 𝐶 depends on a parameter 𝑎. We may also denote the above as 𝑔(𝑥) ≫ 𝑓(𝑥) or as 𝑔(𝑥) ≫𝑎 𝑓(𝑥) as the case may be. • For functions 𝑓(𝑥) and 𝑔(𝑥) as above, we write 𝑓(𝑥) = o(𝑔(𝑥)) if 𝑓(𝑥) = 0. lim 𝑥→∞ 𝑔(𝑥) • For functions 𝑓(𝑥) and 𝑔(𝑥) as above, we write 𝑓(𝑥) ∼ 𝑔(𝑥) and say that 𝑓(𝑥) is asymptotic to 𝑔(𝑥) if 𝑓(𝑥) = 1. 𝑔(𝑥) • Let 𝑎, 𝑏 and 𝑚 be integers with 𝑚 > 0. We say that 𝑎 is congruent to 𝑏 modulo 𝑚 and write 𝑎 ≡ 𝑏 (mod 𝑚) if 𝑚 ∣ (𝑎 − 𝑏). If 𝑚 ∤ (𝑎 − 𝑏), we write 𝑎 ≢ 𝑏 (mod 𝑚). lim

𝑥→∞

• For 𝑛 ∈ ℕ, ℤ/𝑛ℤ denotes the set of all residue classes mod 𝑛. (ℤ/𝑛ℤ)∗ denotes the set of all coprime residue classes mod 𝑛. • Let 𝑎 and 𝑚 be coprime integers such that 𝑚 > 0. The smallest positive integer 𝑘 for which 𝑎𝑘 ≡ 1 (mod 𝑚) is called the order of 𝑎 modulo 𝑚 and is denoted by ord𝑚 (𝑎). • For a positive integer 𝑞, 𝜒(mod 𝑞) denotes a Dirichlet character modulo 𝑞. 𝐺(𝑞) denotes the set of Dirichlet characters mod 𝑞. • For a positive real number 𝑥, 𝜋(𝑥) denotes the number of primes 𝑝 ≤ 𝑥. • Let 𝑎 and 𝑞 be coprime positive integers. 𝜋(𝑥, 𝑞, 𝑎) denotes the numbers of primes ≤ 𝑥 such that 𝑝 ≡ 𝑎 (mod 𝑞). • For 𝐴 ⊂ ℕ, 𝜎(𝐴) denotes the natural density of 𝐴 where as 𝛿(𝐴) denotes the Schnirelmann density of 𝐴. • For integers 𝑠, 𝑘 ≥ 1 and 𝑚 ≥ 0, 𝑟𝑠,𝑘 (𝑚) denotes # {(𝑥1 , 𝑥2 , . . . , 𝑥𝑠 ) ∶ 𝑥𝑖 ∈ ℕ ∪ {0}, 𝑥1𝑘 + 𝑥2𝑘 + ⋯ 𝑥𝑠𝑘 = 𝑚} .

Chapter 1

Introduction and overview

1.1. Introduction The history1 of number theory is replete with examples of problems which can be explained in simple language to an eager primary school student. However, finding solutions to many such problems have taken centuries of concerted efforts by a wider community of mathematicians (amateur as well as professional). Some of these problems are still unsolved. Often, a study of these problems has resulted in deep insights and path-breaking ideas which have revolutionized mathematics. Some of these questions, including those covered in this book, originate from a common source going back several centuries. Diophantus of Alexandria was a scholar in ancient Greece. Although little is known about his life, extant references indicate that he was born in the early part of the third century CE. He wrote a series of thirteen books titled Arithmetica. The book series is considered the earliest text in Western European history in which mathematical ideas and questions were explained using symbols. They contain several algebraic equations to which Diophantus sought solutions in integers. These textbooks became obscure in Western Europe during the period of the Dark Ages,

1 This chapter is a modified version of an article titled “Additive problems in number theory” by the second named author in Blackboard (Mathematics Teachers’ Association, India), Issue 3, 2021.

1

2

1. Introduction and overview

but some of the books were preserved by Byzantine scholars and were rediscovered in Rome in the fifteenth century. The Arithmetica became available to European scholars when Claude Bachet published a Latin translation of the six surviving books. Bachet’s translation soon gained the attention of mathematics lovers, among whom was a young French lawyer by the name of Pierre de Fermat. Fermat was a lawyer who pursued mathematics as a hobby in his free time, and made significant contributions to the subject. Some of his pertinent observations and insights were written on the margins of his copy of Bachet’s translation and were rediscovered by his son a few years after Fermat passed away. An English translation of one such observation ([59, Page 3]) is as follows: “Every number is a triangular number or the sum of two or three triangular numbers; every number is a square or the sum of two, three or four squares; every number is a pentagonal number or the sum of two, three, four or five pentagonal numbers; and so on . . . . The precise statement of this very beautiful and general theorem depends on the number of angles. The theorem is based on the most diverse and abstruse mysteries of numbers, but I am not able to include the proof here.” The general theorem that Fermat is referring to is about what are called polygonal numbers. The note implies that Fermat had a proof, and it does seem plausible that he did. This theorem was proved in its entirety by Augustin-Louis Cauchy in 1813. Our focus, however, will be on the highlighted part of Fermat’s note above, namely the conjecture that every natural number can be written as a sum of at most four squares. It is possible that Diophantus was familiar with this conjecture. But, we find the first recorded statement by Bachet in 1621 (in his translation of Arithmetica), which is how Fermat became familiar with it. Bachet also verified it for every number less than 326. In 1748, Leonhard Euler wrote a letter to Christian Goldbach, which contains a fundamental step in the proof of the four square conjecture. This refers to an explicit identity which shows that a product of two numbers—each of which is a sum of at most four squares—is also a sum of at most four squares. Thus, it is sufficient to prove that every prime number can be written as a sum of at most four squares. This was done by Joseph Louis Lagrange in 1770 and the four square theorem is now named after him. Theorem 1.1 (Lagrange’s four square theorem). Every natural number is a sum of the squares of at most four natural numbers.

1.1. Introduction

3

Around the same time, the mathematician Edward Waring, in his book Meditationes Algebraicae conjectured a generalization of the four square theorem. He stated that every nonnegative integer is the sum of four squares, nine cubes, nineteen fourth powers and so on. The phrase “and so on” has the following precise expression. Conjecture 1.2 (Waring’s problem, 1770). For each positive integer 𝑘 ≥ 2, there exists a positive integer 𝑔 = 𝑔(𝑘) ≥ 2 such that for any positive integer 𝑛, there exist 𝑔 nonnegative integers 𝑥1 , 𝑥2 , . . . , 𝑥𝑔 such that 𝑛 = 𝑥1𝑘 + 𝑥2𝑘 + ⋯ + 𝑥𝑔𝑘 . We note here that 𝑔(𝑘) is chosen to be the minimal positive integer with the above property. That is, there exists a natural number 𝑛 which cannot be written as a sum of [𝑔(𝑘) − 1] 𝑘th powers. Lagrange’s theorem is the assertion that 𝑔(2) = 4. Moreover, as per Waring’s conjecture, 𝑔(3) = 9 and 𝑔(4) = 19. As this book proceeds, we will be able to state and interpret Waring’s problem in various elegant ways. Waring’s problem leads us to ask two questions: firstly, does 𝑔(𝑘) exist for every 𝑘? Secondly, can we find a precise formula for 𝑔(𝑘) for all 𝑘? In a parallel correspondence, L. Euler and Goldbach also discussed fundamental questions about expressing all natural numbers > 1 as sums of finitely many primes. In this direction, Goldbach wrote a letter to L. Euler in 1742, in which he made the following conjectures. Conjecture 1.3 (Goldbach’s binary (strong) conjecture). Every even number 𝑛 ≥ 4 can be written as a sum of two primes. Conjecture 1.4 (Goldbach’s ternary (weak) conjecture). Every odd number 𝑛 > 5 can be written as a sum of three primes. One sees immediately that the strong conjecture implies the weak one. This follows from the observation that an odd 𝑛 > 5 can be written as 3 + 𝑘, where 𝑘 is even and > 2. The ternary Goldbach conjecture is essentially a theorem due to the pioneering work of Ivan M. Vinogradov who showed that the assertion is true for 𝑛 sufficiently large. In one of the most interesting developments in the last decade, the full Goldbach ternary conjecture has been proved by Harald Helfgott, a mathematician at the University of Göttingen. However, the binary conjecture is still open.

4

1. Introduction and overview

The Goldbach conjectures present an interesting contrast between the additive and multiplicative properties of primes. By the Fundamental Theorem of Arithmetic, any natural number 𝑛 > 1 can be written uniquely as a product of powers of primes, 𝑎

𝑎

𝑎

𝑛 = 𝑝1 1 𝑝2 2 . . . 𝑝𝑘𝑘 , 𝑎𝑖 ≥ 1. We now consider a number 𝑐. Can we write any 𝑛 > 1 as a product of at most 𝑐 primes? The answer to this question is no. To see this, we note that there are infinitely many primes. Let us denote the 𝑛th prime number by 𝑝𝑛 . Now, for any 𝑐, we have a number 𝑛 = 𝑝1 𝑝2 . . . 𝑝𝑐+1 , which, by the Fundamental Theorem of Arithmetic, cannot be written as a product of at most 𝑐 primes. We now give an additive twist to the above question. Can we write any 𝑛 > 1 as a sum of at most 𝑐 primes? Goldbach’s conjectures predict that not only do we have an affirmative answer to this question, but also that the value of 𝑐 is as small as 3. The problems of Goldbach and Waring are often combined into a single question called the Goldbach-Waring problem which asks: when can we write a natural number 𝑛 as a sum 𝑛 = 𝑚𝑘1 + 𝑚𝑘2 + ⋯ + 𝑚𝑘𝑔 , where the 𝑚𝑖 belong to a prescribed set 𝑆. If 𝑘 = 1 and 𝑆 is the set of primes, we have the Goldbach problem. If 𝑘 ≥ 2 and 𝑆 is the set of natural numbers, we have the Waring problem. This perspective opens the door to a vista of new questions to which the methods of this book can be applied and the limitations of these methods explored. This book is meant to be a survey of various additive problems in number theory related to Waring’s problem and the conjectures of Goldbach. It is aimed at undergraduate students eager to acquire a knowledge of these problems. Our endeavour is to introduce students to fundamental concepts in number theory and to familiarise them with important techniques in additive number theory such as the circle method in a self-contained manner. As such, this book will be accessible to a student who has had introductory courses in real and complex analysis, but who is learning number theory for the first time. This book is organized into the following chapters.

1.2. Preparatory chapters

5

1.2. Preparatory chapters 1.2.1. Elementary number theory. Since our aim is to be as selfcontained as possible, we start with a review of basic notions in number theory from Chapters 2–4. In Chapter 2, we introduce the student to fundamental notions at the heart of the study of numbers, such as divisibility, the Euclidean algorithm for finding the greatest common divisor of two integers, prime numbers and the Fundamental Theorem of Arithmetic. In Chapter 3, we introduce the notion of arithmetic functions, that is, complex-valued functions defined on the set of natural numbers. We introduce tools from analysis to study the behavior of various arithmetic functions which are relevant in additive number theory. In Chapter 4, we review basic notions and properties of congruence arithmetic. After learning the basic language of congruence arithmetic to express important divisibility properties of integers, the reader will learn important techniques to find solutions of what are called congruence equations. The contents of this chapter form a vital and foundational component of any study in number theory: in the context of this textbook, we will also provide details of specific theorems in congruence arithmetic which are necessary to address the questions of Waring and Goldbach. 1.2.2. Analytic number theory. In Chapter 5, we approach the study of prime numbers from an “analytic” viewpoint. We start by interpreting properties of prime numbers covered in the previous chapters in the language of infinite series. For example, the divergence of the series 1 ∑ 𝑝 𝑝 prime implies that there are infinitely many prime numbers. In this chapter, we cover mathematical tools which have been carefully developed over many centuries to understand prime numbers. One such tool is the wellknown Riemann zeta function. The zeta function is defined as ∞

1 𝑠 𝑛 𝑛=1

𝜁(𝑠) = ∑

for a complex number 𝑠 ∈ ℂ with real part ℜ(𝑠) > 1. Much of the early work on the zeta function viewed this function for real numbers 𝑠 > 1.

6

1. Introduction and overview

The study of the zeta function as a real-valued function has immediate connections with classical properties of prime numbers. For example, the fundamental theorem of arithmetic is equivalent to the assertion that for any real number 𝑠 > 1, ∞

1 1 ) = ∏ (1 − 𝑠 ) 𝑘𝑠 𝑝 𝑝 𝑘=0 𝑝

𝜁(𝑠) = ∏ ( ∑ 𝑝

−1

.

In a breakthrough paper written in 1859 [63], Bernhard Riemann studied the zeta function 𝜁(𝑠) as a function of a complex variable 𝑠 and outlined a detailed program linking the complex-analytic properties of 𝜁(𝑠) with the distribution properties of prime numbers. He made two important observations. Firstly, the zeta function can be analytically continued to the entire complex plane except the point 𝑠 = 1. Secondly, the zero-free regions of this function (that is, the regions where 𝜁(𝑠) ≠ 0) in the half-plane ℜ(𝑠) > 0 have a direct bearing on estimates for the prime-counting function 𝜋(𝑥), defined as the number of primes up to 𝑥 for large values of 𝑥. In this context, Riemann made a conjecture, the well-known Riemann hypothesis, which predicts that any nonreal zero of 𝜁(𝑠) have real part equal to 1/2. This conjecture still remains unproved and has motivated a good deal of mathematics over the last 160 years. The theme outlined by Riemann can be generalized to series of the form ∞ 𝑓(𝑛) ∑ 𝑠 𝑛 𝑛=1 for various interesting arithmetic functions 𝑓(𝑛). In particular, we are interested in 𝐿-functions of the form ∞

𝜒(𝑛) , 𝑛𝑠 𝑛=1 ∑

where 𝜒 is a special periodic complex-valued function called a Dirichlet character. 𝐿-functions were defined by Peter G. L. Dirichletin order to study the distribution properties of primes in arithmetic progressions. As with the classical zeta function of Riemann, the study of zerofree regions of these 𝐿-functions has direct applications to the function 𝜋(𝑥, 𝑞, 𝑎) which counts the number of primes up to 𝑥 lying in the arithmetic progression {𝑘𝑞 + 𝑎 ∶ 𝑘 ∈ ℤ}. Here, ℤ is the set of integers.

1.2. Preparatory chapters

7

This analytic perspective is described in Chapter 5. More precisely, we discuss the following topics: • The Riemann zeta function and more generally, the Dirichlet series associated to suitable arithmetic functions. We derive their complex-analytic properties and how these properties lead to estimates for the partial sums ∑𝑛≤𝑥 𝑓(𝑛). • A sharp version of the prime number theorem states that 𝜋(𝑥) = li 𝑥 + O(

𝑥 ) (log 𝑥)𝐶

for any 𝐶 > 1 and li 𝑥 is the logarithmic integral defined by 𝑥

li 𝑥 ≔ ∫ 2

𝑑𝑡 . log 𝑡

We review a classical proof for the error term in the above asymptotic using information about the zero-free regions of the Riemann zeta function. • Fundamental properties of Dirichlet characters and 𝐿-functions associated to them. • A classical theorem of Carl L. Siegel and Arnold Walfisz which states that if 𝑎 and 𝑞 are coprime integers, then 𝜋(𝑥, 𝑞, 𝑎) =

1 𝑥 li 𝑥 + O( ) 𝜙(𝑞) (log 𝑥)𝐶

for any 𝐶 > 1. Here, 𝜙(𝑞) denotes the number of integers 1 ≤ 𝑛 ≤ 𝑞 − 1 which are coprime to 𝑞. The main idea in the proof of the Siegel–Walfisz theorem is an extension of the argument to prove the (sharp) prime number theorem stated above and requires a discussion of zero-free regions of Dirichlet 𝐿-functions. The above topics have been covered in several expositions (see [16], [39], [43], [53], [54]). Our aim is to provide to the reader a selective review of essential topics required in the study of additive problems covered in this textbook.

8

1. Introduction and overview

1.3. Early developments in the study of Waring’s problem With the preliminaries in place, we move to the first additive problem of interest to us, namely Waring’s problem. 1.3.1. Introduction to Waring’s problem. In Chapter 6, we put together theorems learned in the earlier chapters on elementary number theory to prove Lagrange’s four square theorem (Theorem 1.1). We then introduce Waring’s problem (Conjecture 1.2) and the interesting questions that it leads to. After a brief review of early developments in the study of Waring’s problem, we discuss a conjecture of Johann Albrecht Euler regarding the value of 𝑔(𝑘) and interesting developments around it. 1.3.2. Additive problems and Schnirelmann density. In the 1930s, the Russian mathematician Lev Schnirelmann introduced a new perspective to study additive problems in number theory. Let us take a subset 𝐵 of the set ℕ of natural numbers. For 𝑚 ∈ ℕ, Schnirelmann defined the “sumset” 𝑚

𝑚𝐵 = {∑ 𝑏𝑖 ∶ 𝑏𝑖 ∈ 𝐵 ∪ {0}} . 𝑖=1

He also defined a new notion of density of subsets of ℕ, called the Schnirelmann density and showed that if 𝐵 has positive Schirelmann density, then 𝑚𝐵 = ℕ for some some 𝑚 ∈ ℕ. In other words, every natural number can be written as a sum of at most 𝑚 elements from the set 𝐵. His ideas lead to neat expressions for additive problems such as Waring’s problem and the Goldbach conjectures. In the context of Waring’s problem, we are interested in the sets (1.1)

𝐴𝑘 ≔ {𝑛𝑘 ∶ 𝑛 ∈ ℕ},

where 𝑘 ≥ 2 is a fixed positive integer. Waring’s problem asks if, for each 𝑘, one can find a natural number 𝑔 = 𝑔(𝑘) such that 𝑔𝐴𝑘 = ℕ. Schnirelmann’s observation is not directly applicable to 𝐴𝑘 since it has Schnirelmann density 0. Instead, one first shows that there exists 𝑙 ∈ ℕ such that 𝛿(𝑙𝐴𝑘 ) > 0. Applying Schnirelmann’s observation to the set 𝑙𝐴𝑘 , there exists 𝑚 ∈ ℕ such that 𝑚𝑙𝐴𝑘 = ℕ. The existence of 𝑙 ∈ ℕ such that 𝛿(𝑙𝐴𝑘 ) > 0 was shown by Linnik [47] in 1943. This solves Waring’s problem (Conjecture 1.2) as we can now take 𝑔 = 𝑚𝑙. Linnik’s

1.4. The method of exponential sums

9

method was further simplified by Loo Keng Hua [40, Chapters 18 and 19]. In Chapter 7, we learn about the Schnirelmann density of subsets of ℕ. We study the properties of the Schnirelmann density and its connection with sumsets. Finally, we outline Linnik’s solution of Waring’s problem combined with a theorem of Hua on exponential sums.

1.4. The method of exponential sums The additive properties of a subset of natural numbers are intricately connected to certain associated exponential sums. This connection lies at the heart of most of the work done on the Goldbach conjectures as well as Waring’s problem. For a real number 𝛼, let 𝑒(𝛼) ≔ 𝑒2𝜋𝑖𝛼 = cos 2𝜋𝛼 + 𝑖 sin 2𝜋𝛼. The method of exponential sums originates in the following integral identity: for an integer 𝑚, 1

1 ∫ 𝑒(𝑚𝛼)𝑑𝛼 = { 0 0

(1.2)

if 𝑚 = 0 if 𝑚 ≠ 0.

To use this identity for our purposes, let us take a set 𝒜 ⊂ ℕ ∪ {0}. For a natural number 𝑛, we define the exponential sum 𝑓(𝛼) ≔



𝑒(𝑟𝛼).

𝑟∈𝒜, 𝑟≤𝑛

Then, 𝑓𝑔 (𝛼) =



𝑒((𝑟1 + 𝑟2 + ⋯ + 𝑟𝑔 )𝛼).

𝑟1 ,𝑟2 . . .,𝑟𝑔 ∈𝒜, 𝑟𝑖 ≤𝑛

Using the integral identity (1.2), we deduce that 1

∫ 𝑓𝑔 (𝛼)𝑒(−𝑛𝛼)𝑑𝛼 = #{(𝑟1 , 𝑟2 . . . , 𝑟𝑔 ) ∈ 𝒜𝑔 ∶ 𝑟1 + 𝑟2 + ⋯ + 𝑟𝑔 = 𝑛}. 0

A question about expressing 𝑛 as a sum of 𝑚 elements of 𝒜 now reduces to the following question. Question 1.5. Does there exist a positive number 𝑚 such that 1

∫ 𝑓𝑚 (𝛼)𝑒(−𝑛𝛼)𝑑𝛼 > 0 0

for all 𝑛, or at least for all sufficiently large 𝑛 ?

10

1. Introduction and overview

Clearly, the above integral depends on the exponential sum 𝑓(𝛼) and in Chapter 8, we learn techniques to evaluate such sums corresponding to sets 𝒜 connected with Waring’s problem as well as the Goldbach conjectures. We also learn how the values of these sums at an irrational number 𝛼 are influenced by the Diophantine approximation properties of 𝛼. This connection proves extremely useful in the evaluation of the above integral.

1.5. Origins of the circle method and applications to additive problems As in the previous section, let us consider a subset 𝒜 ⊂ ℕ ∪ {0}. Additive questions described in this article can be modified into the following general form. Question 1.6. Does there exist 𝑔 ∈ ℕ such that every 𝑛 ∈ ℕ can be written as a sum of 𝑔 elements in 𝒜? More precisely, let 𝑟𝒜,𝑔 (𝑛) be defined as 𝑔

𝑟𝒜,𝑔 (𝑛) ≔ # {(𝑥1 , 𝑥2 , . . . , 𝑥𝑔 ) ∶ 𝑥𝑖 ∈ 𝒜, ∑ 𝑥𝑖 = 𝑛} . 𝑖=1

That is, 1

𝑟𝒜,𝑔 (𝑛) = ∫ (𝑓(𝛼))𝑔 𝑒(−𝑛𝛼)𝑑𝛼, 0

𝑓(𝛼) =



𝑒(𝑚𝛼).

𝑚∈𝒜, 𝑚≤𝑛

Does there exist a natural number 𝑔 such that 𝑟𝒜,𝑔 (𝑛) > 0 for each 𝑛 ∈ ℕ or, at least, for sufficiently large values of 𝑛? A related question to ask is if we can find an exact formula for 𝑟𝒜,𝑔 (𝑛) for given 𝑔, 𝑛 ∈ ℕ. Alternatively, for each 𝑔, can we determine the asymptotic growth of 𝑟𝒜,𝑔 (𝑛) as 𝑛 → ∞? In 1918, Hardy and Ramanujan studied the above question through a complex-analytic approach which is referred to as the circle method. This method was originally developed in an epoch-making 1918 paper of Hardy and Ramanujan [31], in which they derived an asymptotic formula for the partition function 𝑝(𝑛), which denotes the number of representations of 𝑛 as a sum of natural numbers less than or equal to 𝑛. The roots of their work go back further to one of the letters that Ramanujan had written to Hardy from India, which indicates that Ramanujan had a

1.5. Origins of the circle method

11

rudimentary form of the circle method in mind. The reader can find the historical details in Chapter 5 of [55]. For the partition function studied by Hardy and Ramanujan, formulated in the language of Question 1.6, one considers 𝒜 = ℕ. For Waring’s problem, we take 𝒜 = 𝐴𝑘 ∪ {0}. Here, 𝐴𝑘 is as defined in Equation (1.1). Let ℙ denote the set of prime numbers. For the conjectures of Goldbach, we take 𝒜 = ℙ. For the binary conjecture, we take 𝑔 = 2 and 𝑛 is an even integer ≥ 4, whereas for the ternary conjecture, we take 𝑔 = 3 and 𝑛 is an odd integer ≥ 7. Further questions of Goldbach type could be asked for larger values of 𝑔. After Ramanujan’s early demise, Hardy and John E. Littlewood developed the method to derive asymptotic formulas in Waring’s problem. They also treated questions of Goldbach type by the circle method. Later, Vinogradov [77] introduced new methods (most notably the method of exponential sums) that allowed for an unconditional treatment of the Goldbach conjectures. The most fundamental observation in the application of the circle method to Waring’s problem as well as the Goldbach conjectures is that the function 𝑓(𝛼) = ∑𝑚∈𝒜, 𝑚≤𝑛 𝑒(𝑚𝛼) takes unusually large values at rational numbers 𝛼 = 𝑎/𝑞 with suitably bounded denominators. So, we partition the interval [0, 1] into two parts: the major arcs 𝔐, which are unions of very tiny intervals around the rational numbers at which the function peaks and the minor arcs 𝔪 = [0, 1] ⧵ 𝔐, which are the portions left behind in the unit interval after taking away the major arcs. 1.5.1. The circle method and Waring’s problem. Between 1920 and 1928, Hardy and Littlewood applied the circle method to the evaluation of 1

∫ (𝑓(𝛼))𝑔 𝑒(−𝑛𝛼)𝑑𝛼, 0

𝑓(𝛼) =



𝑒(𝑚𝛼).

𝑚∈𝐴𝑘 ∪{0}, 𝑚≤𝑛

They wrote a series of papers culminating in [30], in which they showed that for 𝑔 > 2𝑘 , one can obtain asymptotics for the above integral as 𝑛 → ∞. For this, one has to isolate the major and minor arcs and evaluate the integral ∫(𝑓(𝛼))𝑔 𝑒(−𝑛𝛼)𝑑𝛼 over each of them. One then obtains lower

12

1. Introduction and overview

bounds for 𝑔 = 𝑔(𝑘) such that the main term will dominate the error term, leading to a positive value for 𝑟𝐴𝑘 ∪{0},𝑔 (𝑛) for all 𝑛 ∈ ℕ. We describe the above work of Hardy and Littlewood on the application of the circle method to Waring’s problem in Chapter 9. 1.5.2. The circle method and the Goldbach conjectures. In 1923, Hardy and Littlewood used the circle method to prove the ternary Goldbach conjecture for “sufficiently large” odd values of 𝑛 ≥ 𝐶 (under the assumption of the generalized Riemann hypothesis (GRH)). That is, under the condition that the GRH holds, any sufficiently large odd number 𝑛 can be written as a sum of three primes. This was the first major development in the study of Goldbach conjectures since 1742. In 1937, Russian mathematician Vinogradov introduced some remarkably new and beautiful ideas which circumvented the assumption of the generalised Riemann hypothesis to prove the result of Hardy and Littlewood. He proved the ternary Goldbach conjecture unconditionally for “sufficiently large” odd values 𝑛 ≥ 𝐶. Only recently did Helfgott show that we can assert this for 𝑛 > 5. Before we proceed further, we make some remarks about the use of the phrase “sufficiently large”. The theorems of Hardy–Littlewood and Vinogradov were not able to specify an explicit value of 𝐶 such that the ternary Goldbach conjecture would hold for all odd 𝑛 ≥ 𝐶. What they showed was that such a 𝐶 exists. If one could provide an explicit value of 𝐶, then one can verify the conjecture for odd 𝑛 < 𝐶 through computations and derive a complete proof of the ternary Goldbach conjecture (or disprove it if counterexamples exist). Therefore, three main challenges needed to be overcome before one could complete the treatment of Goldbach’s conjectures. (1) Make Vinogradov’s theorem effective by establishing an explicit number 𝐶 such that the ternary Goldbach conjecture holds for 𝑛 ≥ 𝐶. (2) Verify the conjecture for 𝑛 < 𝐶 case-by-case. (3) If 𝐶 is too large for our current computational resources, then refine 𝐶 down to a value for which computational verification of 𝑛 < 𝐶 is feasible. These challenges were overcome through multiple developments which are encapsulated below.

1.5. Origins of the circle method

13

• In 1956, K. G. Borodzkin [7] showed that the ternary Goldbach conjecture holds for all 𝑛 ≥ 𝐶 = 104008659 . • In 1989, Jing Run Chen and Tian Ze Wang [12] reduced 𝐶 to 1043000 and in 1996, to 107194 [13]. • In 1997, Jean-Marc Deshouillers, Gove W. Effinger, Herman te Riele and Dmitrii Zinoviev [20] proved the ternary Goldbach conjecture for all odd numbers 𝑛 > 5, but conditionally on GRH. • 𝐶 = 2 ⋅ 101346 was obtained by Ming-Chit Liu and Wang [48] in 2002. Until 2013, this remained the lowest known unconditional value for 𝐶. • On the most powerful computers, computer verification of the conjecture can be done up to the order of 1030 . In fact, in 2013, Helfgott and David J. Platt [37] verified the conjecture for odd 𝑛 ≤ 8.875 ⋅ 1030 . • In 2013, Helfgott ([34], [35]) proved that the ternary Goldbach conjecture holds for odd 𝑛 ≥ 1027 . Since the conjecture had already been verified for 𝑛 ≤ 1027 [37], Helfgott’s result was the proverbial last nail in the coffin that led to a complete proof of the ternary Goldbach conjecture. All the above results use the circle method. In Chapter 10, we learn the application of the circle method to prove Vinogradov’s assertion that the ternary Goldbach conjecture holds for sufficiently large odd values of 𝑛. We also make remarks about the limitations of the circle method in addressing the binary Goldbach conjecture. Finally, in Chapter 11, we provide the reader a lightning tour of the circle method, and describe the underlying philosophy of this method. We mention references where the interested reader can further explore the circle method in greater depth. We also briefly indicate contemporary applications and generalizations of the circle method.

Chapter 2

Fundamental theorem of arithmetic

In 300 B.C.E., Euclid of Alexandria wrote a series of 13 volumes under the title “Elements”. These volumes contain a systematic presentation of several mathematical concepts through precise definitions, theorems and their deductive proofs. They form the structural foundation of logic and mathematics as we study it today; in fact, much of what we learn in high school mathematics today goes back to the contents of these volumes. In Book 7 of “Elements,” Euclid describes fundamental notions in elementary number theory, namely, divisibility, prime numbers and composite numbers. He also presents an algorithm for finding the greatest common divisor of two natural numbers. This algorithm can further be used to obtain solutions 𝑥, 𝑦 in integers to the linear Diophantine equation 𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏), where (𝑎, 𝑏) refers to the greatest common divisor of integers 𝑎 and 𝑏. This algorithm was independently discovered by the Indian astronomer and mathematician Aryabhata in the sixth century C.E. Aryabhata named this method the “Kuttaka” method, that is, the “pulverizer” method; the choice of this name will become clear to us after we learn this method. In this chapter, we quickly review these fundamental notions and study two important theorems at the heart of number theory. We review the fundamental theorem of arithmetic which states that every natural

15

16

2. Fundamental theorem of arithmetic

number other than 1 is either a prime or can be uniquely written as a product of primes. We also learn that there are infinitely many primes.

2.1. Mathematical induction Henceforth, we will denote the set of natural numbers as ℕ and the set of integers as ℤ. We start this section with the following fundamental property of integers, namely the well-ordering principle. Principle 2.1 (Well-ordering principle). Every nonempty set of nonnegative integers contains a least element. That is, every such set 𝑆 contains an element 𝑠 such that 𝑠 ≤ 𝑎 for every 𝑎 ∈ 𝑆. The well-known principle of mathematical induction follows from the well-ordering principle. Theorem 2.2 (Principle of mathematical induction). Let 𝑆 be a set of positive integers with the properties (a) 1 ∈ 𝑆. (b) If 𝑘 ∈ 𝑆, then 𝑘 + 1 ∈ 𝑆. Then 𝑆 = ℕ. Proof. Let 𝑇 be the set of natural numbers which do not lie in 𝑆. Let us assume that 𝑇 is nonempty. By Principle 2.1, 𝑇 contains a least element, say 𝑡. Clearly, 𝑡 > 1, since, by (a), 1 ∈ 𝑆. Thus, 0 < 𝑡 − 1 < 𝑡. Since 𝑡 is the least element in 𝑇, 𝑡 − 1 is not in 𝑇 and therefore is an element of 𝑆. But, by part (b), if 𝑡 − 1 is an element of 𝑆, then so is (𝑡 − 1) + 1 = 𝑡, which is a contradiction to the fact that 𝑡 is an element of 𝑇. Thus, our assumption is false. This proves that 𝑇 is empty and therefore 𝑆 = ℕ. □ In Theorem 2.2, (a) is usually called the basis hypothesis and (b) is called the induction hypothesis. Remark 2.3. In some cases, in the basis hypothesis, 1 can be replaced by some other positive integer 𝑎. In this case, we conclude that 𝑆 is the set of all integers ≥ 𝑎. For various purposes, we use another form of the induction principle given below.

2.2. Divisibility

17

Theorem 2.4 (Second principle of induction). Let 𝑆 be a set of positive integers with the properties (a) 1 ∈ 𝑆. (b) If 1, 2, 3, . . . , 𝑘 ∈ 𝑆, then 𝑘 + 1 ∈ 𝑆. Then 𝑆 = ℕ. Proof. The proof is similar to that of Theorem 2.2. Let 𝑇 be the set of natural numbers which do not lie in 𝑆 and let us assume that 𝑇 is nonempty. Applying the well-ordering principle, let 𝑡 be the least element in 𝑇. Thus, 𝑡 > 1, that is 𝑡 − 1 > 0. Since 𝑡 is the least element in 𝑇, neither of the elements 1, 2, ⋯ 𝑡 − 1 lie in 𝑇. Thus, 1, 2, ⋯ 𝑡 − 1 are elements of S and therefore, by induction hypothesis, so is 𝑡, which contradicts the fact that 𝑡 ∈ 𝑇. Thus, our assumption is false and 𝑇 is empty. This proves that 𝑆 = ℕ. □ Induction is a very powerful tool in mathematics, but one needs to understand when and in what form to use it. 2.1.1. Exercises. Exercise 2.1.1.1. Prove that for every natural number 𝑛, 1.1! +2.2! + ⋯ + 𝑛.𝑛! = (𝑛 + 1)! −1. Exercise 2.1.1.2. Prove that for any natural number 𝑛, 2

13 + 23 + ⋯ + 𝑛3 = [

𝑛(𝑛 + 1) ] . 2

Exercise 2.1.1.3. Prove that the cube of any integer can be written as the difference of two squares. Exercise 2.1.1.4. Show using induction that 𝑎𝑛 − 1 = (𝑎 − 1)(1 + 𝑎 + 𝑎2 + 𝑎3 + ⋯ + 𝑎𝑛−1 ). [Hint: 𝑎𝑛+1 − 1 = (𝑎 + 1)(𝑎𝑛 − 1) − 𝑎(𝑎𝑛−1 − 1).]

2.2. Divisibility Let 𝑎, 𝑏 ∈ ℤ with 𝑏 ≠ 0. We say that 𝑏 divides 𝑎 if there exists an integer 𝑐 such that 𝑏𝑐 = 𝑎. Notationally, we write this as 𝑏 ∣ 𝑎.

18

2. Fundamental theorem of arithmetic

We shall frequently appeal to the division algorithm, which is stated as follows: Theorem 2.5 (Division algorithm). For any 𝑎, 𝑏 ∈ ℤ with 𝑏 > 0, there exist unique 𝑞, 𝑟 ∈ ℤ such that 𝑎 = 𝑏𝑞 + 𝑟 and 0 ≤ 𝑟 < 𝑏. We prove this theorem in the following set of exercises: 2.2.1. Exercises. Exercise 2.2.1.1. For 𝑎, 𝑏 ∈ ℤ with 𝑏 ≠ 0, consider the set 𝑆 = {𝑎 − 𝑥𝑏 ∶ 𝑥 ∈ ℤ and 𝑎 − 𝑥𝑏 ≥ 0}. Prove that 𝑆 is a nonempty set. Exercise 2.2.1.2. Use Exercise 2.1 to show that there exists an integer 𝑞 such that 𝑎 − 𝑞𝑏 < 𝑏. Exercise 2.2.1.3. Prove Theorem 2.5. [Hint: Exercises 2.1 and 2.2 take care of existence of 𝑞 and 𝑟. It now needs to be shown that 𝑞 and 𝑟 under the restrictions of Theorem 2.5 are unique.]

2.3. Greatest common divisor Definition 2.6. For nonzero integers 𝑎 and 𝑏, the greatest common divisor of 𝑎 and 𝑏 is the largest integer that divides both of them. The greatest common divisor of 𝑎 and 𝑏 is denoted as gcd (𝑎, 𝑏) or simply (𝑎, 𝑏). If (𝑎, 𝑏) = 1, we say that 𝑎 and 𝑏 are relatively prime or coprime. It is not too difficult to prove Lemma 2.7: Lemma 2.7. 𝑑 = (𝑎, 𝑏) if and only if the following two conditions hold. (a) 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏. (b) If 𝑒 ∣ 𝑎 and 𝑒 ∣ 𝑏, then 𝑒 ∣ 𝑑. How does one find the gcd of two integers? The immediate idea that comes to mind is to factorize 𝑎 and 𝑏 and read off the common factors. However, this method is not feasible if the numbers involved are big. A more efficient algorithm, known as the Euclidean algorithm, involves a successive application of the division algorithm until one hits the zero remainder. The general method is described as follows: 𝑎 = 𝑞1 𝑏 + 𝑟1 , 0 ≤ 𝑟1 < 𝑏 𝑏 = 𝑞2 𝑟1 + 𝑟2 , 0 ≤ 𝑟2 < 𝑟1

2.3. Greatest common divisor

19

𝑟1 = 𝑞3 𝑟2 + 𝑟3 , 0 ≤ 𝑟3 < 𝑟2 𝑟2 = 𝑞4 𝑟3 + 𝑟4 , 0 ≤ 𝑟4 < 𝑟3 ... ... ... 𝑟𝑛−3 = 𝑞𝑛−1 𝑟𝑛−2 + 𝑟𝑛−1 , 0 ≤ 𝑟𝑛−1 < 𝑟𝑛−2 𝑟𝑛−2 = 𝑞𝑛 𝑟𝑛−1 + 𝑟𝑛 , 0 ≤ 𝑟𝑛 < 𝑟𝑛−1 𝑟𝑛−1 = 𝑞𝑛+1 𝑟𝑛 + 0. The last nonzero remainder 𝑟𝑛 is the greatest common divisor of 𝑎 and 𝑏. By Lemma 2.7, in order to prove this, we need to show two things: (a) 𝑟𝑛 ∣ 𝑎 and 𝑟𝑛 ∣ 𝑏 (b) If 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏, then 𝑑 ∣ 𝑟𝑛 . In order to prove (a), we observe that the last equation 𝑟𝑛−1 = 𝑞𝑛+1 𝑟𝑛 + 0 shows that 𝑟𝑛 ∣ 𝑟𝑛−1 . Then, the equation immediately above the last equation shows that 𝑟𝑛 ∣ 𝑟𝑛−2 , since it divides both 𝑟𝑛 and 𝑟𝑛−1 . Working our way up, step by step, we deduce that 𝑟𝑛 ∣ 𝑏 and 𝑟𝑛 ∣ 𝑎. To prove (b), let 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏. From the first equation 𝑎 = 𝑞1 𝑏 + 𝑟1 , we see that 𝑑 ∣ 𝑟1 . From the second equation, since 𝑑 ∣ 𝑏 and 𝑑 ∣ 𝑟1 , we see that 𝑑 ∣ 𝑟2 . Working our way down till the second last equation, we get that 𝑑 ∣ 𝑟𝑛 . Thus, we have proved both conditions (a) and (b). This implies, by Lemma 2.7, that 𝑟𝑛 = (𝑎, 𝑏). Thus, we have Algorithm 2.8 to compute the greatest common divisor of two (nonzero) integers 𝑎 and 𝑏. Algorithm 2.8 (Euclidean algorithm). To compute the greatest common divisors of 𝑎 and 𝑏, taking 𝑟−1 = 𝑎 and 𝑟0 = 𝑏, we compute successive quotients and remainders 𝑟 𝑖−1 = 𝑞𝑖+1 𝑟 𝑖 + 𝑟 𝑖+1 for 𝑖 = 0, 1, 2, . . . until 𝑟𝑛+1 = 0. The last nonzero remainder 𝑟𝑛 is the greatest common divisor of 𝑎 and 𝑏. Remark 2.9. The Euclidean algorithm terminates after a finite number of steps because the successive remainders are decreasing, 𝑟0 > 𝑟1 > 𝑟2 > 𝑟3 . . . . The remainders 𝑟 𝑖 are all nonnegative. So, we have a strictly decreasing sequence of nonnegative integers, which eventually must reach 0.

20

2. Fundamental theorem of arithmetic

We now prove that there exist integers 𝑥 and 𝑦 such that 𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏). Yet another method of proving this is provided in Exercise 2.3.1.2. Observe that 𝑎 𝑞 ( ) =( 1 𝑏 1 =(

𝑞1 1

1 𝑏 𝑞 )( ) = ( 1 0 𝑟1 1 1 𝑞2 )( 0 1

1 𝑞2 )( 0 1

1 𝑞 )⋯( 𝑛 0 1

1 𝑟1 )( ) 0 𝑟2

1 𝑟𝑛−1 )( ) 0 𝑟𝑛

𝑟 = 𝐴 ( 𝑛−1 ) , 𝑟𝑛 where 𝐴 is some matrix with determinant equal to ±1. Thus, 𝐴−1 is a matrix with integer entries, say 𝑐 𝐴−1 = ( 𝑥

𝑑 ), 𝑦

such that 𝑐 ( 𝑥

𝑑 𝑎 𝑟 ) ( ) = ( 𝑛−1 ) . 𝑦 𝑏 𝑟𝑛

Thus, there exist integers 𝑥 and 𝑦 such that 𝑎𝑥 + 𝑏𝑦 = 𝑟𝑛 = (𝑎, 𝑏). In practice, in order to find integers 𝑥 and 𝑦 for which 𝑎𝑥 + 𝑏𝑦 = (𝑎, 𝑏), we again work our way through the steps of the Euclidean algorithm as follows: 𝑎 = 𝑞1 𝑏 + 𝑟1 ⇒ 𝑟1 = 𝑎 − 𝑞1 𝑏, 𝑏 = 𝑞2 𝑟1 + 𝑟2 ⇒ 𝑟2 = 𝑏 − 𝑞2 𝑟1 , which implies that 𝑟2 = 𝑏 − 𝑞2 (𝑎 − 𝑞1 𝑏) = −𝑞2 𝑎 + (1 + 𝑞1 𝑞2 )𝑏. Similarly, 𝑟3 = 𝑟1 − 𝑞3 𝑟2 = (𝑎 − 𝑞1 𝑏) − 𝑞3 (−𝑞2 𝑎 + (1 + 𝑞1 𝑞2 )𝑏) = (1 + 𝑞2 𝑞3 )𝑎 − (𝑞1 + 𝑞3 + 𝑞1 𝑞2 𝑞3 )𝑏 and so on. Finally, as we keep moving down, we get 𝑟𝑛 = 𝑎𝑥 + 𝑏𝑦 for some integers 𝑥 and 𝑦. We also observe that if 𝑥 and 𝑦 are solutions of the above linear equation, then, for any integer 𝑘, 𝑎(𝑥 + 𝑘𝑏) + 𝑏(𝑦 − 𝑘𝑎) = 𝑟𝑛 . Thus, the equation 𝑎𝑥+𝑏𝑦 = (𝑎, 𝑏) has infinitely many integer solutions in 𝑥 and 𝑦.

2.3. Greatest common divisor

21

2.3.1. Exercises. Exercise 2.3.1.1. Use the Euclidean algorithm to obtain gcd (143, 227), gcd (272, 1479), and gcd (54321, 9876). Exercise 2.3.1.2. Let 𝑎 and 𝑏 be two nonzero integers. Consider the set 𝑆 = {𝑎𝑥 + 𝑏𝑦 ∶ 𝑎𝑥 + 𝑏𝑦 > 0; 𝑥, 𝑦 ∈ ℤ}. (a) Show that 𝑆 must contain a smallest element 𝑑. (b) By the definition of 𝑆 and (a), there exist integers 𝑥 and 𝑦 for which 𝑑 = 𝑎𝑥 + 𝑏𝑦. Prove that 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏. (c) Finally, prove that 𝑑 = gcd (𝑎, 𝑏). This proves that the smallest positive value of 𝑎𝑥 + 𝑏𝑦 is the gcd of 𝑎 and 𝑏. Exercise 2.3.1.3. Find integers 𝑥 and 𝑦 such that 95𝑥 + 432𝑦 = 1. Show that there are infinitely many such integers. Exercise 2.3.1.4. Show that if (𝑎, 𝑏) = 1 and 𝑥0 , 𝑦0 is an integer solution of 𝑎𝑥 + 𝑏𝑦 = 1, then all integer solutions are of the form 𝑥 = 𝑥0 + 𝑏𝑘, 𝑦 = 𝑦0 − 𝑎𝑘, 𝑘 ∈ ℤ. Exercise 2.3.1.5. Let 𝑑 = gcd (𝑎, 𝑏). Show that the linear equation 𝑎𝑥+ 𝑏𝑦 = 𝑐 has integer solutions in 𝑥 and 𝑦 if and only if 𝑑|𝑐. Show, also, that if (𝑥0 , 𝑦0 ) is a particular integer solution of this equation, then all other integer solutions are given by 𝑥 = 𝑥0 +

𝑏 𝑎 𝑘, 𝑦 = 𝑦0 − 𝑘, 𝑘 ∈ ℤ. 𝑑 𝑑

Exercise 2.3.1.6. Find integers 𝑥, 𝑦, 𝑧 such that 35𝑥 + 55𝑦 + 77𝑧 = 1. Show that there are infinitely many such integers. Exercise 2.3.1.7. Let 𝑎 be a natural number > 1. If 𝑑|𝑛, then show that 𝑎𝑑 − 1 divides 𝑎𝑛 − 1. Exercise 2.3.1.8. Let 𝑎 be a natural number greater than 1. Show that (𝑎𝑚 − 1, 𝑎𝑛 − 1) = 𝑎(𝑚,𝑛) − 1.

22

2. Fundamental theorem of arithmetic

2.4. Prime numbers and unique factorization A natural number is said to be a prime number if it does not have any positive divisors other than 1 and itself. Henceforth, the letter 𝑝 will denote a prime number. The number 1 is not considered a prime. We start with the following theorem. Theorem 2.10. Every integer 𝑛 > 1 is either a prime number or a product of (not necessarily distinct) prime numbers. Proof. We use the second principle of induction as stated in Theorem 2.4. The theorem is clearly true for 𝑛 = 2. Let us suppose that the statement is true for every integer < 𝑛. If 𝑛 is prime, we are done. If not, then it has a divisor not equal to 1 and itself. Thus, 𝑛 = 𝑐𝑑 for some 1 < 𝑐 < 𝑛 and 1 < 𝑑 < 𝑛. By the induction hypothesis, both 𝑐 and 𝑑 are either primes or products of prime numbers and therefore, so is 𝑛 = 𝑐𝑑. Thus, the theorem is proved by the second principle of induction. □ Theorem 2.11 (Euclid’s Lemma). Let 𝑎 and 𝑏 be nonzero integers. If 𝑝 is a prime and 𝑝|𝑎𝑏, then 𝑝|𝑎 or 𝑝|𝑏. Proof. Suppose 𝑝 is a prime such that 𝑝|𝑎𝑏. If 𝑝 does not divide 𝑎, then 𝑝 and 𝑎 are relatively prime. By Exercise 2.3.1.2, there exist integers 𝑥 and 𝑦 such that 𝑎𝑥 + 𝑝𝑦 = 1. Thus, 𝑎𝑏𝑥 + 𝑝𝑏𝑦 = 𝑏. Since 𝑝|𝑎𝑏, we have, 𝑝|(𝑎𝑏𝑥 + 𝑝𝑏𝑦) and therefore, 𝑝|𝑏. □ We are now ready to prove the fundamental theorem of arithmetic. It is a curious historical fact that this theorem is first stated in Carl F. Gauss’s 1801 textbook Disquisitiones Arithmeticae and not in the works of Euclid. Theorem 2.12 (Fundamental theorem of arithmetic). Every integer 𝑛 ≥ 2 is either a prime or can be factored uniquely into a product of primes. Remark 2.13. Rearrangement of prime factors of a number is not a new factorization. Thus, while Theorem 2.10 states that any 𝑛 can be factorized into a product of primes in some way, the fundamental theorem of arithmetic makes a stronger assertion that 𝑛 can be factorized into such a product in only one way, up to a rearrangement of factors. Proof. If 𝑛 is a prime, there is nothing to prove. If not, we have proved the factorization of 𝑛 into a product of primes in Theorem 2.10. We are

2.4. Prime numbers and unique factorization

23

now left with proving uniqueness. Let us apply an inductive argument again. The theorem is clearly true for 𝑛 = 2. Let us suppose that it is true for all positive integers 2 ≤ 𝑘 < 𝑛. Suppose we have 𝑛 = 𝑝1 𝑝2 . . . 𝑝 𝑘 = 𝑞1 𝑞2 . . . 𝑞𝑠 , where 𝑝 𝑖 ’s and 𝑞𝑗 ’s are primes. Thus, 𝑝1 |𝑞1 (𝑞2 𝑞3 . . . 𝑞𝑠 ). By Theorem 2.11, either 𝑝1 = 𝑞1 or 𝑝1 |𝑞2 𝑞3 . . . 𝑞𝑠 . By a repeated application of Theorem 2.11, we see that 𝑝1 = 𝑞𝑗 , for some 1 ≤ 𝑗 ≤ 𝑠, say 𝑗 = 1. Thus, by cancellation, we get 𝑝2 𝑝3 . . . 𝑝 𝑘 = 𝑞2 𝑞3 . . . 𝑞𝑠 . Since 𝑝2 𝑝3 . . . 𝑝 𝑘 < 𝑛, by induction hypothesis, we get that 𝑘 = 𝑠 and for each 2 ≤ 𝑖 ≤ 𝑘, 𝑝 𝑖 equals 𝑞𝑗 for some 2 ≤ 𝑗 ≤ 𝑠. This proves the theorem for 𝑛. Therefore, by the second principle of induction, we have proved the theorem for all positive integers greater than or equal to 2. □ 2.4.1. Exercises. Exercise 2.4.1.1. If 𝑝 divides the product 𝑎1 𝑎2 . . . 𝑎𝑛 of non-zero integers 𝑎𝑖 , prove that 𝑝 divides at least one of the 𝑎𝑖 ’s. Exercise 2.4.1.2. Prove that if 𝑛 is a composite number, then 𝑛 has a prime factor not exceeding √𝑛. Exercise 2.4.1.3. Show that there are infinitely many primes. [Hint: Suppose there are only finitely many primes 𝑝1 , 𝑝2 , . . . , 𝑝𝑛 . What can you say about the prime factors of 𝑁 = 𝑝1 𝑝2 . . . 𝑝𝑛 + 1?] Exercise 2.4.1.4. Let 𝑝𝑛 denote the 𝑛th prime. Prove that 𝑝𝑛+1 ≤ 𝑝1 𝑝2 . . . 𝑝𝑛 − 1. Deduce that for all 𝑛 ≥ 1, 𝑛

𝑝𝑛 ≤ 22 . Finally, show that if 𝜋(𝑥) denotes the number of primes less than or equal to 𝑥, then 𝜋(𝑥) ≥ log log 𝑥 for all 𝑥 ≥ 2.

24

2. Fundamental theorem of arithmetic

Exercise 2.4.1.5. (a) Recall that for any real number 𝑎 such that |𝑎| < 1, (1 − 𝑎)−1 = 1 + 𝑎 + 𝑎2 + . . . . Prove that −1

1 1 ≤ ∏ (1 − ) 𝑛 𝑝 𝑛≤𝑥 𝑝≤𝑥 ∑

.

(b) Recall also that if |𝑎| < 1, then 𝑎2 𝑎3 + + ... . 2 3 Using this, prove that for any prime 𝑝, 1 1 1 − log(1 − ) − ≤ . 𝑝 𝑝 𝑝(𝑝 − 1) (c) Deduce that 1 1 − ∑ log(1 − ) ≤ ∑ + 1. 𝑝 𝑝 𝑝≤𝑥 𝑝≤𝑥 − log(1 − 𝑎) = 𝑎 +

(d) Using the fact that the series ∑𝑛≥1 series 1 ∑ 𝑝 𝑝

1 𝑛

diverges, prove that the

diverges. [Hint: Take logarithms of the inequality in (a)]

Chapter 3

Arithmetic functions

3.1. Multiplicative functions An arithmetic function 𝑓 ∶ ℕ → ℂ is a complex valued function defined on the set of natural numbers. An arithmetic function is said to be multiplicative if 𝑓(𝑚𝑛) = 𝑓(𝑚)𝑓(𝑛) whenever gcd (𝑚, 𝑛) = 1. A function is called completely multiplicative if 𝑓(𝑚𝑛) = 𝑓(𝑚)𝑓(𝑛) for all 𝑚, 𝑛 ∈ ℕ. Henceforth, 𝑛 will denote a natural number unless otherwise specified. Also, a divisor of 𝑛 will refer to a positive divisor of 𝑛. Examples 3.1. (1) 𝑓(𝑛) = 1 for all 𝑛 ∈ ℕ is a completely multiplicative function. (2) For a fixed 𝑎 ∈ ℕ, the function 𝑓(𝑛) = 𝑛𝑎 is completely multiplicative. (3) Let 𝜈(𝑛) denote the number of distinct prime divisors of 𝑛. 𝜈(𝑛) is not a multiplicative function. However, the function 𝑓(𝑛) = (−1)𝜈(𝑛) is a multiplicative function, but not completely multiplicative. We now state and prove an important theorem which helps us to construct new multiplicative functions out of existing ones. For a prime 𝑝, we write that 𝑝𝑎 ‖𝑛 if 𝑝𝑎 |𝑛 but 𝑝𝑎+1 ∤ 𝑛 and we say 𝑝𝑎 exactly divides 𝑛.

25

26

3. Arithmetic functions

Theorem 3.2. If 𝑓 is a multiplicative function, then the function 𝑔 defined by 𝑔(𝑛) ≔ ∑ 𝑓(𝑑) 𝑑∣𝑛

is also multiplicative. Here, the sum runs over all positive divisors of 𝑛. If 𝑛 = ∏ 𝑝𝑎 𝑝𝑎 ∣∣𝑛

is the unique factorization of 𝑛 into powers of distinct primes, then 𝑔(𝑛) = ∏ (1 + 𝑓(𝑝) + 𝑓(𝑝2 ) + ⋯ + 𝑓(𝑝𝑎 )). 𝑝𝑎 ‖𝑛

Proof. To begin with, we observe that if (𝑚, 𝑛) = 1, then any divisor 𝑑 of 𝑚𝑛 can be written as 𝑑1 𝑑2 , such that 𝑑1 |𝑚, 𝑑2 |𝑛 and (𝑑1 , 𝑑2 ) = 1. Thus, if (𝑚, 𝑛) = 1, then 𝑔(𝑚𝑛) = ∑ 𝑓(𝑑) = ∑ 𝑓(𝑑1 𝑑2 ). 𝑑|𝑚𝑛

𝑑=𝑑1 𝑑2 𝑑1 |𝑚 𝑑2 ∣𝑛

Since 𝑓 is multiplicative, the above is equal to ∑ 𝑓(𝑑1 )𝑓(𝑑2 ) = 𝑔(𝑚)𝑔(𝑛). 𝑑1 |𝑚 𝑑2 |𝑛

Thus, 𝑔 is multiplicative. It follows that (3.1)

𝑔(𝑛) = ∏ 𝑔(𝑝𝑎 ) = ∏ (1 + 𝑓(𝑝) + 𝑓(𝑝2 ) + ⋯ + 𝑓(𝑝𝑎 )). 𝑝𝑎 ‖𝑛

𝑝𝑎 ‖𝑛

□ Example 3.3. For any 𝑠 ∈ ℤ≥0 , let 𝜎𝑠 (𝑛) ≔ ∑ 𝑑 𝑠 . 𝑑|𝑛

In particular, for 𝑠 = 0, 𝜎0 (𝑛) is simply the number of positive divisors of 𝑛. 𝜎0 (𝑛) is also denoted as 𝑑(𝑛) or sometimes 𝜏(𝑛) in the existing literature. Since the function 𝑓(𝑛) = 𝑛𝑠 for any 𝑠 ∈ ℤ≥0 is multiplicative, by applying Theorem 3.2, we deduce that 𝜎𝑠 is a multiplicative function. Henceforth, we will denote 𝜎0 (𝑛) as 𝑑(𝑛) and 𝜎1 (𝑛) as 𝜎(𝑛).

3.2. Möbius function and Möbius inversion

27

3.1.1. Exercises. Exercise 3.1.1.1. For any 𝑛 ≥ 1, let 𝑛 = ∏ 𝑝𝑎 𝑝𝑎 ‖𝑛

be the unique factorization of 𝑛 into powers of distinct primes. Prove that 𝜎0 (𝑛) = ∏ (𝑎 + 1) 𝑝𝑎 ‖𝑛

and, for any 𝑠 ∈ ℕ, 𝑝𝑠(𝑎+1) − 1 . 𝑝𝑠 − 1 𝑝𝑎 ‖𝑛

𝜎𝑠 (𝑛) = ∏

Exercise 3.1.1.2. Show that 𝑑(𝑛) is odd if and only if 𝑛 is a perfect square. Exercise 3.1.1.3. Show that the product of the divisors of 𝑛 is equal to 𝑛𝑑(𝑛)/2 . Exercise 3.1.1.4. Given two arithmetic functions 𝑓 and 𝑔, their Dirichlet product or Dirichlet convolution 𝑓 ∗ 𝑔 is the function defined by 𝑛 𝑓 ∗ 𝑔(𝑛) = ∑ 𝑓(𝑑)𝑔 ( ) . 𝑑 𝑑|𝑛 Show that if 𝑓 and 𝑔 are multiplicative functions, so is their Dirichlet convolution 𝑓 ∗ 𝑔.

3.2. Möbius function and Möbius inversion A number 𝑛 is said to be squarefree if 𝑝2 ∤ 𝑛 for any prime 𝑝. In other words, a squarefree number has no square factors > 1. The Möbius function 𝜇 is defined as follows: 𝜇(1) = 1 and for 𝑛 > 1, we set (−1)𝜈(𝑛) 𝜇(𝑛) = { 0

if 𝑛 is squarefree, otherwise.

Here, the exponent 𝜈(𝑛) denotes the number of distinct prime divisors of a natural number 𝑛.

28

3. Arithmetic functions

It can be shown that 𝜇 is a multiplicative function (see Exercise 3.2.1.1). Moreover, by Theorem 3.2, we deduce that 𝑔(𝑛) ≔ ∑𝑑|𝑛 𝜇(𝑑) is also a multiplicative function. One can show (see Exercise 3.2.1.2) that 1 ∑ 𝜇(𝑑) = { 0 𝑑|𝑛

if 𝑛 = 1, otherwise.

These observations lead to Theorem 3.4. Theorem 3.4 (Möbius inversion formula). Let 𝑓 and 𝑔 be arithmetic functions. Then 𝑔(𝑛) = ∑ 𝑓(𝑑) 𝑑|𝑛

if and only if

𝑛 𝑓(𝑛) = ∑ 𝜇(𝑑)𝑔 ( ) . 𝑑 𝑑|𝑛

Proof. We observe that 𝑛 ∑ 𝜇(𝑑)𝑔 ( ) = ∑ 𝜇(𝑑) ∑ 𝑓(𝑒) 𝑑 𝑛 𝑑|𝑛 𝑑|𝑛 𝑒| 𝑑



𝜇(𝑑) ∑ 𝑓(𝑒)

𝑑∶ 𝑎𝑑=𝑛

𝑒∶ 𝑒𝑠=𝑎

=

= ∑ 𝜇(𝑑)𝑓(𝑒) 𝑑𝑒𝑠=𝑛

= ∑ 𝑓(𝑒) ∑ 𝜇(𝑑). 𝑒|𝑛

𝑛

𝑑| 𝑒

By Exercise 3.2.1.2, the inner sum equals 1 if 𝑒 = 𝑛 and 0 otherwise. Thus, ∑ 𝑓(𝑒) ∑ 𝜇(𝑑) = 𝑓(𝑛). 𝑒|𝑛

𝑛

𝑑| 𝑒

The converse is proved in a similar way and is left as an exercise.



A very important arithmetic function is the 𝜙-function of Euler, defined to be the number of positive integers less than or equal to 𝑛, which are coprime to 𝑛. That is, 𝜙(𝑛) ≔ ∑ 1. 𝑘≤𝑛 (𝑘,𝑛)=1

3.2. Möbius function and Möbius inversion

29

As we will see in some exercises below, the Möbius inversion formula gives us an elegant proof of the fact that 𝜙(𝑛) is multiplicative and also establishes a formula for 𝜙(𝑛). 3.2.1. Exercises. Exercise 3.2.1.1. Show that 𝜇 is a multiplicative function. Exercise 3.2.1.2. Prove that 1 ∑ 𝜇(𝑑) = { 0 𝑑|𝑛

if 𝑛 = 1, otherwise.

Exercise 3.2.1.3. Let Λ(𝑛) = log 𝑝 if 𝑛 is a power of the prime 𝑝 and 0 otherwise. Show that log 𝑛 = ∑ Λ(𝑑). 𝑑|𝑛

Deduce that

𝑛 Λ(𝑛) = ∑ 𝜇 ( ) log 𝑑, 𝑑 𝑑|𝑛

and therefore, Λ(𝑛) = − ∑ 𝜇(𝑑) log 𝑑, 𝑑|𝑛

where 𝜇 denotes the Möbius function. Exercise 3.2.1.4. Show that ∑𝑑|𝑛 𝜙(𝑑) = 𝑛. From this, deduce that 𝜇(𝑑) . 𝑑 𝑑|𝑛

𝜙(𝑛) = 𝑛 ∑

Exercise 3.2.1.5. Show that 𝜙 is a multiplicative function. Deduce that 𝜙(𝑛) = 𝑛 ∏ (1 − 𝑝|𝑛

1 ). 𝑝

Exercise 3.2.1.6. For any 𝑛 ≥ 1, prove that √𝑛 ≤ 𝜙(𝑛). If 𝑛 is composite, then prove that 𝜙(𝑛) ≤ 𝑛 − √𝑛. Exercise 3.2.1.7. Show that if 𝜙(𝑛) is odd, then 𝑛 = 1 or 𝑛 = 2.

30

3. Arithmetic functions

Exercise 3.2.1.8. Let 𝑓(𝑛) be the sum of the natural numbers ≤ 𝑛 which are coprime to 𝑛. (1) Show that 1 + 2 + ⋯ + 𝑛 = ∑ 𝑑𝑓(𝑛/𝑑). 𝑑|𝑛

Deduce that for 𝑛 > 1, 𝑓(𝑛) = 𝑛𝜙(𝑛)/2. (2) Show that 𝑓(𝑛) is divisible by 𝑛 for 𝑛 > 2. (3) Show that the numerator of the sum of the reciprocals of the natural numbers ≤ 𝑛 which are coprime to 𝑛 is divisible by 𝑛. Exercise 3.2.1.9. Prove that 𝜇2 (𝑑) 𝑛 = . 𝜙(𝑑) 𝜙(𝑛) 𝑑|𝑛



Exercise 3.2.1.10. Prove that ∑ 𝑑(𝑚)𝜙 ( 𝑚|𝑛

𝑛 ) = 𝜎(𝑛) 𝑚

and ∑ 𝜎(𝑚)𝜙 ( 𝑚|𝑛

𝑛 ) = 𝑛𝑑(𝑛). 𝑚

Exercise 3.2.1.11. If 𝑛 is a squarefree integer, prove that ∑ 𝜎(𝑑 𝑘−1 )𝜙(𝑑) = 𝑛𝑘 . 𝑑|𝑛

Exercise 3.2.1.12. Let 𝐽𝑟 (𝑛) = 𝑛𝑟 ∏ (1 − 𝑝|𝑛

For 𝑟 = 1, 𝐽𝑟 (𝑛) = 𝜙(𝑛). Prove that 𝑛𝑟 = ∑ 𝐽𝑟 (𝑑). 𝑑|𝑛

1 ). 𝑝𝑟

3.3. Greatest integer function

31

3.3. Greatest integer function Let 𝑥 be a real number. The greatest integer of 𝑥, denoted by [𝑥] is the largest integer ≤ 𝑥. In other words, it is the unique integer satisfying 𝑥 − 1 < [𝑥] ≤ 𝑥. For example, 5 [ ] = 2, [√2] = 1. 2 We note that [𝑥] = 𝑥 if and only if 𝑥 ∈ ℤ. The difference 𝑥 − [𝑥] is called the fractional part of 𝑥 and is denoted by {𝑥}. Clearly, 0 ≤ {𝑥} < 1. 3.3.1. Exercises. Exercise 3.3.1.1. (1) [𝑥 + 𝑛] = [𝑥] + 𝑛 for any 𝑛 ∈ ℤ. (2) For any 𝑥, 𝑦 ∈ ℝ, [𝑥] + [𝑦] ≤ [𝑥 + 𝑦]. (3) For any 𝑛 ∈ ℕ, [𝑥] 𝑥 ]. [ ]=[ 𝑛 𝑛 (4) For any 𝑛, 𝑚, 𝑘 ∈ ℕ, [

𝑛𝑚 𝑚 ] ≥ 𝑛[ ]. 𝑘 𝑘

Exercise 3.3.1.2. Let 𝑛 be a positive integer and 𝑝 be a prime. (1) Prove that for any 𝑘 ≥ 1, exactly [𝑛/𝑝𝑘 ] multiples of 𝑝𝑘 occur in 𝑛!. (2) Conclude that the highest power of 𝑝 that divides 𝑛! is ∞

∑[ 𝑘=1

𝑛 ]. 𝑝𝑘

(3) Prove that ∞

𝑛! = ∏ 𝑝

∑𝑘=1 [

𝑛 𝑝𝑘

]

.

𝑝≤𝑛

Exercise 3.3.1.3. Find the highest power of 10 dividing 2000!.

32

3. Arithmetic functions

Exercise 3.3.1.4. Let 𝑛, 𝑟 ∈ ℕ and 1 ≤ 𝑟 < 𝑛. Let (𝑛𝑟) denote the binomial coefficient defined as 𝑛 𝑛! . ( )= 𝑟 𝑟! (𝑛 − 𝑟)! (a) Find the highest power of 𝑝 dividing 𝑟! (𝑛 − 𝑟)!. (b) For each prime factor 𝑝 of 𝑟! (𝑛 − 𝑟)!, prove that ∑[ 𝑘≥1

𝑛 𝑟 𝑛−𝑟 ] ≥ ∑ [ 𝑘] + ∑ [ 𝑘 ]. 𝑝𝑘 𝑝 𝑝 𝑘≥1 𝑘≥1

Exercise 3.3.1.5. Prove that for a positive integer 𝑟, the product of any 𝑟 consecutive integers is divisible by 𝑟!. Exercise 3.3.1.6. Let 𝑓 and 𝑔 be arithmetic functions such that 𝑔(𝑛) = ∑ 𝑓(𝑑). 𝑑|𝑛

Then, for any 𝑥 ≥ 1, show that 𝑥 ∑ 𝑔(𝑛) = ∑ 𝑓(𝑛) [ ] . 𝑛 𝑛≤𝑥 𝑛≤𝑥 Exercise 3.3.1.7. Prove that for any 𝑘 ≥ 0, 𝑥 ∑ 𝜎 𝑘 (𝑛) = ∑ 𝑛𝑘 [ ] . 𝑛 𝑛≤𝑥 𝑛≤𝑥 In particular, we get 𝑥 ∑ 𝑑(𝑛) = ∑ [ ] . 𝑛 𝑛≤𝑥 𝑛≤𝑥 Exercise 3.3.1.8. Prove that 𝑥 ∑ 𝜇(𝑛) [ ] = 1. 𝑛 𝑛≤𝑥 Deduce that | | | ∑ 𝜇(𝑛) | ≤ 1 for all 𝑥 ≥ 1. | | |𝑛≤𝑥 𝑛 |

3.4. The big-O and little-o notations

33

3.4. The big-O and little-o notations Definition 3.5. For a complex valued function 𝑓(𝑥) and a real valued function 𝑔(𝑥) such that 𝑔(𝑥) > 0 for all 𝑥 ≥ 𝑥0 , we write 𝑓(𝑥) = O(𝑔(𝑥)) or 𝑓(𝑥) ≪ 𝑔(𝑥) if there exists a constant 𝐶 > 0 such that |𝑓(𝑥)| ≤ 𝐶𝑔(𝑥) for all 𝑥 ≥ 𝑥0 . 𝐶 is called the implied constant. We write 𝑓(𝑥) = O𝑎 (𝑔(𝑥)) or 𝑓(𝑥) ≪𝑎 𝑔(𝑥) if the above implied constant 𝐶 depends on a parameter 𝑎. We may also denote the above as 𝑔(𝑥) ≫ 𝑓(𝑥) or as 𝑔(𝑥) ≫𝑎 𝑓(𝑥) as the case may be. Definition 3.6. For functions 𝑓(𝑥) and 𝑔(𝑥) as above, we write 𝑓(𝑥) = o(𝑔(𝑥)) if 𝑓(𝑥) lim = 0. 𝑥→∞ 𝑔(𝑥) Definition 3.7. We write 𝑓(𝑥) ∼ 𝑔(𝑥) and say that 𝑓(𝑥) is asymptotic to 𝑔(𝑥) if 𝑓(𝑥) lim = 1. 𝑥→∞ 𝑔(𝑥) 3.4.1. Exercises. Exercise 3.4.1.1. Let 𝜈(𝑛) denote the number of distinct prime divisors of 𝑛. Prove that 𝜈(𝑛) = O(log 𝑛). More precisely, log 𝑛 𝜈(𝑛) ≤ . log 2 Exercise 3.4.1.2. Let 𝑓(𝑥) be a monotonically decreasing, continuous function defined on [𝑎 − 1, 𝑏 + 1]. Prove that 𝑏



𝑏+1

𝑏

𝑓(𝑡)𝑑𝑡 > ∑ 𝑓(𝑗) > ∫

𝑎−1

𝑗=𝑎

𝑓(𝑡)𝑑𝑡.

𝑎

Deduce that 𝑏

𝑏

∑ 𝑓(𝑗) = ∫ 𝑓(𝑡)𝑑𝑡 + O(|𝑓(𝑎 − 1)| + |𝑓(𝑏)|). 𝑗=𝑎

𝑎

Exercise 3.4.1.3. Show that 𝑛

1 = log 𝑛 + O(1). 𝑗 𝑗=1 ∑

34

3. Arithmetic functions

Exercise 3.4.1.4. (a) Let 𝑝𝛼 ‖𝑛, that is, let 𝑝𝛼 be the highest power of a prime 𝑝 dividing 𝑛. Show that for any 𝜖 > 0, 𝛼+1 𝑑(𝑛) = ∏ 𝛼𝜖 . 𝑛𝜖 𝑝 𝑝𝛼 ‖𝑛 (b) Show that 𝑑(𝑛) 𝛼+1 ≤ ∏ . 𝑛𝜖 𝑝𝛼𝜖 𝑝𝛼 ‖𝑛 𝑝 0, where 𝐶(𝜖) = ∏ (1 + 𝑝 0. Exercise 3.4.1.6. Show that 𝜎(𝑛) = O(𝑛 log 𝑛), where 𝜎(𝑛) or 𝜎1 (𝑛), as defined in Example 3.3, refers to the sum of the (positive) divisors of 𝑛. Exercise 3.4.1.7. Show that 𝜎 𝑘 (𝑛) = O(𝑛𝑘 log 𝑛), where 𝜎 𝑘 (𝑛) is as defined in Example 3.3. Exercise 3.4.1.8. Show that for any 𝑐 > 0, log 𝑥 ≤

𝑥𝑐 𝑐𝑒

for any 𝑥 ≥ 1. Thus, log 𝑥 = O(𝑥𝑐 ) for any 𝑐 > 0. Exercise 3.4.1.9. Show that for real numbers 𝐴 and 𝐵 and a positive integer 𝑀, |(𝐴 + 𝐵)|𝑀 ≤ 2𝑀 max(|𝐴|𝑀 , |𝐵|𝑀 ) . Thus, (𝐴 + 𝐵)𝑀 ≪𝑀 |𝐴|𝑀 + |𝐵|𝑀 .

3.5. Averages of arithmetical functions

35

3.5. Averages of arithmetical functions Many arithmetic functions of interest to us exhibit rapid fluctuations. For example, 𝑑(𝑛) does not seem to follow any smooth pattern as 𝑛 increases. It takes the value 2 infinitely often and right next to a prime may lie a composite number with a large number of prime divisors and therefore a large number of divisors. For arithmetic functions of this nature, we study the arithmetic mean 𝑛

1 ∑ 𝑓(𝑘). 𝑛 𝑘=1 Definition 3.8. Let 𝑓 be an arithmetic, real valued function and let 𝑔 be a monotonically increasing function on ℝ. We say that the average order of 𝑓(𝑛) is 𝑔(𝑛) if lim

𝑥→∞

∑𝑛≤𝑥 𝑓(𝑛)

= 1,

𝑥𝑔(𝑥)

that is, if ∑ 𝑓(𝑛) ∼ 𝑥𝑔(𝑥). 𝑛≤𝑥

For example, let 𝑑(𝑛) be the number of divisors of 𝑛. The average order of 𝑑(𝑛) is log 𝑛. To see this, we observe, by Exercise 3.3.1.7, that ∑ 𝑑(𝑛) = ∑ ∑ 1 𝑛≤𝑥

𝑛≤𝑥 𝑑|𝑛

= ∑

∑ 1

𝑛≤𝑥 𝑑∶ 𝑑𝑒=𝑛

= ∑ ∑1 𝑑≤𝑥 𝑒≤ 𝑥 𝑑

𝑥 = ∑ [ ]. 𝑑 𝑑≤𝑥

36

3. Arithmetic functions

Now, [𝑥] = 𝑥 − {𝑥} = 𝑥 + O(1). Thus, 𝑥 ∑ 𝑑(𝑛) = ∑ [ ] 𝑑 𝑛≤𝑥 𝑑≤𝑥 𝑥 + O( ∑ 1) 𝑑 𝑑≤𝑥 𝑑≤𝑥

= ∑

1 + O(𝑥) 𝑑 𝑑≤𝑥

=𝑥∑

= 𝑥(log 𝑥 + O(1)) + O(𝑥) = 𝑥 log 𝑥 + O(𝑥). From this, we deduce that lim (

∑𝑛≤𝑥 𝑑(𝑛)

𝑥→∞

𝑥 log 𝑥

) = lim (1 + O( 𝑥→∞

1 )) = 1. log 𝑥

Thus, the average order of 𝑑(𝑛) is log 𝑛. 3.5.1. Excercises. Exercise 3.5.1.1. Let 𝑓(𝑥) be a monotonically decreasing and continuous function on [1, ∞). Prove that, for any 𝑥 ≥ 1, ∞

∑ 𝑓(𝑛) ≤ ∫ 𝑓(𝑡)𝑑𝑡. [𝑥]

𝑛>𝑥

Deduce that 1 1 = 𝜁(2) + O( ) , 2 𝑥 𝑛 𝑛≤𝑥 ∑

where ∞

1 𝑠 𝑛 𝑛=1

𝜁(𝑠) = ∑

for 𝑠 > 1.

Exercise 3.5.1.2. Show that the average order of 𝜎(𝑛) is 𝜁(2) 𝑛. 2

3.5. Averages of arithmetical functions

37

For this exercise, one would be tempted to follow a procedure similar to the one for 𝑑(𝑛). Following this, we get 𝑥 ∑ 𝜎(𝑛) = ∑ 𝑑 [ ] 𝑑 𝑛≤𝑥 𝑑≤𝑥 = ∑ 𝑑( 𝑑≤𝑥

𝑥 + O(1)) 𝑑

= 𝑥 ∑ 1 + ∑ O(𝑑) 𝑑≤𝑥

𝑑≤𝑥

= 𝑥[𝑥] + O(𝑥2 ) = 𝑥(𝑥 + O(1)) + O(𝑥2 ) = O(𝑥2 ). This calculation does not give us the average order of 𝜎(𝑛). So, we reexamine ∑𝑛≤𝑥 𝜎(𝑛) as follows. Solution. ∑

∑ 𝑒= ∑ ∑𝑒 𝑑≤𝑥 𝑒≤ 𝑥

𝑛≤𝑥 𝑑∶𝑑𝑒=𝑛

𝑑

𝑥 = ∑ {1 + 2 + ⋯ + [ ]} 𝑑 𝑑≤𝑥 𝑥 1 𝑥 [ ] ([ ] + 1) 2 𝑑 𝑑 𝑑≤𝑥

= ∑ =

1 𝑥 𝑥 ∑ ( + O(1)) ( + O(1)) 2 𝑑≤𝑥 𝑑 𝑑

=

𝑥2 𝑥 1 ∑ [ 2 + O( )] 2 𝑑≤𝑥 𝑑 𝑑

=

1 1 𝑥2 ∑ + O(𝑥 ∑ ) 2 𝑑≤𝑥 𝑑 2 𝑑 𝑑≤𝑥

=

1 𝑥2 (𝜁(2) − ∑ 2 ) + O(𝑥 log 𝑥) 2 𝑑 𝑑>𝑥

𝑥2 1 (𝜁(2) + O( )) + O(𝑥 log 𝑥) 2 𝑥 𝑥2 𝜁(2) + O(𝑥 log 𝑥). = 2 =

38

3. Arithmetic functions

This proves that the average order of 𝜎(𝑛) is 𝜁(2) 𝑛. 2 (Note: At first glance, it may appear strange that 𝜎(𝑛) has average order (𝜁(2)/2)𝑛, since 𝜎(𝑛) > 𝑛 for all 𝑛, where as 𝜁(2)/2 < 1. However, the average order of a function 𝑓(𝑛) measures its average value for all 𝑛 ≤ 𝑥, and therefore, for 𝜎(𝑛), we can expect it to be less than 𝑥.) □ Exercise 3.5.1.3. Show that 𝑑(𝑛) 2 ≪ log 𝑥. 𝑛 𝑛≤𝑥 ∑

Exercise 3.5.1.4. Show that 3

∑ 𝑑(𝑛)2 ≪ 𝑥 log 𝑥. 𝑛≤𝑥

Exercise 3.5.1.5. (a) Let 𝑓 and 𝑔 be two arithmetic functions such that 𝑔(𝑛) = ∑ 𝑓(𝑑) 𝑑|𝑛 ∞



𝑓(𝑛)

𝑔(𝑛)

and the series ∑𝑛=1 𝑛𝑠 and ∑𝑛=1 𝑛𝑠 are absolutely convergent for 𝑠 > 1. Show that, for any 𝑠 > 1, ∞



𝑓(𝑛) 𝑔(𝑛) = ∑ 𝑠 . 𝑠 𝑛 𝑛 𝑛=1 𝑛=1

𝜁(𝑠) ∑

(b) Using (a), prove that if 𝑠 > 1, ∞

𝜇(𝑛) = 1. 𝑛𝑠 𝑛=1

𝜁(𝑠) ∑ (c) Show that ∑ 𝜙(𝑛) = 𝑛≤𝑥

𝑥2 + O(𝑥 log 𝑥). 2𝜁(2) 𝑛

[Hint: 𝜙(𝑛) = ∑𝑑|𝑛 𝜇(𝑑) 𝑑 .] (d) Conclude that the average order of 𝜙(𝑛) is

𝑛 . 2𝜁(2)

3.6. Technique of partial summation

39

Exercise 3.5.1.6. Let 𝑛 and 𝑘 be positive integers and 𝑘 > 1. Then, 𝑛 is said to 𝑘-free if it is not divisible by the 𝑘th power of any prime number. Let 𝑠𝑘 (𝑛) = 1 if 𝑛 is 𝑘-free and 0 otherwise. Show that ∑ 𝑠𝑘 (𝑛) = 𝑐 𝑘 𝑥 + O(𝑥1/𝑘 ), 𝑛≤𝑥

where ∞

𝜇(𝑛) . 𝑛𝑘 𝑛=1

𝑐𝑘 = ∑

3.6. Technique of partial summation In this section, we learn techniques which help us to estimate sums of 𝑥 the form ∑𝑛≤𝑥 𝑓(𝑛) by ∫1 𝑓(𝑡)𝑑𝑡 for suitable functions 𝑓. We start with the following lemma. Lemma 3.9. Let 𝑎 and 𝑏 be natural numbers with 𝑎 < 𝑏 and let 𝑓(𝑡) be a monotone function on [𝑎, 𝑏]. Then, 𝑏

𝑏

min(𝑓(𝑎), 𝑓(𝑏)) ≤ ∑ 𝑓(𝑘) − ∫ 𝑓(𝑡)𝑑𝑡 ≤ max(𝑓(𝑎), 𝑓(𝑏)). 𝑘=𝑎

𝑎

Proof. If 𝑓 is monotonically increasing on [𝑎, 𝑏], we have, 𝑘+1

𝑓(𝑘) ≤ ∫

𝑓(𝑡)𝑑𝑡 for 𝑘 = 𝑎, 𝑎 + 1, . . . , 𝑏 − 1

𝑘

and 𝑘

𝑓(𝑘) ≥ ∫

𝑓(𝑡)𝑑𝑡 for 𝑘 = 𝑎 + 1, . . . , 𝑏.

𝑘−1

Thus, 𝑏

𝑏−1

∑ 𝑓(𝑘) = ∑ 𝑓(𝑘) + 𝑓(𝑏) (3.2)

𝑘=𝑎

𝑘=𝑎 𝑘+1

𝑏−1

≤ ∑ (∫ 𝑘=𝑎

𝑘

𝑏

𝑓(𝑡)𝑑𝑡) + 𝑓(𝑏) = ∫ 𝑓(𝑡)𝑑𝑡 + 𝑓(𝑏). 𝑎

40

3. Arithmetic functions

Similarly, 𝑏

𝑏

∑ 𝑓(𝑘) = 𝑓(𝑎) + ∑ 𝑓(𝑘) 𝑘=𝑎

(3.3)

𝑘=𝑎+1 𝑘

𝑏

≥ 𝑓(𝑎) + ∑ (∫

𝑏

𝑓(𝑡)𝑑𝑡) = 𝑓(𝑎) + ∫ 𝑓(𝑡)𝑑𝑡.

𝑘−1

𝑘=𝑎+1

𝑎

By equations (3.2) and (3.3), if 𝑓 is monotonically increasing on [𝑎, 𝑏], then 𝑏

𝑏

𝑓(𝑎) ≤ ∑ 𝑓(𝑘) − ∫ 𝑓(𝑡)𝑑𝑡 ≤ 𝑓(𝑏). 𝑘=𝑎

𝑎

A similar inequality for monotonically decreasing functions completes the proof of the lemma. □ We now describe a partial summation formula first proved by Niels Henrik Abel. We do so in three stages. Proposition 3.10. Let 𝑢 ∶ ℕ → ℂ and 𝑓 ∶ ℕ → ℂ be arithmetic functions and let 𝑎 and 𝑏 be natural numbers with 𝑎 < 𝑏. We define 𝑈(𝑡) ≔ ∑ 𝑢(𝑛). 𝑛≤𝑡

Then, 𝑏−1

𝑏

∑ 𝑢(𝑛)𝑓(𝑛) = 𝑈(𝑏)𝑓(𝑏)−𝑈(𝑎)𝑓(𝑎+1)− ∑ 𝑈(𝑛)(𝑓(𝑛+1)−𝑓(𝑛)). 𝑛=𝑎+1

𝑛=𝑎+1

Proof. We have, 𝑏

𝑏

∑ 𝑢(𝑛)𝑓(𝑛) = ∑ (𝑈(𝑛) − 𝑈(𝑛 − 1)) 𝑓(𝑛) 𝑛=𝑎+1 𝑏

𝑛=𝑎+1 𝑏−1

= ∑ 𝑈(𝑛)𝑓(𝑛) − ∑ 𝑈(𝑛)𝑓(𝑛 + 1) 𝑛=𝑎+1

𝑛=𝑎 𝑏−1

= 𝑈(𝑏)𝑓(𝑏) − 𝑈(𝑎)𝑓(𝑎 + 1) − ∑ 𝑈(𝑛)(𝑓(𝑛 + 1) − 𝑓(𝑛)). 𝑛=𝑎+1

□ The above proposition can be generalized to intervals where the end points may not be integers.

3.6. Technique of partial summation

41

Proposition 3.11. Let 𝑥, 𝑦 ∈ ℝ with 0 ≤ 𝑦 < 𝑥. Let 𝑢 ∶ ℕ → ℂ and let 𝑓 be a function which has a continuous derivative on [𝑦, 𝑥]. Then, 𝑥

∑ 𝑢(𝑛)𝑓(𝑛) = 𝑈(𝑥)𝑓(𝑥) − 𝑈(𝑦)𝑓(𝑦) − ∫ 𝑈(𝑡)𝑓′ (𝑡)𝑑𝑡. 𝑦

𝑦 0, put 𝑥 = −𝐵/𝐴.]

3.7. The Cauchy–Schwarz and Hölder inequalities

47

Exercise 3.7.1.2. Let 𝑝 and 𝑞 be as in Theorem 3.14. Let 𝑓 and 𝑔 be complex-valued functions on [𝑎, 𝑏] such that 0 < 𝐹, 𝐺 < ∞, where 𝑏

𝑏 𝑝

𝐹 = ∫ |𝑓(𝑥)| 𝑑𝑥, 𝐺 = ∫ |𝑔(𝑥)|𝑞 𝑑𝑥. 𝑎

𝑎

(a) Use Proposition 3.13 to show that 𝑏

∫ ( 𝑎

1/𝑝

|𝑓(𝑥)|𝑝 ) 𝐹

1/𝑞

(

|𝑔(𝑥)|𝑞 ) 𝐺

𝑑𝑥 ≤ 1.

(b) Show that 𝑏 | 𝑏 | |∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥| ≤ ∫ |𝑓(𝑥)𝑔(𝑥)|𝑑𝑥. | | | 𝑎 | 𝑎

(c) From (a) and (b), deduce that 𝑏

𝑏

1/𝑝

| ∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥| ≤ (∫ |𝑓(𝑥)|𝑝 𝑑𝑥) | | 𝑎

𝑎

1/𝑞

𝑏 𝑞

(∫ |𝑔(𝑥)| 𝑑𝑥) 𝑎

.

Chapter 4

Introduction to congruence arithmetic

With a view to keep this book as self-contained as possible, in this chapter, we review fundamental concepts in congruence arithmetic and important prerequisites in the study of additive number theory.

4.1. Definition and basic properties of congruences In his book Disquisitiones Arithmeticae, Gauss introduced the notion of congruences, which provides a natural and convenient approach to address problems related to divisibility of integers. Definition 4.1. Let 𝑎, 𝑏 and 𝑚 be integers with 𝑚 > 0. We say that 𝑎 is congruent to 𝑏 modulo 𝑚 and write 𝑎 ≡ 𝑏 (mod 𝑚) if 𝑚 divides 𝑎 − 𝑏. If 𝑚 ∤ (𝑎 − 𝑏), we write 𝑎 ≢ 𝑏 (mod 𝑚). Thus, 𝑎 ≡ 0 (mod 𝑚) if and only if 𝑚|𝑎. Let 𝑛 be a fixed positive integer. By the division algorithm, we know that for any 𝑎 ∈ ℤ, there exists a unique integer 𝑟 such that 0 ≤ 𝑟 < 𝑛 and 𝑎 ≡ 𝑟 (mod 𝑛). We call 𝑟 the residue (or reduced residue) of 𝑎 modulo 𝑛. The set {𝑥 ∈ ℤ ∶ 𝑥 ≡ 𝑟 (mod 𝑛)} is called the residue class or congruence class 𝑟 (mod 𝑛). A set of 𝑛 integers, one belonging to each residue class 𝑟 (mod 𝑛), 0 ≤ 𝑟 < 𝑛 is called a complete residue system modulo 𝑛. 49

50

4. Introduction to congruence arithmetic Proposition 4.2 describes important properties of congruences.

Proposition 4.2. Let 𝑎, 𝑏, 𝑐, 𝑑 and 𝑛 be integers with 𝑛 > 0. Then (i) 𝑎 ≡ 𝑎 (mod 𝑛). (ii) If 𝑎 ≡ 𝑏 (mod 𝑛), then 𝑏 ≡ 𝑎 (mod 𝑛). (iii) If 𝑎 ≡ 𝑏 (mod 𝑛) and 𝑏 ≡ 𝑐 (mod 𝑛), then 𝑎 ≡ 𝑐 (mod 𝑛). (iv) If 𝑎 ≡ 𝑏 (mod 𝑛) and 𝑐 ≡ 𝑑 (mod 𝑛), then 𝑎±𝑐 ≡ 𝑏±𝑑 (mod 𝑛) and 𝑎𝑐 ≡ 𝑏𝑑 (mod 𝑛). (v) If 𝑎 ≡ 𝑏 (mod 𝑛), then 𝑎𝑘 ≡ 𝑏𝑘 (mod 𝑛) for any positive integer 𝑘. More generally, let 𝑓(𝑥) be a polynomial with integer coefficients. If 𝑎 ≡ 𝑏 (mod 𝑛), then 𝑓(𝑎) ≡ 𝑓(𝑏) (mod 𝑛). (vi) If 𝑐 > 0, then 𝑎 ≡ 𝑏 (mod 𝑛) if and only if 𝑎𝑐 ≡ 𝑏𝑐 (mod 𝑛𝑐). Proof. The above properties can be proved easily and are left to the reader as an exercise. □ Proposition 4.3. If 𝑎𝑐 ≡ 𝑏𝑐 (mod 𝑛) and if 𝑑 = (𝑐, 𝑛), then 𝑎 ≡ 𝑏 (mod

𝑛 ). 𝑑

Proof. 𝑎𝑐 ≡ 𝑏𝑐 (mod 𝑛) ⇒ 𝑛 ∣ 𝑐(𝑎 − 𝑏). By Proposition 4.2 (vi), the above implies 𝑛 𝑐 ∣ (𝑎 − 𝑏). 𝑑 𝑑 𝑛

𝑐

Since ( 𝑑 , 𝑑 ) = 1, we get

𝑛 𝑑

∣ (𝑎 − 𝑏). Thus, 𝑎 ≡ 𝑏 (mod

𝑛 ). 𝑑



Proposition 4.4. 𝑎 ≡ 𝑏 (mod 𝑛) if and only if 𝑎 and 𝑏 have the same residue modulo 𝑛. Proof. Let 𝑟1 and 𝑟2 be the residues of 𝑎 and 𝑏 modulo 𝑛 respectively. Clearly, 𝑎 ≡ 𝑏 (mod 𝑛) if and only if 𝑟1 − 𝑟2 ≡ 0 (mod 𝑛). Since 0 ≤ |𝑟1 − 𝑟2 | < 𝑛, we observe that 𝑟1 − 𝑟2 ≡ 0 (mod 𝑛) if and only 𝑟1 − 𝑟2 = 0. This proves the proposition. □

4.2. Congruence powers and Euler’s theorem

51

4.1.1. Exercises. Exercise 4.1.1.1. What is the residue of 15 + 25 + . . . 1005 (mod 4)? Exercise 4.1.1.2. Prove that a positive integer is divisible by 9 if and only if the sum of its digits is divisible by 9. Exercise 4.1.1.3. If 𝑎 ≡ 𝑏 (mod 𝑛), then show that (𝑎, 𝑛) = (𝑏, 𝑛). Exercise 4.1.1.4. If 𝑝 is a prime such that 𝑛 < 𝑝 < 2𝑛, show that (

2𝑛 ) ≡ 0 (mod 𝑝). 𝑛

Exercise 4.1.1.5. Let 𝑚0 > 1. For any integer 𝑥0 , show that there exists 𝑥1 ∈ ℤ such that |𝑥1 | ≤ 𝑚0 /2 and 𝑥0 ≡ 𝑥1 (mod 𝑚0 ).

4.2. Congruence powers and Euler’s theorem Let 𝑎 be an integer such that (𝑎, 𝑛) > 1. Can we have 𝑎𝑘 ≡ 1 (mod 𝑛) for some positive power 𝑘? If this were possible, then we would have an integer 𝑦 such that 𝑎𝑘 − 𝑛𝑦 = 1. But, since (𝑎, 𝑛) > 1, this would imply that 𝑎𝑘 −𝑛𝑦 would have a divisor greater than 1, which is a contradiction. Hence, for 𝑎𝑘 to be congruent to 1 mod 𝑛, it is necessary that (𝑎, 𝑛) = 1. We now state a fundamental theorem which is named after L. Euler in the literature. Although the first published proof of this theorem is attributed to Euler, a similar proof had been presented by Gottfried W. Leibniz in an unpublished manuscript in 1683. Theorem 4.5. Let (𝑎, 𝑛) = 1. Then we have 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛). Proof. Let 𝑏1 , 𝑏2 , . . . , 𝑏𝜙(𝑛) be the 𝜙(𝑛) numbers lying between 1 and 𝑛 − 1 which are coprime to 𝑛. Since (𝑎, 𝑛) = 1, we have (𝑏𝑖 𝑎, 𝑛) = 1 for all 1 ≤ 𝑖 ≤ 𝜙(𝑛). If 𝑏𝑖 𝑎 ≡ 𝑐 (mod 𝑛), then by Exercise 4.1.1.3, (𝑐, 𝑛) = 1. Thus, each number in the list 𝑏1 𝑎, 𝑏2 𝑎, . . . , 𝑏𝜙(𝑛) 𝑎 is congruent to some number in the list 𝑏1 , 𝑏2 , . . . , 𝑏𝜙(𝑛) modulo 𝑛. Suppose 𝑏𝑖 𝑎 ≡ 𝑏𝑗 𝑎 (mod 𝑛) for some 1 ≤ 𝑏𝑖 ≤ 𝑏𝑗 ≤ 𝜙(𝑛).

52

4. Introduction to congruence arithmetic

Since (𝑎, 𝑛) = 1, by Proposition 4.3, 𝑏𝑗 ≡ 𝑏𝑖 (mod 𝑛) But, 0 ≤ |𝑏𝑗 − 𝑏𝑖 | < 𝑛. So, 𝑛|(𝑏𝑗 − 𝑏𝑖 ) if and only if 𝑏𝑖 = 𝑏𝑗 . Thus, the residues of 𝑏1 𝑎, 𝑏2 𝑎, . . . , 𝑏𝜙(𝑛) 𝑎 (mod 𝑛) are distinct (mod 𝑛). Thus, by Proposition 4.4, 𝑏1 𝑏2 . . . 𝑏𝜙(𝑛) ≡ 𝑎𝜙(𝑛) 𝑏1 𝑏2 . . . 𝑏𝜙(𝑛) (mod 𝑛). Since (𝑏1 𝑏2 . . . 𝑏𝜙(𝑛) , 𝑛) = 1, we deduce, 𝑎𝜙(𝑛) ≡ 1 (mod 𝑛). □ Theorem 4.5 has an important corollary. Corollary 4.6 (Fermat’s little theorem). If 𝑝 is a prime and (𝑎, 𝑝) = 1, then 𝑎𝑝−1 ≡ 1 (mod 𝑝). Proof. Since 𝜙(𝑝) = 𝑝 − 1, this follows from Theorem 4.5.



The above result is referred to as Fermat’s little theorem as it had been observed by Fermat in a letter to his friend Frénicle de Bessy in 1640. In his characteristic style, Fermat provided some context to his observation, but did not give a proof out of “fear of going on for too long”.1 4.2.1. Exercises. Exercise 4.2.1.1. Prove that 𝑎560 ≡ 1 (mod 561) for every 𝑎 coprime to 561. Exercise 4.2.1.2. Prove that 22225555 + 55552222 is divisible by 7. Exercise 4.2.1.3. In the following exercise, we will prove that 𝜙(𝑛) is a multiplicative function using the theory of congruences. That is, we prove that if (𝑚, 𝑛) = 1, then 𝜙(𝑚𝑛) = 𝜙(𝑚)𝜙(𝑛). (a) Given integers 𝑎, 𝑚, 𝑛 show that (𝑎, 𝑚𝑛) = 1 if and only if (𝑎, 𝑚) = (𝑎, 𝑛) = 1. Thus, for positive integers 𝑚 and 𝑛, {1 ≤ 𝑎 ≤ 𝑚𝑛 ∶ (𝑎, 𝑚𝑛) = 1} = {1 ≤ 𝑎 ≤ 𝑚𝑛 ∶ (𝑎, 𝑚) = (𝑎, 𝑛) = 1}. 1

See [51]

4.3. Linear congruence equations

53

(b) Let 𝑚, 𝑛 > 1. Let us arrange all integers between 1 and 𝑚𝑛 in 𝑚 columns and 𝑛 rows as follows: 1 𝑚+1 ⋮ (𝑛 − 1)𝑚 + 1

2 𝑚+2 ⋮ (𝑛 − 1)𝑚 + 2

... 𝑟 ... 𝑚+𝑟 ⋱ ⋮ . . . (𝑛 − 1)𝑚 + 𝑟

... 𝑚 . . . 2𝑚 ⋱ ⋮ . . . 𝑛𝑚

where 1 ≤ 𝑟 ≤ 𝑚. We denote the 𝑟th column as 𝐶𝑟 . Show that all the numbers in 𝐶𝑟 are coprime to 𝑚 if and only if 𝑟 is coprime to 𝑚. This tells us that only 𝜙(𝑚) columns contain numbers coprime to 𝑚 and every number in such a column is coprime to 𝑚. (c) Let (𝑚, 𝑛) = 1. Show that no two numbers in 𝐶𝑟 are congruent (mod 𝑛). Thus, the elements in 𝐶𝑟 form a complete residue system (mod 𝑛). (d) Show that each 𝐶𝑟 contains 𝜙(𝑛) elements coprime to 𝑛. (e) Finally, deduce that if (𝑚, 𝑛) = 1, then 𝜙(𝑚𝑛) = 𝜙(𝑚)𝜙(𝑛). Exercise 4.2.1.4. For a positive integer 𝑛, show that 𝜙(𝑛) = 𝑛 ∏ (1 − 𝑝∣𝑛

1 ), 𝑝

where the product runs over all the distinct primes dividing 𝑛.

4.3. Linear congruence equations In high school algebra, we typically learn to solve equations of the form 𝑓(𝑥) = 0, where 𝑓(𝑥) is a polynomial with integer coefficients. We can also consider congruence equations of the form 𝑓(𝑥) ≡ 0 (mod 𝑚) and ask how many solutions such an equation has. We first need to understand what we mean by a solution of a congruence equation. If an integer 𝑥 satisfies the equation 𝑓(𝑥) ≡ 0 (mod 𝑚), so does any integer 𝑦 ≡ 𝑥 (mod 𝑚), by Proposition 4.2(v). In modular arithmetic, we do not count solutions which belong to the same residue class as distinct. Henceforth, the number of solutions of 𝑓(𝑥) ≡ 0 (mod 𝑚) will refer to the number of incongruent solutions of this equation, or in other words, the number of solutions of this equation among the residues {0, 1, 2, . . . , 𝑚 − 1}.

54

4. Introduction to congruence arithmetic

In this section, we consider polynomials 𝑓(𝑥) of degree 1, in other words, equations of the form 𝑎𝑥 ≡ 𝑏 (mod 𝑚). For example, how many solutions does 2𝑥 ≡ 3 (mod 4) have? Observe that none among 𝑥 = 1, 2, 3, 4 satisfy this congruence. Therefore, this equation does not have a solution. For large values of 𝑚, it is not feasible to check for solutions to 𝑎𝑥 ≡ 𝑏 (mod 𝑚) by checking all residues mod 𝑚. We now state some theorems which explicitly determine how many solutions a linear congruence equation has and how to find those solutions. Theorem 4.7. Let (𝑎, 𝑚) = 1. The linear congruence equation 𝑎𝑥 ≡ 1 (mod 𝑚) has exactly one solution. Proof. Consider the set {𝑎, 2𝑎, 3𝑎, ⋯ , 𝑚𝑎}. If 𝑖𝑎 ≡ 𝑗𝑎 (mod 𝑚) for some 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑚, then 𝑚|(𝑖 − 𝑗)𝑎. Since (𝑎, 𝑚) = 1, this implies that 𝑚|(𝑖 − 𝑗), which is possible only when 𝑖 = 𝑗, since 0 ≤ 𝑗 − 𝑖 ≤ 𝑚 − 1. Thus, any two elements in the above set are incongruent to each other (mod 𝑚). That is, the above set forms a complete residue system mod 𝑚. Thus, there exists a unique integer 𝑥 lying between 1 and 𝑚 such that 𝑎𝑥 ≡ 1 (mod 𝑚). □ How do we determine the solutions of 𝑎𝑥 ≡ 𝑏 (mod 𝑚)? Observe that this is equivalent to finding integers 𝑥 and 𝑦 such that 𝑎𝑥 + 𝑚𝑦 = 𝑏. To do so, we first find integers 𝑋 and 𝑌 such that 𝑎𝑋 + 𝑚𝑌 = 1. This can be done using the Euclidean algorithm. Then, 𝑥 = 𝑏𝑋, 𝑦 = 𝑏𝑌 are solutions of 𝑎𝑥 + 𝑚𝑦 = 𝑏. We reduce 𝑥 modulo 𝑚 to solve 𝑎𝑥 ≡ 𝑏 (mod 𝑚). Theorem 4.8. Let (𝑎, 𝑚) = 𝑑. The linear congruence equation 𝑎𝑥 ≡ 𝑏 (mod 𝑚) has a solution if and only if 𝑑|𝑏. Proof. Suppose 𝑎𝑥 ≡ 𝑏 (mod 𝑚) has a solution. This implies that there exist integers 𝑥, 𝑦 such that 𝑎𝑥+𝑚𝑦 = 𝑏. Since 𝑑|𝑎𝑥 and 𝑑|𝑚𝑦, we deduce that 𝑑|𝑏. Conversely, suppose 𝑑 = (𝑎, 𝑚) divides 𝑏. Since (𝑎/𝑑, 𝑚/𝑑) = 1, the congruence 𝑎 𝑚 𝑏 𝑥 ≡ (mod ) 𝑑 𝑑 𝑑 has a solution by Theorem 4.7. By Proposition 4.2 (vi), this solution immediately provides a solution of 𝑎𝑥 ≡ 𝑏 (mod 𝑚). □

4.3. Linear congruence equations

55

We now describe how solutions of the congruence equation 𝑎 𝑏 𝑚 𝑥 ≡ (mod ) 𝑑 𝑑 𝑑 can be extended into solutions of 𝑎𝑥 ≡ 𝑏 (mod 𝑚). Theorem 4.9. Let (𝑎, 𝑚) = 𝑑 and 𝑑|𝑏. The linear congruence (4.1)

𝑎𝑥 ≡ 𝑏 (mod 𝑚)

has exactly 𝑑 solutions modulo 𝑚. Let 𝑡 be the unique solution of the linear congruence 𝑏 𝑚 𝑎 𝑥 ≡ (mod ) . 𝑑 𝑑 𝑑

(4.2)

Then the solutions of 𝑎𝑥 ≡ 𝑏 (mod 𝑚) are given by 𝑡, 𝑡 +

𝑚 𝑚 𝑚 , 𝑡 + 2 , ⋯ , 𝑡 + (𝑑 − 1) , 𝑑 𝑑 𝑑

Proof. By Proposition 4.2 (vi), any solution of equation (4.2) is also a solution of equation (4.1). Moreover, if an integer 𝑡 satisfies the con𝑚 gruence equation (4.2), so does any integer of the form 𝑡 + 𝑗 𝑑 , 𝑗 ∈ ℤ. Therefore, 𝑚 𝑚 𝑚 𝑡, 𝑡 + , 𝑡 + 2 , . . . , 𝑡 + (𝑑 − 1) , 𝑑 𝑑 𝑑 are solutions of equation (4.2) and therefore of equation (4.1). Suppose 𝑡+𝑗

𝑚 𝑚 ≡ 𝑡 + 𝑘 (mod 𝑚) for some 0 ≤ 𝑗 ≤ 𝑘 ≤ 𝑑 − 1. 𝑑 𝑑

This implies, 𝑗

𝑚 𝑚 ≡ 𝑘 (mod 𝑚) 𝑑 𝑑

and therefore 𝑗 ≡ 𝑘 (mod 𝑑). But 0 ≤ 𝑘 − 𝑗 ≤ 𝑑 − 1. So, 𝑗 = 𝑘. In other words, the solutions 𝑡, 𝑡 +

𝑚 𝑚 𝑚 , 𝑡 + 2 , . . . , 𝑡 + (𝑑 − 1) 𝑑 𝑑 𝑑

are incongruent mod 𝑚.

56

4. Introduction to congruence arithmetic

Finally, we show that any solution of equation 4.1 is contained in the above list. Let 𝑡0 be a solution of equation (4.1). Then 𝑎𝑡0 ≡ 𝑎𝑡 (mod 𝑚). 𝑚 𝑑|𝑎(𝑡 − 𝑡0 ) 𝑑 𝑚 𝑎 ⇒ | (𝑡 − 𝑡0 ) 𝑑 𝑑 𝑚 ⇒ |(𝑡 − 𝑡0 ) 𝑑

𝑚|𝑎(𝑡 − 𝑡0 ) ⇒

⇒ 𝑡 ≡ 𝑡0 (mod

𝑚 ). 𝑑

Thus

𝑚 for some 𝑙 ∈ ℤ. 𝑑 But, 𝑙 ≡ 𝑟 (mod 𝑑) for some 0 ≤ 𝑟 ≤ 𝑑 − 1. Therefore, 𝑚 𝑚 𝑙 ≡ 𝑟 (mod 𝑚). 𝑑 𝑑 Thus, 𝑚 𝑚 𝑡0 = 𝑡 + 𝑙 ≡ 𝑡 + 𝑟 (mod 𝑚) for some 0 ≤ 𝑟 ≤ 𝑑 − 1, 𝑑 𝑑 that is any solution of the congruence Equation (4.1) is contained in the list 𝑚 𝑚 𝑚 𝑡, 𝑡 + , 𝑡 + 2 , ⋯ , 𝑡 + (𝑑 − 1) . 𝑑 𝑑 𝑑 □ 𝑡0 = 𝑡 + 𝑙

4.3.1. Exercises. Exercise 4.3.1.1. Find integers 𝑥 and 𝑦 such that 95𝑥 + 432𝑦 = 1. Show that there are infinitely many such integers. More generally, show that if (𝑎, 𝑏) = 1, and (𝑥0 , 𝑦0 ) is an integer solution of 𝑎𝑥 + 𝑏𝑦 = 1, then all its integer solutions are of the form 𝑥 = 𝑥0 + 𝑏𝑡, 𝑦 = 𝑦0 − 𝑎𝑡 for 𝑡 an integer. Exercise 4.3.1.2. Solve the linear congruence equation 95𝑥 ≡ 7(mod 432).

4.4. Linear congruences and the Chinese remainder theorem A Chinese mathematical work from the early fourth century posed the following problem [65, See page 75]:

4.4. Linear congruences and CRT

57

We have a number of things, but we do not know exactly how many. If we count them by threes, we have two left over. If we count them by fives, we have three left over. If we count them by sevens, we have two left over. How many things are there? Sun Tzu Suan Ching (Master Sun’s Mathematical Manual) Around 300 CE, Volume 3, Problem 26 In the language of congruences, Master Sun is asking us to find a positive integer 𝑥 satisfying the following system of equations: 𝑥 ≡ 2 (mod 3), 𝑥 ≡ 3 (mod 5), 𝑥 ≡ 2 (mod 7). The first equation tells us that 𝑥 = 3𝑡 + 2 for some integer 𝑡. Substituting this in second and third equations, we have 3𝑡 ≡ 1 (mod 5) and 3𝑡 ≡ 0 (mod 7). Since (3, 7) = 1, we get 𝑡 ≡ 0 (mod 7), that is, 𝑡 = 7𝑦 for some 𝑦 ∈ ℤ. Since 3𝑡 ≡ 1 (mod 5), we have 21𝑦 ≡ 1 (mod 5), which implies, 𝑦 ≡ 1 (mod 5). We deduce that 𝑦 = 5𝑘 + 1, 𝑡 = 7𝑦 = 35𝑘 + 7 and finally, 𝑥 = 3𝑡 + 2 = 105𝑘 + 23. So any positive integer 𝑥 of the form 105𝑘 + 23 satisfies all the congruence equations above. That is, the above system of congruences has exactly one solution between 1 and 105. That is, this system has a unique solution 𝑥 ≡ 23 (mod 105). On the other hand, there is no 𝑥 simultaneously satisfying 𝑥 ≡ 1 (mod 2) and 𝑥 ≡ 0 (mod 4). Notice that while 3, 5 and 7 are mutually coprime, 2 and 4 are not coprime. Theorem 4.10 describes the conditions under which systems of linear congruences can be solved. Theorem 4.10 (Chinese remainder theorem). Let 𝑛1 , 𝑛2 , . . . , 𝑛𝑘 be mutually coprime positive integers, that is, (𝑛𝑖 , 𝑛𝑗 ) = 1 for all 𝑖 ≠ 𝑗. Then, the system of congruences 𝑥 ≡ 𝑐 1 (mod 𝑛1 ) 𝑥 ≡ 𝑐 2 (mod 𝑛2 ) ⋮ 𝑥 ≡ 𝑐 𝑘 (mod 𝑛𝑘 ) has a unique solution (mod 𝑛1 𝑛2 . . . 𝑛𝑘 ).

58

4. Introduction to congruence arithmetic

Proof. Let 𝑁 = 𝑛1 𝑛2 . . . 𝑛𝑘 and 𝑁 𝑖 = 𝑁/𝑛𝑖 for each 1 ≤ 𝑖 ≤ 𝑘. Then (𝑛𝑖 , 𝑁 𝑖 ) = 1. By Theorem 4.7, for each 𝑁 𝑖 , there exists a unique 𝑁𝑖′ modulo 𝑛𝑖 such that 𝑁 𝑖 𝑁𝑖′ ≡ 1 (mod 𝑛𝑖 ). Consider 𝑘

𝑥 = ∑ 𝑐 𝑖 𝑁 𝑖 𝑁𝑖′ . 𝑖=1

𝑐 𝑖 𝑁 𝑖 𝑁𝑖′

For each 𝑖, ≡ 𝑐 𝑖 (mod 𝑛𝑖 ) and 𝑁 𝑗 ≡ 0 (mod 𝑛𝑖 ) whenever 𝑖 ≠ 𝑗. Thus, for each 1 ≤ 𝑖 ≤ 𝑘, 𝑘

∑ 𝑐𝑗 𝑁 𝑗 𝑁𝑗′ ≡ 𝑐 𝑖 (mod 𝑛𝑖 ). 𝑗=1

Thus, 𝑥 satisfies every congruence in the system. We now show that 𝑥 is unique (mod 𝑁). Suppose 𝑥 and 𝑦 are two integers satisfying all the equations above. Then 𝑥 ≡ 𝑦 (mod 𝑛𝑖 ) for each 1 ≤ 𝑖 ≤ 𝑘. Since 𝑛𝑖 ’s are mutually coprime, we get 𝑥 ≡ 𝑦 (mod 𝑁). □ 4.4.1. Exercises. Exercise 4.4.1.1. Show that 𝜙(𝑛1 )

𝑥 = 𝑐 1 𝑁1

𝜙(𝑛2 )

+ 𝑐 2 𝑁2

𝜙(𝑛𝑘 )

+ ⋯ + 𝑐 𝑘 𝑁𝑘

is a simultaneous solution to the system of congruences described in Theorem 4.10. In the next few exercises, we practice the use of Chinese remainder theorem (CRT) and also look at systems where 𝑛𝑖 ’s are not mutually coprime. Exercise 4.4.1.2. Find a simultaneous solution mod 900 for the system 𝑥 ≡ 3 (mod 4) 𝑥 ≡ 2 (mod 9) 𝑥 ≡ 1 (mod 25) Exercise 4.4.1.3. Find an integer 𝑥 such that 2𝑥 ≡ 1 (mod 11), 6𝑦 ≡ 1 (mod 7), 2𝑧 ≡ 1 (mod 5). Exercise 4.4.1.4. Find integers 𝑥, 𝑦, 𝑧 such that 35𝑥 + 55𝑦 + 77𝑧 = 1. Show that there are infinitely many such integers. Exercise 4.4.1.5. Find the last two digits in the expansion of 41000 .

4.5. Polynomial congruences

59

Exercise 4.4.1.6. Let 𝑚 and 𝑛 be two positive integers such that (𝑚, 𝑛) = 1. Let us consider two sets 𝐴 and 𝐵 where 𝐴 = {𝑎 ∶ 1 ≤ 𝑎 ≤ 𝑚𝑛 such that gcd (𝑎, 𝑚𝑛) = 1} and 𝐵 = {(𝑏, 𝑐) ∶ 1 ≤ 𝑏 ≤ 𝑚, 1 ≤ 𝑐 ≤ 𝑛, gcd (𝑏, 𝑚) = 1, gcd (𝑐, 𝑛) = 1.} Define a function 𝑓 ∶ 𝐴 → 𝐵 as follows: 𝑓(𝑎 mod 𝑚𝑛) = (𝑎 mod 𝑚, 𝑎 mod 𝑛). (a) Show that 𝑓 is injective (one-to-one). (b) Show that 𝑓 is surjective (onto). [Hint: Use CRT] (c) From the above, conclude that 𝜙(𝑚𝑛) = 𝜙(𝑚)𝜙(𝑛). Exercise 4.4.1.7. Find an integer 𝑥 that satisfies 𝑥 ≡ 6 (mod 15), 𝑥 ≡ 9 (mod 14). Is this solution unique (mod 210)? Exercise 4.4.1.8. Show that the system of congruences 𝑥 ≡ 5 (mod 6) 𝑥 ≡ 7 (mod 10) has more than one solution modulo 60.

4.5. Polynomial congruences In this section, we will study congruence equations of the form 𝑓(𝑥) ≡ 0 (mod 𝑛), where 𝑓(𝑥) is a polynomial with integer coefficients. To motivate this discussion, let us start with simple polynomials of degree 2, which play an important role in number theory. Let 𝑝 be a prime. Consider 𝑓(𝑥) = 𝑥2 − 1. How many solutions modulo 𝑝 does the equation 𝑥2 ≡ 1 (mod 𝑝) have? Observe that 𝑥2 ≡ 1 (mod 𝑝) ⇒ 𝑝|(𝑥2 − 1) ⇒ 𝑝|(𝑥 − 1)(𝑥 + 1) ⇒ Either 𝑝|(𝑥 − 1) or 𝑝|(𝑥 + 1) ⇒ 𝑥 ≡ ±1 (mod 𝑝).

60

4. Introduction to congruence arithmetic

1 ≡ −1 (mod 𝑝) if and only if 𝑝 = 2. Thus, 𝑥2 ≡ 1 (mod 𝑝) has a unique solution if 𝑝 = 2 and has exactly two solutions if 𝑝 is an odd prime. This fact has an interesting consequence. Theorem 4.11 (Wilson’s theorem). If 𝑝 is a prime, then (𝑝 − 1)! ≡ −1 (mod 𝑝). Proof. Consider the numbers 1, 2, . . . , 𝑝 − 1. Among these numbers, 1 and 𝑝 − 1 satisfy 𝑥2 ≡ 1 (mod 𝑝). By Theorem 4.7, for each 2 ≤ 𝑎 ≤ 𝑝 − 2, there exists a unique 2 ≤ 𝑎′ ≤ 𝑝−2 such that 𝑎𝑎′ ≡ 1 (mod 𝑝). Moreover 𝑎 ≠ 𝑎′ , since 𝑎2 ≡ 1 (mod 𝑝) if and only if 𝑎 = 1, 𝑝 − 1. Thus, 2.3 . . . (𝑝 − 2) ≡ 1 (mod 𝑝). Moreover 𝑝 − 1 ≡ −1 (mod 𝑝). Thus, 1.2.3 . . . (𝑝 − 1) ≡ −1 (mod 𝑝).



Corollary 4.12. Let 𝑛 ≥ 4. Then, (𝑛 − 1)! ≡ −1 (mod 𝑛) if and only if 𝑛 is a prime. Proof. Sufficiency is proved by Wilson’s theorem. If 𝑛 is not a prime, then there exist 1 ≤ 𝑎, 𝑏 ≤ 𝑛 − 1 such that 𝑎𝑏 = 𝑛. If 𝑎 ≠ 𝑏, then clearly 𝑛|(𝑛 − 1)! , since both 𝑎 and 𝑏 are factors of (𝑛 − 1)!. If 𝑎 = 𝑏, that is, if 𝑛 = 𝑎2 , then for 𝑛 > 4, both 𝑎 and 2𝑎 lie between 1 and 𝑛 − 1. Thus, 𝑛 = 𝑎2 divides (𝑛 − 1)! , and, therefore, (𝑛 − 1)! ≡ 0 (mod 𝑛). For 𝑛 = 4, one immediately checks that 3! ≢ −1 (mod 4). Thus, (𝑛 − 1)! ≡ −1 (mod 𝑛) if and only if 𝑛 is a prime. □ As a consequence of Wilson’s theorem, we also make the following observation: Corollary 4.13. For a prime 𝑝 > 2, (−1)

𝑝−1 2

[(

2 𝑝−1 ) ! ] ≡ −1 (mod 𝑝). 2

Proof. For any integer 𝑖 lying between 1 and (𝑝 − 1)/2, we have 𝑝 − 𝑖 ≡ −𝑖 (mod 𝑝).

4.5. Polynomial congruences

61

Thus, (𝑝 − 1)! = 1.2. . . . (𝑝 − 1) 𝑝−1 𝑝−1 = 1.2. . . . (𝑝 − ) . . . (𝑝 − 2)(𝑝 − 1) 2 2 2 𝑝−1 𝑝−1 ≡ (−1) 2 [( ) ! ] (mod 𝑝). 2 Thus, by Wilson’s theorem, for a prime 𝑝 > 2, (−1)

𝑝−1 2

[(

2 𝑝−1 ) ! ] ≡ −1 (mod 𝑝). 2

□ We now prove a theorem regarding the solutions of 𝑥2 ≡ −1 (mod 𝑝). Theorem 4.14. For an odd prime 𝑝, 𝑥2 ≡ −1 (mod 𝑝) has a solution if and only if 𝑝 ≡ 1 (mod 4). Proof. Suppose 𝑥2 ≡ −1 (mod 𝑝) has a solution 𝑥 = 𝑎. Thus, (4.3)

𝑎𝑝−1 ≡ (𝑎2 )

𝑝−1 2

≡ (−1)

𝑝−1 2

(mod 𝑝).

Clearly (𝑎, 𝑝) = 1, otherwise 𝑎2 ≡ 0 ≢ −1 (mod 𝑝). Thus, by Theorem 4.6, we have 𝑎𝑝−1 ≡ 1 (mod 𝑝).

(4.4)

Combining equations (4.3) and (4.4), we get (−1)

𝑝−1 2

≡ 1 (mod 𝑝).

Therefore, (𝑝 − 1)/2 is even. That is, 𝑝 ≡ 1 (mod 4). Conversely, suppose 𝑝 ≡ 1 (mod 4). Then (𝑝 − 1)/2 is even. Thus, by Corollary 4.13, [(

2 𝑝−1 ) ! ] ≡ −1 (mod 𝑝), 2

and therefore, 𝑥2 ≡ −1 (mod 𝑝) has a solution.



Recall, from high school algebra, that to solve the quadratic equation, 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0, 𝑎 ≠ 0,

62

4. Introduction to congruence arithmetic

we simplified it into 2

𝑎 ((𝑥 +

𝑏 𝑏2 − 4𝑎𝑐 ) −( )) = 0. 2𝑎 4𝑎2

Thus, −𝑏 ± √𝑏2 − 4𝑎𝑐 . 2𝑎 To what extent does this calculation hold mod 𝑝? For an odd prime 𝑝, if (𝑎, 𝑝) = 1, then we can solve 𝑥=

2𝑎𝑥 ≡ 1 (mod 𝑝). If we can solve the equation 𝑦2 ≡ 𝑏2 − 4𝑎𝑐 (mod 𝑝), then 𝑥(−𝑏 ± 𝑦) will be solutions of 𝑎𝑥2 + 𝑏𝑥 + 𝑐 ≡ 0 (mod 𝑝). Thus, solving a quadratic congruence reduces to the problem of solving a linear congruence and a congruence of the form 𝑦2 ≡ 𝐴 (mod 𝑝), for a given integer 𝐴. We therefore ask the following questions: • Under what conditions does 𝑦2 ≡ 𝐴 (mod 𝑝) have a solution? • How many solutions can it possibly have? We postpone the discussion of the first question to the next section. The second question is answered by the following theorem of Lagrange: Theorem 4.15 (Lagrange’s theorem). Given a prime 𝑝, let 𝑓(𝑥) = 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛 𝑥𝑛 , 𝑎𝑖 ∈ ℤ be a polynomial of degree 𝑛 such that 𝑎𝑛 ≢ 0 (mod 𝑝). Then the polynomial congruence 𝑓(𝑥) ≡ 0 (mod 𝑝) has at most 𝑛 solutions.

4.5. Polynomial congruences

63

Proof. We prove this theorem by induction. Since 𝑎1 ≢ 0 (mod 𝑝), by Theorem 4.7, the equation 𝑎1 𝑥 + 𝑎0 ≡ 0 (mod 𝑝) has a unique solution. Thus, the theorem is true for 𝑛 = 1. Suppose that the theorem is true for polynomials of degree 𝑛 − 1. Assume, also, that the equation 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛 𝑥𝑛 ≡ 0 (mod 𝑝), 𝑎𝑛 ≢ 0 (mod 𝑝) has 𝑛 + 1 incongruent solutions mod 𝑝, say 𝑥0 , 𝑥1 , . . . , 𝑥𝑛 . We have 𝑛

𝑓(𝑥) − 𝑓(𝑥0 ) = ∑ 𝑎𝑘 (𝑥𝑘 − 𝑥0𝑘 ) = (𝑥 − 𝑥0 )𝑔(𝑥), 𝑘=1

where degree of 𝑔(𝑥) is 𝑛 − 1 and the leading coefficient of 𝑔(𝑥) is 𝑎𝑛 which is ≢ 0 (mod 𝑝). We observe that for every 1 ≤ 𝑘 ≤ 𝑛, 𝑓(𝑥𝑘 ) ≡ 𝑓(𝑥0 ) (mod 𝑝). Thus, 𝑓(𝑥𝑘 ) − 𝑓(𝑥0 ) = (𝑥𝑘 − 𝑥0 )𝑔(𝑥𝑘 ) ≡ 0 (mod 𝑝). Since 𝑥𝑘 and 𝑥0 are incongruent (mod 𝑝), we get 𝑔(𝑥𝑘 ) ≡ 0 (mod 𝑝) for every 1 ≤ 𝑘 ≤ 𝑛. Thus, 𝑔(𝑥) ≡ 0 (mod 𝑝) has 𝑛 incongruent solutions (mod 𝑝), which contradicts our induction hypothesis that it can have at most 𝑛 − 1 solutions. Therefore, 𝑎0 + 𝑎1 𝑥 + ⋯ + 𝑎𝑛 𝑥𝑛 ≡ 0 (mod 𝑝), 𝑎𝑛 ≢ 0 (mod 𝑝) has at most 𝑛 solutions. By induction, we have proved the result for all 𝑛 ≥ 1. □ Remark 4.16. Theorem 4.15 may not hold for a composite modulus 𝑛. For example, 𝑥2 − 1 ≡ 0 (mod 8) has 4 solutions. From Lagrange’s theorem, we deduce Corollary 4.17: Corollary 4.17. If 𝑝 is a prime number and 𝑑|𝑝 − 1, then the congruence equation 𝑥𝑑 − 1 ≡ 0 (mod 𝑝) has exactly 𝑑 solutions.

64

4. Introduction to congruence arithmetic

Proof. Let 𝑝 − 1 = 𝑑𝑘 for some integer 𝑘. Then 𝑥𝑝−1 − 1 = (𝑥𝑑 − 1)(𝑥𝑑(𝑘−1) + 𝑥𝑑(𝑘−2) + . . . 𝑥 + 1). That is, 𝑥𝑝−1 − 1 = (𝑥𝑑 − 1)𝑓(𝑥), where 𝑓(𝑥) is a polynomial with integer coefficients of degree 𝑝 − 1 − 𝑑. By Theorem 4.6, we know that 𝑥𝑝−1 − 1 ≡ 0 (mod 𝑝) has exactly 𝑝 − 1 solutions (namely 𝑥 = 1, 2, . . . , 𝑝 − 1). By Lagrange’s theorem, we know that 𝑥𝑑 − 1 ≡ 0 (mod 𝑝) has at most 𝑑 solutions and 𝑓(𝑥) ≡ 0 (mod 𝑝) has at most 𝑝 − 1 − 𝑑 solutions. Therefore, 𝑥𝑑 − 1 ≡ 0 (mod 𝑝) must have exactly 𝑑 solutions, since if it has fewer than 𝑑 solutions, the number of solutions of the equations 𝑥𝑝−1 − 1 ≡ 0 (mod 𝑝) and (𝑥𝑑 − 1)𝑓(𝑥) ≡ 0 (mod 𝑝) □

will not match up. 4.5.1. Exercises.

Exercise 4.5.1.1. Find all integer solutions of 𝑦2 = 23𝑥3 + 22 if any. [Hint: Consider the above equation modulo 23.] Exercise 4.5.1.2. Find the solutions, if any, of the following quadratic congruences. (a) 𝑥2 + 4𝑥 + 12 ≡ 0 (mod 13) (b) 3𝑥2 + 9𝑥 + 7 ≡ 0 (mod 11) Exercise 4.5.1.3. Show that for any prime 𝑝, the equation 𝑥3 ≡ 1 (mod 𝑝) will have either one solution or three solutions. Explain under what conditions it will have three solutions. Exercise 4.5.1.4. If 𝑝 is an odd prime, prove that the congruence equation 𝑥𝑝−2 + ⋯ + 𝑥2 + 𝑥 + 1 ≡ 0 (mod 𝑝) has exactly 𝑝 − 2 solutions and they are 2, 3, . . . , 𝑝 − 2.

4.6. Order and primitive roots

65

Exercise 4.5.1.5. Let 𝑛 = 𝑝1 𝑝2 . . . 𝑝 𝑘 , where 𝑝 𝑖 ’s are distinct primes. (a) Show that if all the 𝑝 𝑖 ’s are odd, then the number of solutions to 𝑥2 ≡ 1(mod 𝑛) is 2𝑘 . (b) Show that if 𝑝 𝑖 = 2 for some 𝑖, then the above congruence has 2𝑘−1 solutions. Exercise 4.5.1.6. If 𝑝 is an odd prime, show that (

𝑝−1 ) ≡ (−1)𝑘+1 (mod 𝑝), 𝑘+1

for 𝑘 satisfying 0 ≤ 𝑘 ≤ 𝑝 − 2.

4.6. Order and primitive roots Let 𝑎 and 𝑚 be coprime integers such that 𝑚 > 0. By Euler’s theorem (Theorem 4.5), we know that 𝑎𝜙(𝑚) ≡ 1 (mod 𝑚). However, there might be a smaller power 𝑘 ≤ 𝜙(𝑚) for which 𝑎𝑘 ≡ 1 (mod 𝑚). Definition 4.18. Let 𝑎 and 𝑚 be coprime integers such that 𝑚 > 0. The smallest positive integer 𝑘 for which 𝑎𝑘 ≡ 1 (mod 𝑚) is called the order of 𝑎 modulo 𝑚 and is denoted by ord𝑚 (𝑎). If ord𝑚 (𝑎) = 𝜙(𝑚), then 𝑎 is called a primitive root of 𝑚 or a primitive root (mod 𝑚). Note that by Theorem 4.7, if (𝑎, 𝑚) = 1, then 𝑎 has a multiplicative inverse mod 𝑚. Thus, the set of coprime residue classes mod 𝑚 forms a multiplicative group and is denoted as (ℤ/𝑚ℤ)∗ . If 𝑚 has a primitive root, then (ℤ/𝑚ℤ)∗ is a cyclic group.

66

4. Introduction to congruence arithmetic

Theorem 4.19. Let 𝑎 and 𝑚 be coprime integers such that 𝑚 > 0. Let 𝑘 = ord𝑚 (𝑎). Then (a) 𝑎𝑙 ≡ 1 (mod 𝑚) if and only if 𝑙 ≡ 0 (mod 𝑘). (b) 𝑎𝑖 ≡ 𝑎𝑗 (mod 𝑚) if and only if 𝑖 ≡ 𝑗 (mod 𝑘). (c) The numbers 1, 𝑎, 𝑎2 , . . . , 𝑎𝑘−1 are incongruent mod 𝑚. Proof. (a) If 𝑙 ≡ 0 (mod 𝑘), then clearly 𝑎𝑙 ≡ 1 (mod 𝑚). Conversely, suppose 𝑎𝑙 ≡ 1 (mod 𝑚). By the division algorithm, we know that there exist unique integers 𝑞 and 𝑟 such that 𝑙 = 𝑞𝑘 + 𝑟, such that 0 ≤ 𝑟 < 𝑘. Observe that 𝑎𝑟 ≡ 1.𝑎𝑟 ≡ 𝑎𝑘𝑞 𝑎𝑟 ≡ 𝑎𝑞𝑘+𝑟 ≡ 𝑎𝑙 ≡ 1 (mod 𝑚). Since 𝑘 = ord𝑚 (𝑎) and 𝑟 < 𝑘, we get 𝑟 = 0, Thus, 𝑙 = 𝑞𝑘 and therefore 𝑙 ≡ 0 (mod 𝑘). (b) This follows easily from (a). (c) Suppose 𝑎𝑖 ≡ 𝑎𝑗 (mod 𝑚) for some 0 ≤ 𝑖 ≤ 𝑗 ≤ 𝑘. By (b), 𝑖 ≡ 𝑗 (mod 𝑘). Since 0 ≤ 𝑗 − 𝑖 ≤ 𝑘 − 1, this is not possible unless 𝑗 − 𝑖 = 0, that is, 𝑗 = 𝑖. Thus, the numbers 1, 𝑎, 𝑎2 , . . . , 𝑎𝑘−1 are incongruent mod 𝑚. □ For any modulus 𝑚, let 𝑎1 , 𝑎2 , . . . , 𝑎𝜙(𝑚) denote all the integers between 1 and 𝑚 − 1 which are coprime to 𝑚. Any collection of 𝜙(𝑚) integers, which are incongruent mod 𝑚 and each of which is congruent to one of the 𝑎𝑖 ’s is called a reduced residue system mod 𝑚. Theorem 4.19(c) tells us that if 𝑎 is a primitive root of 𝑚, then {𝑎, 𝑎2 , . . . , 𝑎𝜙(𝑚) } forms a reduced residue system mod 𝑚. Theorem 4.20. For a prime 𝑝 and a divisor 𝑑 of 𝑝 − 1, let 𝐴(𝑑) = {1 ≤ 𝑎 ≤ 𝑝 − 1 ∶ ord𝑝 (𝑎) = 𝑑}. Then 𝑑 = ∑ |𝐴(𝑑𝑖 )|. 𝑑𝑖 |𝑑

Therefore, |𝐴(𝑑)| = 𝜙(𝑑).

4.6. Order and primitive roots

67

Proof. If 𝑥𝑑 ≡ 1 (mod 𝑝), then ord𝑝 (𝑥) divides 𝑑. So, if we look at all the divisors of 𝑑 and for each divisor 𝑑𝑖 of 𝑑, consider 𝐴(𝑑𝑖 ) = {1 ≤ 𝑥 ≤ 𝑝 − 1 ∶ ord𝑝 (𝑥) = 𝑑𝑖 }, then ⋃

𝐴(𝑑𝑖 ) = {1 ≤ 𝑥 ≤ 𝑝 − 1 ∶ 𝑥𝑑 ≡ 1 (mod 𝑝)}.

𝑑𝑖 |𝑑

By Theorem 4.17, we know that if 𝑑|𝑝 − 1, then the congruence equation 𝑥𝑑 − 1 ≡ 0 (mod 𝑝) has exactly 𝑑 solutions. Thus, we deduce that 𝑑 = ∑ |𝐴(𝑑𝑖 )|. 𝑑𝑖 |𝑑

By Möbius inversion and Exercise 3.2.1.4, we get |𝐴(𝑑)| = ∑ 𝑑𝑖 |𝑑

𝑑 𝜇(𝑑𝑖 ) = 𝜙(𝑑). 𝑑𝑖 □

From this, we conclude one of the most important results in number theory: Theorem 4.21 (Primitive root theorem). Every prime 𝑝 has a primitive root. Proof. Applying Theorem 4.20 for 𝑑 = 𝑝−1 and noting that 𝜙(𝑝−1) ≥ 1 for all primes 𝑝, we see that every prime 𝑝 has a primitive root. In fact, it has exactly 𝜙(𝑝 − 1) primitive roots. □ Do prime powers have primitive roots? Clearly, both 2 and 22 have primitive roots. In Exercise 4.6.1.13 (a), we show that 2𝛼 does not have a primitive root for 𝛼 ≥ 3. In the Theorem 4.22, we show that the power of any odd prime has a primitive root. Theorem 4.22. Let 𝑝 be an odd prime. Then the group of coprime residue classes mod 𝑝𝛼 is cyclic of order 𝜙(𝑝𝛼 ) = 𝑝𝛼−1 (𝑝 − 1). If 𝑝 = 2, then every coprime residue class mod 2𝛼 can be written as ±5𝑘 .

68

4. Introduction to congruence arithmetic

Proof. By Exercise 4.2.1.4, for any prime 𝑝 and 𝛼 ≥ 1, it follows that 𝜙(𝑝𝛼 ) = 𝑝𝛼−1 (𝑝 − 1). Let 𝑝 be an odd prime. For the case 𝛼 = 1, the cyclicity of the group of coprime residue classes mod 𝑝𝛼 forms the substance of Theorem 4.21. For an odd prime 𝑝, the existence of primitive roots of 𝑝𝛼 for 𝛼 ≥ 2 follows from Exercises 4.6.1.11 and 4.6.1.12. We now study the case 𝑝 = 2. The claim is immediate for 𝛼 = 1, 2. Let 𝛼 ≥ 3. By Exercise 4.6.1.13 (b), the elements of the set {5𝑘 , 0 ≤ 𝑘 ≤ 2𝛼−2 − 1} are all incongruent mod 2𝛼 . Similarly, the elements of the set {−5𝑘 , 0 ≤ 𝑘 ≤ 2𝛼−2 − 1} are also incongruent mod 2𝛼 . We further observe that 5𝑠 ≢ −5𝑡 (mod 2𝛼 ) for any 1 ≤ 𝑠, 𝑡 ≤ 2𝛼−2 − 1. This is because 5𝑠 ≡ 1 (mod 4) and −5𝑡 ≡ 3 (mod 4). Thus, the set {±5𝑘 , 0 ≤ 𝑘 ≤ 2𝛼−2 − 1} contains 2𝛼−1 distinct residue classes mod 2𝛼 . Since 𝜙(2𝛼 ) = 2𝛼−1 , it follows that every coprime residue class mod 2𝛼 can be written as ±5𝑘 . □ The primitive root theorem helps us to answer the following question raised in Section 4.5: when does the equation 𝑥2 ≡ 𝑎 (mod 𝑝) have a solution? Theorem 4.23 (Euler’s criterion). Let 𝑝 be an odd prime and (𝑎, 𝑝) = 1. Then, 𝑥2 ≡ 𝑎 (mod 𝑝) has a solution if and only if 𝑎

𝑝−1 2

≡ 1 (mod 𝑝).

Proof. Suppose 𝑥2 ≡ 𝑎 (mod 𝑝) has a solution, say 𝑥 = 𝑥1 . Since (𝑎, 𝑝) = 1, we have (𝑥1 , 𝑝) = 1. Thus, 𝑎

𝑝−1 2

≡ (𝑥12 )

𝑝−1 2

𝑝−1

≡ 𝑥1

≡ 1 (mod 𝑝),

4.6. Order and primitive roots

69

by Fermat’s little theorem. Conversely, suppose 𝑎

𝑝−1 2

≡ 1 (mod 𝑝).

By the primitive root theorem, 𝑝 has a primitive root, say, 𝑟. By Exercise 4.6.1.2, 𝑎 ≡ 𝑟𝑘 (mod 𝑝) for some 1 ≤ 𝑘 ≤ 𝑝 − 1. This implies (𝑟𝑘 )

𝑝−1 2

≡𝑎

𝑝−1 2

≡ 1 (mod 𝑝).

But ord𝑝 (𝑟) = 𝑝 − 1. Therefore, by Theorem 4.19 (a), we have 𝑘(𝑝 − 1) . 2 Thus, 𝑘 is an even integer, say 𝑘 = 2𝑙 and (𝑝 − 1) ∣

𝑎 ≡ 𝑟𝑘 ≡ 𝑟2𝑙 ≡ (𝑟𝑙 )2 (mod 𝑝). That is, 𝑥 = 𝑟𝑙 is a solution of 𝑥2 ≡ 𝑎 (mod 𝑝).



In particular, Theorem 4.14 is a special case of Euler’s criterion. We now record an application of Theorem 4.22 to estimate the number of solutions of polynomial congruences of the form 𝑥𝑘 ≡ 𝑐 (mod 𝑞) for 𝑞 > 1 and (𝑐, 𝑞) = 1. We start with Lemma 4.24. Lemma 4.24. Let 𝑝 be an odd prime and (𝑐, 𝑝) = 1. The number of solutions of the polynomial congruence 𝑥𝑘 ≡ 𝑐 (mod 𝑝𝛼 ) is at most (𝑘, 𝜙(𝑝𝛼 )). If 𝑝 = 2, the number of solutions is at most 2(𝑘, 2𝛼−2 ). Proof. Let 𝑔 be a generator of the coprime residue classes (mod 𝑝𝛼 ). Write 𝑥 = 𝑔𝑡 and 𝑐 = 𝑔𝑠 . By Theorem 4.19 (b), the congruence 𝑥𝑘 ≡ 𝑐 (mod 𝑝𝛼 ) reduces to the linear congruence 𝑘𝑡 ≡ 𝑠 (mod 𝜙(𝑝𝛼 )), which, by Theorem 4.8, has no solutions unless (𝑘, 𝜙(𝑝𝛼 )) divides 𝑠. If (𝑘, 𝜙(𝑝𝛼 )) divides 𝑠, then, by Theorem 4.9, the above linear congruence has (𝑘, 𝜙(𝑝𝛼 )) solutions mod 𝑝𝛼 . The situation for 𝑝 = 2 is similar and we leave this as an exercise to the reader (see Exercise 4.6.1.14). □

70

4. Introduction to congruence arithmetic

This allows us to derive an upper bound for the number of solutions of the congruence 𝑥𝑘 ≡ 𝑐 (mod 𝑞) for any integer 𝑞 > 1. Lemma 4.25. For any 𝑞 > 1 and (𝑐, 𝑞) = 1, the number of solutions of 𝑥𝑘 ≡ 𝑐 (mod 𝑞) is at most (2𝑘)𝜈(𝑞) , where 𝜈(𝑞) is the number of distinct prime factors of 𝑞. Proof. By the Chinese remainder theorem (Theorem 4.10), the congruence 𝑥𝑘 ≡ 𝑐 (mod 𝑞) is equivalent to a system of congruences 𝑥𝑘 ≡ 𝑐 (mod 𝑝𝛼 ) as we range over the distinct prime power divisors 𝑝𝛼 of 𝑞. By Lemma 4.25, the number of solutions for each prime power is ≤ 𝑘 if 𝑝 is odd and ≤ 2𝑘 if 𝑝 = 2. The result is now immediate. □ 4.6.1. Exercises. Exercise 4.6.1.1. Find ord7 (2) and ord7 (3). Exercise 4.6.1.2. Let (𝑎, 𝑚) = 1. Prove that 𝑎 is a primitive root of 𝑚 if and only if {𝑎, 𝑎2 , . . . , 𝑎𝜙(𝑚) } forms a reduced residue system (mod 𝑚). Exercise 4.6.1.3. Show that 2 is a primitive root of 19. Find all residues 𝑎 (mod 19) for which 𝑥2 ≡ 𝑎 (mod 19) has a solution. Exercise 4.6.1.4. Let 𝑝 be an odd prime and (𝑎, 𝑝) = 1. If 𝑎 = 𝑏2 is a perfect square, can 𝑎 be a primitive root of 𝑝? Exercise 4.6.1.5. Find all primitive roots of 17. Exercise 4.6.1.6. Prove that there exist infinitely many primes of the form 4𝑛 + 1. [Hint: Suppose there were only finitely many primes of this kind and let 𝑁 denote their product. What can you say about a prime factor of 4𝑁 2 + 1?] Exercise 4.6.1.7. Check if the congruence 𝑥2 ≡ −1 (mod 95) has a solution.

4.6. Order and primitive roots

71

Exercise 4.6.1.8. Check if the congruence 𝑥2 ≡ 2 (mod 59) has a solution. Exercise 4.6.1.9. Check if the congruence 𝑥2 ≡ 2 (mod 61) has a solution. Is 2 a primitive root of 61? Exercise 4.6.1.10. Let 𝑘 and 𝑎 be positive integers with 𝑘 odd. Show that the set {1, 3𝑘 , 5𝑘 , . . . (2𝑎 −1)𝑘 } forms a reduced residue system modulo 2𝑎 . Exercise 4.6.1.11. Let 𝑝 be an odd prime and 𝑟 be a primitive root of 𝑝. (a) Show that ord𝑝2 (𝑟) is either 𝑝 − 1 or 𝑝(𝑝 − 1). (b) Suppose ord𝑝2 (𝑟) = 𝑝 − 1. Show that (𝑟 + 𝑝)𝑝−1 ≢ 1(mod 𝑝2 ). (c) Deduce from above that either 𝑟 or 𝑟 + 𝑝 is a primitive root of 𝑝2 . Exercise 4.6.1.12. Let 𝑟 be a primitive root of 𝑝 which is also a primitive root of 𝑝2 . (The existence of 𝑟 follows from Exercise 6.11.) (a) Show that for all integers 𝛼 ≥ 2, 𝑟𝑝

𝛼−2 (𝑝−1)

≢ 1(mod 𝑝𝛼 ).

(b) Show that 𝑝−1 divides ord𝑝𝛼 (𝑟) and ord𝑝𝛼 (𝑟) divides 𝑝𝛼−1 (𝑝− 1). (c) Deduce that 𝑟 is a primitive root of 𝑝𝛼 for all 𝛼 ≥ 2. Exercise 4.6.1.13. (a) Let 𝑥 be an odd integer and 𝛼 ≥ 3. Show that 𝑥2

𝛼−2

≡ 1 (mod 2𝛼 ).

Thus, 2𝛼 does not have a primitive root for 𝛼 ≥ 3. (b) Show that 52

𝛼−3

≡ 1 + 2𝛼−1 ≢ 1 (mod 2𝛼 ).

Thus, ord2𝛼 (5) = 2𝛼−2 .

72

4. Introduction to congruence arithmetic

Exercise 4.6.1.14. Let 𝛼 ≥ 2. Show that the number of solutions of the congruence equation 𝑥𝑘 ≡ 𝑐 (mod 2𝛼 ) is at most 2(𝑘, 2𝛼−2 ). [Hint: Use the second part of Lemma 4.24]

Chapter 5

Distribution of prime numbers

In Book 7 of “Elements”, Euclid proved that there are infinitely many prime numbers. In fact, if there were only finitely many prime numbers, say 𝑝1 , 𝑝2 , . . . , 𝑝𝑛 , we consider the number 𝑁 = 𝑝1 𝑝2 . . . 𝑝𝑛 + 1 and observe that neither of the primes 𝑝 𝑖 divide 𝑁. Thus, 𝑁 must have a prime divisor other than each of the 𝑝 𝑖 ’s. This leads to a contradiction and we deduce that there are infinitely many primes. The infinitude of primes can also be demonstrated by an “analytic” approach. We show that the series ∑ 𝑝 prime

1 𝑝

diverges. To see this, we observe that the infinite series (which we will formally introduce and study in this chapter as the Riemann zeta func∞ 1 tion) 𝜁(𝑠) = ∑𝑛=1 𝑛𝑠 converges absolutely for any real 𝑠 > 1. By Exercise 5.2.1.1, we have ∞



−1

1 1 1 = ∏ ( ∑ 𝑘𝑠 ) = ∏ (1 − 𝑠 ) 𝑠 𝑛 𝑝 𝑝 𝑘=1 𝑛=1 𝑝 𝑝 ∑

.

Using the Taylor series expansion ∞

𝑥𝑘 , |𝑥| < 1, 𝑘 𝑘=1

− log(1 − 𝑥) = ∑

73

74

5. Distribution of prime numbers

we derive ∞

log 𝜁(𝑠) = − ∑ log(1 − 𝑝

Since lim𝑠→1+ ∑𝑛

1 𝑛𝑠

1 1 ) = ∑ (∑ ). 𝑘𝑠 𝑝𝑠 𝑘𝑝 𝑝 𝑘=1

= ∞, we have lim𝑠→1+ log 𝜁(𝑠) = ∞. Thus,

lim+ (∑

𝑠→1

𝑝

1 1 +∑∑ ) = ∞. 𝑠 𝑝 𝑘𝑝𝑘𝑠 𝑝 𝑘≥2

But, for 𝑠 > 1, ∞

1 1 1 1 ≪ ∑ 2 < ∞. ≤∑∑ ≤∑ 𝑘𝑠 𝑘 𝑛 𝑝(𝑝 − 1) 𝑘𝑝 𝑘𝑝 𝑘≥2 𝑛=1 𝑝 𝑘≥2 𝑝

∑∑ 𝑝

Thus, ∑ 𝑝 prime

1 1 = lim+ (∑ 𝑠 ) = ∞, 𝑝 𝑠→1 𝑝 𝑝

thereby showing that there are infinitely many primes. The fundamental theorem of arithmetic is a vital component in this “analytic” proof as well, but is used in the form of a series representation being equal to an infinite product: ∞

1 1 = ∏ (1 − 𝑠 ) 𝑠 𝑛 𝑝 𝑛=1 𝑝 ∑

−1

,

𝑠 > 1.

This point of view makes information about the behavior of prime numbers transparent; in fact, by viewing the zeta function 𝜁(𝑠) as a complexvalued function, one derives finer information about the distribution of prime numbers among the natural numbers as well as special subsets of natural numbers, such as arithmetic progressions. More generally, one may derive information about the distribution of values 𝑓(𝑛) of an arithmetic function 𝑓 ∶ ℕ → ℂ by studying the series ∞

𝑓(𝑛) 𝑛𝑠 𝑛=1 ∑

for suitable complex numbers 𝑠. In this chapter, we elaborate on this theme. We also recall vital theorems about the distribution of primes in the set of natural numbers as well as primes in arithmetic progressions.

5.1. Dirichlet series

75

5.1. Dirichlet series Let 𝑓 ∶ ℕ → ℂ be an arithmetic function. In Chapter 3, we learned some elementary tools to estimate sums of the form ∑𝑛≤𝑥 𝑓(𝑛) using specific information about the function 𝑓. For example, if 𝑓 is real-valued and 𝑥 monotone, then the approximation of the sum ∑𝑛≤𝑥 𝑓(𝑛) to ∫1 𝑓(𝑡)𝑑𝑡 can be made explicit (see Lemma 3.9). Abel’s partial summation formula (Theorem 3.12) allows us to do the same for (complex-valued) functions which have continuous derivatives on [1, 𝑥]. The goal of this chapter is to learn to derive estimates for ∑𝑛≤𝑥 𝑓(𝑛) by invoking complex-analytic properties of the Dirichlet series ∞

𝑓(𝑛) 𝑛𝑠 𝑛=1

𝐹(𝑠) = ∑

associated to 𝑓, defined for appropriate complex numbers 𝑠. The viewpoint of studying 𝐹(𝑠) as a function of a complex variable was introduced by Riemann. Riemann’s viewpoint helps us to derive information about ∑𝑛≤𝑥 𝑓(𝑛), which would not have been accessible by merely viewing 𝐹(𝑠) as a function of real variables. Henceforth, we use the notation 𝑠 = 𝜎 + 𝑖𝑡,

𝜎, 𝑡 ∈ ℝ,

𝜎 = Re (𝑠),

𝑡 = Im (𝑠).

We have 𝑛𝑠 = 𝑛𝜍 𝑛𝑖𝑡 = 𝑛𝜍 𝑒𝑖𝑡 log 𝑛 and therefore, |𝑛𝑠 | = 𝑛𝜍 . If 𝜎 ≥ 𝜎0 , then | 𝑓(𝑛) | |𝑓(𝑛)| |𝑓(𝑛)| | 𝑠 |= ≤ 𝜍 . | 𝑛 | 𝑛𝜍 𝑛 0 Thus, if ∞

|𝑓(𝑛)| < ∞, 𝑛𝜍0 𝑛=1 ∑

then the Dirichlet series 𝐹(𝑠) is absolutely convergent for all 𝑠 ∈ ℂ with Re (𝑠) = 𝜎 ≥ 𝜎0 . For example, the Riemann zeta function (which we study in detail later) defined by ∞

1 𝑛𝑠 𝑛=1

𝜁(𝑠) = ∑ is absolutely convergent for 𝜎 > 1.

76

5. Distribution of prime numbers

With respect to a general Dirichlet series 𝐹(𝑠), suppose 𝑓(𝑛) = O(𝑛𝛿 ) for some 𝛿 > 0. Then, 1 | 𝑓(𝑛) | | 𝑠 | ≤ 𝜍−𝛿 . | 𝑛 | 𝑛 ∞

𝑓(𝑛)

Therefore, the Dirichlet series ∑𝑛=1 𝑛𝑠 converges absolutely for 𝜎 > 1 + 𝛿. Going back to some arithmetic functions we studied in Chapter 3, we can prove the following: ∞

(1) The series ∑𝑛=1

𝜇(𝑛) 𝑛𝑠

converges absolutely for 𝜎 > 1.

(2) As in Section 3.2, let 𝜈(𝑛) denote the number of prime divisors ∞ 𝜈(𝑛) of 𝑛. The series ∑𝑛=1 𝑛𝑠 converges absolutely for 𝜎 > 1. (3) Recall that 𝑑(𝑛) denotes the number of positive divisors of 𝑛. ∞ 𝑑(𝑛) Then, the series ∑𝑛=1 𝑛𝑠 converges absolutely for 𝜎 > 1. (4) More generally, for a nonnegative integer 𝑘, recall the function ∞ 𝜍 (𝑛) 𝜎 𝑘 (𝑛) defined in Example 3.3. The series ∑𝑛=1 𝑘𝑛𝑠 converges absolutely for 𝜎 > 𝑘 + 1. We now verify items (1)–(4) above in the following exercises. 5.1.1. Exercises. ∞

𝜈(𝑛)



𝑑(𝑛)

Exercise 5.1.1.1. Prove that the series ∑𝑛=1 𝑛𝑠 converges absolutely for 𝜎 = Re (𝑠) > 1. [Hint: Use Exercises 3.4.1.1 and 3.4.1.8.] Exercise 5.1.1.2. Prove that the series ∑𝑛=1 𝑛𝑠 converges absolutely for 𝜎 = Re (𝑠) > 1. [Hint: Use Exercise 3.4.1.4.] Exercise 5.1.1.3. Let 𝑘 be a nonnegative integer. Prove that the series ∞

𝜎 𝑘 (𝑛) 𝑛𝑠 𝑛=1 ∑

converges absolutely for 𝜎 > 𝑘 + 1. [Hint: Use Exercise 3.4.1.7.]

5.2. Euler products and Dirichlet series As mentioned in the introduction, one of the most recurrent themes in analytic number theory is a reinterpretation of the Fundamental Theorem of Arithmetic to simplify Dirichlet series in their domain of absolute convergence.

5.2. Euler products and Dirichlet series

77

Let 𝑓 ∶ ℕ → ℂ be a multiplicative function such that the sum ∞ ∑𝑛=1 𝑓(𝑛) converges absolutely. Since each natural number can be written uniquely as a product of prime powers, one can represent this sum as the infinite product (see Exercise 5.2.1.1) ∏ (1 + 𝑓(𝑝) + 𝑓(𝑝2 ) + . . . ) . 𝑝

In particular, if 𝑓 is a multiplicative function such that ∞

| 𝑓(𝑛) | ∑ | 𝑠 | < ∞, | 𝑛 |

𝑛=1

then



𝑓(𝑛) 𝑓(𝑝) 𝑓(𝑝2 ) ∏ = + + 2𝑠 + . . . ) . (1 𝑛𝑠 𝑝𝑠 𝑝 𝑛=1 𝑝 ∑

The above is known as the Euler product representation of a Dirichlet ∞ 𝑓(𝑛) series ∑𝑛=1 𝑛𝑠 . Moreover, if 𝑓 is completely multiplicative, then we have −1



𝑓(𝑝)𝑘 𝑓(𝑛) 𝑓(𝑝) 𝑓(𝑝)2 𝑓(𝑝) ∑ 𝑠 = ∏ (1 + 𝑠 + 2𝑠 + . . . 𝑘𝑠 ) = ∏ (1 − 𝑠 ) 𝑛 𝑝 𝑝 𝑝 𝑝 𝑛=1 𝑝 𝑝

.

In the exercises that follow, we rigorously prove the properties of Euler products stated above. We also derive the Euler product representations of Dirichlet series associated to commonly used arithmetic functions, including those from the previous section. 5.2.1. Exercises. Exercise 5.2.1.1. Let 𝑓 be a multiplicative function such that the sum ∞ ∑𝑛=1 𝑓(𝑛) converges absolutely. Show that ∞

∑ 𝑓(𝑛) = ∏ (1 + 𝑓(𝑝) + 𝑓(𝑝2 ) + . . . ) . 𝑛=1

𝑝

Exercise 5.2.1.2. Show that ∞

−1

1 1 = ∏ (1 − 𝑠 ) 𝑠 𝑛 𝑝 𝑛=1 𝑝 ∑

Deduce that 𝜁(𝑠) ≠ 0 for ℜ(𝑠) > 1.

for ℜ(𝑠) > 1.

78

5. Distribution of prime numbers

Exercise 5.2.1.3. Let 𝑓 ∶ ℕ → ℂ be a multiplicative function such that 𝑓(𝑛) = O(𝑛𝛿 ) for some 𝛿 > 0. Prove the following. ∞

(a) The Dirichlet series ∑𝑛=1

𝑓(𝑛) 𝑛𝑠

converges for Re (𝑠) > 1 + 𝛿.

(b) If the above Dirichlet series converges absolutely for Re (𝑠) > 𝜎, then ∞

𝑓(𝑛) 𝑓(𝑝) 𝑓(𝑝2 ) = ∏ (1 + 𝑠 + 2𝑠 + . . . ) , Re (𝑠) > 𝜎. 𝑠 𝑛 𝑝 𝑝 𝑛=1 𝑝 ∑

(c) If the above product converges absolutely for Re (𝑠) > 𝜎, then so does the series.

5.3. Analytic properties of Dirichlet series As before, let 𝑓 ∶ ℕ → ℂ be an arithmetic function. We continue the theme of deriving estimates for ∑𝑛≤𝑥 𝑓(𝑛) by invoking complex-analytic properties of the Dirichlet series 𝐹(𝑠), defined for complex numbers 𝑠 for which this series converges absolutely. An important technique that we have learned so far is the partial summation formula. By applying this formula, we obtain 𝑥

𝑓(𝑛) 𝐴(𝑥) 𝐴(𝑡) = 𝑠 + 𝑠 ∫ 𝑠+1 𝑑𝑡, 𝑠 𝑛 𝑥 𝑡 1 𝑛≤𝑥 ∑

where 𝐴(𝑥) = ∑𝑛≤𝑥 𝑓(𝑛). Suppose 𝐴(𝑥) = O(𝑥𝛿 ) for some 𝛿 ≥ 0. Then, for ℜ(𝑠) > 𝛿, we have lim

𝑥→∞

|𝐴(𝑥)| = 0. |𝑥𝑠 |

Thus, ∞



(5.1)

𝑓(𝑛) = 𝑠∫ 𝑛𝑠 1 𝑛=1 ∑

𝐴(𝑥) 𝑑𝑥 𝑥𝑠+1

if 𝐴(𝑥) = O(𝑥𝛿 ) and ℜ(𝑠) > 𝛿. For example, we have ∞

1 = 𝑠∫ 𝑠 𝑛 1 𝑛=1

𝜁(𝑠) = ∑



[𝑡] 𝑑𝑡, ℜ(𝑠) > 𝛿. 𝑡𝑠+1

5.3. Analytic properties of Dirichlet series

79

Writing [𝑡] = 𝑡 − {𝑡}, we get ∞

(5.2)

𝜁(𝑠) =

𝑠 − 𝑠∫ 𝑠−1 1

{𝑡} 𝑑𝑡. 𝑡𝑠+1

Note that the right-hand side is an analytic function for ℜ(𝑠) > 0, 𝑠 ≠ 1. Thus, we have an analytic continuation of (𝑠 − 1)𝜁(𝑠) from the half-plane ℜ(𝑠) > 1 to ℜ(𝑠) > 0. By equation (5.2), we also see that lim(𝑠 − 1)𝜁(𝑠) = 1. 𝑠→1

Thus, 𝜁(𝑠) has an analytic continuation to the half-plane ℜ(𝑠) > 0 except at 𝑠 = 1, where it has a simple pole with residue 1. The above observations lay the the foundations for the study of distribution of prime numbers. Using the Euler product ∞

−1

1 1 = ∏ (1 − 𝑠 ) 𝑠 𝑛 𝑝 𝑛=1 𝑝 ∑

for ℜ(𝑠) > 1,

one can show that 𝜁(𝑠) ≠ 0 for ℜ(𝑠) > 1 (see Exercise 5.2.1.2). Using the Taylor series expansion ∞

𝑥𝑘 , |𝑥| < 1, 𝑘 𝑘=1

− log(1 − 𝑥) = ∑ we derive



log 𝜁(𝑠) = − ∑ log(1 − 𝑝

1 1 ) = ∑ (∑ ). 𝑘𝑠 𝑝𝑠 𝑘𝑝 𝑝 𝑘=1

Taking logarithmic derivatives, ∞

𝑘 log 𝑝 𝜁′ (𝑠) = −∑ ∑ . 𝜁(𝑠) 𝑘𝑝𝑘𝑠 𝑝 𝑘=1 Thus, ∞

(5.3)



log 𝑝 𝜁′ (𝑠) = ∑ ∑ 𝑘𝑠 , 𝜁(𝑠) 𝑝 𝑝 𝑘=1

ℜ(𝑠) > 1.

We now learn how to derive sharp estimates for ∑𝑛≤𝑥 𝑓(𝑛) using ∞ complex-analytic information about 𝐹(𝑠) = ∑𝑛=1 𝑓(𝑛)/𝑛𝑠 . In the next couple of sections, we will apply this technique to derive information about the distribution of prime numbers. We start with Lemma 5.1.

80

5. Distribution of prime numbers

Lemma 5.1. For a positive real number 𝑥, define if 𝑥 > 1 ⎧1 𝛿(𝑥) ≔ 1/2 if 𝑥 = 1 ⎨ if 0 < 𝑥 < 1. ⎩0 For any 𝑐, 𝑇 > 0, we have 𝑐

𝑥 𝑐+𝑖𝑇 𝑠 | 1 | 𝑥 | | ≤ { 𝑇| log 𝑥| ∫ 𝑑𝑠 − 𝛿(𝑥) | 2𝜋𝑖 | 𝑐 𝑠 | | 𝑐−𝑖𝑇 𝑇

(5.4)

if 𝑥 ≠ 1 if 𝑥 = 1.

Thus, 𝑐+𝑖∞

𝑐+𝑖𝑇

1 𝑥𝑠 𝑥𝑠 1 ∫ ∫ 𝑑𝑠 ≔ lim 𝑑𝑠 = 𝛿(𝑥). 2𝜋𝑖 𝑐−𝑖∞ 𝑠 𝑠 𝑇→∞ 2𝜋𝑖 𝑐−𝑖𝑇

(5.5)

Proof. Note that the function 𝑥𝑠 /𝑠 is analytic everywhere except at 𝑠 = 0, where it has a simple pole with residue 1. We consider different cases to prove the lemma. Case 1. Let 𝑥 > 1. For some 𝑢 > 0, let us consider the rectangular contour 𝒞ᵆ in counterclockwise direction with vertices 𝑐 − 𝑖𝑇, 𝑐 + 𝑖𝑇, −𝑢 + 𝑖𝑇, −𝑢 − 𝑖𝑇. By Cauchy’s residue theorem, we have 1 𝑥𝑠 𝑥𝑠 ∫ 𝑑𝑠 = Res𝑠=0 = 1. 2𝜋𝑖 𝒞 𝑠 𝑠 𝑢

Thus, 𝑐+𝑖𝑇

−ᵆ+𝑖𝑇

1 +∫ [∫ 2𝜋𝑖 𝑐−𝑖𝑇 𝑐+𝑖𝑇

−ᵆ−𝑖𝑇

+∫

𝑐−𝑖𝑇

+∫

−ᵆ+𝑖𝑇

We denote −ᵆ+𝑖𝑇

𝐼1 = ∫ 𝑐+𝑖𝑇 −ᵆ−𝑖𝑇

𝐼2 = ∫ −ᵆ+𝑖𝑇

𝑥𝑠 𝑑𝑠, 𝑠 𝑥𝑠 𝑑𝑠 𝑠

and 𝑐−𝑖𝑇

𝐼3 = ∫ −ᵆ−𝑖𝑇

𝑥𝑠 𝑑𝑠. 𝑠

−ᵆ−𝑖𝑇

]

𝑥𝑠 𝑑𝑠 = 1. 𝑠

5.3. Analytic properties of Dirichlet series

81

Thus, (5.6)

𝑐+𝑖𝑇 𝑠 | 1 | 𝑥 | ∫ 𝑑𝑠 − 𝛿(𝑥)|| ≤ |𝐼1 + 𝐼2 + 𝐼3 |. | 2𝜋𝑖 𝑠 | | 𝑐−𝑖𝑇

We now observe that −ᵆ+𝑖𝑇

|𝐼1 | = | ∫ 𝑐+𝑖𝑇 −ᵆ

(5.7) ≤

1| |∫ 𝑇| 𝑐

−ᵆ

𝑥𝑠 𝑑𝑠| ≤ ∫ 𝑠 𝑐

𝑥𝜍 √𝜎 2 + 𝑇 2

𝑑𝜎

| 1 | 𝑥−ᵆ 𝑥𝑐 | |. − 𝑥𝜍 𝑑𝜎| ≤ | | | 𝑇 log 𝑥 log 𝑥 |

Similarly, (5.8)

|𝐼3 | ≤

1 | 𝑥−ᵆ 𝑥𝑐 | | |. − 𝑇 | log 𝑥 log 𝑥 |

We now look at the vertical integral 𝐼2 . −ᵆ−𝑖𝑇

|𝐼2 | = | ∫ (5.9)

−ᵆ+𝑖𝑇

𝑇

𝑥𝑠 1 𝑑𝑠| ≤ 𝑥−ᵆ ∫ 𝑑𝑡 𝑠 −𝑇 √𝑢2 + 𝑡2

𝑥−ᵆ ≤ 2𝑇. 𝑢 Since 𝑥 > 1, 𝑥−ᵆ → 0 as 𝑢 → ∞. Thus, combining equations (5.7)–(5.9) and (5.6) and taking 𝑢 → ∞, we have (5.10)

𝑐+𝑖𝑇 𝑠 | 1 | 𝑥 𝑥𝑐 | 𝑑𝑠 − 𝛿(𝑥)|| ≤ 2 . | 2𝜋𝑖 ∫ 𝑠 𝑇 log 𝑥 | | 𝑐−𝑖𝑇

Case 2. Let 0 < 𝑥 < 1. For this case, we consider the rectangular contour 𝒞ᵆ′ in counterclockwise direction with vertices 𝑢 − 𝑖𝑇, 𝑢 + 𝑖𝑇, 𝑐 + 𝑖𝑇, 𝑐 − 𝑖𝑇 where 𝑢 > 0. Following a similar method as in Case 1, we get (5.11)

𝑐+𝑖𝑇 𝑠 | 1 | 𝑥 𝑥𝑐 | ∫ 𝑑𝑠 − 𝛿(𝑥)|| ≤ 2 . | 2𝜋𝑖 𝑠 𝑇| log 𝑥| | | 𝑐−𝑖𝑇

82

5. Distribution of prime numbers Case 3. Let 𝑥 = 1.

𝑐+𝑖𝑇

𝑇

1 1 𝑖 1 ∫ ∫ 𝑑𝑠 = 𝑑𝑡 2𝜋𝑖 𝑐−𝑖𝑇 𝑠 2𝜋𝑖 −𝑇 𝑐 + 𝑖𝑡 𝑇

=

1 𝑖(𝑐 − 𝑖𝑡) ∫ 𝑑𝑡 2𝜋𝑖 −𝑇 𝑐2 + 𝑡2

=

1 𝑐 1 𝑡 ∫ ∫ 𝑑𝑡 + 𝑑𝑡 2𝜋 −𝑇 𝑐2 + 𝑡2 2𝜋𝑖 −𝑇 𝑐2 + 𝑡2

𝑇

(5.12)

𝑇

𝑇

1 𝑐 ∫ 𝑑𝑡 𝜋 0 𝑐2 + 𝑡2 1 𝑇 = arctan( ) 𝜋 𝑐 1 𝑐 = − arctan( ) . 2 𝑇

=

Thus, 𝑐+𝑖𝑇 | 1 1 | ∫ 𝑑𝑠 − | 2𝜋𝑖 𝑠 | 𝑐−𝑖𝑇

1 || | 𝑐 𝑐 ≤ arctan( )|| ≪ . 2 || | 𝑇 𝑇

Equations (5.10)–(5.12) are sufficient to prove (5.4). Taking 𝑇 → ∞, we get equation (5.5). This concludes the proof of Lemma 5.1. □ Lemma 5.1 can be used to study ∑𝑛≤𝑥 𝑓(𝑛) as follows: Theorem 5.2 (Perron’s formula). Let ∞

𝑓(𝑛) 𝑛𝑠 𝑛=1

𝐹(𝑠) = ∑

be absolutely convergent in the half-plane ℜ(𝑠) > 𝜎0 and let 𝑐 > 𝜎0 . For a positive real number 𝑥 ∉ ℕ and for a positive real number 𝑇, 𝑐+𝑖𝑇



|𝑓(𝑛)| 1 𝑥𝑠 𝑥𝑐 ∫ ∑ 𝑓(𝑛) = ∑ 𝑐 𝐹(𝑠) 𝑑𝑠 + O( ). 2𝜋𝑖 𝑠 log 𝑇 𝑛 | log(𝑥/𝑛)| 𝑐−𝑖𝑇 𝑛≤𝑥 𝑛=1

5.4. Distribution functions for prime numbers

83

Proof. Since 𝐹(𝑠) is absolutely convergent for ℜ(𝑠) = 𝑐, we have 𝑐+𝑖𝑇



𝑓(𝑛) 𝑥𝑠 1 ∫ ( ∑ 𝑠 ) 𝑑𝑠 2𝜋𝑖 𝑐−𝑖𝑇 𝑛=1 𝑛 𝑠 𝑐+𝑖𝑇



= ∑ 𝑓(𝑛) ( 𝑛=1 ∞

1 (𝑥/𝑛)𝑠 ∫ 𝑑𝑠) 2𝜋𝑖 𝑐−𝑖𝑇 𝑠

𝑥 (𝑥/𝑛)𝑐 = ∑ 𝑓(𝑛) (𝛿 ( ) + O( )) 𝑛 𝑇| log(𝑥/𝑛)| 𝑛=1 ∞

= ∑ 𝑓(𝑛) + O( 𝑛≤𝑥

|𝑓(𝑛)| 𝑥𝑐 ∑ 𝑐 ), 𝑇 𝑛=1 𝑛 | log(𝑥/𝑛)|

where the last two equalities follow from Lemma 5.1. Note here, that since 𝑥 is not an integer, 𝑛 ≠ 𝑥 for any 𝑛. □ The use of Perron’s formula to derive estimates for ∑𝑛≤𝑥 𝑓(𝑛) is called the method of contour integration. 5.3.1. Exercises. Exercise 5.3.1.1. For 𝑛 ∈ ℕ, we define the von Mangoldt function Λ by log 𝑝 Λ(𝑛) ≔ { 0

if 𝑛 = 𝑝𝑘 , 𝑘 ≥ 1 otherwise

and 𝜓(𝑥) = ∑ Λ(𝑛). 𝑛≤𝑥

Show that for ℜ(𝑠) > 1, −

𝜁′ (𝑠) = 𝑠∫ 𝜁(𝑠) 1



𝜓(𝑥) 𝑑𝑥. 𝑥𝑠+1

Exercise 5.3.1.2. Prove Lemma 5.1 for the case when 0 < 𝑥 < 1.

5.4. Distribution functions for prime numbers The goal of this section is to introduce some arithmetic functions which help us to study the distribution of prime numbers. We obtain estimates for these functions with the help of the analytic properties of 𝜁(𝑠).

84

5. Distribution of prime numbers We start with the basic prime counting function

(5.13)

𝜋(𝑥) = ∑ 1 = #{𝑝 ≤ 𝑥, 𝑝 prime}. 𝑝≤𝑥 𝑝 prime

One of the most important theorems in number theory is the prime number theorem which gives an asymptotic estimate for 𝜋(𝑥) as 𝑥 → ∞. Theorem 5.3 (Prime number theorem). 𝑥 𝜋(𝑥) ∼ as 𝑥 → ∞. log 𝑥 The prime counting function can be studied with the help of the following “weighted” sums defined by Hans Carl Friedrich von Mangoldt. As before, we define Λ(𝑛) = {

log 𝑝

if 𝑛 = 𝑝𝑘 , 𝑘 ≥ 1

0

otherwise

and log 𝑝 𝑇(𝑛) = { 0

if 𝑛 = 𝑝 otherwise

.

We then consider partial sums of the above functions: (5.14)

𝜓(𝑥) ≔ ∑ Λ(𝑛) 𝑛≤𝑥

and (5.15)

𝜃(𝑥) ≔ ∑ 𝑇(𝑛). 𝑛≤𝑥

By suitable applications of the partial summation formula, one can prove the following equivalent statements of the prime number theorem. Theorem 5.4. The following statements are equivalent. (a) 𝜋(𝑥) ∼ 𝑥/ log 𝑥 as 𝑥 → ∞. (b) 𝜃(𝑥) ∼ 𝑥 as 𝑥 → ∞. (c) 𝜓(𝑥) ∼ 𝑥 as 𝑥 → ∞. Proof. The proof of this theorem is left as an exercise (see Exercise 5.4.3.1). □

5.4. Distribution functions for prime numbers

85

In this section, we prove the prime number theorem by using the method of contour integration. In fact, we prove the following stronger theorem. Theorem 5.5. There exists a constant 𝐷 > 0 such that 𝜓(𝑥) = 𝑥 + O(𝑥 exp(−𝐷(log 𝑥)1/10 )) . This leads to Corollary 5.6. Corollary 5.6. For any 𝐶 > 0, 𝑥 ). (log 𝑥)𝐶 Here, the implied constant depends only on 𝐶. 𝜓(𝑥) = 𝑥 + O(

Proof. The proof is left as an exercise to the reader.



Remark 5.7. Note that Theorem 5.5 immediately implies the prime number theorem. It can be viewed as an effective version of prime number theorem, that is, a version with sharp estimates for |𝜓(𝑥) − 𝑥|. The proof of Theorem 5.5 has the following key ideas. • We first use Perron’s formula to reduce the problem of estimating ∑𝑛≤𝑥 Λ(𝑛) to the integral 𝑐+𝑖𝑇

−𝜁′ 1 𝑥𝑠 ∫ (𝑠)) 𝑑𝑠 ( 2𝜋𝑖 𝑐−𝑖𝑇 𝑠 𝜁 for 𝑐 > 1 along with a residual error term. This is made precise in Proposition 5.8. • The aforesaid integral is evaluated by shifting the line of integration from 𝑐 > 1 to the left of the line ℜ(𝑠) = 1. This requires us to specify a region to the left of ℜ(𝑠) = 1 such that −𝜁′ /𝜁(𝑠) is analytic in this region, except at the point 𝑠 = 1. That is, we need a region around ℜ(𝑠) = 1 which does not contain any zeros of 𝜁(𝑠). One of the fundamental observations of Riemann was that the sharpness of the error term in the estimate |𝜓(𝑥)−𝑥| depends on how wide a zero-free region we can determine for 𝜁(𝑠) in the strip 0 < ℜ(𝑠) < 1. This theme is described in several textbooks (see [16], [43], [53], [54]). We also refer the reader to [39, Chapter 5] for a discussion of how different analytic methods have been brought to bear upon the estimation

86

5. Distribution of prime numbers of this error term. For our immediate purposes, we will outline the “classical” zero-free region that will be sufficient for proving Theorem 5.5. This is described in Section 5.4.1 leading up to a proof of Theorem 5.15. • In Section 5.4.2, we estimate the residual difference 𝑐+𝑖𝑇 | −𝜁′ 𝑥𝑠 | |𝜓(𝑥) − 1 ∫ (𝑠)) 𝑑𝑠|| . ( | 2𝜋𝑖 𝑐−𝑖𝑇 𝑠 | 𝜁 |

This leads to Theorem 5.16. • We then combine all the above estimates together to obtain Theorem 5.5. As mentioned in Remark 5.7, this immediately implies Theorem 5.3. In fact, Theorem 5.5 can be viewed as a stronger version of the prime number theorem with explicit error terms. We now explain the above key steps in detail. Firstly, applying Perron’s formula to 𝑓(𝑛) = Λ(𝑛), we get Proposition 5.8. Proposition 5.8. For positive real numbers 𝑥, 𝑐 and 𝑇 with 𝑐 > 1 and 𝑥 ∉ ℕ, 𝑐+𝑖𝑇

∑ Λ(𝑛) = 𝑛≤𝑥



|Λ(𝑛)| −𝜁′ 𝑥𝑠 1 𝑥𝑐 ∫ ∑ (𝑠)) 𝑑𝑠 + O( ( ). 2𝜋𝑖 𝑐−𝑖𝑇 𝑠 𝑇 𝑛=1 𝑛𝑐 | log(𝑥/𝑛)| 𝜁

Proof. The proof follows immediately from equation (5.3) and Theorem 5.2. □ In the next few sections, we deliberate on the integral 𝑐+𝑖𝑇

−𝜁′ 1 𝑥𝑠 ∫ (𝑠)) 𝑑𝑠. ( 2𝜋𝑖 𝑐−𝑖𝑇 𝑠 𝜁 5.4.1. Bounds and zero-free regions for the Riemann zeta function. We start with some observations on 𝜁(𝑠) when ℜ(𝑠) > 0. Proposition 5.9. Let 𝑠 = 𝜎 + 𝑖𝑡 ∈ ℂ with 𝜎 > 0 and 𝑡 ≠ 0. Then, for any 𝑁 ∈ ℕ, 𝑁

|𝑠| 1 𝑁 1−𝜍 + . + 𝜍 𝜍 𝑛 |𝑡| 𝜎𝑁 𝑛=1

|𝜁(𝑠)| ≤ ∑

5.4. Distribution functions for prime numbers

87

Proof. For 𝑁 ∈ ℕ, define the arithmetic function 𝑓(𝑛) = 0 when 𝑛 ≤ 𝑁 and 𝑓(𝑛) = 1 when 𝑛 > 𝑁. By equation (5.1), for ℜ(𝑠) > 1, we have 𝑁





𝑓(𝑛) 1 = ∑ 𝑠 = 𝑠∫ 𝑠 𝑛 𝑛 1 𝑛=1 𝑛=1

𝜁(𝑠) − ∑

𝐴(𝑥) 𝑑𝑥, 𝑥𝑠+1

where 0 𝐴(𝑥) = { [𝑥] − 𝑁

if 𝑥 < 𝑁 if 𝑥 ≥ 𝑁.

Thus, ∞

𝑁

1 [𝑥] − 𝑁 𝑑𝑥 = 𝑠∫ 𝑠 𝑛 𝑥𝑠+1 𝑁 𝑛=1

𝜁(𝑠) − ∑ ∞

= 𝑠∫ (5.16)

𝑁



𝑥 − {𝑥} 1 𝑑𝑥 − 𝑠𝑁 ∫ 𝑑𝑥 𝑠+1 𝑠+1 𝑥 𝑥 𝑁 ∞

𝑠𝑁 1−𝑠 {𝑥} =− − 𝑁 1−𝑠 − 𝑠 ∫ 𝑑𝑥 𝑠+1 1−𝑠 𝑥 𝑁 ∞

=−

{𝑥} 𝑁 1−𝑠 − 𝑠∫ 𝑑𝑥. 1−𝑠 𝑥𝑠+1 𝑁

We observe that for any 𝜖 > 0, the integral ∞

∫ 𝑁

{𝑥} 𝑑𝑥 𝑥𝑠+1

converges uniformly for ℜ(𝑠) ≥ 𝜖. Therefore, this integral defines an analytic function in 𝑠 for ℜ(𝑠) > 0. That is, by equation (5.16), 𝑁

1 𝑛𝑠 𝑛=1

𝜁(𝑠) − ∑

can be analytically continued to ℜ(𝑠) > 0 except for a simple pole at 𝑠 = 1 with residue 1. Thus, the equation ∞

𝑁

(5.17)

1 𝑁 1−𝑠 {𝑥} − − 𝑠∫ 𝑑𝑥 𝑠 𝑠+1 𝑛 1 − 𝑠 𝑥 𝑁 𝑛=1

𝜁(𝑠) = ∑

holds for ℜ(𝑠) > 0, 𝑠 ≠ 1. Now, let 𝑡 = ℑ(𝑠) ≠ 0. Then, | −𝑁 1−𝑠 | 𝑁 1−𝜍 | |≤ | 1−𝑠 | |𝑡|

88

5. Distribution of prime numbers

and





| | | | |𝑠| {𝑥} 1 |−𝑠 ∫ 𝑑𝑥| ≤ |𝑠| |∫ 𝑑𝑥| = . 𝑠+1 𝜍+1 𝑥 | | | 𝑁 𝑥 | 𝜎𝑁 𝜍 𝑁 □

This proves Proposition 5.9. We deduce Corollary 5.10 from the proof of Proposition 5.9. Corollary 5.10. Let 𝑁 ∈ ℕ. Then, for 𝜎 > 0 and 𝑡 ≠ 0, |𝜁(𝑠)| ≤

|𝑡| 𝑁 1−𝜍 𝑁 1−𝜍 + + (1 + ) 𝑁 −𝜍 . 1−𝜎 |𝑡| 𝜎

Proof. Since 1/𝑥𝜍 is a monotonically decreasing function on [2, 𝑁], we have 𝑁

𝑁

1 1 𝑁 1−𝜍 − 1 𝑁 1−𝜍 ≤ . ≤1+∫ 𝑑𝑥 ≤ 1 + 𝜍 𝜍 𝑛 𝑥 1−𝜎 1−𝜎 1 𝑛=2

1+ ∑

We also observe that 𝜎 + |𝑡| |𝑡| 1 |𝑠| ≤ ≤ (1 + ) 𝜍 . 𝜍 𝜍 𝜎𝑁 𝜎𝑁 𝜎 𝑁 By Proposition 5.9 and above inequalities, it follows that 𝑁

|𝑠| 1 𝑁 1−𝜍 + + 𝜍 𝜍 𝑛 |𝑡| 𝜎𝑁 𝑛=1

|𝜁(𝑠)| ≤ ∑ ≤

|𝑡| 1 𝑁 1−𝜍 𝑁 1−𝜍 + + (1 + ) 𝜍 . 1−𝜎 |𝑡| 𝜎 𝑁 □

We are now ready to prove the following upper bounds for |𝜁(𝑠)| and |𝜁′ (𝑠)| for ℜ(𝑠) = 𝜎 sufficiently close to 1. Proposition 5.11. Let 𝑠 = 𝜎 + 𝑖𝑡 with |𝑡| ≥ 2 and 𝜎≥1−

1 . 4 log |𝑡|

There exist absolute, positive constants 𝐴 and 𝐵 such that 2

|𝜁(𝑠)| ≤ 𝐴 log |𝑡| and |𝜁′ (𝑠)| ≤ 𝐵 log |𝑡|.

5.4. Distribution functions for prime numbers

89

Proof. We sketch a proof and relegate the details to be worked out in Exercises 5.4.3.3–5.4.3.9. Let 1/2 ≤ 𝜎0 < 1 and |𝑡| ≥ 2. The inequality (5.18)

|𝜁(𝑠)| ≤ 4

|𝑡|1−𝜍0 , 𝜎 ≥ 𝜎0 1 − 𝜎0

follows by taking 𝑁 = [|𝑡|] in Corollary 5.10 (see Exercise 5.4.3.3). In particular, if |𝑡| ≥ 2, then 1−

1 1 ≥ . 4 log |𝑡| 2

Applying the inequality (5.18) with 𝜎0 = 1 −

1 , 4 log |𝑡|

we get

|𝜁(𝑠)| ≤ 16𝑒1/4 log |𝑡|. We now take derivatives of terms in equation (5.17). We obtain, for ℜ(𝑠) > 0, 𝑠 ≠ 1, ∞

𝑁



log 𝑛 𝑥 − [𝑥] 𝑥 − [𝑥] 𝜁 (𝑠) = − ∑ + 𝑠∫ log 𝑥𝑑𝑥 − ∫ 𝑑𝑥 𝑠 𝑠+1 𝑛 𝑥 𝑥𝑠+1 𝑁 𝑁 𝑛=1 ′

(5.19)



𝑁 1−𝑠 log 𝑁 𝑁 1−𝑠 − . 𝑠−1 (𝑠 − 1)2

Deriving inequalities in a manner very similiar to the proofs of Proposition 5.9 and Corollary 5.10, we obtain, for 𝑁 ∈ ℕ and 𝜎 ≥ 𝜎0 > 0, |𝜁′ (𝑠)| ≤ 3 log 𝑁 (

|𝑡| 𝑁 1−𝜍0 𝑁 1−𝜍0 + (1 + ) 𝑁 −𝜍0 + ). 1 − 𝜎0 𝜎0 |𝑡|

We now take 𝑁 = [|𝑡|]. Thus, log 𝑁 ≤ log |𝑡|. For 1/2 ≤ 𝜎0 < 1, we immediately derive, akin to Exercise 5.4.3.4, |𝜁′ (𝑠)| ≤ 3 log |𝑡| (4 Finally, for 𝜎0 = 1 −

1 , 4 log |𝑡|

|𝑡|1−𝜍0 ) for all 𝜎 ≥ 𝜎0 . 1 − 𝜎0

we get

|𝜁′ (𝑠)| ≤ 48𝑒1/4 (log |𝑡|)2 . □ Proposition 5.12. 𝜁(𝑠) ≠ 0 for ℜ(𝑠) ≥ 1.

90

5. Distribution of prime numbers

Proof. By the Euler product representation for 𝜁(𝑠), we see that 𝜁(𝑠) ≠ 0 when ℜ(𝑠) > 1 (see Exercise 5.2.1.2). We also have, for ℜ(𝑠) = 𝜎 > 1, ∞

1 𝑘𝑠 𝑘𝑝 𝑘=1

log 𝜁(𝑠) = ∑ ∑ 𝑝 ∞

1 𝑘𝜍 𝑝𝑖𝑘𝑡 𝑘𝑝 𝑘=1

=∑∑ 𝑝 ∞

= ∑ 𝑛=1

Λ(𝑛) (cos 𝑛𝑡 − 𝑖 sin 𝑛𝑡). log 𝑛

𝑛𝜍

Observe that for 𝜎 > 1, 3 log 𝜁(𝜎) + 4 log 𝜁(𝜎 + 𝑖𝑡) + log 𝜁(𝜎 + 2𝑖𝑡) ∞

= ∑ 𝑛=1

Λ(𝑛) (3 + 4𝑒−𝑖𝑡 log 𝑛 + 𝑒−2𝑖𝑡 log 𝑛 ) . log 𝑛

𝑛𝜍

Note that for any 𝜃 ∈ ℝ, 3 + 4 cos 𝜃 + cos 2𝜃 = 2(1 + cos 𝜃)2 ≥ 0. Thus, ℜ (3 log 𝜁(𝜎) + 4 log 𝜁(𝜎 + 𝑖𝑡) + log 𝜁(𝜎 + 2𝑖𝑡)) ∞

Λ(𝑛) (3 + 4 cos(𝑡 log 𝑛) + cos(2𝑡 log 𝑛)) 𝜍 log 𝑛 𝑛 𝑛=1

= ∑ ≥0

and therefore, for any 𝜎 > 1 and 𝑡 ∈ ℝ, |𝜁(𝜎)3 𝜁(𝜎 + 𝑖𝑡)4 𝜁(𝜎 + 2𝑖𝑡)| ≥ 1.

(5.20)

Suppose 𝜁(𝑠) has a zero of order 𝑚 at 𝑠 = 1 + 𝑖𝑡, 𝑡 ≠ 0. Then, lim+

𝜍→1

𝜁(𝜎 + 𝑖𝑡) = 𝑐 ≠ 0. (𝜎 + 𝑖𝑡 − 1 − 𝑖𝑡)𝑚

We also recall that lim (𝜎 − 1)𝜁(𝜎) = 1.

𝜍→1+

By equation (5.20), (5.21)

| (𝜎 − 1)3 | | 𝜁(𝜎)3 𝜁(𝜎 + 𝑖𝑡)4 𝜁(𝜎 + 2𝑖𝑡)| ≥ (𝜎 − 1)3−4𝑚 . | (𝜎 − 1)4𝑚 |

5.4. Distribution functions for prime numbers

91

We now have | | 𝜁(𝜎 + 𝑖𝑡)4 lim+ |(𝜎 − 1)3 𝜁(𝜎)3 𝜁(𝜎 + 2𝑖𝑡)| = |𝑐4 𝜁(1 + 2𝑖𝑡)| . | (𝜎 − 1)4𝑚 𝜍→1 | On the other hand, lim (𝜎 − 1)3−4𝑚 = ∞ if 𝑚 ≥ 1.

𝜍→1+

This contradicts equation (5.21) for 𝑚 ≥ 1. Therefore, 𝑚 = 0 and 𝜁(𝑠) cannot have a zero at 𝑠 = 1 + 𝑖𝑡, 𝑡 ≠ 0. □ Theorem 5.13. There exist 𝐴1 , 𝐴2 > 0 such that 𝜁(𝑠) ≠ 0 and | 1 | | | ≤ 𝐴2 (log |𝑡|)7 | 𝜁(𝑠) | when |𝑡| ≥ 2, 𝜎 ≥ 1 −

𝐴1 . (log |𝑡|)9

Proof. Note that for 𝜎 ≥ 2, ∞ |∞ 1| 1 |𝜁(𝑠)| ≥ 1 − || ∑ 𝑠 || ≥ 1 − ∑ 2 ≥ 2 − 𝜁(2). 𝑛 |𝑛=2 𝑛 | 𝑛=2

Thus, |1/𝜁(𝑠)| is uniformly bounded above by a positive constant when 𝜎 ≥ 2. We now consider the domain 𝐴1 ≤ 𝜎 < 2, |𝑡| ≥ 2} , (5.22) 𝒜 ≔ {1 + (log |𝑡|)9 where 𝐴1 is a positive constant to be appropriately chosen later. Note that lim (𝜎 − 1)𝜁(𝜎) = 1.

𝜍→1+

Thus, there exists a positive constant 𝑐 such that 𝑐 (5.23) |𝜁(𝜎)| ≤ for 1 < 𝜎 < 2. 𝜎−1 By Proposition 5.11, (5.24)

|𝜁(𝜎 + 2𝑖𝑡)| ≤ 2𝐴 log |𝑡| for 𝑠 ∈ 𝒜.

In the above, 𝐴 is as in Proposition 5.11. Combining equations (5.20), (5.23) and (5.24), we have, for 𝑠 ∈ 𝒜, (5.25)

|𝜁(𝜎 + 𝑖𝑡)| ≥ (

𝐴1 3/4 1 1/4 1 ) ( ) ( ). 𝑐 2𝐴 (log |𝑡|)7

92

5. Distribution of prime numbers

We denote 𝐴4 = (

𝐴1 3/4 1 1/4 ) ( ) . 𝑐 2𝐴

We also denote 𝜎1 = 1 −

𝐴1 𝐴1 , 𝜎 =1+ . (log |𝑡|)9 2 (log |𝑡|)9

We now choose 𝐴1 to be a positive constant such that 𝐴1 < min(

(log 2)8 1 , ). 4 (2𝐵)4 𝑐3 (2𝐴)

Since 𝐴1 < (log 2)8 /4, we have 1/8

|𝑡| ≥ 2 ≥ 𝑒(4𝐴1 ) and therefore, 1−

𝐴1 1 ≥1− . 4 log |𝑡| (log |𝑡|)9

We observe that 𝜍2

𝜁(𝜎2 + 𝑖𝑡) − 𝜁(𝜎 + 𝑖𝑡) = ∫

𝜁′ (𝑥 + 𝑖𝑡)𝑑𝑥.

𝜍

Thus, by Proposition 5.11, for 𝜎1 ≤ 𝜎 ≤ 𝜎2 , |𝜁(𝜎2 + 𝑖𝑡) − 𝜁(𝜎 + 𝑖𝑡)| ≤ (𝜎2 − 𝜎1 ) max |𝜁′ (𝑥 + 𝑖𝑡)| 𝑥∈[𝜍1 ,𝜍2 ]

2𝐴1 𝐵(log |𝑡|)2 . ≤ (log |𝑡|)9 Therefore, for 𝜎1 ≤ 𝜎 ≤ 𝜎2 ,

(5.26)

|𝜁(𝜎 + 𝑖𝑡)| ≥ |𝜁(𝜎2 + 𝑖𝑡)| − |𝜁(𝜎2 + 𝑖𝑡) − 𝜁(𝜎 + 𝑖𝑡)| 2𝐴1 𝐵 ≥ |𝜁(𝜎2 + 𝑖𝑡)| − (log |𝑡|)7 𝐴 − 2𝐴1 𝐵 ≥ 4 . (log |𝑡|)7

Since 𝐴1
0. 𝑐 2𝐴

5.4. Distribution functions for prime numbers

93

Combining the inequalities (5.25) and (5.26), we can show that there exists an absolute constant 𝐴2 > 0 such that | 1 | | | ≤ 𝐴2 (log |𝑡|)7 | 𝜁(𝑠) | when 𝜎≥1− In fact, 𝐴2 =

𝐴1 . (log |𝑡|)9

1 . 𝐴4 − 2𝐴1 𝐵 □

This proves Theorem 5.13. Theorem 5.14. There exist positive constants 𝐴′1 , 𝐴3 and 𝐷 such that (a) | 𝜁′ (𝑠) | | | ≤ 𝐴3 (log |𝑡|)9 | 𝜁(𝑠) | when |𝑡| ≥ 2 and 𝜎 ≥ 1 −

𝐴′1 . (log |𝑡|)9

(b) 𝜁′ (𝑠) | | |≤𝐷 |(𝑠 − 1) | 𝜁(𝑠) | when |𝑡| ≤ 2 and 1 − 𝐴′1 ≤ 𝜎 ≤ 2. Proof. Let 𝐴1 , 𝐴2 and 𝐵 be as in Theorems 5.11 and 5.13. Using these theorems, we have | 𝜁′ (𝑠) | | | ≤ 𝐴2 𝐵(log |𝑡|)9 | 𝜁(𝑠) | when 𝐴1 . |𝑡| ≥ 2 and 𝜎 ≥ 1 − (log |𝑡|)9 We denote 𝐴3 = 𝐴2 𝐵. Note that (𝑠 − 1)

𝜁′ (𝑠) 𝜁(𝑠)

is analytic in the compact set {1 ≤ 𝜎 ≤ 2, |𝑡| ≤ 2}.

94

5. Distribution of prime numbers

Due to compactness of the above set, we can move the region of analyticity of the above function slightly to the left of 𝜎 = 1. Thus, there exists a small positive constant 𝐴0 < 1/2 such that (𝑠 − 1)𝜁′ (𝑠)/𝜁(𝑠) is analytic in the set {1 − 𝐴0 ≤ 𝜎 ≤ 2, |𝑡| ≤ 2} and therefore uniformly bounded by a positive constant 𝐷 in this set. We now choose a positive constant 𝐴′1 small enough such that 𝐴′1 𝐴1 ≥ max{1 − 𝐴0 , 1 − }. (log |𝑡|)9 (log |𝑡|)9 Theorem 5.14 immediately follows from Theorem 5.13 and the above considerations. □ 1−

Theorem 5.15. Let 𝑇 be a positive real number such that 𝑇 ≥ 2. Let 𝑏=1−

𝐴′1 𝐴′1 , 𝑐 = 1 + (log 𝑇)9 (log 𝑇)9

with 𝐴′1 as chosen in Theorem 5.14. Then, 𝑐+𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 1 𝑥𝑐 ∫ 𝑑𝑠 = 𝑥 + O( ) + O(𝑥𝑏 (log 𝑇)10 ) . 2𝜋𝑖 𝑐−𝑖𝑇 𝜁(𝑠) 𝑠 𝑇 Proof. We consider a rectangular contour ℛ in the counter-clockwise direction joining 𝑐−𝑖𝑇, 𝑐+𝑖𝑇, 𝑏+𝑖𝑇 and 𝑏−𝑖𝑇. The function −𝜁′ (𝑠)/𝜁(𝑠) is analytic at all points in the rectangular region enclosed by ℛ except for a simple pole at 𝑠 = 1 with residue 1. By an application of Cauchy’s residue theorem −𝜁′ (𝑠) 𝑥𝑠 −𝜁′ (𝑠) 𝑥𝑠 1 ∫ 𝑑𝑠 = Res𝑠=1 𝑑𝑠 = 𝑥. 2𝜋𝑖 ℛ 𝜁(𝑠) 𝑠 𝜁(𝑠) 𝑠 Thus, 𝑐+𝑖𝑇

(5.27) where

3

−𝜁′ (𝑠) 𝑥𝑠 1 ∫ = 𝑥 + ∑ 𝐼𝑖 , 2𝜋𝑖 𝑐−𝑖𝑇 𝜁(𝑠) 𝑠 𝑖=1 𝑏−𝑖𝑇

𝐼1 = ∫ 𝑐−𝑖𝑇 𝑏+𝑖𝑇

𝐼2 = ∫ 𝑏−𝑖𝑇

and

𝑐+𝑖𝑇

𝐼3 = ∫ 𝑏+𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 , 𝜁(𝑠) 𝑠 −𝜁′ (𝑠) 𝑥𝑠 𝜁(𝑠) 𝑠 −𝜁′ (𝑠) 𝑥𝑠 . 𝜁(𝑠) 𝑠

5.4. Distribution functions for prime numbers To estimate 𝐼1 , we first observe that for 𝑠 ∈ [𝑐 − 𝑖𝑇, 𝑏 − 𝑖𝑇], | 𝑥𝑠 | 𝑥𝑐 | |≤ . |𝑠| 𝑇 Since 𝑇 ≥ 2 and 𝐴′1 , (log |𝑡|)9 we apply Theorem 5.14 (a) to obtain 𝜎≥1−

𝑥𝑐 𝑥𝑐 (𝑐 − 𝑏)𝐴3 (log 𝑇)9 ≪ . 𝑇 𝑇

|𝐼1 | ≤ Similarly, we obtain

|𝐼3 | ≤

𝑥𝑐 . 𝑇

We estimate 𝐼2 as follows: 𝑏+𝑖𝑇

∫ 𝑏−𝑖𝑇

𝑏−2𝑖

𝑏+2𝑖

𝑏+𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 −𝜁′ (𝑠) 𝑥𝑠 = (∫ +∫ +∫ . ) 𝜁(𝑠) 𝑠 𝜁(𝑠) 𝑠 𝑏−𝑖𝑇 𝑏−2𝑖 𝑏+2𝑖

Applying Theorem 5.14 (a) again, we obtain 𝑏+𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 𝑑𝑠 𝜁(𝑠) 𝑠

∫ 𝑏+2𝑖

𝑇

−𝜁′ (𝑏 + 𝑖𝑦) 𝑥𝑖𝑦 𝑖𝑑𝑦 𝜁(𝑏 + 𝑖𝑦) 𝑏 + 𝑖𝑦

= 𝑥𝑏 ∫ 2 𝑇

(log 𝑦)9 𝑑𝑦 ≪ 𝑥𝑏 (log 𝑇)10 . 𝑦

≪ 𝑥𝑏 ∫ 2

The integral 𝑏−2𝑖

∫ 𝑏−𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 𝜁(𝑠) 𝑠

can be estimated similarly. By Theorem 5.14 (b), we obtain 𝑏 | 𝑏+2𝑖 −𝜁′ (𝑠) 𝑥𝑠 | |∫ | ≪ 𝑥 ≪ 𝑥𝑏 . | | 𝑏 | 𝑏−2𝑖 𝜁(𝑠) 𝑠 |

Combining these, we get 𝑏+𝑖𝑇

𝐼2 = ∫ 𝑏−𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 ≪ 𝑥𝑏 (log 𝑇)10 𝜁(𝑠) 𝑠

95

96

5. Distribution of prime numbers

and therefore, by putting all the above information in equation (5.27), 𝑐+𝑖𝑇

−𝜁′ (𝑠) 𝑥𝑠 1 ∫ 2𝜋𝑖 𝑐−𝑖𝑇 𝜁(𝑠) 𝑠 3

= 𝑥 + ∑ 𝐼𝑖 𝑖=1

𝑥𝑐 ) + O(𝑥𝑏 (log 𝑇)10 ) . 𝑇 All the implied constants in the above inequalities are absolute. = 𝑥 + O(



5.4.2. Error terms in the prime number theorem. In this section, we obtain an estimate for 𝑐+𝑖𝑇

𝜓(𝑥) −

−𝜁′ (𝑠) 𝑥𝑠 1 ∫ 𝑑𝑠 2𝜋𝑖 𝑐−𝑖𝑇 𝜁(𝑠) 𝑠

and use it to prove the prime number theorem. Theorem 5.16. Let 𝑥 ∈ ℕ + 1/2. With 𝑐 as in Theorem 5.15, we have ∞

𝑥𝑐 (log 𝑇)9 𝑥(log 𝑥)2 |Λ(𝑛)| 𝑥𝑐 ∑ 𝑐 + = O( ). 𝑇 𝑛=1 𝑛 | log(𝑥/𝑛)| 𝑇 𝑇 Proof. We note that if 𝑛 < 𝑥/2 or 𝑛 > 3𝑥/2, then | log(𝑥/𝑛)| > log(3/2) and therefore, |Λ(𝑛)| |Λ(𝑛)| 𝑥𝑐 𝑥𝑐 ∑ 𝑐 ∑ 𝑐 + 𝑇 𝑛3𝑥/2 𝑛 | log(𝑥/𝑛)| ∞



Λ(𝑛) 𝑥𝑐 ∑ 𝑇 𝑛=1 𝑛𝑐



𝑥𝑐 | −𝜁′ (𝑐) | | | 𝑇 | 𝜁(𝑐) |



𝑥𝑐 (log 𝑇)9 𝑥𝑐 ≪ , 𝑇|𝑐 − 1| 𝑇

(5.28)

where the inequality

1 | −𝜁′ (𝑐) | | |≪ | 𝜁(𝑐) | |𝑐 − 1|

follows from the fact that lim (𝜎 − 1) (

𝜍→1

−𝜁′ (𝜎) ) = 1. 𝜁(𝜎)

5.4. Distribution functions for prime numbers

97

To estimate the sum 𝑥𝑐 𝑇

|Λ(𝑛)|

∑ 𝑥/2 1. 𝑎

𝑎

𝑎

Let 𝑞 = 𝑝1 1 𝑝2 2 . . . 𝑝𝑘𝑘 be the unique factorization of 𝑞 into prime powers. By a generalization of Exercise 4.4.1.6, we see that 𝑎

𝑎

𝑎

(ℤ/𝑞ℤ)∗ ≃ (ℤ/𝑝1 1 ℤ)∗ × (ℤ/𝑝2 2 ℤ)∗ . . . (ℤ/𝑝𝑘𝑘 ℤ)∗ . Therefore, it would be natural to express characters mod 𝑞 in terms of 𝑎 characters mod 𝑝𝑖 𝑖 . This is done with the help of Proposition 5.22, which is fundamental. Proposition 5.22. Let 𝑘1 and 𝑘2 be coprime, positive integers. Let 𝜒 be a character mod 𝑘1 𝑘2 . Then, there exist characters 𝜒1 (mod 𝑘1 ) and 𝜒2 (mod 𝑘2 ) such that 𝜒(𝑛) = 𝜒1 (𝑛)𝜒2 (𝑛) for every (𝑛, 𝑘1 𝑘2 ) = 1. Proof. We start with the assertion that given any integer 𝑛, we can find integers 𝑛1 and 𝑛2 such that 𝑛 ≡ 𝑛1 (mod 𝑘1 ) 𝑛1 ≡ 1(mod 𝑘2 )

(5.30)

𝑛 ≡ 𝑛2 (mod 𝑘2 ) 𝑛2 ≡ 1(mod 𝑘1 ).

To see this, we observe (see Exercise 2.3.1.5) that since (𝑘1 , 𝑘2 ) = 1, we can find integers 𝑙1 and 𝑙2 such that 𝑙1 𝑘1 + 𝑙2 𝑘2 = 𝑛 − 1. We now define 𝑛1 = 𝑛 − 𝑙1 𝑘1 , 𝑛2 = 𝑛 − 𝑙2 𝑘2 . We see immediately that 𝑛1 and 𝑛2 satisfy equation (5.30). For a character 𝜒 (mod 𝑘1 𝑘2 ), we now define characters 𝜒1 (mod 𝑘1 ) and 𝜒2 (mod 𝑘2 ) as follows: 𝜒1 (𝑛) ≔ 𝜒(𝑛1 ), 𝜒2 (𝑛) ≔ 𝜒(𝑛2 ). By equation (5.30), 𝑛 ≡ 𝑛1 𝑛2 (mod 𝑘1 ) and 𝑛 ≡ 𝑛1 𝑛2 (mod 𝑘2 ). Since (𝑘1 , 𝑘2 ) = 1, we have 𝑛 ≡ 𝑛1 𝑛2 (mod 𝑘1 𝑘2 ). Thus, 𝜒(𝑛) = 𝜒(𝑛1 )𝜒(𝑛2 ) = 𝜒1 (𝑛)𝜒2 (𝑛).



106

5. Distribution of prime numbers From Proposition 5.22, we immediately deduce Theorem 5.23. 𝑎

𝑎

𝑎

Theorem 5.23. Let 𝑞 = 𝑝1 1 𝑝2 2 . . . 𝑝𝑘𝑘 be the unique factorization of 𝑞 into prime powers. Let 𝜒 be a character mod 𝑞. Then, there exist characters 𝑎 𝜒𝑖 (mod 𝑝𝑖 𝑖 ) such that 𝑘

𝜒(𝑛) = ∏ 𝜒𝑖 (𝑛). 𝑖=1 𝑎

Conversely, given characters 𝜒𝑖 (mod 𝑝𝑖 𝑖 ), the function 𝜒 ∶ (ℤ/𝑞ℤ)∗ → ℂ∗ defined by 𝑘

𝜒(𝑛) ≔ ∏ 𝜒𝑖 (𝑛) 𝑖=1

is a character mod 𝑞. Proof. The proof follows from properties of characters as well as Proposition 5.22. It is left as an exercise (see Exercise 5.6.1.2). □ Theorem 5.23 has the following important corollary. Corollary 5.24. Let 𝑞 > 1. If 𝑛 ≢ 1 (mod 𝑞), then there exists a character 𝜓 ∈ 𝐺(𝑞) such that 𝜓(𝑛) ≠ 1. 𝑎

𝑎

𝑎

Proof. Let 𝑞 = 𝑝1 1 𝑝2 2 . . . 𝑝𝑘𝑘 be the unique factorization of 𝑞 into prime powers. If 𝑛 ≢ 1 (mod 𝑞), then there exists 1 ≤ 𝑖 ≤ 𝑘 such that 𝑎

𝑛 ≢ 1 (mod 𝑝𝑖 𝑖 ). By the discussion in the proofs of Lemmas 5.20 and 5.21, we see that 𝑎 there exists a character 𝜒𝑖 (mod 𝑝𝑖 𝑖 ) such that 𝜒𝑖 (𝑛) ≠ 1. Now, for 𝑗 ≠ 𝑖, 𝑎𝑗 𝑎𝑗 define 𝜒𝑗 (mod 𝑝𝑗 ) to be the principal character mod 𝑝𝑗 . Then, for the character 𝜓 (mod 𝑞) defined as 𝑘

𝜓 ≔ ∏ 𝜒𝑗 , 𝑗=1

we note that 𝜓(𝑛) ≠ 1. □ We also deduce the following corollary to Theorem 5.23, which enables us to generalize Lemma 5.20 to the case when (ℤ/𝑞ℤ)∗ is not a cyclic group.

5.6. Dirichlet characters and Dirichlet 𝐿-functions

107

Corollary 5.25. For any 𝑞 ≥ 1, there are 𝜙(𝑞) distinct Dirichlet characters mod 𝑞. That is, |𝐺(𝑞)| = 𝜙(𝑞). Proof. We deduce from Theorem 5.23 that there is a one-to-one corre𝑎 𝑎 𝑎 spondence between 𝐺(𝑞) and 𝐺(𝑝1 1 ) × 𝐺(𝑝2 2 ) ⋯ × 𝐺(𝑝𝑘𝑘 ). By Lemma 5.21, 𝑎

𝑎

|𝐺(𝑝𝑖 𝑖 )| = 𝜙(𝑝𝑖 𝑖 ). Thus, 𝑘

𝑘

𝑎

𝑎

|𝐺(𝑞)| = ∏ |𝐺(𝑝𝑖 𝑖 )| = ∏ 𝜙(𝑝𝑖 𝑖 ) = 𝜙(𝑞). 𝑖=1

𝑖=1

□ The characters mod 𝑞 satisfy the following orthogonality relations. Proposition 5.26. (a) For a character 𝜒 ∈ 𝐺(𝑞), 𝜙(𝑞) 𝜒(𝑎) = { 0 𝑎∈(ℤ/𝑞ℤ)∗ ∑

if 𝜒 = 𝜒0 if 𝜒 ≠ 𝜒0 .

(b) For 𝑎 ∈ (ℤ/𝑞ℤ)∗ , 𝜙(𝑞) ∑ 𝜒(𝑎) = { 0 𝜒∈𝐺(𝑞)

if 𝑎 ≡ 1 (mod 𝑞) otherwise.

Proof. (a) The equality for 𝜒 = 𝜒0 follows immediately since 𝜒0 (𝑎) = 1 for all 𝑎 ∈ (ℤ/𝑞ℤ)∗ . If 𝜒 ≠ 𝜒0 , then there exists 𝑏 ∈ (ℤ/𝑞ℤ)∗ such that 𝜒(𝑏) ≠ 1. Note that if 𝑎 runs over all coprime residue classes mod 𝑞 and (𝑏, 𝑞) = 1, then 𝑎𝑏 mod 𝑞 also runs over all coprime classes mod 𝑞. By the multiplicativity of 𝜒, ∑ 𝑎∈(ℤ/𝑞ℤ)∗

𝜒(𝑎) =



𝜒(𝑎𝑏) = 𝜒(𝑏)

𝑎∈(ℤ/𝑞ℤ)∗

Since 𝜒(𝑏) ≠ 1, we get ∑ 𝑎∈(ℤ/𝑞ℤ)∗

∑ 𝑎∈(ℤ/𝑞ℤ)∗

𝜒(𝑎) = 0.

𝜒(𝑎).

108

5. Distribution of prime numbers (b) Note that for 𝑎 ≡ 1 (mod 𝑞), 𝜒(𝑎) = 1 for any 𝜒 ∈ 𝐺(𝑞). Thus, by Corollary 5.25, ∑ 𝜒(𝑎) = |𝐺(𝑞)| = 𝜙(𝑞). 𝜒∈𝐺(𝑞)

If 𝑎 ≢ 1 (mod 𝑞), then, by Corollary 5.24, there exists a character 𝜓 ∈ 𝐺(𝑞) such that 𝜓(𝑎) ≠ 1. Moreover, by Exercise 5.6.1.4, {𝜓𝜒 ∶ 𝜒 ∈ 𝐺(𝑞)} = 𝐺(𝑞). Thus, ∑ 𝜒(𝑎) = 𝜒∈𝐺(𝑞)

∑ 𝜓𝜒(𝑎) = 𝜓(𝑎) ∑ 𝜒(𝑎). 𝜒∈𝐺(𝑞)

𝜒∈𝐺(𝑞)

Since 𝜓(𝑎) ≠ 1, we get ∑ 𝜒(𝑎) = 0. 𝜒∈𝐺(𝑞)

□ We observe that since |𝜒(𝑛)| ≤ 1 for all 𝑛 ∈ ℕ, the series ∞

𝜒(𝑛) 𝑛𝑠 𝑛=1 ∑

is absolutely convergent for 𝜎 > 1. We also note here that if 𝜒 ≠ 𝜒0 , then by Proposition 5.26 (a), | ∑ 𝜒(𝑛)| ≤ 𝑞. 𝑛≤𝑥

Thus, by equation (5.1), 𝐿(𝑠, 𝜒) is absolutely convergent when 𝜎 > 0. Definition 5.27. Let 𝑞 be a positive integer and 𝜒 be a Dirichlet character mod 𝑞. The infinite series ∞

𝜒(𝑛) , ℜ(𝑠) = 𝜎 > 1 𝑛𝑠 𝑛=1

𝐿(𝑠, 𝜒) ≔ ∑

is referred to as a Dirichlet 𝐿-function corresponding to 𝜒. It is easily checked that 𝜒 mod 𝑞 is a completely multiplicative function (see Exercise 5.6.1.3). Using the Euler product representation for

5.6. Dirichlet characters and Dirichlet 𝐿-functions

109

𝐿(𝑠, 𝜒), we have (5.31)

𝐿(𝑠, 𝜒) = ∏ (1 − 𝑝

𝜒(𝑝) ) 𝑝𝑠

−1

, 𝜎 > 1.

This leads to Lemma 5.28. Lemma 5.28. Let 𝑞 be a positive integer and 𝜒 be a Dirichlet character mod 𝑞. Then, for 𝜎 > 1, ∞



Λ(𝑛)𝜒(𝑛) 𝐿′ (𝑠, 𝜒) = ∑ . 𝑛𝑠 𝐿(𝑠, 𝜒) 𝑛=1

Proof. For 𝜎 > 1, the product on the right-hand side of (5.31) converges absolutely, since |𝜒(𝑝)/𝑝𝑠 | < 1. Thus, log 𝐿(𝑠, 𝜒) = − ∑ log(1 − 𝑝 ∞

𝑧𝑘 𝑘 ∞

Using the identity − log(1 − 𝑧) = ∑𝑘=1

𝜒(𝑝) ). 𝑝𝑠

for |𝑧| < 1, we have

𝜒(𝑝)𝑘 . 𝑘𝑝𝑘𝑠 𝑘=1

log 𝐿(𝑠, 𝜒) = ∑ ∑ 𝑝

By the complete multiplicativity of 𝜒, ∞

𝜒(𝑝𝑘 ) . 𝑘𝑝𝑘𝑠 𝑘=1

log 𝐿(𝑠, 𝜒) = ∑ ∑ 𝑝

Again, the right-hand side is absolutely convergent and therefore analytic for 𝜎 > 1. Differentiating, we have ∞

𝜒(𝑝𝑘 ) log 𝑝 𝐿′ (𝑠, 𝜒) . = −∑ ∑ 𝐿(𝑠, 𝜒) 𝑘𝑝𝑘𝑠 𝑝 𝑘=1 □

This proves Lemma 5.28.

We now explain the connection between the distribution functions for primes in arithmetic progressions defined in Section 5.5 and Dirichlet 𝐿-functions. Let 𝑎 and 𝑞 be positive integers such that (𝑎, 𝑞) = 1. For a Dirichlet character 𝜒 mod 𝑞, we define 𝜓(𝑥, 𝜒) ≔ ∑ 𝜒(𝑛)Λ(𝑛). 𝑛≤𝑥

110

5. Distribution of prime numbers

By Proposition 5.26 (b), we have, 𝜙(𝑞) ∑ 𝜒−1 (𝑎)𝜒(𝑛) = { 0 𝜒∈𝐺(𝑞)

if 𝑛 ≡ 𝑎 (mod 𝑞) otherwise.

Multiplying both sides by Λ(𝑛) and summing over 𝑛 ≤ 𝑥, ∑ ( ∑ 𝜒−1 (𝑎)Λ(𝑛)𝜒(𝑛)) = 𝑛≤𝑥

𝜒∈𝐺(𝑞)

∑ 𝜒−1 (𝑎)𝜓(𝑥, 𝜒) = 𝜙(𝑞)𝜓(𝑥, 𝑞, 𝑎). 𝜒∈𝐺(𝑞)

We also have, for 𝜎 > 1, ∞

∑ (5.32)

𝑛=1 𝑛≡𝑎 (mod 𝑞)

=



Λ(𝑛) 1 Λ(𝑛) ∑ = ( ∑ 𝜒−1 (𝑎)𝜒(𝑛)) 𝑛𝑠 𝜙(𝑞) 𝑛=1 𝑛𝑠 𝜒∈𝐺(𝑞)

𝐿′ (𝑠, 𝜒) −1 ∑ 𝜒−1 (𝑎) . 𝜙(𝑞) 𝜒∈𝐺(𝑞) 𝐿(𝑠, 𝜒)

If 𝜒 = 𝜒0 , then using the properties of Dirichlet series described in Section 5.3, we see that 𝐿(𝑠, 𝜒) can be analytically continued to the halfplane ℜ(𝑠) > 0 except for a simple pole at 𝑠 = 1 with residue 𝜙(𝑞)/𝑞 (see Exercise 5.6.1.6). Thus, 𝐿′ /𝐿(𝑠, 𝜒0 ) has a simple pole at 𝑠 = 1 with residue −1. Just as the prime number theorem was derived by applying Perron’s formula to −𝜁′ /𝜁(𝑠), the Siegel–Walfisz theorem can be obtained by applying Perron’s formula to each component on the right hand side of equation (5.32), −1 −1 𝐿′ (𝑠, 𝜒) 𝜒 (𝑎) . 𝜙(𝑞) 𝐿(𝑠, 𝜒) This requires us to evaluate integrals 𝑐+𝑖𝑇

1 𝐿′ 𝑥𝑠 ∫ (− (𝑠, 𝜒)) 𝑑𝑠 2𝜋𝑖 𝑐−𝑖𝑇 𝐿 𝑠 in the spirit of Section 5.4. In fact, just as in Section 5.4, this requires a careful study of zero-free regions for 𝐿(𝑠, 𝜒) in the strip 0 < ℜ(𝑠) < 1. All the details of these calculations will take us well beyond the scope of this textbook. Therefore, we briefly mention key steps and refer the interested reader to sources such as [16, Chapter 20] and [53, Chapter 11] for further details.

5.6. Dirichlet characters and Dirichlet 𝐿-functions

111

A Dirichlet character 𝜒 ∈ 𝐺(𝑞) is said to be a quadratic character if the order of 𝜒 in 𝐺(𝑞) is 2. That is, a quadratic character only takes values −1, 0, 1 and it must take the value −1 at least once. For example, if 𝑞 is an odd prime, then define ⎧0, if 𝑛 ≡ 0 (mod 𝑞), 𝜒(𝑛) ≔ 1, if 𝑥2 ≡ 𝑛 (mod 𝑞) has a solution, ⎨ 2 ⎩−1, if 𝑥 ≡ 𝑛 (mod 𝑞) does not have a solution. 𝜒 is a quadratic character mod 𝑞. Using Theorem 5.23, one can also construct quadratic characters for odd, positive integers 𝑞. If 𝜒 is a nonquadratic character mod 𝑞, then one derives an analogue of Theorem 5.14 for 𝐿′ /𝐿(𝑠, 𝜒) by showing that there exists an absolute constant 𝐴 > 0 such that for any nonquadratic character 𝜒, 𝐿(𝑠, 𝜒) does not vanish in the region 𝑅𝑞,𝐴 ≔ {𝑠 ∈ ℂ ∶ 𝜎 > 1 −

𝐴 }. log 𝑞|𝑡|

On the other hand, the assertion about the nonvanishing of 𝐿(𝑠, 𝜒) in 𝑅𝑞,𝐴 cannot be made if 𝜒 is a quadratic character. In this case, what we do know is that 𝐿(𝑠, 𝜒) has at most one zero 𝛽 in 𝑅𝑞,𝐴 , called the exceptional zero (sometimes called the Siegel zero or more accurately the Landau–Siegel zero). Moreover, 𝛽 is real and 0 < 𝛽 < 1. So, in the case of a quadratic character, in order to derive a suitable analogue of Theorem 5.14, we have to perform an even more delicate analysis, keeping in mind the existence of a possible exceptional zero of 𝐿(𝑠, 𝜒). Finally, combining the information from the analogues of Theorem 5.14 for all 𝜒 ∈ 𝐺(𝑞) in equation (5.32), we derive Theorem 5.17. 5.6.1. Exercises. 𝑎

𝑎

𝑎

Exercise 5.6.1.1. Let 𝑞 = 𝑝1 1 𝑝2 2 . . . 𝑝𝑘𝑘 be the unique factorization of 𝑞 into prime powers. Show that 𝑎

𝑎

𝑎

(ℤ/𝑞ℤ)∗ ≃ (ℤ/𝑝1 1 ℤ)∗ × (ℤ/𝑝2 2 ℤ)∗ . . . (ℤ/𝑝𝑘𝑘 ℤ)∗ . [Hint: Generalize Exercise 4.4.1.6.] Exercise 5.6.1.2. Prove Theorem 5.23. Exercise 5.6.1.3. Show that a Dirichlet character 𝜒 mod 𝑞 is a completely multiplicative function.

112

5. Distribution of prime numbers

Exercise 5.6.1.4. Show that for any 𝑞 ≥ 1 and 𝜓 ∈ 𝐺(𝑞), {𝜓𝜒 ∶ 𝜒 ∈ 𝐺(𝑞)} = 𝐺(𝑞). Exercise 5.6.1.5. For any positive integer 𝑞 and a Dirichlet character 𝜒 (mod 𝑞), show that ∑ 𝜒(𝑛) = { 𝑛≤𝑥

𝜙(𝑞) 𝑥 𝑞

+ O(𝑞)

O(𝑞)

if 𝜒 = 𝜒0 if 𝜒 ≠ 𝜒0 .

Exercise 5.6.1.6. For any positive integer 𝑞 and a Dirichlet character 𝜒 (mod 𝑞), show that if 𝜒 ≠ 𝜒0 , then 𝐿(𝑠, 𝜒) converges for 𝜎 > 0. If 𝜒 = 𝜒0 , show that 𝐿(𝑠, 𝜒0 ) is analytic at all points with ℜ(𝑠) > 0 except for a simple pole at 𝑠 = 1 with residue 𝜙(𝑞)/𝑞. Exercise 5.6.1.7. For any positive integer 𝑞 and a Dirichlet character 𝜒 (mod 𝑞), show that 𝐿(𝑠, 𝜒) = ∏ (1 − 𝑝

𝜒(𝑝) ) 𝑝𝑠

−1

, 𝜎 > 1.

Further, if 𝜒 = 𝜒0 , show that 𝐿(𝑠, 𝜒0 ) = 𝜁(𝑠) ∏ (1 − 𝑝|𝑞

1 ) , 𝜎 > 1. 𝑝𝑠

5.7. Ramanujan sums and Ramanujan series We conclude this chapter by introducing an important class of sums, called the Ramanujan sums. Let 𝑒(𝛼) denote the complex number 𝑒2𝜋𝑖𝛼 . Let 𝑞 be a positive integer and 𝑎 be an integer which is coprime to 𝑞. Then, 𝑟𝑎 {𝑒 ( ) ∶ 1 ≤ 𝑟 ≤ 𝑞} 𝑞 consists of all the 𝑞th roots of unity. It is easy to see that the geometric sum 𝑞 𝑟𝑎 ∑ 𝑒 ( ) = 0. 𝑞 𝑟=1 We start with Lemma 5.29. Lemma 5.29. Let 𝑎 and 𝑞 be coprime positive integers. Then, 𝑞

∑ 𝑒( 𝑟=1 (𝑟,𝑞)=1

𝑟𝑎 ) = 𝜇(𝑞). 𝑞

5.7. Ramanujan sums and Ramanujan series

113

Proof. Applying Exercise 3.2.1.2, we have 𝑞

𝑞

∑ 𝑒( 𝑟=1 (𝑟,𝑞)=1

𝑟𝑎 𝑟𝑎 ) = ∑ 𝑒 ( ) ∑ 𝜇(𝑑) 𝑞 𝑞 𝑑|(𝑟,𝑞) 𝑟=1 𝑞

= ∑ 𝜇(𝑑) ( ∑ 𝑒 ( 𝑟=1 𝑑|𝑟

𝑑|𝑞

𝑟𝑎 )) 𝑞

𝑞/𝑑

= ∑ 𝜇(𝑑) ( ∑ 𝑒 ( 𝑑|𝑞

𝑟1 =1 𝑞/𝑑

= ∑ 𝜇(𝑑) ( ∑ 𝑒 ( 𝑑|𝑞

𝑟1 =1

𝑟1 𝑎 )) 𝑞/𝑑

0

if

𝑑

if

= ∑ 𝜇(𝑑) ({ 𝑞 𝑑|𝑞

𝑑𝑟1 𝑎 )) 𝑞

𝑞 ∤𝑎 𝑑 ). 𝑞 |𝑎 𝑑

Since (𝑎, 𝑞) = 1, the inner sum will take a nonzero value only when 𝑞 = 𝑑. This proves the lemma. □ Remark 5.30. The sum 𝑞

𝑐𝑞 (𝑎) ≔ ∑ 𝑒 ( 𝑟=1 (𝑟,𝑞)=1

𝑟𝑎 ) 𝑞

is an important sum in number theory and is called the Ramanujan sum. More generally, if 𝑎 and 𝑞 are integers not necessarily coprime, then Ramanujan proved that (see Exercise 5.7.1.1) 𝑞

(5.33)

∑ 𝑒( 𝑟=1 (𝑟,𝑞)=1

𝑞 𝑟𝑎 ) = ∑ 𝜇 ( ) 𝑑. 𝑞 𝑑 𝑑|(𝑎,𝑞)

In his paper [62], Ramanujan conceived the idea of “Fourier like” expansion of any arithmetical function. He showed that many arithmetical functions 𝑓(𝑛) admit an expansion of the form ∞

ˆ 𝑞 (𝑛), 𝑓(𝑛) = ∑ 𝑓(𝑞)𝑐 𝑞=1

ˆ for appropriate complex numbers 𝑓(𝑞). In analogy with Fourier series, we can call these expansions “Ramanujan series.” For example, he

114

5. Distribution of prime numbers

showed ∞

log 𝑞 𝑐 (𝑛). 𝑞 𝑞 𝑞=1

𝑑(𝑛) = − ∑

For ℜ(𝑠) > 0, we have (see Exercise 5.7.1.4 below) ∞

𝑐𝑞 (𝑛) . 𝑞𝑠+1 𝑞=1

𝜎𝑠 (𝑛) = 𝑛𝑠 𝜁(𝑠 + 1) ∑

We now state the following orthogonality principle (see Exercise 5.7.1.6). Principle 5.31 (An orthogonality principle for Ramanujan sums). 1 ∑ 𝑐𝑟 (𝑛)𝑐𝑠 (𝑛 + ℎ) = 𝑐𝑟 (ℎ). 𝑥→∞ 𝑥 𝑛≤𝑥 lim

The orthogonality principle for Ramanujan sums allows one to (heuristically) write down Ramanujan expansions for many arithmetical functions. Just like in the classical theory of Fourier series, it is another matter to show the resulting series actually converges to the arithmetic function. Indeed, if ∞

ˆ 𝑞 (𝑛), 𝑓(𝑛) = ∑ 𝑓(𝑞)𝑐 𝑞=1

then multiplying both sides of this equation by 𝑐𝑟 (𝑛) then, one would expect that 1 ˆ 𝑓(𝑟)𝜙(𝑟) = lim ∑ 𝑓(𝑛)𝑐𝑟 (𝑛). 𝑥→∞ 𝑥 𝑛≤𝑥 The orthogonality principle also suggests that if 𝑓(𝑛) has a Ramanujan expansion, then so does any shift 𝑓(𝑛 + ℎ) and one would expect that ∞

𝑐𝑞 (ℎ) ˆ 𝑞 (𝑛). 𝑓(𝑞)𝑐 𝜙(𝑞) 𝑞=1

𝑓(𝑛 + ℎ) = ∑

Of course, these are all heuristic results and the question of determining valid Ramanujan expansions of arithmetical functions (provided they exist) is a delicate one that has been researched well. The simplest result in this context is a theorem of Delange [19] that states the following: Suppose that 𝑓(𝑛) = ∑ 𝑔(𝑑) 𝑑|𝑛

5.7. Ramanujan sums and Ramanujan series and that



∑ 2𝜔(𝑛) 𝑛=1

115

|𝑔(𝑛)| < ∞, 𝑛

where 𝜔(𝑛) is the number of distinct prime factors of 𝑛. Then, 𝑓 admits a Ramanujan expansion ∞

ˆ 𝑞 (𝑛), ∑ 𝑓(𝑞)𝑐 𝑞=1

then



ˆ = ∑ 𝑔(𝑞𝑚) . 𝑓(𝑞) 𝑞𝑚 𝑚=1 The reader can find further discussion in the survey by Lutz G. Lucht [49]. 5.7.1. Exercises. Exercise 5.7.1.1. For integers 𝑎 and 𝑞 with 𝑞 ≥ 1, show that 𝑞

𝑐𝑞 (𝑎) = ∑ 𝑒 ( 𝑟=1 (𝑟,𝑞)=1

𝑞 𝑟𝑎 ) = ∑ 𝜇 ( ) 𝑑. 𝑞 𝑑 𝑑|(𝑎,𝑞)

Exercise 5.7.1.2. Show that 𝑞

𝑐𝑞 (𝑎) = ∑ cos( 𝑟=1 (𝑟,𝑞)=1

2𝜋𝑟𝑎 ). 𝑞

Exercise 5.7.1.3. Let 𝜀𝑑 (𝑛) equal 𝑑 if 𝑑|𝑛 and zero otherwise. Show that 𝑐𝑞 (𝑛) = ∑ 𝜀𝑑 (𝑛)𝜇(𝑞/𝑑), 𝑑|𝑞

and deduce via Möbius inversion that 𝜀𝑞 (𝑛) = ∑ 𝑐 𝑑 (𝑛). 𝑑|𝑞

Exercise 5.7.1.4. Noting that ∞

𝜀 (𝑛) 𝜎𝑠 (𝑛) = ∑ 𝑑𝑠+1 𝑛𝑠 𝑑 𝑑=1 show that



𝑐𝑞 (𝑛) . 𝑞𝑠+1 𝑞=1

𝜎𝑠 (𝑛) = 𝑛𝑠 𝜁(𝑠 + 1) ∑

116

5. Distribution of prime numbers

Exercise 5.7.1.5. Show that for 𝑞1 , 𝑞2 coprime, we have 𝑐𝑞1 (𝑛)𝑐𝑞2 (𝑛) = 𝑐𝑞1 𝑞2 (𝑛). In other words, for fixed 𝑛, 𝑐𝑞 (𝑛) is a multiplicative function of 𝑞. Exercise 5.7.1.6. If (𝑎, 𝑟) = 1 and (𝑏, 𝑠) = 1, show that 𝑎 𝑏 + 𝑟 𝑠 is not an integer unless 𝑟 = 𝑠. Using this, deduce that lim

𝑥→∞

1 ∑ 𝑐 (𝑛)𝑐𝑠 (𝑛 + ℎ) = 𝑐𝑟 (ℎ). 𝑥 𝑛≤𝑥 𝑟

Exercise 5.7.1.7. Show that for ℜ(𝑠) > 1, ∞

𝜇(𝑞)𝑐𝑞 (𝑛) 1 1 = ∏ (1 − 𝑠−1 ) ∏ (1 − 𝑠−1 ). 𝑠−1 𝜙(𝑞) 𝑝 𝑞 𝑝 (𝑝 − 1) 𝑞=1 𝑝∤𝑛 𝑝|𝑛 ∑

Exercise 5.7.1.8. Show that ∞

𝜇(𝑞)𝑐𝑞 (𝑛) (𝑝 − 1)(𝑝𝑠−1 − 1) = ) 𝜁(𝑠)ℎ(𝑠), (∏ 𝑝𝑠 − 𝑝𝑠−1 + 1 𝑞𝑠−1 𝜙(𝑞) 𝑞=1 𝑝|𝑛 ∑

where ℎ(𝑠) is analytic for ℜ(𝑠) > 1 and nonvanishing there. Exercise 5.7.1.9. Using Exercise 5.7.1.8, show that ∞

𝜇(𝑞)𝑐𝑞 (𝑛) 𝑞𝑠−1 𝜙(𝑞) 𝑞=1 ∑

is zero at 𝑠 = 1 if 𝑛 is not a prime power and is equal to (𝑝 − 1) log 𝑝 𝑝 if 𝑛 is a power of the prime 𝑝. Exercise 5.7.1.10. Using Exercise 5.7.1.9, derive the Ramanujan series expansion of Λ(𝑛) by showing that ∞ 𝜇(𝑞)𝑐𝑞 (𝑛) 𝜙(𝑛) Λ(𝑛) = ∑ . 𝑛 𝜙(𝑞) 𝑞=1

5.7. Ramanujan sums and Ramanujan series

117

H. Gopalkrishna Gadiyar and Ramanathan Padma [25] observed that one can use Exercise 5.7.1.10 to derive a conjectural formula for the number of twin primes. Their idea is to notice (ignoring convergence issues) that ∞



𝜇(𝑞)𝜇(𝑞′ ) 𝜙(𝑛)𝜙(𝑛 + 2) ∑ 𝑐 (𝑛)𝑐𝑞′ (𝑛 + 2). Λ(𝑛)Λ(𝑛 + 2) = ∑ ∑ 𝑛(𝑛 + 2) 𝜙(𝑞)𝜙(𝑞′ ) 𝑛≤𝑥 𝑞 𝑛≤𝑥 𝑞=1 𝑞′ =1 ∑

Using Exercise 5.7.1.6 (the orthogonality principle for Ramanujan sums), the innermost sum is asymptotic to 𝑐𝑞 (2)𝑥 when 𝑞 = 𝑞′ and 𝑜(𝑥) otherwise. Thus, the expected main term is ∞

𝜇2 (𝑞) 𝑐 (2). 𝜙(𝑞)2 𝑞 𝑞=1

𝑥∑

Expanding the sum into an infinite product gives precisely the Hardy– Littlewood heuristic conjecture for the number of twin primes, which states that ∞ 𝜇2 (𝑞) 𝑥 𝜋2 (𝑥) ∼ ( ∑ 𝑐𝑞 (2)) . 2 2 𝜙(𝑞) log 𝑥 𝑞=1 Here, 𝜋2 (𝑥) denotes the number of primes up to 𝑥 such that 𝑝 + 2 is also prime.

Chapter 6

An introduction to Waring’s problem

As we mentioned in the introduction, Waring, in his book Meditationes Algebraicae, states that every nonnegative integer is the sum of four squares, nine cubes, nineteen fourth powers and so on. While most of this book is dedicated to studying the last part of his statement, namely the phrase “. . . and so on”, we start by proving the first part of his statement. The fact that every positive integer is a sum of at most four squares is a well-known theorem in number theory and was proved by Lagrange in 1770, around the same time that Waring wrote Meditationes. The classical proof of the four square theorem is elementary and puts together fundamental properties of congruence arithmetic and a technique called the method of descent attributed to Fermat. We present this proof in Sections 6.1 and 6.2 of this chapter. We also discuss Waring’s problem for biquadrates, some conjectures related to Waring’s problem and an easier variant in Sections 6.3 and 6.4.

6.1. Fermat’s two square theorem In 1770, Lagrange proved the following theorem: Theorem 6.1. Every natural number can be written as a sum of at most four squares of natural numbers. 119

120

6. An introduction to Waring’s problem

We will prove this in the next section. In this section, we study numbers that can be written as a sum of two squares. With some minor modifications, the method of classifying such numbers allows us to prove Lagrange’s theorem. We begin with the elementary matrix identity [

𝑎 −𝑏 𝑐 ][ 𝑏 𝑎 𝑑

−𝑑 𝑎𝑐 − 𝑏𝑑 ]=[ 𝑐 𝑎𝑑 + 𝑏𝑐

−(𝑎𝑑 + 𝑏𝑐) ]. 𝑎𝑐 − 𝑏𝑑

Taking determinants, we obtain (6.1)

(𝑎2 + 𝑏2 )(𝑐2 + 𝑑 2 ) = (𝑎𝑐 − 𝑏𝑑)2 + (𝑎𝑑 + 𝑏𝑐)2 .

This shows that if 𝑚 and 𝑛 can be written as a sum of two squares, so can their product. Thus, we are led to the question of classifying primes 𝑝 which can be written as a sum of two squares. For 𝑝 = 2, this is clear: 2 = 12 + 12 . What about odd primes? We now use Fermat’s “method of descent” to prove Theorem 6.2. Theorem 6.2. A prime 𝑝 > 2 can be written as a sum of two squares if and only if 𝑝 ≡ 1 (mod 4). Proof. If 𝑝 ≡ 3 (mod 4), then it is impossible to write 𝑝 as a sum of two squares. Indeed, the square of any integer is congruent to 0 or 1 (mod 4). Thus, sums of two squares are congruent to either 0, 1 or 2 mod 4. On the other hand, by Theorem 4.14, we know that for a prime 𝑝 ≡ 1 (mod 4), the equation 𝑥2 ≡ −1 (mod 𝑝) has a solution. That is, we can find an integer 1 ≤ 𝑥 ≤ 𝑝 − 1 and 𝑚 ∈ ℕ such that 𝑝𝑚 = 𝑥2 + 1. We observe that 𝑝𝑚 = 𝑥2 + 1 ≤ (𝑝 − 1)2 + 1 < 𝑝2 , and therefore 𝑚 < 𝑝. We now consider the set 𝐹 = {𝑚 ∈ ℕ ∶ 𝑚 < 𝑝, 𝑝𝑚 is a sum of two squares}. By the observation made above, the set 𝐹 is nonempty. Let 𝑚0 be the smallest element of 𝐹. We claim that 𝑚0 = 1.

6.1. Fermat’s two square theorem

121

Let us assume that 𝑚0 > 1. We write 𝑝𝑚0 = 𝑥02 + 𝑦20 . By Exercise 4.1.1.5, we can choose 𝑥1 and 𝑦1 such that 𝑥1 ≡ 𝑥0 (mod 𝑚0 ), 𝑦1 ≡ 𝑦0 (mod 𝑚0 ) with |𝑥1 | ≤ 𝑚0 /2 and |𝑦1 | ≤ 𝑚0 /2. We have 𝑥12 + 𝑦21 ≡ 𝑥02 + 𝑦20 ≡ 0 (mod 𝑚0 ). We observe here that it is not possible for both 𝑥1 and 𝑦1 to be zero. If 𝑥1 = 𝑦1 = 0, then we would get that 𝑚0 divides each of 𝑥0 and 𝑦0 . Thus, 𝑚20 would divide 𝑥02 + 𝑦20 = 𝑝𝑚0 and therefore, 𝑚0 ∣ 𝑝, which is not possible as 1 < 𝑚0 < 𝑝. Thus, we have a positive integer 𝑚1 such that 𝑚0 𝑚1 = 𝑥12 + 𝑦21 with 𝑚1 ≤ 𝑚0 /2 (since 𝑥12 + 𝑦21 ≤ 𝑚20 /2). We write (6.2) (𝑝𝑚0 )(𝑚0 𝑚1 ) = (𝑥02 + 𝑦20 )(𝑥12 + 𝑦21 ) = (𝑥0 𝑥1 + 𝑦0 𝑦1 )2 + (𝑥1 𝑦0 − 𝑥0 𝑦1 )2 . Since 𝑥0 𝑥1 + 𝑦0 𝑦1 ≡ 𝑥02 + 𝑦20 ≡ 0 (mod 𝑚0 ) and 𝑥1 𝑦0 − 𝑥0 𝑦1 ≡ 0 (mod 𝑚0 ), we may divide both sides of equation (6.2) by 𝑚20 to get integers 𝛼 and 𝛽 such that 𝑚 𝑝𝑚1 = 𝛼2 + 𝛽 2 , 𝑚1 ≤ 0 . 2 This shows that 𝑚1 ∈ 𝐹, contradicting the minimality of 𝑚0 , Thus, our assumption that 𝑚0 > 1 is false. Hence, 𝑚0 = 1. This proves Theorem 6.2. □ Remark 6.3. The technique of using the existence of a “nontrivial” smallest element in a subset of natural numbers to show the existence of an even smaller element in the set and thereby arriving at a contradiction is attributed to Fermat and is referred to as Fermat’s method of descent. We use such a method again in the next section to study sums of four squares. Theorem 6.4 enables us to characterize numbers which can be written as a sum of two squares. Theorem 6.4. Let 𝑛 be a natural number with prime factorization 𝛼

𝛼

𝛽

𝛽

2𝛼 𝑝1 1 ⋯ 𝑝𝑘 𝑘 𝑞1 1 ⋯ 𝑞𝑙 𝑙

122

6. An introduction to Waring’s problem

where 𝑝 𝑖 ’s are distinct primes ≡ 1 (mod 4) and 𝑞𝑗 ’s are distinct primes ≡ 3 (mod 4). Then, 𝑛 is a sum of two squares if and only if all 𝛽𝑗 ’s are even. That is, 𝑛 is a sum of two squares if and only if all primes are divisors of 𝑛 ≡ 3 (mod 4) occur to an even power in the factorization of 𝑛. Proof. The “if” part of the theorem readily follows from equation (6.1) and Theorem 6.2. For the converse, we need only show that if a prime 𝑞 divides 𝑛 with 𝑞 ≡ 3 (mod 4), then 𝑞 must appear to an even power in the factorization of 𝑛. This is because, by Fermat’s little theorem (see Corollary 4.6), we see that 𝑥2 ≡ −1 (mod 𝑞) has no solution. Indeed, if such a solution existed, then −1 ≡ (−1)

𝑞−1 2

≡ (𝑥2 )

𝑞−1 2

≡ 𝑥𝑞−1 ≡ 1 (mod 𝑞),

which is not possible. Now, if 𝑛 is a sum of two squares, say, 𝑛 = 𝑥2 + 𝑦2 , then 𝑛 ≡ 0 (mod 𝑞). Let us assume that (𝑞, 𝑥) = 1. The congruence 𝑥2 ≡ −𝑦2 (mod 𝑞) necessarily implies that (𝑞, 𝑦) = 1. By Theorem 4.7, we can find an integer 𝑏 with 𝑏𝑦 ≡ 1 (mod 𝑞) and therefore, (𝑏𝑥)2 ≡ −(𝑏𝑦)2 ≡ −1 (mod 𝑞). This is not possible as 𝑞 ≡ 3 (mod 4). Our assumption is therefore false and we deduce that 𝑞 ∣ 𝑥. This immediately implies 𝑞 ∣ 𝑦. We therefore obtain 𝑞2 ∣ 𝑥2 and 𝑞2 ∣ 𝑦2 , so that 𝑞2 |𝑛. That is, whenever a prime 𝑞 ≡ 3 (mod 4) divides 𝑛, we must have 𝑞2 ∣ 𝑛. Theorem 6.4 now follows almost immediately (see Exercise 6.1.1.2 below). □ 6.1.1. Exercises. Exercise 6.1.1.1. A number is called squarefree if it is not divisible by the square of a prime number. Let 𝐷 be squarefree. Show that if natural numbers 𝑚 and 𝑛 can be written in the form 𝑎2 + 𝐷𝑏2 , then so can their product.

6.2. Lagrange’s four square theorem

123

Exercise 6.1.1.2. Suppose we have a prime 𝑞 ≡ 3 (mod 4) and a natural number 𝑛 = 𝑥2 + 𝑦2 such that 𝑞 ∣ 𝑛. Show that 𝑞 must occur to an even power in the prime factorization of 𝑛. [Hint: Use the fact that 𝑞2 ∣ 𝑛.] Exercise 6.1.1.3. Show that any prime 𝑝 ≡ 1 (mod 4) can be expressed as a sum of two squares. Exercise 6.1.1.4. Show that any odd prime can be written as a difference of two consecutive squares. Exercise 6.1.1.5. Prove that a natural number 𝑛 can be written as the difference of two square numbers if and only if 𝑛 ≢ 2 (mod 4). Exercise 6.1.1.6. Show that if 𝑥, 𝑦, 𝑧 are integers such that 𝑥2 +𝑦2 +𝑧2 ≡ 0 (mod 4), then 𝑥, 𝑦, 𝑧 are all even. Exercise 6.1.1.7. Show that a positive integer of the form 4𝑘 (8𝑛 + 7) cannot be represented as a sum of three squares.

6.2. Lagrange’s four square theorem In the previous section, we saw that not all positive integers can be written as sums of two squares or even sums of three squares. We now prove Theorem 6.1 and show that every natural number can be written as a sum of at most four squares. The matrix equation (6.1) has a natural generalization to complex numbers (6.3)

[

𝑎 −𝑏 𝑐 ][ 𝑏 𝑎 𝑑

𝑎𝑐 − 𝑏𝑑 −(𝑎𝑑 + 𝑏𝑐) −𝑑 ]=[ ], 𝑐 𝑎𝑑 + 𝑏𝑐 𝑎𝑐 − 𝑏𝑑

which, upon taking determinants, leads to (6.4)

(|𝑎|2 + |𝑏|2 ) (|𝑐|2 + |𝑑|2 ) = |𝑎𝑐 − 𝑏𝑑|2 + |𝑎𝑑 + 𝑏𝑐|2 .

Since each complex number 𝑧 = 𝑥+𝑖𝑦 leads to |𝑧|2 = 𝑥2 +𝑦2 , we see that (6.4) implies that if 𝑚, 𝑛 can be written as the sum of four squares, then so can their product. Thus, to prove Theorem 6.1, it suffices to prove that every prime number 𝑝 can be written as a sum of four squares. This is clear for 𝑝 = 2. Lemma 6.5. For an odd prime 𝑝, there exist integers 0 ≤ 𝑥, 𝑦 ≤ that 𝑥2 + 𝑦2 + 1 ≡ 0 (mod 𝑝).

𝑝−1 2

such

124

6. An introduction to Waring’s problem

Proof. Let us consider the set of residue classes 𝑝−1 𝑆 = {𝑥2 (mod 𝑝), 0 ≤ 𝑥 ≤ }. 2 We observe that no two elements in the above set are congruent (mod 𝑝). In fact, if 𝑝−1 𝑖2 ≡ 𝑗2 (mod 𝑝) for some 0 ≤ 𝑖 ≤ 𝑗 ≤ , 2 then 𝑝 ∣ 𝑖2 −𝑗2 and therefore 𝑝 ∣ (𝑖+𝑗) or 𝑝 ∣ (𝑖−𝑗). But 0 ≤ |𝑖±𝑗| ≤ 𝑝−1. This forces 𝑖 = 𝑗. Therefore, the cardinality of the set 𝑆 has 𝑝+1 𝑝−1 = . 1+ 2 2 So does the set 𝑝−1 }. 2 The sets 𝑆 and 𝑇 cannot be disjoint, for otherwise we get 𝑝+1 𝑝+1 + =𝑝+1>𝑝 |𝑆| + |𝑇| = 2 2 distinct residue classes (mod 𝑝). This proves Lemma 6.5. 𝑇 = {−1 − 𝑦2 (mod 𝑝), 0 ≤ 𝑦 ≤



As before, given a prime 𝑝, consider the set 𝔏 = {𝑚 ∈ ℕ ∶ 𝑚 < 𝑝, 𝑝𝑚 is a sum of (at most) four squares}. By Lemma 6.5, the set is nonempty. Let 𝑚0 be the minimal element of this set. We want to show that 𝑚0 = 1. As in the previous section, let us assume that 𝑚0 > 1. Thus, (6.5)

𝑚0 𝑝 = 𝑥 2 + 𝑦 2 + 𝑧 2 + 𝑤 2 .

If 𝑥, 𝑦, 𝑧, 𝑤 are all even, then 4 must divide 𝑚0 . Thus, we can divide both sides of equation (6.5) by 4, contradicting the minimality of 𝑚0 in 𝔏. If 𝑥, 𝑦, 𝑧, 𝑤 are all odd, then again, looking at equation (6.5) modulo 4, we see that 4 ∣ 𝑚0 and 𝑝𝑚0 𝑥+𝑦 2 𝑥−𝑦 2 𝑧+𝑤 2 𝑧−𝑤 2 =( ) +( ) +( ) +( ) , 4 2 2 2 2 contradicting the minimality of 𝑚0 in 𝔏. A similar contradiction arises if exactly two of 𝑥, 𝑦, 𝑧, 𝑤 are even and the other two are odd.

6.3. A conjectured value for 𝑔(𝑘)

125

So, we may assume that an odd number of 𝑥, 𝑦, 𝑧, 𝑤 are even, in which case we deduce that 𝑚0 is odd. Now choose 𝑥0 , 𝑦0 , 𝑧0 , 𝑤 0 so that 𝑥 ≡ 𝑥0 ,

𝑦 ≡ 𝑦0 ,

𝑧 ≡ 𝑧0 ,

𝑤 ≡ 𝑤0

(mod 𝑚0 )

and |𝑥0 |, |𝑦0 |, |𝑧0 |, |𝑤 0 | < 𝑚0 /2. This inequality is strict since 𝑚0 is odd. We therefore have 𝑥02 + 𝑦20 + 𝑧20 + 𝑤20 = 𝑚0 𝑚1 for some 0 < 𝑚1 < 𝑚0 since 𝑥02 + 𝑦20 + 𝑧20 + 𝑤20 < 𝑚20 . Note here that 𝑚1 > 0 since 𝑥0 , 𝑦0 , 𝑧0 , 𝑤 0 cannot all be zero (otherwise 𝑚20 ∣ 𝑚0 𝑝, which is not possible as 1 < 𝑚0 < 𝑝). Hence 𝑚20 𝑚1 𝑝 = 𝑥22 + 𝑦22 + 𝑧22 + 𝑤22 with 𝑥2 , 𝑦2 , 𝑧2 , 𝑤 2 given via equation (6.4). From this equation, it can be easily verified that 𝑚0 divides each of 𝑥2 , 𝑦2 , 𝑧2 , 𝑤 2 so that we deduce that 𝑚1 𝑝 can be written as a sum of four squares. But 𝑚1 < 𝑚0 and this contradicts the minimality of 𝑚0 . Hence, in all cases, we contradict the assumption that 𝑚0 > 1. Therefore 𝑚0 = 1 and this proves Theorem 6.1.

6.3. A conjectured value for 𝑔(𝑘) Waring’s conjecture can be stated as follows: Conjecture 6.6 (Waring’s conjecture). Let 𝑘 ≥ 2 be a natural number. There exists 𝑔 ∈ ℕ such that every natural number can be written as a sum of (at most) 𝑔 𝑘th powers of natural numbers. That is, for every 𝑛 ≥ 1, one can find integers 𝑥1 , 𝑥2 , . . . , 𝑥𝑔 ≥ 0 such that 𝑛 = 𝑥1𝑘 + 𝑥2𝑘 + . . . 𝑥𝑔𝑘 . The minimal value of 𝑔 that works for a given value of 𝑘 is denoted 𝑔(𝑘). By Lagrange’s theorem, we see that 𝑔(2) ≤ 4. Moreover, by the discussion in previous sections, 𝑔(2) ≠ 2 (see Theorem 6.2) and 𝑔(2) ≠ 3 (see Exercise 6.1.1.7). Thus, 𝑔(2) = 4. Some questions that emerge from a discussion of Waring’s conjecture are as follows. (1) [Waring’s problem] Does a finite value of 𝑔(𝑘) exist for all 𝑘? (2) Can we find a precise formula for 𝑔(𝑘) for every 𝑘 ≥ 2? (3) Can we obtain asymptotics, upper bounds or lower bounds for 𝑔(𝑘)?

126

6. An introduction to Waring’s problem

With respect to the first and second questions, between 1770 and 1986, the exact value of 𝑔(𝑘) was investigated for several values of 𝑘. For example, in 1909, Arthur Wieferich [78] showed that 𝑔(3) = 9. There was a gap in Wieferich’s proof, which was filled by Aubrey Kempner [44] three years later. In 1986, Ramachandran Balasubramanian, Jean-Marc Deshouillers andFrançois Dress ([3], [4]) showed that 𝑔(4) = 19. In Theorem 6.7, we use elementary considerations along with Lagrange’s theorem to discuss a weaker bound for 𝑔(4). Theorem 6.7. 𝑔(4) ≤ 50. Proof. The idea is to use the identities (𝑎2 + 𝑏2 + 𝑐2 + 𝑑 2 )2 = 𝑎4 +𝑏4 +𝑐4 +𝑑 4 +2𝑎2 𝑏2 +2𝑎2 𝑐2 +2𝑎2 𝑑 2 +2𝑏2 𝑐2 +2𝑏2 𝑑 2 +2𝑐2 𝑑 2 and (𝑎 + 𝑏)4 + (𝑎 − 𝑏)4 = 2𝑎4 + 12𝑎2 𝑏2 + 2𝑏4 so that 6(𝑎2 +𝑏2 +𝑐2 +𝑑 2 )2 = (𝑎+𝑏)4 +(𝑎−𝑏)4 +(𝑎+𝑐)4 +(𝑎−𝑐)4 +(𝑎+𝑑)4 +(𝑎−𝑑)4 +(𝑏 + 𝑐)4 + (𝑏 − 𝑐)4 + (𝑏 + 𝑑)4 + (𝑏 − 4)4 + (𝑐 + 𝑑)4 + (𝑐 − 𝑑)4 . By Theorem 6.1, it follows that any number of the form 6𝑥2 is a sum of 12 fourth powers, since 𝑥 can be written as a sum of four squares. Now, let 𝑛 be a natural number and write, using the division algorithm (Theorem 2.5), 𝑛 = 6𝑁 + 𝑟, 0 ≤ 𝑟 ≤ 5. By Lagrange’s theorem, 𝑁 = 𝑥2 + 𝑦2 + 𝑧2 + 𝑤2 . Our identity above shows that each of 6𝑥2 , 6𝑦2 , 6𝑧2 , 6𝑤2 can be written as a sum of 12 fourth powers. Thus, we conclude that 𝑔(4) ≤ 48 + 5 = 53. This bound can be improved slightly by noting that any 𝑛 ≥ 81 can be written as 𝑛 = 6𝑁 ′ + 𝑡 with 𝑡 = 0, 1, 2, 81, 16 or 17 according as 𝑛 ≡ 0, 1, 2, 3, 4, 5 (mod 6) respectively. Observe that 1 = 14 , 2 = 14 + 14 , 81 = 34 , 16 = 24 , 17 = 24 + 14 . Using the above mentioned identities again, 6𝑁 ′ can be written as a sum of at most 48 fourth powers.

6.3. A conjectured value for 𝑔(𝑘)

127

Thus, any 𝑛 ≥ 81 can be written as a sum of at most 50 fourth powers. If 51 ≤ 𝑛 ≤ 80, then 𝑛 = 24 + 24 + 𝑚.14 , 19 ≤ 𝑚 ≤ 48. □

Thus, we get 𝑔(4) ≤ 50.

In 1909, David Hilbert was able to show the existence of 𝑔(𝑘) for every 𝑘 ≥ 2. The proof of Hilbert’s theorem is highly involved and does not give us any indication towards a formula for 𝑔(𝑘). In this regard, as early as 1772, J. A. Euler (son of the celebrated Leonhard Euler), made the following conjecture. Conjecture 6.8 (J. A. Euler). For every 𝑘 ≥ 2, 3 𝑘 𝑔(𝑘) = 2𝑘 + [( ) ] − 2. 2 It was J. A. Euler himself who observed that the above conjectural value does give us a lower bound for 𝑔(𝑘). Indeed, let 3 𝑘 𝑞 = [( ) ] . 2 Let us denote 𝑛 = 2𝑘 𝑞 − 1. Since 𝑛 < 3𝑘 , in order to write 𝑛 as a sum of 𝑘th powers, we can only use 1𝑘 and 2𝑘 as summands. To minimize the number of summands, we maximize the number of 2𝑘 ’s, which is clearly (𝑞 − 1). Thus, 𝑛 = (𝑞 − 1)2𝑘 + (2𝑘 − 1)1𝑘 giving us 𝑔(𝑘) ≥ 2𝑘 + [(3/2)𝑘 ] − 2. Subbayya S. Pillai and Leonard Eugene Dickson are credited with discovering (independently) a conditional formula for 𝑔(𝑘), in 1936, though there is some controversy regarding this (see [61]). Their result can be stated as follows: Theorem 6.9. Write 3𝑘 = 2𝑘 𝑞 + 𝑟, 0 < 𝑟 < 2𝑘 and 𝑞 = [(3/2)𝑘 ]. If 𝑟 ≤ 2𝑘 − 𝑞 − 3, then 3 𝑘 𝑔(𝑘) = 2𝑘 + [( ) ] − 2. 2

128

6. An introduction to Waring’s problem

The proof of this theorem is not simple and we do not discuss it here. However, we will discuss the hypothesis that implies the exact formula for 𝑔(𝑘). In 1957, Kurt Mahler [50] proved, using Ridout’s variation of a theorem of Roth that the hypothesis 𝑟 ≤ 2𝑘 − 𝑞 − 3 holds for all 𝑘 sufficiently large. Unfortunately, the Roth–Ridout theorems are ineffective and one cannot give an explicit lower bound for which the hypothesis holds. The theorem of Ridout alluded to is the following: Theorem 6.10. Let 𝜉 ≠ 0 be a real algebraic number. Let 𝑝1 , 𝑝2 , ⋯ , 𝑝𝑠 be finitely many distinct primes and 𝜖 > 0 be fixed. Then there are only finitely many (𝑠 + 1)-tuples of integers (𝑒 0 , 𝑒 1 , ⋯ , 𝑒𝑠 ) with 𝑒 0 ≠ 0 such that 𝑒

𝑒

𝑒

0 < |𝑝11 𝑝22 ⋯ 𝑝𝑠𝑠 − 𝑒 0 𝜉| < 𝑒−𝜖𝐸 where 𝐸 = max |𝑒𝑗 |. 1≤𝑗≤𝑠

We apply Theorem 6.10 with 𝜉 = 1, 𝑝1 = 2, 𝑝2 = 3 and conclude that the number of tuples (𝑒 0 , 𝑒 1 , 𝑒 2 ) with 𝑒 0 = 𝑞 + 1, 𝑒 1 = −𝑘, 𝑒 2 = 𝑘 satisfying | 3 𝑘 | 0 < |( ) − (𝑞 + 1)| < 𝑒−𝜖𝑘 2 | | is finite. In other words, for 𝑘 sufficiently large, |3𝑘 − (𝑞 + 1)2𝑘 | > 2𝑘 𝑒−𝜖𝑘 . That is, |𝑟 − 2𝑘 | > 2𝑘 𝑒−𝜖𝑘 for 𝑘 sufficiently large. Choose

4 𝜖 < log( ) 3 so that the Pillai–Dickson condition is easily satisfied. An interesting remark has been made by Sinnou David in this context, which connects Euler’s conjecture to the famous 𝑎𝑏𝑐 conjecture, stated as follows:

6.3. A conjectured value for 𝑔(𝑘)

129

Conjecture 6.11 (𝑎𝑏𝑐-conjecture). For any 𝜖 > 0, there is a 𝜅(𝜖) > 0 such that if 𝑎, 𝑏, 𝑐 are mutually coprime integers satisfying 𝑎 + 𝑏 = 𝑐, then 1+𝜖

max(|𝑎|, |𝑏|, |𝑐|) ≤ 𝜅(𝜖) ( ∏ 𝑝) 𝑝|𝑎𝑏𝑐

where the product is over distinct primes 𝑝 dividing 𝑎𝑏𝑐 (and is called the radical of 𝑎𝑏𝑐). David has proved that if the 𝑎𝑏𝑐 conjecture holds, then the Pillai– Dickson condition holds for 𝑘 sufficiently large. Moreover, if the 𝑎𝑏𝑐 conjecture is effective, then one can establish an effective lower bound for 𝑘. He notes that we can take the 𝑎𝑏𝑐 conjecture with any 𝜖
2𝑘 − 𝑞 − 2. We will treat 3𝑘 = 2𝑘 𝑞 + 𝑟 = 2𝑘 (𝑞 + 1) + (𝑟 − 2𝑘 ) as the relevant equation to apply the 𝑎𝑏𝑐 conjecture. Let 3𝜈 = 𝑔𝑐𝑑(3𝑘 , 2𝑘 (𝑞 + 1)). Then 3𝑘−𝜈 = 3−𝜈 2𝑘 (𝑞 + 1) + (𝑟 − 2𝑘 )3−𝜈 . The radical is bounded by 6(𝑞 + 1)(𝑞 + 2) < 36𝑞2 for 𝑞 ≥ 1. By the 𝑎𝑏𝑐 conjecture, 2𝑘 < max(3𝑘−𝜈 , 2𝑘 (

𝑞 + 1 𝑟 − 2𝑘 ), ) < 𝜅(𝜖)(36𝑞2 )1+𝜖 3𝜈 3𝜈 ≪ 𝑞2+2𝜖 .

Since 𝑟 > 0, we have 3𝑘 > 2𝑘 𝑞 so that 𝑞 < (3/2)𝑘 . Hence 2+2𝜖

3 𝑘 2 ≪ (( ) ) 2 𝑘

130

6. An introduction to Waring’s problem

implying that 3 1 < (2 + 2𝜖) log( ) , 2 which is a contradiction. 6.3.1. Exercises. Exercise 6.3.1.1. Let 𝑞 be a natural number such that all numbers of the form 𝑞𝑛2 can be written as a bounded number of 𝑘th powers. Show that every natural number can be written as a sum of a bounded number of 𝑘th powers. Exercise 6.3.1.2. Show that if 16𝑛 can be written as a sum of 15 biquadrates, then so can 𝑛. Exercise 6.3.1.3. Prove that the number of solutions of the equation 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑔 = 𝑛, where the 𝑥𝑖 are nonnegative integers, is given by the binomial coefficient 𝑛+𝑔−1 ( ). 𝑛 Exercise 6.3.1.4. Prove that the number of solutions of 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑔 ≤ 𝑛, where the 𝑥𝑖 are nonnegative integers is given by the binomial coefficient (

𝑛+𝑔 ). 𝑛

Exercise 6.3.1.5. Let 𝐺(𝑘) denote the smallest value of 𝑔 such that every sufficiently large number can be written as a sum of 𝑔 𝑘th powers. Show that 𝐺(𝑘) ≥ 𝑘 + 1. [Hint: Using the previous exercise, show that there are infinitely many natural numbers that cannot be written as a sum of 𝑘 𝑘th powers.] Exercise 6.3.1.6. Prove that 𝐺(4) ≥ 15. [Hint: First show that any fourth power is ≡ 0 or 1 (mod 16).]

6.4. The easier Waring’s problem

131

6.4. The easier Waring’s problem In this section, we discuss an easier variant of Waring’s problem. We ask whether every natural number 𝑛 can be written as 𝑛 = ±𝑥1𝑘 ± 𝑥2𝑘 ± ⋯ ± 𝑥𝑠𝑘 . This problem is, in fact, substantially easier to solve using differencing arguments. Indeed, let Δ𝑓(𝑥) ≔ 𝑓(𝑥 + 1) − 𝑓(𝑥) and define recursively Δ𝑚+1 𝑓(𝑥) ≔ Δ(Δ𝑚 𝑓(𝑥)). Observe that if 𝑓(𝑥) is a polynomial of degree 𝑘 with leading coefficient 𝑎, then Δ𝑓(𝑥) is a polynomial of degree 𝑘 − 1 with leading coefficient 𝑘𝑎. We immediately obtain (see Exercise 6.4.1.1) Lemma 6.12. Δ𝑘−1 𝑥𝑘 = 𝑘! 𝑥 + 𝑑 where 𝑑 is an integer (depending on 𝑘). We can now prove Theorem 6.13. Every natural number 𝑛 can be written as 𝑛 = ±𝑥1𝑘 ± 𝑥2𝑘 ± ⋯ ± 𝑥𝑠𝑘 with 𝑠 ≤ 2𝑘−1 +

𝑘! . 2

Proof. Note that Δ𝑥𝑘 = (𝑥 + 1)𝑘 − 𝑥𝑘 and Δ2 𝑥𝑘 = (𝑥 + 2)𝑘 − 2(𝑥 + 1)𝑘 + 𝑥𝑘 . By induction we see that Δ𝑘−1 𝑥𝑘 can be written as a sum of 2𝑘−1 terms of the form ±(𝑥 + 𝑎)𝑘 . With 𝑑 as in Lemma 6.12, we write (via the division algorithm) 𝑛 − 𝑑 = 𝑘! 𝑥 + 𝑟 𝑘! with |𝑟| ≤ 2 . In other words, 𝑛 = Δ𝑘−1 𝑥𝑘 + 𝑟.

132

6. An introduction to Waring’s problem

Since the |𝑟| can be written as a sum of at most 𝑘! /2 1’s, we deduce the result immediately. □ We now denote 𝑣(𝑘) to be the smallest value of 𝑠 permissible in the above theorem. Clearly, 𝑣(𝑘) ≤ 𝑔(𝑘). We recall the function 𝐺(𝑘) (from Exercise 6.3.1.5), which is defined as the smallest 𝑔 such that every sufficiently large natural number can be written as a sum of 𝑔 𝑘th powers. Vinogradov proved that 𝐺(𝑘) = O(𝑘 log 𝑘). Thus, 𝑔(𝑘) is conjectured to have exponential growth rate compared to 𝐺(𝑘). There are many unsolved problems regarding 𝐺(𝑘). So far, there are only two values of 𝑘 for which 𝐺(𝑘) is known. By Lagrange’s four square theorem 𝐺(2) = 4. Harold Davenport [17] has proved that 𝐺(4) = 16. Further, it is conjectured that 𝐺(3) = 4. That is, every sufficiently large number can be written as the sum of four cubes. This conjecture is still open. The best known bound so far is 𝐺(3) ≤ 7. Questions about 𝐺(𝑘) have been extensively studied by Robert C. Vaughan and Trevor D. Wooley ([72], [73], [74], [75]), who have obtained bounds for 𝐺(𝑘) for 5 ≤ 𝑘 ≤ 20. The interested reader can find a survey of general bounds on 𝐺(𝑘) and related problems in [76]. 6.4.1. Exercises. Exercise 6.4.1.1. Let 𝑓(𝑥) be a polynomial of degree 𝑘 with integer coefficients. Define Δ𝑓(𝑥) ≔ 𝑓(𝑥 + 1) − 𝑓(𝑥) and define recursively, for every positive integer 𝑚, Δ𝑚+1 𝑓(𝑥) ≔ Δ(Δ𝑚 𝑓(𝑥)). Show that Δ𝑘−1 𝑥𝑘 = 𝑘! 𝑥 + 𝑑 where 𝑑 is an integer (depending on 𝑘). Exercise 6.4.1.2. Show that 𝑣(𝑘) ≤ 𝐺(𝑘) + 1. Exercise 6.4.1.3. Show that 𝑣(3) is either 4 or 5. Exercise 6.4.1.4. Show that 𝑣(2) = 3.

Chapter 7

Waring’s problem

A fundamental contribution to additive number theory was made by Schnirelmann in the 1930s. Schnirelmann was motivated by the conjecture of Goldbach that every 𝑛 > 2 can be written as a sum of at most two primes numbers if 𝑛 is even and at most three primes if 𝑛 is odd. Before the advent of sieve theory, Edmund Landau in 1912 challenged the mathematical community to show that there exists a positive integer 𝐶 such that every natural number greater than 1 can be written as the sum of at most 𝐶 prime numbers. Responding to this challenge, Schnirelmann was able to show that 𝐶 < 800, 000. In order to obtain his theorem, he related the existence of such a constant 𝐶 to a new way of interpreting the density of the set of primes. This new notion of density, different from the notion of “natural” density, can be generalized to subsets of natural numbers and is amenable to several additive problems including the one posed by Waring. In this chapter, we introduce the notion of Schnirelmann density and show how it leads to a solution of Waring’s problem following an approach used by Linnik.

7.1. Schnirelmann density Let 𝐴 ⊆ ℕ and for every 𝑛 ≥ 1, let 𝐴(𝑛) = #{𝑎 ∈ 𝐴 ∶ 𝑎 ≤ 𝑛}. The Schnirelmann density of 𝐴, denoted by 𝛿(𝐴) is defined as 𝛿(𝐴) ≔ inf 𝑛≥1

𝐴(𝑛) . 𝑛 133

134

7. Waring’s problem

We observe that 𝐴(𝑛) ≥ 𝛿(𝐴)𝑛 for all 𝑛 ≥ 1. We also observe that 0 ≤ 𝛿(𝐴) ≤ 1 and 𝛿(𝐴) = 1 if and only if 𝐴 = ℕ. The Schnirelmann density is different from the natural density or asymptotic density 𝜎(𝐴) defined as 𝜎(𝐴) = lim

𝑛→∞

𝐴(𝑛) . 𝑛 𝐴(𝑛)

While 𝜎(𝐴) measures the asymptotic behavior of 𝑛 for arbitrarily large values of 𝑛, the Schnirelmann density 𝛿(𝐴) is sensitive to all values of 𝑛. For example, if 𝔼 and 𝕆 denote the set of even and odd natural numbers respectively, then 𝜎(𝔼) = 𝜎(𝕆) = 1/2. On the other hand, 𝛿(𝔼) = 0 and 𝛿(𝕆) = 1/2. Given two sets 𝐴 and 𝐵 of integers, let 𝐴 + 𝐵 denote the sumset of 𝐴 and 𝐵, that is, 𝐴 + 𝐵 = {𝑎 + 𝑏 ∶ 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵}. If 𝐴 and 𝐵 are subsets of ℕ, let 𝐴 ⊕ 𝐵 denote the set 𝐴 ⊕ 𝐵 ≔ 𝐴 ∪ {0} + 𝐵 ∪ {0}. More generally, let 𝐴1 , 𝐴2 , ⋯ , 𝐴𝑡 ⊆ ℕ. For any 𝑖 ≥ 2, let us define ⊕𝑡𝑖=1 𝐴𝑖 recursively as follows: ⊕𝑡𝑖=1 𝐴𝑖 ≔ (⊕𝑡−1 𝑖=1 𝐴𝑖 ) ⊕ 𝐴𝑡 . Furthermore, for any 𝑚 ∈ ℕ and 𝐴 ⊆ ℕ, we denote 𝑚𝐴 ≔ ⊕𝑚 𝑖=1 𝐴𝑖 ,

𝐴𝑖 = 𝐴

for each 1 ≤ 𝑖 ≤ 𝑚.

The above concept of sums of sets helps us to restate Waring’s conjecture (which we also call Waring’s problem) in a simple and elegant manner. For 𝑘 ≥ 2, let us consider 𝐴𝑘 = {𝑛𝑘 ∶ 𝑛 ∈ ℕ}. Waring’s problem posits the existence of 𝑔 = 𝑔(𝑘) ∈ ℕ such that 𝑔𝐴𝑘 = ℕ. In this sense, Lagrange’s four square theorem is simply the assertion that 4𝐴2 = ℕ. One can reformulate the Goldbach problem also in a similar framework which was the original motivation for Schnirelmann. In order to understand his treatment of such problems, we start with some fundamental properties of 𝛿(𝐴) proved by him in 1936. Theorem 7.1 (Schnirelmann, 1936). For any two subsets 𝐴 and 𝐵 of ℕ, 𝛿(𝐴 ⊕ 𝐵) ≥ 𝛿(𝐴) + 𝛿(𝐵) − 𝛿(𝐴)𝛿(𝐵).

7.1. Schnirelmann density

135

Proof. Suppose 𝐴(𝑛) = 𝑟. We order the elements 𝑎𝑖 ≤ 𝑛 of 𝐴 as 1 ≤ 𝑎1 < 𝑎2 < 𝑎3 ⋯ < 𝑎𝑟 ≤ 𝑛. Let 𝐵1 ≔ {𝑏 ∈ 𝐵 ∶ 𝑏 < 𝑎1 }. For 2 ≤ 𝑖 ≤ 𝑟, let 𝐵𝑖 ≔ {𝑏 ∈ 𝐵 ∶ 𝑎𝑖−1 + 𝑏 < 𝑎𝑖 }. Finally, let 𝐵𝑟+1 ≔ {𝑏 ∈ 𝐵 ∶ 𝑎𝑟 + 𝑏 ≤ 𝑛}. We denote 𝑎0 = 0. Observe that the sets {𝑎1 , 𝑎2 , ⋯ 𝑎𝑟 } and the sets 𝑎𝑖−1 + 𝐵𝑖 , 1 ≤ 𝑖 ≤ 𝑟 + 1 are disjoint subsets of (𝐴 ⊕ 𝐵), and each element in these sets is ≤ 𝑛. For notational convenience, let us define 𝐵(0) = 0. From above, we have 𝑟+1

(𝐴 ⊕ 𝐵)(𝑛) ≥ 𝐴(𝑛) + ∑ |𝐵𝑖 | 𝑖=1 𝑟

≥ 𝐴(𝑛) + 𝐵(𝑎1 − 1) + ∑ 𝐵(𝑎𝑖 − 𝑎𝑖−1 − 1) + 𝐵(𝑛 − 𝑎𝑟 ). 𝑖=2

Combining the above inequality with the property that 𝐵(𝑛) ≥ 𝛿(𝐵)𝑛, we get that for every 𝑛 ≥ 1, 𝑟

(𝐴 ⊕ 𝐵)(𝑛) ≥ 𝐴(𝑛) + 𝛿(𝐵){(𝑎1 − 1) + ∑ (𝑎𝑖 − 𝑎𝑖−1 − 1) + (𝑛 − 𝑎𝑟 )} 𝑖=2

= 𝐴(𝑛) + 𝛿(𝐵)(𝑛 − 𝑟) = 𝐴(𝑛) + 𝛿(𝐵)(𝑛 − 𝐴(𝑛)) = 𝐴(𝑛)(1 − 𝛿(𝐵)) + 𝛿(𝐵)𝑛 ≥ 𝛿(𝐴)𝑛(1 − 𝛿(𝐵)) + 𝛿(𝐵)𝑛 = 𝑛 (𝛿(𝐴) + 𝛿(𝐵) − 𝛿(𝐴)𝛿(𝐵)) . This proves Theorem 7.1. □ A more general version of Schnirelmann’s theorem can be stated as follows:

136

7. Waring’s problem

Theorem 7.2. For 𝐴1 , 𝐴2 , ⋯ , 𝐴𝑡 ⊆ ℕ, 𝑡 𝑡 𝛿(⊕𝑖=1 𝐴𝑖 ) ≥ 1 − ∏(1 − 𝛿(𝐴𝑖 )). 𝑖=1

Proof. By Theorem 7.1, 𝛿(𝐴1 ⊕ 𝐴2 ) ≥ 𝛿(𝐴1 ) + 𝛿(𝐴2 ) − 𝛿(𝐴1 )𝛿(𝐴2 ) = 1 − (1 − 𝛿(𝐴1 ))(1 − 𝛿(𝐴2 )). This proves the theorem for 𝑡 = 2. We now apply induction. We have, 𝑡−1 𝛿 (⊕𝑡𝑖=1 𝐴𝑖 ) = 𝛿 (⊕𝑡−1 𝑖=1 𝐴𝑖 ⊕ 𝐴𝑡 ) ≥ 1 − (1 − 𝛿 (⊕𝑖=1 𝐴𝑖 )) (1 − 𝛿(𝐴𝑡 )).

By the induction hypothesis for 𝑖 = 𝑡 − 1, 𝑡−1

𝛿 (⊕𝑡−1 𝑖=1 𝐴𝑖 ) ≥ 1 − ∏(1 − 𝛿(𝐴𝑖 )), 𝑖=1

that is, 𝑡−1

1 − 𝛿 (⊕𝑡−1 𝑖=1 𝐴𝑖 ) ≤ ∏(1 − 𝛿(𝐴𝑖 )). 𝑖=1

Therefore, 𝛿 (⊕𝑡𝑖=1 𝐴𝑖 ) ≥ 1 − (1 − 𝛿 (⊕𝑡−1 𝑖=1 𝐴𝑖 )) (1 − 𝛿(𝐴𝑡 )) 𝑡−1

≥ 1 − (1 − 𝛿(𝐴𝑡 )) ∏(1 − 𝛿(𝐴𝑖 )). 𝑖=1



This proves the theorem for any 𝑡 ≥ 2.

The Schnirelmann density of a subset 𝐵 of ℕ measures the “closeness” of 𝐵 to ℕ. For example, 𝛿(𝐵) = 1 if and only if 𝐵 = ℕ. We now show that if 𝛿(𝐵) is greater than 1/2, then 2𝐵 = ℕ. Lemma 7.3. If 𝛿(𝐵) > 1/2 for some 𝐵 ⊆ ℕ, then 𝛿(2𝐵) = 1. In other words, 2𝐵 = ℕ. Proof. Let 𝑛 ∈ ℕ. We will show that 𝑛 ∈ 2𝐵. If 𝑛 ∈ 𝐵, then we are done. If 𝑛 ∉ 𝐵, let 𝐵𝑛 = {𝑏 ∈ 𝐵 ∶ 𝑏 < 𝑛} and 𝐵𝑛′ = {𝑛 − 𝑏 ∶ 𝑏 ∈ 𝐵𝑛 }. Clearly, #𝐵𝑛 = #𝐵𝑛′ . Also, #𝐵𝑛 = 𝐵(𝑛), as 𝑛 ∉ 𝐵. Since 𝐵𝑛 ∪ 𝐵𝑛′ ⊆ {1, 2, ⋯ , 𝑛}, it is clear that #(𝐵𝑛 ∪ 𝐵𝑛′ ) ≤ 𝑛. Since 𝛿(𝐵) > 1/2, we get #𝐵𝑛 = #𝐵𝑛′ = 𝐵(𝑛) >

𝑛 . 2

7.1. Schnirelmann density

137

Let us assume that 𝐵𝑛 and 𝐵𝑛′ are disjoint. This implies, 𝑛 < #(𝐵𝑛 ∪ 𝐵𝑛′ ), which contradicts the fact that #(𝐵𝑛 ∪𝐵𝑛′ ) ≤ 𝑛. Hence, our assumption is false and 𝐵𝑛 and 𝐵𝑛′ are not disjoint. In other words, there exist 𝑏1 , 𝑏2 ∈ 𝐵, with 𝑏1 , 𝑏2 < 𝑛, such that 𝑏1 = 𝑛 − 𝑏2 , that is, 𝑏1 + 𝑏2 = 𝑛. This proves that 𝑛 ∈ 2𝐵. This proves Lemma 7.3. □ Theorem 7.4 connects the concept of Schnirelmann density with additive problems: Theorem 7.4 (Schnirelmann, 1936). If 𝐴 is a subset of ℕ such that 𝛿(𝐴) > 0, then there exists 𝑚 ∈ ℕ such that 𝛿(𝑚𝐴) = 1, and therefore, 𝑚𝐴 = ℕ. Proof. If 𝛿(𝐴) = 1, we are done. If not, we have 0 < 𝛿(𝐴) < 1. Since 0 < 1 − 𝛿(𝐴) < 1, we may choose 𝑡 large enough so that (1 − 𝛿(𝐴))𝑡


□ In 1940, Linnik noticed how Schnirelmann’s theorem can be applied to solve Waring’s problem. Let 𝑘 ≥ 2 and 𝐴𝑘 ≔ {𝑛𝑘 ∶ 𝑛 ∈ ℕ}. Theorem 7.4 reduces Waring’s problem to showing the existence of a natural number 𝑚 such that 𝛿(𝑚𝐴𝑘 ) > 0. This is done in the next section. In other words, Waring’s problem is reduced to the “easier problem” of showing that a positive Schnirelmann density of natural numbers can be written as a sum of a bounded number of 𝑘th powers. Theorem 7.1 of Schnirelmann has been improved by various authors culminating in the work of Henry Berthold Mann who showed in 1942 that if 0 ∈ 𝐴 ∩ 𝐵, then 𝛿(𝐴 + 𝐵) ≥ min(1, 𝛿(𝐴) + 𝛿(𝐵)).

138

7. Waring’s problem

Students can find a simple proof of Theorem 7.4 in the combinatorial classic of Heini Halberstam and Klaus Friedrich Roth [29]. 7.1.1. Exercises. Exercise 7.1.1.1. Let 𝑚 > 1. Prove that the Schnirelmann density of the set of natural numbers ≡ 1 (mod 𝑚) is equal to 1/𝑚. Exercise 7.1.1.2. Show that Theorem 7.4 is false if we replace Schnirelmann density by natural density. Exercise 7.1.1.3. Let 𝐴 ⊆ ℕ with Schnirelmann density 𝛿. Then, 𝐴(𝑛) ≥ 𝛿𝑛. Show that this inequality is false if 𝛿 is replaced by natural density. Exercise 7.1.1.4. Let 𝐴 ⊆ ℕ be a subset with Schnirelmann density 𝜔 with 0 < 𝜔 < 1. Show that every natural number can be written as a sum of at most log 2 2 (1 + [− ]) log(1 − 𝜔) elements from 𝐴. Exercise 7.1.1.5. (1) Show that ∑ 𝑝

1 1 < , 2 𝑝2

where the sum is over all prime numbers 𝑝. (2) Show that the set of squarefree numbers has Schnirelmann density > 1/2. Deduce that every natural number can be written as a sum of at most two squarefree numbers.

7.2. Schnirelmann density and Waring’s problem In 1940 Linnik [47] used Schnirelmann’s theorem (Theorem 7.4) to provide a solution of Waring’s problem through elementary numbertheoretic techniques. We outline Linnik’s arguments in this section. Let 𝑘 ≥ 2 and 𝐴𝑘 = {𝑥𝑘 ∶ 𝑥 ∈ ℕ}. We observe that 𝐴 (𝑛) 𝑛1/𝑘 1 ≤ 𝑘 ≤ for every 𝑛 ≥ 1. 𝑛 𝑛 𝑛 Thus,

𝑛1/𝑘 = 0. 𝑛→∞ 𝑛

𝛿(𝐴𝑘 ) ≤ 𝜎(𝐴𝑘 ) ≤ lim

7.2. Schnirelmann density and Waring’s problem

139

Hence, 𝛿(𝐴𝑘 ) = 0. In view of Theorem 7.4 of Schnirelmann, showing that 𝛿(𝑠𝐴𝑘 ) > 0 for some 𝑠 ≥ 2 would solve Waring’s problem. For integers 𝑠 ≥ 1 and 𝑚 ≥ 0, let 𝑟𝑠,𝑘 (𝑚) denote the number of nonnegative integral solutions of the equation 𝑥1𝑘 + 𝑥2𝑘 + ⋯ 𝑥𝑠𝑘 = 𝑚. That is, 𝑟𝑠,𝑘 (𝑚) = # {(𝑥1 , 𝑥2 , . . . , 𝑥𝑠 ) ∶ 𝑥𝑖 ∈ ℕ ∪ {0}, 𝑥1𝑘 + 𝑥2𝑘 + ⋯ 𝑥𝑠𝑘 = 𝑚} . Observe that if 𝑥1𝑘 +𝑥2𝑘 +⋯ 𝑥𝑠𝑘 = 𝑚, then 0 ≤ 𝑥𝑖 ≤ 𝑚1/𝑘 for each 1 ≤ 𝑖 ≤ 𝑠. Thus, for 𝑚 ≥ 1, 𝑠

𝑠 𝑗 𝑠 𝑟𝑠,𝑘 (𝑚) ≤ ([𝑚1/𝑘 ] + 1) = ∑ ( ) ([𝑚1/𝑘 ]) ≪𝑠 𝑚𝑠/𝑘 . 𝑗 𝑗=0

Linnik’s fundamental observation, which we will prove in Section 7.3, was that we can find a sufficiently large natural number 𝑠 for which the above estimate for 𝑟𝑠,𝑘 (𝑚) can be sharpened. He proved Theorem 7.5. Theorem 7.5 (Linnik, 1943). For a natural number 𝑘 ≥ 2 there exists 𝑠 ∈ ℕ and a constant 𝑐(𝑘) depending only on 𝑘 such that 𝑠

𝑟𝑠,𝑘 (𝑚) ≤ 𝑐(𝑘)𝑚 𝑘 −1 for all 𝑚 ≥ 1. Before proving Linnik’s theorem, we prove Corollary 7.6. Corollary 7.6. For any natural number 𝑘 ≥ 2, there exists 𝑠 ∈ ℕ such that 𝛿(𝑠𝐴𝑘 ) > 0. Proof. By Theorem 7.5, there exists 𝑠 ∈ ℕ such that 𝑛

𝑛

(7.1)



𝑟𝑠,𝑘 (𝑚) ≤ 1 +



𝑠

𝑚=1 𝑟𝑠,𝑘 (𝑚)≠0

𝑚=0 𝑟𝑠,𝑘 (𝑚)≠0

where 𝑐′ (𝑘) = max{1, 𝑐(𝑘)}. We also observe that if 0 ≤ 𝑥𝑖 ≤

𝑛1/𝑘 for each 𝑖, 𝑠1/𝑘

then, 𝑠

∑ 𝑥𝑖𝑘 ≤ 𝑛. 𝑖=1

𝑠

𝑐(𝑘)𝑚 𝑘 −1 ≤ 𝑐′ (𝑘)𝑛 𝑘 −1

𝑛

∑ 𝑚=0 𝑟𝑠,𝑘 (𝑚)≠0

1,

140

7. Waring’s problem

Thus, 𝑛



(7.2)



1≤

(𝑥1 ,𝑥2 ,⋯,𝑥𝑠 ) 𝑛1/𝑘 0≤𝑥𝑖 ≤ 𝑠1/𝑘

𝑟𝑠,𝑘 (𝑚).

𝑚=0 𝑟𝑠,𝑘 (𝑚)≠0

Observe that 𝑠

(7.3)



(𝑠𝐴𝑘 )(𝑛) =

1 = ([

(𝑥1 ,𝑥2 ,⋯,𝑥𝑠 ) 𝑛1/𝑘 0≤𝑥𝑖 ≤ 𝑠1/𝑘

𝑠

𝑛1/𝑘 𝑛1/𝑘 + 1) ≥ ] ) . ( 𝑠1/𝑘 𝑠1/𝑘

Combining equation (7.3) with the inequalities in equations (7.1) and (7.2), we get a positive constant 𝑐′ (𝑘) such that 𝑠

(

𝑠 𝑛1/𝑘 ) ≤ 𝑐′ (𝑘)𝑛 𝑘 −1 (𝑠𝐴𝑘 )(𝑛) for every 𝑛 ≥ 1. 1/𝑘 𝑠

Thus, (𝑠𝐴𝑘 )(𝑛) 1 ≥ 𝑠 for every 𝑛 ≥ 1. 𝑛 𝑐′ (𝑘)𝑠 𝑘 Hence, by Theorem 7.5, we can find 𝑠 ∈ ℕ and 𝑑(𝑘) > 0 such that 𝛿(𝑠𝐴𝑘 ) ≥

1 𝑠

,

𝑐′ (𝑘)𝑠 𝑘 □

and therefore, 𝛿(𝑠𝐴𝑘 ) > 0.

Thus, by Theorem 7.4, Corollary 7.6 solves Waring’s problem. We now prove Linnik’s Theorem 7.5 in Section 7.3.

7.3. Proof of Linnik’s theorem We start this section with an observation about exponential functions which helps us to express the term 𝑟𝑠,𝑘 (𝑚) with the help of suitable exponential sums. This helps us to prove Linnik’s theorem and will also enable us later to invoke the circle method. Let 𝑛 ∈ ℤ. We observe that 1

(7.4)

∫ 𝑒2𝜋𝑖𝑛𝛼 𝑑𝛼 = { 0

1 0

if 𝑛 = 0, otherwise.

7.3. Proof of Linnik’s theorem

141

Let 𝑃 = 𝑚1/𝑘 and set 𝑓(𝛼) ≔ ∑ 𝑒(𝑥𝑘 𝛼),

𝑒(𝑥) ≔ 𝑒2𝜋𝑖𝑥 .

0≤𝑥≤𝑃

As an immediate application of (7.4), we deduce that for any 𝑠 ∈ ℕ and for a nonnegative integer 𝑚 ≥ 0, 1

∫ (𝑓(𝛼))𝑠 𝑒(−𝑚𝛼)𝑑𝛼 0 1

=∫ 0

(7.5)

𝑒 ((𝑥1𝑘 + 𝑥2𝑘 + ⋯ + 𝑥𝑠𝑘 ) 𝛼) 𝑒(−𝑚𝛼)𝑑𝛼

∑ (𝑥1 ,𝑥2 ,. . .,𝑥𝑠 ) 0≤𝑥1 ,𝑥2 ,. . .,𝑥𝑠 ≤𝑃

1

=

∫ 𝑒 ((𝑥1𝑘 + 𝑥2𝑘 + ⋯ + 𝑥𝑠𝑘 − 𝑚) 𝛼) 𝑑𝛼

∑ (𝑥1 ,𝑥2 ,. . .,𝑥𝑠 ) 0≤𝑥1 ,𝑥2 ,. . .,𝑥𝑠 ≤𝑃

=

∑ (𝑥1 ,𝑥2 ,. . .,𝑥𝑠 ) 0≤𝑥1 ,𝑥2 ,. . .,𝑥𝑠 ≤𝑃

0

1 { 0

if 𝑥1𝑘 + 𝑥2𝑘 + ⋯ + 𝑥𝑠𝑘 = 𝑚 otherwise.

= 𝑟𝑠,𝑘 (𝑚). Thus,

(7.6)

| 1 | 𝑟𝑠,𝑘 (𝑚) = |𝑟𝑠,𝑘 (𝑚)| = ||∫ (𝑓(𝛼))𝑠 𝑒(−𝑚𝛼)𝑑𝛼|| | 0 | 𝑠

1

| | ≤ ∫ || ∑ 𝑒(𝑥𝑘 𝛼)|| 𝑑𝛼. | 0 |0≤𝑥≤𝑃

Linnik proved Theorem 7.7 with respect to the above exponential sum. Theorem 7.7. For any natural number 𝑘 ≥ 2 and for 𝑃 ≥ 1, 8𝑘−1

1

| | ∫ || ∑ 𝑒(𝑥𝑘 𝛼)|| | 0 |0≤𝑥≤𝑃

𝑑𝛼 ≤ 𝑐(𝑘)𝑃 8

where 𝑐(𝑘) is a real number depending on 𝑘. Let 𝑚 ≥ 1. Choosing 1

𝑃 = 𝑚 𝑘 and 𝑠 = 8𝑘−1 ,

𝑘−1 −𝑘

,

142

7. Waring’s problem

we get, by equation (7.6), 𝑠

𝑟𝑠,𝑘 (𝑚) ≤ 𝑐(𝑘)𝑚 𝑘 −1 , which proves Theorem 7.5 and consequently, Corollary 7.6. For the rest of this section, therefore, we focus our attention to proving Theorem 7.7. We now prove a basic lemma on linear equations. Lemma 7.8. For a nonnegative integer 𝑛, let 𝑞(𝑛) denote the number of integer solutions (𝑥1 , 𝑥2 , 𝑦1 , 𝑦2 ) of the equation (7.7)

𝑥1 𝑦1 + 𝑥2 𝑦2 = 𝑛

such that |𝑥𝑖 | ≤ 𝑋 and |𝑦 𝑖 | ≤ 𝑌 . Then 𝑞(0) ≪ (𝑋𝑌 )3/2 and 1 𝑞(𝑛) ≪ (𝑋𝑌 ∑ ) 𝑑 𝑑|𝑛

for 𝑛 ≥ 1.

Proof. We first consider the case when 𝑛 = 0. Clearly, 𝑥1 , 𝑥2 and 𝑦1 can take at most 2𝑋 + 1, 2𝑋 + 1 and 2𝑌 + 1 values respectively. Once these are chosen, 𝑦2 can take at most one value. Thus, 𝑞(0) ≤ (2𝑋 + 1)2 (2𝑌 + 1) ≪ 𝑋 2 𝑌 . Similarly, 𝑞(0) ≪ 𝑋𝑌 2 . Thus, 𝑞(0) ≪ min{𝑋 2 𝑌 , 𝑋𝑌 2 } ≪ √𝑋 2 𝑌 .𝑋𝑌 2 ≪ (𝑋𝑌 )3/2 . We can do better when 𝑛 ≠ 0. We assume, without loss of generality, that 𝑋 ≤ 𝑌 . Let 𝑞1 (𝑛) be the number of integer solutions to 𝑥1 𝑦1 + 𝑥2 𝑦2 = 𝑛, such that |𝑥2 | ≤ |𝑥1 | ≤ 𝑋 and |𝑦 𝑖 | ≤ 𝑌 for 𝑖 = 1, 2. This ensures that 𝑥1 ≠ 0. Otherwise, we would get 𝑥2 = 0 which implies that 𝑛 = 0. Let us start by fixing 𝑥1 and 𝑥2 and assume that (𝑥1 , 𝑥2 ) = 1. Let 𝑞(𝑛; 𝑥1 , 𝑥2 ) be the number of integral solutions of equation (7.7). Clearly, 𝑞(𝑛; 𝑥1 , 𝑥2 ) > 0. Given a particular solution (𝑦′1 , 𝑦′2 ), all solutions of equation (7.7) are of the form 𝑦1 = 𝑦′1 + 𝑡𝑥2 , 𝑦2 = 𝑦′2 − 𝑡𝑥1 , 𝑡 ∈ ℤ. We observe that |𝑡| =

|𝑦′2 − 𝑦2 | 2𝑌 ≤ . |𝑥1 | |𝑥1 |

7.3. Proof of Linnik’s theorem

143

.

We conclude that ∑

𝑞1 (𝑛) ≤



1≤|𝑥1 |≤𝑋 |𝑥2 |≤|𝑥1 |







1≤|𝑥1 |≤𝑋 |𝑥2 |≤|𝑥1 |

≤ 5𝑌

∑ 1≤|𝑥1 |≤𝑋

(

(2

2𝑌 + 1) |𝑥1 |

4𝑌 + |𝑥1 | ) |𝑥1 |

2|𝑥1 | + 1 ≪ 𝑋𝑌 . |𝑥1 |

Thus, equation (7.7) has ≪ 𝑋𝑌 integer solutions. If (𝑥1 , 𝑥2 ) = 𝑑 > 1, equation (7.7) has an integer solution provided 𝑑|𝑛. In this case, we take 𝑥 𝑥 𝑥1′ = 1 , 𝑥2′ = 2 . 𝑑 𝑑 From above, the number of integer solutions to the equation 𝑛 𝑥1′ 𝑦1 + 𝑥2′ 𝑦2 = 𝑑 is ≪

𝑋𝑌 . 𝑑

We conclude that 1 𝑞1 (𝑛) ≪ 𝑋𝑌 ∑ . 𝑑 𝑑|𝑛

It immediately follows that 1 𝑞(𝑛) ≪ 𝑋𝑌 ∑ . 𝑑 𝑑|𝑛 □ From Lemma 7.8, we deduce the following: Lemma 7.9. Let 𝑓(𝑥) be a polynomial of degree 2 with integer coefficients, say, 𝑓(𝑥) = 𝑎2 𝑥2 + 𝑎1 𝑥 + 𝑎0 , with 𝑎2 = O(1), 𝑎1 = O(𝑃) and 𝑎0 = O(𝑃 2 ). The number of solutions in the variables 𝑥𝑖 ’s and 𝑦 𝑖 ’s such that 0 ≤ 𝑥𝑖 , 𝑦 𝑖 ≤ 𝑃 for 1 ≤ 𝑖 ≤ 4 to the equation (7.8) 𝑓(𝑥1 ) + 𝑓(𝑥2 ) + 𝑓(𝑥3 ) + 𝑓(𝑥4 ) = 𝑓(𝑦1 ) + 𝑓(𝑦2 ) + 𝑓(𝑦3 ) + 𝑓(𝑦4 ) is ≪ 𝑃 6 . Proof. We observe that 𝑓(𝑥𝑖 ) − 𝑓(𝑦 𝑖 ) = (𝑥𝑖 − 𝑦 𝑖 )[𝑎2 (𝑥𝑖 + 𝑦 𝑖 ) + 𝑎1 ].

144

7. Waring’s problem

We put 𝑧𝑖 = 𝑥𝑖 − 𝑦 𝑖 and 𝑤 𝑖 = 𝑎2 (𝑥𝑖 + 𝑦 𝑖 ) + 𝑎1 . The number of solutions of equation (7.8) is less than or equal to the number of solutions of the equation 𝑧1 𝑤 1 + 𝑧2 𝑤 2 = −𝑧3 𝑤 3 − 𝑧4 𝑤 4 , where 𝑧𝑖 ≪ 𝑃 and 𝑤 𝑖 ≪ 𝑃. By Lemma 7.8, we see that for a fixed 𝑛 ≥ 0, the number 𝑞(𝑛) of solutions of 𝑧1 𝑤 1 + 𝑧2 𝑤 2 = 𝑛 is ≪ 𝑃 3 if 𝑛 = 0, and 1 if 𝑛 ≥ 1. 𝑑 𝑑|𝑛

≪ 𝑃2 ∑

Thus, the number of solutions of the equation 𝑧1 𝑤 1 + 𝑧2 𝑤 2 = −𝑧3 𝑤 3 − 𝑧4 𝑤 4 , where 𝑧𝑖 ≪ 𝑃 and 𝑤 𝑖 ≪ 𝑃 is 2

1 ∑ 𝑞(𝑛)2 ≪ 𝑃 6 + ∑ (𝑃 2 ∑ ) 𝑑 𝑑|𝑛 |𝑛|≪𝑃 2 1≤𝑛≤𝑐𝑃 2 ≪ 𝑃6 + 𝑃4 ∑



1≤𝑛≤𝑃 2

𝑑1 |𝑛 𝑑2 |𝑛

1 . 𝑑1 𝑑2

We now observe that ∑



1≤𝑛≤𝑃 2

𝑑1 |𝑛 𝑑2 |𝑛

=

∑ 1≤𝑑1 ≤𝑃 2 1≤𝑑2 ≤𝑃 2

∑ 1≤𝑑1 ≤𝑃 2 1≤𝑑2 ≤𝑃 2

∑ 1≤𝑑1 ≤𝑃 2 1≤𝑑2 ≤𝑃 2

1 = 𝑑1 𝑑2 1 𝑑1 𝑑2

1 𝑑1 𝑑2

∑ 1≤𝑑1 ≤𝑃 2 1≤𝑑2 ≤𝑃 2



1 𝑑1 𝑑2

1

1≤𝑛≤𝑃 2 𝑑1 |𝑛, 𝑑2 |𝑛



1

1≤𝑛≤𝑃 2 [𝑑1 ,𝑑2 ]|𝑛

1 𝑃2 + O(1)) , ( 𝑑1 𝑑2 [𝑑1 , 𝑑2 ]

∑ 1≤𝑛≤𝑃 2 𝑑1 |𝑛, 𝑑2 |𝑛

1

7.3. Proof of Linnik’s theorem

145

where [𝑑1 , 𝑑2 ] denotes the least common multiple of 𝑑1 and 𝑑2 . Using the elementary identity [𝑑1 , 𝑑2 ](𝑑1 , 𝑑2 ) = 𝑑1 𝑑2 , we have, 2

1 ∑ 𝑞(𝑛) ≪ 𝑃 + ∑ (𝑃 ∑ ) 𝑑 𝑑|𝑛 |𝑛|≪𝑃 2 1≤𝑛≤𝑐𝑃 2 2

≪ 𝑃6 + 𝑃4

6

2

∑ 1≤𝑑1 ≤𝑃 2 1≤𝑑2 ≤𝑃 2

≪ 𝑃6 + 𝑃6

(𝑑1 , 𝑑2 ) (𝑑1 𝑑2 )2

∑ 1≤𝑑1 ≤𝑃 2 1≤𝑑2 ≤𝑃 2



1 𝑃2 𝑑1 𝑑2 [𝑑1 , 𝑑2 ]



1 (𝑑1 𝑑2 )3/2 =1

≪ 𝑃6 + 𝑃6 ∑ ∑ 𝑑1 =1 𝑑2 6

≪𝑃 . In the above, we have used the trivial estimate (𝑑1 , 𝑑2 ) ≤ √𝑑1 𝑑2 . This proves Lemma 7.9. □ We now prove the following general version of Linnik’s theorem, due to Hua [40]. Though elementary, the proof will require some intellectual stamina on the part of the reader. Theorem 7.10. Let 𝑘 ≥ 2 and 𝑓(𝑥) be a polynomial of degree 𝑘 with integer coefficients, say, 𝑓(𝑥) = 𝑎𝑘 𝑥𝑘 + 𝑎𝑘−1 𝑥𝑘−1 + ⋯ 𝑎1 𝑥1 + 𝑎0 such that 𝑎𝑘 = O(1), 𝑎𝑘−1 = O(𝑃), ⋯ , 𝑎1 = O(𝑃 𝑘−1 ), 𝑎0 = O(𝑃 𝑘 ). Then, there exists a positive real number 𝑐 = 𝑐(𝑘) such that for all 𝑃 ≥ 1, 1

| 𝑃 | ∫ || ∑ 𝑒2𝜋𝑖𝑓(𝑥)𝛼 || | 0 |𝑥=0

(7.9)

8𝑘−1

𝑑𝛼 ≤ 𝑐(𝑘)(𝑃 8

𝑘−1 −𝑘

).

Proof. Let us start with 𝑘 = 2. Observe that 1

8

4

4

1 𝑃 𝑃 | 𝑃 | 2𝜋𝑖𝑓(𝑥)𝛼 | 2𝜋𝑖𝑓(𝑥)𝛼 | ∫ |∑ 𝑒 ) ( ∑ 𝑒−2𝜋𝑖𝑓(𝑥)𝛼 ) 𝑑𝛼 | 𝑑𝛼 = ∫ ( ∑ 𝑒 | 0 |𝑥=0 0 𝑥=0 𝑥=0

146

7. Waring’s problem

1⎛ ⎞ ∑ =∫ ⎜ 𝑒2𝜋𝑖(𝑓(𝑥1 )+𝑓(𝑥2 )+𝑓(𝑥3 )+𝑓(𝑥4 )−𝑓(𝑦1 )−𝑓(𝑦2 )−𝑓(𝑦3 )−𝑓(𝑦4 ))𝛼 ⎟ 𝑑𝛼. ⎜ 0≤𝑥1 ,𝑥2 ,𝑥3 ,𝑥4 ≤𝑃 ⎟ 0 ⎝ 0≤𝑦1 ,𝑦2 ,𝑦3 ,𝑦4 ≤𝑃 ⎠

By equation (7.4), the integral in question is equal to the number of integer solutions (𝑥1 , . . . , 𝑥4 , 𝑦1 , . . . , 𝑦4 ) to equation (7.8) such that 0 ≤ 𝑥𝑖 , 𝑦 𝑖 ≤ 𝑃. Thus, this integral is ≪ 𝑃 6 by Lemma 7.9. This proves Theorem 7.10 for 𝑘 = 2. We now proceed by mathematical induction and assume that equation (7.9) holds when we replace 𝑘 by 𝑘 − 1. Observe that 2

𝑃 𝑃 | 𝑃 | | ∑ 𝑒2𝜋𝑖𝑓(𝑥)𝛼 | = ∑ ∑ 𝑒2𝜋𝑖(𝑓(𝑥1 )−𝑓(𝑥2 ))𝛼 | | |𝑥=0 | 𝑥1 =0 𝑥2 =0 𝑃−𝑥

𝑃

= ∑ 𝑒−2𝜋𝑖𝑓(𝑥)𝛼 ∑ 𝑒2𝜋𝑖𝑓(𝑥+ℎ)𝛼 ℎ=−𝑥 ′ 2𝜋𝑖(𝑓(𝑥+ℎ)−𝑓(𝑥))𝛼

𝑥=0 ′

=𝑃+1+∑ ∑ 𝑒

,

ℎ≠0 𝑥

where the dash on top of the summations refers to all those integers ℎ lying between −𝑃 and 𝑃 and those integers 𝑥 such that both 𝑥 + ℎ and 𝑥 lie between 0 and 𝑃. Now, 𝑘

𝑓(𝑥 + ℎ) − 𝑓(𝑥) = ∑ 𝑎𝑗 ((𝑥 + ℎ)𝑗 − 𝑥𝑗 ) 𝑗=0 𝑘

𝑗−1

𝑗 = ∑ 𝑎𝑗 ∑ ( )𝑥𝑖 ℎ𝑗−𝑖 . 𝑖 𝑗=0 𝑖=0 Thus, 𝑓(𝑥 + ℎ) − 𝑓(𝑥) = ℎ𝜙(𝑥, ℎ), where 𝑘−1

𝜙(𝑥, ℎ) = ℎ ∑ 𝑏𝑖 (ℎ)𝑥𝑖 , 𝑖=0 𝑘

and 𝑏𝑖 (ℎ) = ∑𝑗=𝑖+1 (𝑗𝑖)ℎ𝑗−𝑖−1 . Thus, 𝜙(𝑥, ℎ) is a polynomial whose degree in 𝑥 is at most 𝑘 − 1. Moreover, the coefficient 𝑏𝑖 (ℎ) of 𝑥𝑖 for each 0 ≤ 𝑖 ≤ 𝑘 − 1 satisfies the bound 𝑘

(7.10)

𝑘

𝑗 𝑗 |𝑏𝑖 (ℎ)| ≪ ∑ ( )|ℎ|𝑗−𝑖−1 ≤ ∑ ( )𝑃 𝑗−𝑖−1 ≪𝑘 𝑃 𝑘−1−𝑖 . 𝑖 𝑖 𝑗=𝑖+1 𝑗=𝑖+1

7.3. Proof of Linnik’s theorem Let us define

147 ′

𝑎ℎ = ∑ 𝑒2𝜋𝑖ℎ𝜙(𝑥,ℎ)𝛼 . 𝑥

Then, we have 2

| | 𝑃 | ∑ 𝑒2𝜋𝑖𝑓(𝑥)𝛼 | = 𝑃 + 1 + ∑′ 𝑎 . ℎ | | |𝑥=0 | ℎ≠0 Raising both sides by the power 8𝑘−2 , we have 2.8𝑘−2

(7.11)

| 𝑃 | | ∑ 𝑒2𝜋𝑖𝑓(𝑥)𝛼 | | | |𝑥=0 |

8𝑘−2

8𝑘−2



= (𝑃 + 1 + ∑ 𝑎ℎ )

≤2

8𝑘−2

max(𝑃

8𝑘−2

ℎ≠0

| ′ | , ||∑ 𝑎ℎ || | ℎ≠0 |

).

We now consider two cases. Case 1. If | ′ | |∑ 𝑎 | ≤ 𝑃, ℎ| | | ℎ≠0 | then 2.8𝑘−2

| 𝑃 | | ∑ 𝑒2𝜋𝑖𝑓(𝑥)𝛼 | | | |𝑥=0 |

≪ 𝑃8

𝑘−2

.

Hence, raising the above equation to the fourth power, 1

8𝑘−1

| | 𝑃 ∫ || ∑ 𝑒2𝜋𝑖𝑓(𝑥)𝛼 || | 0 |𝑥=0

𝑘−2

𝑑𝛼 ≪ 𝑃 4.8

≪ 𝑃8

𝑘−1 −𝑘

,

since 4.8𝑘−2 ≤ 8𝑘−1 − 𝑘 for all 𝑘 ≥ 2. This proves the theorem, provided | ′ | |∑ 𝑎 | ≤ 𝑃. ℎ| | | ℎ≠0 | Case 2. Suppose now that | ′ | |∑ 𝑎 | ≥ 𝑃. ℎ| | | ℎ≠0 |

148

7. Waring’s problem

Then, by equation (7.11), 8𝑘−2

8𝑘−2

| | ′ ≪ || ∑ 𝑎ℎ || |0 0. One may also ask finer questions about 𝑟𝒜,𝑔 (𝑛) or 𝑅𝒜,𝑔 (𝑛), for example, their asymptotic growth. For this, we need to study the relevant exponential sums with more care. In this chapter, we introduce finer techniques to study sums of the form ∑ 𝑒 (𝑔(𝑚)) , 1≤𝑚≤𝑃

where 𝑔(𝑡) denotes a suitable arithmetic function. That is, we study the average order of exponential arithmetic functions of the form 𝑒(𝑔(𝑥)). Such sums are called exponential sums and have numerous applications. The study of certain exponential sums forms a natural bridge between number theory and algebraic geometry. As such, they open into another universe of scintillating ideas. Exponential sums also have an interesting connection with the theory of uniform distribution. In 1916, Hermann Weyl [79] systematically investigated how the behavior of the above exponential sum influences the distribution properties of the sequence {𝑔(𝑚)}𝑚≥1 modulo one. We refer the reader to [54, Chapter 11] and [46, Chapter 1] for a detailed exposition of this theme. In this textbook we focus on the connection of various types of exponential sums connected to Waring’s problem and the Goldbach conjectures. This connection is exploited in upcoming chapters. As such, the goal of this chapter is to answer the following questions:

8.1. Exponential sums for polynomials of degree 1

155

• Let 𝑔(𝑥) = 𝛼𝑥 where 𝛼 ∈ ℝ. What can we say about ∑ 𝑒 (𝑔(𝑚)) 1≤𝑚≤𝑃

if 𝛼 is an irrational number? This question is addressed in Section 8.1. • Let 𝑔(𝑥) = 𝛼𝑥𝑘 + 𝑎𝑘−1 𝑥𝑘−1 + ⋯ + 𝑎1 𝑥 + 𝑎0 where 𝑘 ≥ 2. The evaluation of ∑1≤𝑚≤𝑃 𝑒 (𝑔(𝑚)) is addressed in Section 8.2. • In each of the above questions, how are the concerned exponential sums influenced by the Diophantine approximation properties of 𝛼? This will be a recurring theme in all the sections of this chapter and in future chapters. • What can we say about ∑𝑝≤𝑥 𝑒 (𝑝𝛼) for 𝛼 lying in suitable subintervals of [0, 1]? Here, the sum runs over primes 𝑝 ≤ 𝑥. As described in Section 5.4, while investigating questions about primes numbers, it is often more feasible to attach weights from the von Mangoldt functions to such sums. That is, we study weighted exponential sums over primes or primes powers of the form ∑ 𝑇(𝑛)𝑒(𝑛𝛼) and ∑ Λ(𝑛)𝑒(𝑛𝛼). 𝑛≤𝑥

𝑛≤𝑥

8.1. Exponential sums for polynomials of degree 1 𝑁

We start with the sum ∑𝑚=1 𝑒(𝑚𝛼) where 𝛼 is a rational number of the form 𝑎/𝑞 with (𝑎, 𝑞) = 1. Clearly, if 𝑚 ≡ 𝑟 (mod 𝑞), then 𝑒(𝑚𝑎/𝑞) = 𝑒(𝑟𝑎/𝑞). Thus, 𝑁

∑ 𝑒( 𝑚=1

𝑞

𝑚𝑎 𝑟𝑎 ) = ∑ 𝑒( ) 𝑞 𝑞 𝑟=1

𝑁



1.

𝑚=1 𝑚≡𝑟 (mod 𝑞)

From the above, one easily derives the estimate 𝑁

∑ 𝑒( 𝑚=1

𝑚𝑎 ) = O(𝑞) 𝑞

using the fact that the sum of the 𝑞th roots of unity is zero (see Exercise 8.1.1.1). Note here that the implied bound is independent of 𝑁.

156

8. Exponential sums

The problem becomes more interesting if we consider irrational values of 𝛼. By a slightly more careful analysis, we derive the dependence of the sum ∑𝑚 𝑒(𝑚𝛼) on the distance between 𝛼 and the integer closest to it. In what follows, it is convenient to introduce, for 𝛼 ∈ ℝ, the notation, ‖𝛼‖ ≔ min(|𝑛 − 𝛼| ∶ 𝑛 ∈ ℤ). Clearly, 0 ≤ ‖𝛼‖ ≤ 1/2. We prove an elementary lemma from calculus. 1

Lemma 8.1. If 0 < 𝛼 < 2 , then 2𝛼 < sin 𝜋𝛼 < 𝜋𝛼. Proof. The function 𝑓(𝛼) = sin 𝜋𝛼−2𝛼 is zero for 𝛼 = 0, 1/2. If 𝑓(𝛼) = 0 for some 0 < 𝛼 < 1/2, then 𝑓 would have more than one point of extremum in the interval (0, 1/2). This is not possible, since 𝑓′ (𝛼) = 𝜋 cos 𝜋𝛼−2 is strictly decreasing from 𝜋−2 to −2 in this interval. Therefore, 𝑓′ (𝛼) = 0 for exactly one value of 𝛼 in this interval. Thus, 𝑓(𝛼) ≠ 0 for any 0 < 𝛼 < 1/2. The continuity and nonvanishing of 𝑓 in (0, 1/2) implies that either 𝑓(𝑥) < 0 in the entire interval or 𝑓(𝑥) > 0 in the entire interval. We know that the former is true since 𝑓(1/4) > 0. Thus, 2𝛼 < sin 𝜋𝛼 for 0 < 𝛼 < 1/2. For the upper bound, consider 𝑔(𝛼) = 𝜋𝛼 − sin 𝜋𝛼. Note that 𝑔(0) = 0 and 1 𝑔′ (𝛼) = 𝜋 − 𝜋 cos 𝜋𝛼 > 0 for 0 < 𝛼 < . 2 So, 𝑔 is increasing and 𝑔(𝛼) > 0, as required. □ Using Lemma 8.1, we derive a classical inequality for the exponential sum ∑𝑚 𝑒(𝑚𝛼) over a finite range. This inequality, due to Weyl, shows the dependence of this sum on ‖𝛼‖. Lemma 8.2 (Weyl). For 𝛼 ∈ ℝ, and integers 𝑁1 and 𝑁2 with 𝑁1 < 𝑁2 , we have | 𝑁2 | | ∑ 𝑒(𝛼𝑚)| ≪ min(𝑁 − 𝑁 , 1 ) . 2 1 | | ‖𝛼‖ |𝑚=𝑁1 +1 | Proof. The first bound | | 𝑁2 | ∑ 𝑒(𝛼𝑚)| ≪ 𝑁 − 𝑁 2 1 | | | |𝑚=𝑁1 +1

8.1. Exponential sums for polynomials of degree 1

157

is clear. In fact, if 𝛼 ∈ ℤ, that is, ‖𝛼‖ = 0, then 𝑁2



𝑒(𝛼𝑚) = 𝑁2 − 𝑁1 .

𝑚=𝑁1 +1

If ‖𝛼‖ = 1/2, that is, 𝛼 = 𝑘 + 1/2 for some integer 𝑘, then 𝑁2

∑ 𝑚=𝑁1 +1

𝑁2

𝑒(𝛼𝑚) =



(−1)𝑚 .

𝑚=𝑁1 +1

The latter sum is clearly bounded above by 1. Now, suppose 0 < ‖𝛼‖ < 1/2. The sum is a geometric progression. | 𝑁2 | | ∑ 𝑒(𝛼𝑚)| = |𝑒(𝛼(𝑁 + 1))| || 𝑒(𝛼(𝑁2 − 𝑁1 )) − 1 || 1 | | | | 𝑒(𝛼) − 1 |𝑚=𝑁1 +1 | 2 ≤ |𝑒(𝛼) − 1| 1 = . | sin 𝜋𝛼| It is easily verified that | sin 𝜋𝛼| = |(sin 𝜋‖𝛼‖)| . Thus, the result follows from Lemma 8.2. □ We now review important approximation properties of 𝛼 ∈ ℝ. We do so in an attempt to understand how ∑𝑚 𝑒(𝑚𝛼) depends on rational approximations to 𝛼. We start with an application of the pigeonhole principle. Lemma 8.3 (Dirichlet). Let 𝛼 and 𝑄 be real numbers with 𝑄 ≥ 1. There exist integers 𝑎, 𝑞 with (𝑎, 𝑞) = 1 and 1 ≤ 𝑞 ≤ 𝑄 such that 1 𝑎| | |𝛼 − | < . | 𝑞 | 𝑞𝑄 Proof. Let 𝑁 = [𝑄]. It suffices to prove the lemma without the condition (𝑎, 𝑞) = 1. Consider the numbers 𝛽𝑞 ≔ 𝛼𝑞 − [𝛼𝑞], 𝑞 = 1, 2, . . . , 𝑁 which all lie in [0, 1). Partition the interval [0, 1) into 𝑁 + 1 subintervals 𝐵𝑟 = [

𝑟 𝑟−1 , ) , 𝑟 = 1, 2, . . . , 𝑁 + 1. 𝑁+1 𝑁+1

158

8. Exponential sums

If one of the numbers 𝛽𝑞 lies in 𝐵1 , then we are done because 1 1 < , 𝑁+1 𝑄 and the result follows upon dividing by 𝑞. |𝛼𝑞 − [𝛼𝑞]| ≤

The same result also follows if some 𝛽𝑞 lies in 𝐵𝑁+1 because in this case, 1 1 < |1 − 𝛽𝑞 | = |1 − (𝛼𝑞 − [𝛼𝑞])| ≤ 𝑁+1 𝑄 and again the result follows upon dividing by 𝑞. So, we may suppose that the numbers 𝛽𝑞 all lie in the 𝑁 − 1 intervals 𝐵𝑟 , 𝑟 = 2, . . . , 𝑁. By the pigeonhole principle, there exists a pair 𝛽ᵆ , 𝛽𝑣 with 1 ≤ 𝑢 < 𝑣 ≤ 𝑁 lying in some interval 𝐵𝑟 , 2 ≤ 𝑟 ≤ 𝑁. Taking 𝑞 = 𝑣 − 𝑢 and 𝑎 = [𝛼𝑣] − [𝛼𝑢], we have Lemma 8.3. □ In particular, Lemma 8.3 implies that given real numbers 𝛼 and 𝑄 with 𝑄 ≥ 1, one can find integers 𝑎 and 𝑞 with 1 ≤ 𝑞 ≤ 𝑄 and (𝑎, 𝑞) = 1 such that 1 𝑎| | |𝛼 − | ≤ 2 . | 𝑞| 𝑞 This has some interesting consequences. • As the integer 𝑚 varies in an interval of length 𝑞/2, the values 1 ‖𝑚𝛼‖ are distinct and spaced apart by a distance of at least 2𝑞 . • Let 𝒮 ≔ {‖𝑚𝛼‖ ∶ 𝑚 varies over a complete residue system mod 𝑞}. Any interval of the form 𝑠 𝑠+1 [ , ), 0 ≤ 𝑠 ≤ 𝑞 − 1 𝑞 𝑞 can have at most eight elements of 𝒮. The above observations are relevant in the study of exponential sums due to Lemma 8.2. In turn, all the above observations have several interesting applications in estimating sums of the form ∑ 𝑎(𝑛)𝑒(𝑛𝛼), 𝑛≤𝑋

where 𝑎(𝑛) is a sequence. Such estimates are obtained by breaking up the above sum into smaller intervals of length 𝑞/2 and utilising the above mentioned principles. In later chapters, for example, we will explore

8.1. Exponential sums for polynomials of degree 1

159

how they are applied to study Waring’s problem and Goldbach’s problem. For now, we adapt these principles into precise lemmas, keeping future applications in mind. Lemma 8.4. Let 𝛼 ∈ ℝ and 𝑎 and 𝑞 be integers such that 𝑞 ≥ 1 and (𝑎, 𝑞) = 1. If 𝑎| 1 | |𝛼 − | < 2 , | 𝑞| 𝑞 then 𝑞/2 1 ∑ ≪ 𝑞 log 𝑞. ‖𝛼𝑟‖ 𝑟=1 Proof. The proof has been expanded in Exercise 8.1.1.3.



Lemma 8.5. Let 𝛼 ∈ ℝ and 𝑎 and 𝑞 be integers such that 𝑞 ≥ 1 and (𝑎, 𝑞) = 1. If 𝑎| 1 | |𝛼 − | < 2 , | 𝑞| 𝑞 then for any nonnegative real number 𝑣 and integer ℎ ≥ 0, we have 𝑞

∑ min(𝑣, 𝑟=1

1 ) ≪ 𝑣 + 𝑞 log 𝑞. ‖𝛼(ℎ𝑞 + 𝑟)‖

Proof. The proof has been expanded in Exercise 8.1.1.4.



Lemma 8.6. Let 𝛼 ∈ ℝ and 𝑎 and 𝑞 be integers such that 𝑞 ≥ 1 and (𝑎, 𝑞) = 1. If 𝑎| 1 | |𝛼 − | < 2 , | 𝑞| 𝑞 then 1 𝑋𝑌 ∑ min(𝑌 , ) ≪ (𝑞 + 𝑋 + 𝑌 + ) max{1, log 𝑞}. ‖𝑚𝛼‖ 𝑞 0≤𝑚≤𝑋 Proof. We partition the sum according to the arithmetic progression that 𝑚 belongs to mod 𝑞: 𝑞

∑ min(𝑌 , 0≤𝑚≤𝑋

1 1 ) ≤ ∑ ∑ min(𝑌 , ). ‖𝑚𝛼‖ ‖(ℎ𝑞 + 𝑟)𝛼‖ 0≤ℎ≤𝑋/𝑞 𝑟=1

By Lemma 8.5, the inner sum is ≪ 𝑌 + 𝑞 log 𝑞.

160

8. Exponential sums

Thus, the sum in question is 𝑋 𝑋𝑌 + 1) (𝑌 + 𝑞 log 𝑞) = + 𝑌 + 𝑋 log 𝑞 + 𝑞 log 𝑞 𝑞 𝑞 𝑋𝑌 ≪ (𝑞 + 𝑋 + 𝑌 + ) max{1, log 𝑞}, 𝑞

≪(



as desired. Before concluding this section, we record the following estimate. Lemma 8.7. Let 𝛼 be an irrational number satisfying 𝑎| 1 | |𝛼 − | ≤ 2 , | | 𝑞 𝑞

for some rational 𝑎/𝑞 where 1 ≤ 𝑞 ≤ 𝑁 and (𝑎, 𝑞) = 1. Then, for any real number 𝑋 ≥ 1, we have ∑ min( 1≤𝑚≤𝑋

𝑁 1 𝑁 , ) ≪ ( + 𝑋 + 𝑞) log(2𝑞𝑋). 𝑚 ‖𝑚𝛼‖ 𝑞

Proof. To prove Lemma 8.7, we write each 𝑚 as 𝑚 = ℎ𝑞 + 𝑟, where 1 ≤ 𝑟 ≤ 𝑞 and 0 ≤ ℎ < 𝑋/𝑞. Thus, ∑ min( 1≤𝑚≤𝑋





𝑁 1 , ) 𝑚 ‖𝑚𝛼‖

∑ min(

0≤ℎ